Phil Lessner, YAGEO CTO on Artificial Intelligence – Why Now?

This blog post was written by Philip Lessner, Chief Technology Officer at YAGEO Group and posted on his LinkedIn profile AI – Why Now?.

The latest AI tools can answer your questions, generate an image from a text description, and even make a video for you. The progress in the last couple of year seems amazing. It’s almost magical what’s happening. But, why now? The answer is the convergence of three elements:

Before we examine these in more detail to see how they contribute to the current “AI Revolution”, let’s first define what AI is (Figure 1).

AI is the simulation of intelligent behavior of machines and the concept has been around since mid-last century.

Machine learning is one way to practically approach the goals of AI. In classic programming, the data is fed through a series of steps that are predetermined by the programmer. In machine learning, the data is used to train a model to recognize patterns in the data, and then this model is applied to new data to make predictions. One of the oldest uses of machine learning is spam prediction. The model is trained on data labeled Spam or Not Spam and then when a new email is fed to the model, it classifies it as Spam or Not Spam. This prediction step is called inference.

There are many machine learning algorithms, but the most powerful ones that have been applied are called ‘Deep Learning’. These are based on algorithms that mimic the structure of the brain, so they are also called neural networks.

Figure 1. The evolution of AI

The artificial neuron was first proposed in 1943 (Figure 2), but for a long time no one knew the algorithm for using data to determine the parameters of a neural network—called “training” the neural network. The breakthrough came in 1986 when Geoffrey Hinton and his colleagues at the University of Toronto discovered a technique called backpropagation that allowed training of neural networks.

In the late 1980’s two types of neural networks were discovered that allowed efficient recognition of images and voice—Convolutional Neural Networks and Recurrent Neural Networks. However, it wasn’t till 23 years later that these were widely applied in apps like Amazon Alexa, Apple Siri, and Google Lens and industrial applications like automated optical inspection (which is used in YAGEO factories). These types of networks also find use in emerging applications like precision agriculture and image recognition in autonomous vehicles.

In 2013 Tomas Mikelov and his colleagues at Google determined how to represent words and sentences as mathematical vectors, and more importantly, how to make using them practical in machine learning models. This did for natural language processing what CNNs and RNNs did for image recognition. Three years later, this was applied to Google Translate which improved the quality of machine translation by orders of magnitude.

In 2017 this was extended by another type of neural network called Transformers which lead to so called Large Language Models (or LLMs) the most famous of which is ChatGPT released 5 years later in 2022.

Why the lag between the development of the algorithms and the commercial release of these products?

Figure 2. Algorithms and Applications

It turns out that you need a lot of data to train these algorithms. The neural network models have millions or billions of parameters and the number of parameters is growing by 10x per year (see Figure 3). It takes millions of images and billions of words to properly train these models.

Figure 3: Number of Model Parameters

This was really enabled by the growth of internet when millions of images and billions of words of text became readily available. This is represented by ‘unstructured data’ in the graph in Figure 4.

Figure 4: The Growth of Data

In addition to the math (algorithms) and the data you also need lots of computational power to train and deploy these models with billions of parameters. By the mid 2000’s computational power had advanced enough to begin to run these deep learning models (Figure 5). The real advance that gave enough computational power was the development of the GPU or graphics processing unit. These are able to process many operations in parallel and the operations they are optimized for in graphics are of a similar form to those needed for deep learning. Now there are also dedicated chips that are, in some cases, even faster than GPUs to process the data.

Figure 5: History of Compute

So here we are in 2024 able to to access all these amazing applications because of the convergence of algorithms, data, and computational power over the last 80 years.

Storm clouds on the horizon, however. Compute takes power (and energy) and the amount of power needed may soon outstrip the ability to supply it. In the next part of this series we’ll take a deeper dive into this problem and then look at what’s being done to address it.

Exit mobile version