What is AI? (a conceptual overview)


 AI is a hot topic to say the least. The last few years has seen a veritable explosion of AI, machine learning, deep learning etc. products and apps. Most recently the mainstream adoption of ChatGPT has sent tremors across society to the point of some tech experts cautioning that progress may be occurring too quickly. But what even are these technologies in the first place? The purpose of this post is to provide a thoughtful and clear breakdown of what the terms really mean from the perspective of a mathematically aware individual who also values conceptual understanding and accessible communication. I have been blessed in many ways throughout life, not least of which has been my education in theoretical and mathematical physics. I would like to make a few posts sharing perspectives that my background has afforded me in case it is helpful to anyone. (It will be beneficial to me as well as I reflect on and refine my understanding of various topics!)

AI or artificial intelligence is the ability of a constructed entity to solve problems of human interest. It is a broad umbrella. I would consider a calculator to be a form of AI. All useful computer programs are a form of AI in so far as they automate tasks for people. Within the superset of AI, there is a subset called machine learning which is responsible for the contemporary leaps we see around us today. Still more specifically, there is a subset within machine learning called deep learning which is responsible for the most powerful AI applications e.g., computer vision apps that can identify objects in photos or natural language processing apps that can generate text, answer questions and even hold realistic human-like conversations. Making sense so far? The big picture is that AI is a broad superset of which machine learning is a subset; deep learning is still another subset of machine learning. These subsets are at the core of modern advancements.

There was a time in the early history of AI in the 20th century when programmers attempted to codify problem solving abilities in rules-based systems. For example, imagine trying to write an instruction manual on how to identify different kinds of fruits. You might write something like "if the object is yellow, then it is a banana". If you start to think a bit further, you quickly realize that you will need to be increasingly specific and careful in your conditions-- there are a great many objects in the world that are yellow and not bananas! Literally trying to spell out in a computer program the exact rules and conditions for solving problems turned out to be a dead end of complexity. In contrast, the strides of the 21st century have come through machine learning which teaches computer programs through examples and data rather than exact expressions of theoretical logic. You could say machine learning is an empirical approach to reason as opposed to an a priori approach. 

To reiterate, machine learning amounts to teaching computer programs how to solve problems via examples and data. There are two ways in which this data can be provided: in a manner explicitly labelled by humans to direct the computer program on what relations to learn or in an unlabeled manner whereby the computer program must learn any relations in the data on its own. The former case of using labelled data is referred to as supervised machine learning whereas the latter case is referred to as unsupervised machine learning. For example, teaching a computer program how to spot tumors in MRI scans by using previously labelled MRI images would be supervised learning. On the other hand, if you were to provide unlabeled MRI scans to a computer program and allowed it to cluster images together into groups based on patterns and relations it recognized on its own, that would be considered unsupervised learning.

So where are we so far? AI is the superset; machine learning is the modern subset driven by teaching through examples and data. What then is deep learning? Deep learning is a subset of machine learning where the underlying algorithms or models being used by the computer programs to learn are basically really big. When we say really big, we mean really really really big. For example, v4 of ChatGPT is an example of deep learning where the underlying model is made up of 100,000,000,000,000 parameters (that's 1 followed by 14 zeros!). In contrast, the simplest form of a machine learning model would be linear regression with only 2 parameters. A shrewd reader may point out that regression models can actually be generalized to include an arbitrary number of degrees of freedom, indeed you can try to model even 2 variables with 100s, 1000s or even 10^14 polynomial powers if you like i.e., a highly nonlinear regression model. Would such a model still be considered deep learning purely based on the number of tunable parameters? It's a fair point and requires us to expand on what it means for a deep learning model to be big or "deep". To go there, we need to understand machine learning and deep learning in terms of their underlying mathematical structure, that is to say the functional architecture that underlies the notion of learning by examples and data.

Let's take a break here. Getting into the underlying functional architecture of what makes deep learning deep will be a great next post!


References:

- Check out the first chapter of Ian Goodfellow's classic and free textbook on Deep Learning for a definitive explanation of what AI really means and how its modern manifestations fit together.

Deep Learning (deeplearningbook.org)


Comments

Popular Posts