What makes Deep Learning deep?
I defined AI in a recent post as the ability to solve human problems through constructed means; machine learning is the subset where those means are constructed through models trained on data and deep learning is the subset of ML where those models are in some sense big or deep. I would like to expand here on what it means for Deep Learning to be deep.
Taking a step back, one can think of most problems or processes in life as functions in the mathematical sense. In mathematics, the concept of a function is central. It is a structure that takes inputs and maps them to outputs. The position of an object over time is a function; ordering food is a sort of function, the order or request being an input (X) and the food itself being the output (Y). The problem-solving process is a sort of function-- taking the problem statement and available information as inputs (X) and producing solutions as outputs (Y).
In so far then as AI is fundamentally about solving problems, it is useful to view its manifestations through a functional framework. Indeed, the realization of modern AI algorithms is that this functional perspective is not just philosophical in nature, but if taken as a literal mathematical prescription, can be extremely powerful when supplied with sufficient data and computational resources to optimally construct the function (f).
Simple machine learning models like linear (or even nonlinear) regression assume a straightforward form of the function (f) say as a polynomial in the input (x). The crux of Deep Learning is that it is profitable in complex problem-solving endeavors to construct the function (f) as a composition, that is as multiple layers of consecutive functions.
In particular, when these consecutive layers involve matrix multiplications or in their generalized form tensorial operations, we refer to such a functional architecture as a deep neural network. (The popular programming library TensorFlow gets its name from the mathematical form of its operations being tensors.) Historically, basic neural networks are made up of matrix multiplication acting on an input vector (x) and further acted upon by element-wise activation functions. At the end of the day, it's just composition of functions. These layered functions (f) are defined by tunable parameters that are fitted to optimally match the relations in the data provided to the model for training/learning, in much the same way that linear regression starts with an unspecified slope and y-intercept that we fit to data to generate a trend line (line of best fit).
In deep learning models, these tunable parameters are now spread across the various matrices, tensors and intermediary transformations that define the functional architecture. The various forms that the original input (x) takes as it transmutes through the layers, massaged and modified by each layer, are sometimes referred to as representations or embeddings. (In a rigorous mathematical sense, we usually reserve the term representation for cases where the transformation is invertible, implying that the information associated with the original input is fully recoverable and has not been lost by the function.) Basically, each layer of the deep neural network is taking the input data and transforming it to bring it closer to the nugget we care about, that is the solution to our original problem.
Comments
Post a Comment