The technology press is abuzz these days with stories about Machine Learning (ML) and Artificial Intelligence (AI) — every other week it seems we’re hearing about a new AI surpassing human ability at some task or other, and just as often we hear about exciting new start-ups revolutionizing traditional problem spaces using machine learning. We also see the odd notable AI failure every now and then.
It can be hard to conceptualize what people are talking about when it comes to AI; and of course there’s also the question of the so-called Singularity (an artificial “superintelligence” arising and causing runaway technological growth): just how near is it?
This post is the first in a series on machine learning, and aims to bring some clarity to the subject, explaining how the concepts of machine learning and artificial intelligence relate to each other. It also describes at a high-level how basic ML works today to solve problems.
We are starting to use ML techniques here at Acquia both internally and to enhance some of our products, and we are incredibly excited about the possibilities. With this series of posts we hope to spread the excitement, not the hype, about these technologies.
A Little History
In order to understand the way the terms “Artificial Intelligence” and “Machine Learning” are used today, it’s worth taking a quick look at the history of the respective fields. AI research, which started back in the 1950s, was originally about building machines that could “think” — it was about mimicking human intelligence, in all its flexible glory. So problems such as translating text from one language to another were tackled in ways that assumed that all kinds of intermediate steps were required, like imbuing the machine with an “understanding” of the rules of language. Researchers had some early successes (although notably not in the area of translation) that led to great optimism, which in turn led to great disappointment when things turned out to be much harder than originally thought.
One thread of AI research back in those early days broke away from the rules-based approach and is of particular significance in the story of what we now call Machine Learning; it centered around something called a perceptron. This was an algorithm that had been inspired by the way neurons work in the brain. It learns the weights to apply to input neurons in order to produce a correct binary response in the output neuron.
This idea was the seed for more complicated “Artificial Neural Networks,” where hidden layers of neurons between the input layer and the output enabled great flexibility in tackling different types of problems. The direct descendants of the lowly perceptron are right now enjoying the most celebrated successes in problems like image classification and speech recognition under the umbrella term of “Deep Learning.”
Developments along this particular fork in the path of AI research moved away not just from the methodologies traditionally employed (using rule-based learning), but also from the broader goals of those earlier systems. It was no longer about producing something that could think like a human, but was instead simply about solving practical problems.
The development of Deep Learning suffered its own specific setbacks that led to years during which the very mention of the term “neural network” was almost taboo. But breakthroughs involving the clever application of some tricks from Calculus eventually put it back on track as a theoretically sound approach to answering questions with data.
Specifically, it was focused on classification problems: learning to classify examples as belonging to one class or another. One very early and very successful application was training a neural network to recognize handwritten digits. Here, each possible digit 0-9 is a class and the network needs to take an image of a digit like the one pictured below as input (raw pixel data) and output the correct class (digit):
The way it works is that the network is shown thousands and thousands of handwritten digits and told what each one of them is, and it needs to learn the features that distinguish each one. Once it has been trained to do this, when it sees a brand new image of a handwritten digit, it makes a prediction about which class it belongs to (which digit it is.)
Meanwhile over in the world of statistics, statisticians had for decades been using techniques such as Linear Regression for making predictions about real-valued quantities (i.e., estimating numerical values as opposed to categories) of a dependent variable from one or more independent variables. For example, trying to estimate the sales of a product as a function of advertising spend.
Of course, this approach can be used to make predictions based on many more variables. For example, predicting sales based on advertising spend per channel, per market segment, and at different times of day. This information can help marketers make day to day decisions that increase sales while reducing spend.
Both regression and classification are about taking labeled data — examples where you have the answer (“this image is of the digit 7” or “the sales figure for x advertising spend was y”) — and using it to train a model that can then be used to make predictions about new data. This general method is referred to as supervised learning. The dependent variable is often referred to as the outcome, and the independent variables are referred to as predictors or features.
A new field is born
Up to now we’ve talked about the Deep Learning (DL) researchers working on classification problems and the statisticians working on regression problems, but the field of statistics had also developed methods for doing classification. In fact, the DL folks found themselves reinventing techniques from statistics and data mining as they worked to improve their neural network algorithms.
Gradually, the overlap in these fields became a field in its own right, that of Machine Learning.
The general idea is about defining some loss function, which outputs a measure of how wrong your statistical model is about your data, and using optimization algorithms to adjust the model so as to minimize that loss (a.k.a cost or error). Many more algorithms have been invented for solving both classification and regression problems. Examples include “tree-based” methods, whereby the decision to classify an example (or estimate its real-valued output as lying within a particular range) depends on the answers to questions asked of the predictors at each branching point in the tree.
For much of the ‘90s the various approaches were on a relatively even playing field — some were better for particular problem types but worse for others. Then some developments not directly related to this field of research led to the DL approach, i.e. using sophisticated neural networks, really taking off.
Deep Learning Takes the World by Storm
The algorithms involved in DL had certainly developed in sophistication over the years, but they suffered from certain drawbacks: they required a lot of data and a lot of computing resources to train to a level of accuracy competitive with that of other types of algorithms.