Machine Learning and Artificial Intelligence: A Primer
by Katherine Bailey
The technology press is abuzz these days with stories about Machine Learning (ML) and Artificial Intelligence (AI) — every other week it seems we’re hearing about a new AI surpassing human ability at some task or other, and just as often we hear about exciting new start-ups revolutionizing traditional problem spaces using machine learning. We also see the odd notable AI failure every now and then.
It can be hard to conceptualize what people are talking about when it comes to AI; and of course there’s also the question of the so-called Singularity (an artificial “superintelligence” arising and causing runaway technological growth): just how near is it?
This post is the first in a series on machine learning, and aims to bring some clarity to the subject, explaining how the concepts of machine learning and artificial intelligence relate to each other. It also describes at a high-level how basic ML works today to solve problems.
We are starting to use ML techniques here at Acquia both internally and to enhance some of our products, and we are incredibly excited about the possibilities. With this series of posts we hope to spread the excitement, not the hype, about these technologies.
A Little History
In order to understand the way the terms “Artificial Intelligence” and “Machine Learning” are used today, it’s worth taking a quick look at the history of the respective fields. AI research, which started back in the 1950s, was originally about building machines that could “think” — it was about mimicking human intelligence, in all its flexible glory. So problems such as translating text from one language to another were tackled in ways that assumed that all kinds of intermediate steps were required, like imbuing the machine with an “understanding” of the rules of language. Researchers had some early successes (although notably not in the area of translation) that led to great optimism, which in turn led to great disappointment when things turned out to be much harder than originally thought.
One thread of AI research back in those early days broke away from the rules-based approach and is of particular significance in the story of what we now call Machine Learning; it centered around something called a perceptron. This was an algorithm that had been inspired by the way neurons work in the brain. It learns the weights to apply to input neurons in order to produce a correct binary response in the output neuron.
This idea was the seed for more complicated “Artificial Neural Networks,” where hidden layers of neurons between the input layer and the output enabled great flexibility in tackling different types of problems. The direct descendents of the lowly perceptron are right now enjoying the most celebrated successes in problems like image classification and speech recognition under the umbrella term of “Deep Learning.”
Developments along this particular fork in the path of AI research moved away not just from the methodologies traditionally employed (using rule-based learning), but also from the broader goals of those earlier systems. It was no longer about producing something that could think like a human, but was instead simply about solving practical problems.
The development of Deep Learning suffered its own specific setbacks that led to years during which the very mention of the term “neural network” was almost taboo. But breakthroughs involving the clever application of some tricks from Calculus eventually put it back on track as a theoretically sound approach to answering questions with data.
Specifically, it was focused on classification problems: learning to classify examples as belonging to one class or another. One very early and very successful application was training a neural network to recognize handwritten digits. Here, each possible digit 0-9 is a class and the network needs to take an image of a digit like the one pictured below as input (raw pixel data) and output the correct class (digit):
The way it works is that the network is shown thousands and thousands of handwritten digits and told what each one of them is, and it needs to learn the features that distinguish each one. Once it has been trained to do this, when it sees a brand new image of a handwritten digit, it makes a prediction about which class it belongs to (which digit it is.)
Meanwhile over in the world of statistics, statisticians had for decades been using techniques such as Linear Regression for making predictions about real-valued quantities (i.e., estimating numerical values as opposed to categories) of a dependent variable from one or more independent variables. For example, trying to estimate the sales of a product as a function of advertising spend.
Of course, this approach can be used to make predictions based on many more variables. For example, predicting sales based on advertising spend per channel, per market segment, and at different times of day. This information can help marketers make day to day decisions that increase sales while reducing spend.
Both regression and classification are about taking labeled data — examples where you have the answer (“this image is of the digit 7” or “the sales figure for x advertising spend was y”) — and using it to train a model that can then be used to make predictions about new data. This general method is referred to as supervised learning. The dependent variable is often referred to as the outcome, and the independent variables are referred to as predictors or features.
A new field is born
Up to now we’ve talked about the Deep Learning (DL) researchers working on classification problems and the statisticians working on regression problems, but the field of statistics had also developed methods for doing classification. In fact, the DL folks found themselves reinventing techniques from statistics and data mining as they worked to improve their neural network algorithms.
Gradually, the overlap in these fields became a field in its own right, that of Machine Learning.
The general idea is about defining some loss function, which outputs a measure of how wrong your statistical model is about your data, and using optimization algorithms to adjust the model so as to minimize that loss (a.k.a cost or error). Many more algorithms have been invented for solving both classification and regression problems. Examples include “tree-based” methods, whereby the decision to classify an example (or estimate its real-valued output as lying within a particular range) depends on the answers to questions asked of the predictors at each branching point in the tree.
For much of the ‘90s the various approaches were on a relatively even playing field — some were better for particular problem types but worse for others. Then some developments not directly related to this field of research led to the DL approach, i.e. using sophisticated neural networks, really taking off.
Deep Learning Takes the World by Storm
The algorithms involved in DL had certainly developed in sophistication over the years, but they suffered from certain drawbacks: they required a lot of data and a lot of computing resources to train to a level of accuracy competitive with that of other types of algorithms.
Well, if the story of computing over the last couple of decades hasn’t been all about greater processing power (cloud computing, GPUs) then it has been about Big Data. That’s right: the very two things that DL needed more of in order to flourish are things that have seen spectacular growth in recent years. There are unprecedented amounts of data available to train classifiers with: just think of all the tweets being tweeted, and all the images being uploaded to the internet every day. And with cloud computing and various technologies enabling massive parallelization of algorithms, training on these massive datasets is vastly speeded up.
Deep Learning has seen tremendous successes in the last decade, particularly in the areas of image classification and speech recognition. It’s what Facebook uses to identify faces in your photos, what Siri uses to understand what you are saying, what Google image search uses to show you images related to your search terms. And the wonderful thing is, these and other large machine learning companies are falling over each other to open source their Deep Learning libraries: there’s TensorFlow from Google, DSSTNE from Amazon, DeepText from Facebook. It is an incredibly exciting time for anyone looking to get started in the field, not to mention companies looking to leverage these powerful technologies to solve their business problems.
Where does this leave the broader field of AI?
Not everyone has abandoned AI’s original goal of mimicking human intelligence, but we use the term Strong AI to differentiate that endeavour from the (weak) AI that most researchers are working on these days. A good rule of thumb is, when you hear the phrase “Artificial Intelligence” and it has not been qualified by the term “strong” or “weak”, it is most likely weak AI. Nobody has solved strong AI and so it is only ever talked about theoretically.
Even in cases where we talk about “an AI,” meaning “an artificially intelligent entity,” such as Siri or Alexa, we are still talking about weak AI; it’s just that it has been packaged into something you can interact with in human-like ways. Speech recognition, where Siri and Alexa figure out the words you are saying based on sounds, is one ML task that then feeds into the next task of Natural Language Understanding (NLU), i.e. figuring out the intent of the utterance in order to know how to respond. The tasks are combined together to give the illusion of an artificially intelligent entity. But make no mistake, Siri and Alexa are a long way from “waking up” as sentient beings ;)
The AI may be weak, but the force is strong
Just because the current state of AI is not producing robot overlords does not mean it isn’t doing amazing things. In the next post in this series, we’ll take a look at some of the ways AI is being put to use to achieve spectacular successes in a wide range of different fields, solving real-world problems.