Machine Learning's Not Magic, but It Can Work Wonders
There’s a fairly common tendency to use the word “magic” when talking about Machine Learning (ML). In some ways this word is appropriate given the amazing feats being accomplished using ML techniques, but in other ways it’s an unfortunate choice of words as it suggests these things are impossible for us to understand. If we believe that, then we can also be led to believe that “anything’s possible” when it comes to ML, and we lose our ability to be skeptical about claims being made about it.
The “anything’s possible” idea can also lead to unwarranted fears about an imminent Singularity, where the exponential rate of technological growth brings about a so-called superintelligence and the subsequent subjugation of the entire human race.
This post is about celebrating some of the most important successes of Machine Learning in a way that hopefully gets across how unmagical but nonetheless remarkable and full of promise these ML techniques are. It is the second post in our series on ML (you can read the first one here) which aims to spread interest and enthusiasm, not hype, about the great potential of Machine Learning in solving real problems.
Google’s AlphaGo has beaten the world Go champion
This story, more than any other from the past year, is probably responsible for a lot of recent converts to the idea that the Singularity must surely be near. Here are some of the reasons why AlphaGo’s victory against Lee Sedol, world Go champion, was such a big deal:
- The number of legal configurations of a Go board is greater than the number of atoms in the universe
- The branching factor of Go, i.e. the number of legal moves from a given position, is 250 (chess has an average branching factor of 35), which makes any kind of brute force (purely search-based) approach impossible
- In 2014, AI researchers working on the problem thought it would take at least another decade for an AI to be able to beat the best humans at Go
So how did AlphaGo do it? The full technical details are provided in a paper in Nature but on AlphaGo’s website the approach is described as combining “Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play.” We talked a little bit about deep neural networks as a form of supervised learning in the last post. In this case, the networks were trained on game board configurations from real historical games played by human experts, so they could learn the probability of a win for a given move from a given position. And AlphaGo then got to hone its skills by playing against itself over and over. So it played - and learned from - more games of Go than any human ever has. And that was the secret to its success.
As remarkable a feat as this is, it is important to bear in mind that AlphaGo cannot do anything besides play games of Go. It is another example of narrow or weak AI, focused on a very narrowly defined task.
Big Data is helping researchers understand cancer
Genomics, the study of entire genomes of organisms, needs to analyze extremely long sequences of characters representing the building blocks of DNA. Because these sequences form a double-helix, the size of genomes is given in pairs of characters (so-called base pairs) and the size of the human genome is estimated to be about 3.2 billion pairs. Researchers investigating the genetic foundations of cancer now have access to two petabytes of cancer genome data that they can train ML models on in order to identify drivers of tumor growth or predict the most effective treatment for specific cancer types, for example.
Both supervised and unsupervised learning techniques are used in cancer research. So far in this series we’ve only touched on supervised learning; unsupervised learning is about looking for patterns in data without having a specific outcome you’re trying to predict. One example is clustering: given a batch of data with various features, figure out meaningful ways of dividing it into clusters, based on those features. In cancer research this technique can be used to identify different types of cells exhibiting particular characteristics that prove to be significant in driving tumor growth.
Elsewhere in cancer research, Deep Learning is helping reduce the error rate in breast cancer diagnosis by a staggering 85%.
Bigger than Big Data: Astronomical Data
If you’ve ever wondered what the biggest data is, it is the data of space. The Square Kilometer Array telescope which is due for completion in 2024 will produce an exabyte of raw data per day. That’s 10 18 bytes, or one million terabytes. Machine learning techniques such as dimensionality reduction are required right from the start in order to be able to throw away just the useless data and retain a manageable amount of relevant data. But ML finds all kinds of uses in the study of space. NASA held a competition in 2011, via the ML competition site Kaggle, where the challenge was to “create a cosmological image analysis program to measure the small distortion in galaxy images caused by dark matter.” Within a week, the current state-of-the-art algorithm for this task had been beaten. The winners used an Artificial Neural Network to recognize patterns in the 100,000 image dataset.
The statistical approach to language translation beats all others
As mentioned briefly in the first post in this series, one of the main tasks the early AI researchers focused on was solving Machine Translation (MT): creating software that would take as input a sentence in one language and produce as output that sentence translated into another language. After decades of mediocre results from translation systems that attempted to use syntactic rules to translate from one language to another, the data-driven approach emerged and quickly left rules-based approaches in the dust.
Google researchers wrote about this in 2009 in an articled entitled The Unreasonable Effectiveness of Data. They noted that far from being one of the simplest tasks in Natural Language Processing (NLP), MT is in fact one of the hardest, but the reason the data-driven approach was so effective was due to the availability of massive amounts of data to train on: text is translated every day by humans from one language to another. The general trend seems to be that once you throw enough data at any NLP problem, the data-driven approach will eventually win out over rules-based approaches. As we noted before, the availability of large amounts of data has been a driving factor in the huge uptake of ML in recent years, and this is as true of NLP as it is of any other problem space.
At Acquia, we’re excited about many of the possible applications of ML, from unsupervised learning for understanding user data to predicting traffic spikes to optimize our Cloud service. And we’re particularly excited about applications in Natural Language Processing because the data they’re concerned with is text, i.e. content. In the next post, we’ll look more closely at NLP and some of the techniques used to analyze text data.