Dr. Alok Aggarwal
CEO and Chief Data Scientist
Scry Analytics, California, USA
Office: +1 408 872 1078; Mobile: +1 914 980 4717
January 20, 2018
In memory of Alan Turing, Marvin Minsky and John McCarthy
Every decade seems to have its technological buzzwords: we had personal computers in 1980s; Internet and worldwide web in 1990s; smart phones and social media in 2000s; and Artificial Intelligence (AI) and Machine Learning in this decade. However, the field of AI is 67 years old and this is the third in a series of five articles wherein:
As mentioned in a previous article , the 1950-82 era saw a new field of Artificial Intelligence (AI) being born, lot of pioneering research being done, massive hype being created but eventually fizzling out. The 1983-2004 era saw research and development in AI gradually picking up and leading to a few key accomplishments (e.g., Deep Blue beating Kasparov in Chess) and commercialized solutions (e.g., Cyberknife), but its pace really picked up during 2005 and 2010 .
Since 2011, AI research and development has been witnessing hypergrowth, and researchers have created several AI solutions that are almost as good as – or better than – humans in several domains; these include playing games, healthcare, computer vision and object recognition, speech to text conversion, speaker recognition, and improved robots and chat-bots for solving specific problems. The table in the Appendix lists key AI solutions that are rivaling humans in various domains and six of these solutions are described below. After discussing these six AI solutions, we discuss key reasons for this hypergrowth including the effects of Moore’s law, parallel and distributed computing, open source software, availability of Big Data, growing collaboration between academia and industry, and the amount of research that is being done in AI and its subfields.
Key reasons for this hyper-growth include the effects of Moore’s law, parallel and distributed computing, open source software, availability of Big Data, growing collaboration between academia and industry, and the amount of research that is being done in AI
In 2006, IBM Watson Research Center embarked on creating IBM Watson, a system that would use machine learning, natural language processing and information retrieval techniques to beat humans in the game called Jeopardy!. IBM Watson had 90 servers, each of which used an eight-core processor, four threads per core (i.e., total of 2,880 processor threads) and 16 terabytes of RAM . This processing power allowed IBM Watson to process 500 gigabytes, or about a million books, per second . Today, such a system would cost around 600,000 US Dollars.
IBM researchers realized early on that out of 3,500 randomly selected Jeopardy questions, Wikipedia titles contained at least 95% of the answers. Hence, IBM Watson contained all of Wikipedia and this “feature engineering” was one of the key insights for it to win Jeopardy! It also contained 200 million pages of other content including Wiktionary, Wikiquote, multiple editions of the Bible, encyclopedias, dictionaries, thesauri, newswire articles, and other literary works, and it used various other databases, taxonomies, and ontologies (e.g., DBPedia, WordNet, and Yago) to connect various documents .
IBM Watson had an ensemble of around 100 algorithms many of which were supervised learning. Although researchers tried using deep neural networks, logistic regression and related techniques performed much better. This is not surprising since deep learning networks require massive amounts of data whereas it was trained only on around 25,000 questions, many of which were taken from old Jeopardy shows. Former Jeopardy contestants and others also trained IBM Watson, and it played around 100 “rehearsal” matches where it was correct 71% of the time and won 65% such matches .
In 2016, researchers at Google’s DeepMind created AlphaGo that defeated the reigning world champion, Lee Sodol, in the game of Go. AlphaGo evaluated positions and selected moves using deep neural networks, which were trained by supervised learning using human expert moves, and by reinforcement learning from self-play. In 2017, Deep Mind researchers introduced AlphaGo Zero, which was solely based on reinforcement learning, without human data, guidance or domain knowledge, except for incorporating the rules of the game [94,105]. By playing 4.9 million games against itself, AlphaGo Zero improved and eventually won 100–0 against the previous champion, AlphaGo.
In 1980s, researchers at Carnegie Mellon University built the first autonomous car prototype but it had limited capabilities . In 2005, the U.S. Government (via DARPA) launched the “Urban Challenge” for autonomous cars to obey traffic rules and operate in an urban environment, and in 2009, researchers at Google built such a self-driving car. In 2015, Nevada, Florida, California, Virginia, Michigan and Washington, D.C. allowed the testing of autonomous cars on public roads , and in 2017, Waymo (a Google’s sister company) announced that it had begun testing driverless cars without any person in the driver’s position (but still somewhere inside the car) . Most autonomous car driving software is based on supervised learning and reinforcement learning techniques as well as computer vision and image processing.
In 2015, a research group led Joel Dudley at Mount Sinai Hospital in New York created a three-layer unsupervised deep learning network called Deep Patient. Researchers provided Deep Patient data worth several hundred variables (e.g., medical history, test results, doctor visits, drugs prescribed) for about 700,000 patients . The system was unsupervised and, yet it was able to discover patterns in the hospital data that indicated as to who was likely to get liver cancer soon. A more interesting aspect was that it could largely anticipate the onset of psychiatric disorders like schizophrenia. Since schizophrenia is notoriously difficult to predict even for psychiatrists, Dudley sadly remarked, “We can build these models, but we don’t know how they work.”
Commercial chatbots started with Siri, which was developed SRI’s Artificial Intelligence Center . Its speech recognition engine was later provided by Nuance Communications, and was released as an app in Apple iPhones in February 2010. Other commercial chatbots that were developed during 2011-17 include Microsoft’s Cortana, Xbox, Skype’s Translator, Amazon’s Alexa, Google’s Now and Allo, Baidu and iFlyTek voice search, and Nuance speech-based products . The following three humanoid robots are particularly interesting and there are several others in production or being sold, e.g., Milo, Ekso GT, Deka, and Moley :
Most of these robots use sophisticated control engineering, computer vision, and deep learning networks (specifically Long-Short Term Memory) but by and large, chatbot dialog still falls far short of human dialog and there are no accepted benchmarks for comparing them.
Robotic Process Automation (RPA) software is configured to execute steps that are followed by a human user while doing a specific task; such configuration is achieved using demonstrative steps, rather than coding it in a computer language. RPA’s aim is to provide a software robot that is a virtual worker who can be rapidly “trained” or configured by a business user in an intuitive manner, which is like training a new human user by an experienced colleague. Although the roots of this software go back to 1980s, practitioners started using it in a big way only around 2010. By itself, RPA software suffers several drawbacks , e.g.,
To overcome most of these drawbacks, during the last two years, RPA has been combined with machine learning and natural language processing systems to build more holistic automation systems.
In 1965, Moore observed that the number of transistors in an electronic circuit doubles approximately every year and he predicted that this rate of growth would continue for a decade . In 1975, he revised his prediction to doubling every two years .
The exponential increase in computing power, as well as reduction in size and cost, has had the largest effect on the field of AI
It is important to note that Moore’s law is not really a law, but a set of observations made by Dr. Gordon Moore who was the founder of Intel Corporation. In fact, in 2015, Moore himself said, “I see Moore’s law dying here in the next decade or so” , which is not surprising since the size of today’s transistors can be reduced by at most a factor of 4,900 before reaching the theoretical limit of one Silicon atom, which also provides a limitation on the size and speed of a perceptron. Nevertheless, this exponential increase in computing power, as well as reduction in size and cost, has had the largest effect on the field of AI.
As mentioned in the previous article , most AI algorithms require enormous computing power and by 2004, parallel and distributed computing became practical. Since electronic communication, storage and computing have become inexpensive and pervasive, many companies (e.g., Amazon, Microsoft, IBM, Google) are now selling computation power by the hour or even by the minute, which in turn is helping researchers and practitioners exploit parallel and distributed computing enormously and executing their algorithms on several thousand computers simultaneously (by using Hadoop, Spark and related frameworks).
Machine learning algorithms, especially deep learning algorithms, require enormous amount of data. For example, a supervised neural network with 50 input attributes (or variables) and one output perceptron and with three hidden layers containing 50 perceptrons each, has 10,050 connections, and this network may require hundred thousand or more labeled data points for training since each connection’s weight needs to be optimized. Fortunately, inexpensive and easily available hardware and network connectivity has allowed humans to produce more than 8 quadrillion Gigabytes (i.e., 8 zetta bytes) of data by 2017 . Many researchers and developers started using freely available data to create “open” databases for specific problems and started “crowd sourcing” for labeling this data. MNIST was the first such database created in 1998 and ImageNet has been the largest one that was created in 2011 [72,73]. ImageNet contains more than 14 million URLs of images of which more than 10 million have been hand-labeled to indicate what they represent.
Open source software allows the freedom for users to execute, modify and redistribute its copies with or without changes. Richard Stallman, a professor at Carnegie Mellon University, launched the Free Software Foundation in 1985. In 2002, Torch was the first such machine learning software but since then many others (e.g., Caffe, Theano, Keras, MXNet, DeepLearning4J, Tensorflow) have been introduced [118,119]. This has allowed researchers and practitioners to experiment immensely with open source software and build new algorithms, which if successful, are often made open source too.
According to our estimates, since 1950, more than 200,000 research articles have been written in AI and its subfields. Out of these, more than 125,000 have been published during 2008-2017 alone. Similarly, there has been a tremendous growth in industry-academia collaboration since 2008, which is leading to hyper-growth in building new AI solutions.
“History doesn’t repeat itself but if often rhymes,” is a quote attributed to Mark Twain and it seems to be true in AI with the enormous excitement that occurred in 1950s and again during the last seven years. In both cases, researchers got extremely enthusiastic with the hope of quickly creating AI machines that could mimic humans, and in both cases, this led to hyper-growth in AI research and development.
During 1950s and 1960s, seminal research was done in AI and many of its subfields born, whereas, in the current phase, powerful and tedious engineering as well as inexpensive and abundant computing led to more than 20 AI systems that are rivaling or beating humans. And just like the 1950s and 1960s, this has again created euphoria among researchers, developers, practitioners, investors and the public, which in turn has started a new hype cycle. We will discuss the characteristics of this hype cycle in the next article, “The Current Hype Cycle in Artificial Intelligence” .
|Playing Checkers||In 1952, Arthur Samuel (IBM) built programs to play checkers; Chinook was built by Jonathan Schaeffer and colleagues at the University of Alberta and it beat the reigning champion, Don Lafferty, in 1995 .|
|Playing Backgammon||In 1992, Gerald Tesauro (IBM) built a reinforcement learning program TD-Gammon that played Backgammon almost at the grand master level .|
|Playing Othello||In 1997, Logistello, a program that plays Othello, was built by Michael Buro (Univ. of Paderborn) and it beat the world champion Takeshi Murakami in 6:0 match .|
|Playing Chess||In 1997, IBM’s DeepBlue beat chess champion Gary Kasparov in 3.5:2.5 match ; chess programs on smartphones can now play at grand master level.|
|Playing Jeopardy!||In 2011, the IBM Watson system beat Brad Rutter and Ken Jennings in the quizshow Jeopardy! and won the first-place prize of $1 million .|
|Playing Atari games||In 2015, Google’s DeepMind built a reinforcement learning system to play 49 Atari games. This system achieved human level performance in most games (e.g., Breakout) but not in others (e.g., Montezuma’s Revenge) .|
|Playing Go||In 2016, Google’s AlphaGo beat the world champion, Lee Sodol, with 4:1; in 2017 AlphaGo Zero, played against itself 4.9 million games and then won 100–0 against AlphaGo .|
|Playing PACMAN||In 2017, Microsoft researchers created an AI system that reached PACMAN’s maximum
point value of 999,900 on Atari 2600 .
|Playing Poker||In 2017, Lengpudashi or “cold poker master,” a version of Libratus created by researchers at Carnegie Mellon University, beat four top poker professionals in a 20- day, 120,000-hand Heads-Up No-Limit Texas Hold’em competition .|
|Autonomous vehicle driving||In 2009, researchers at Google built a self-driving car that obeyed traffic rules and operated in an urban environment. In 2017, Waymo (a Google’s sister company) started testing driverless cars without any person in the driver’s position but still somewhere inside the car .|
|Predicting a biomolecular target||In 2012, Dahl and colleagues won the “Merck Molecular Activity Challenge” using deep neural networks to predict bio-molecular target of one drug .|
|Detecting toxic effects of chemicals||In 2014, Hochreiter’s group used deep learning to detect off-target and toxic effects of environmental chemicals in nutrients, household products and drugs and won the “Tox21 Data Challenge” of NIH, FDA and NCATS .|
|Predicting Schizophrenia||In 2015, Mount Sinai Hospital’s Deep Patient was able to discover patterns in the hospital data that indicated as to which people were likely to get liver cancer soon. Also, it could largely anticipate the onset of psychiatric disorders like schizophrenia, which is quite difficult even for psychiatrists to predict .|
|Detecting Skin Cancer||In 2017, Esteva and colleagues at Stanford University used a deep learning network that was trained on 129,450 clinical images arising from 2,032 different diseases and compared its performance against 21 dermatologists; their system classified skin cancer at the same level as dermatologists .|
|Face Recognition||In 2014, researchers at Facebook created DeepFace that identified human faces in digital images with an accuracy of 97.35%, thereby matching human visual recognition; it has a nine-layer neural net with over 120 million directed edges and was trained on four million images from Facebook .|
|Recognizing Handwritten Characters||In 2015, researchers at the Massachusetts Institute of Technology, New York University and the University of Toronto created a computer vision software using “Bayesian Program Learning” that outperformed humans in identifying handwritten characters based on a single example. This is important since neural networks require several thousand labeled data points .|
|Large Scale Visual Recognition||In 2015, researchers at Microsoft created software to classify objects from a set of 1,000 categories and accurately detect all instances of objects in 200 categories; it had to distinguish between objects such as bicycles and cars, both of which may seem to have only two wheels (when seen from a specific direction), and it was at least as good as humans in doing so .|
|Answering Questions Related to Wikipedia||In 2016, Stanford University researchers created Stanford Question Answering Dataset (SQuAD), where questions are posed by crowd-workers on a set of Wikipedia articles and answers are segments of text or come from a reading passage. Humans can achieve 91% accuracy in finding the correct answers in such datasets, whereas, AIbased algorithms achieved 82% in 2016 .|
|Speaker Recognition||In 2014, GoVivace deployed speaker identification system that searches for an individual among millions of speakers by using a single example recording of the individual’s voice. In 2016, HSBC started offering 15 million customers its biometric banking software to access banking accounts using voice .|
|Speech to Text Conversion||In 2017, Microsoft (and separately IBM) built a deep learning (LSTM) network that has more than 95% recognition accuracy on the “Switchboard corpus” on a vocabulary of 165,000 words, thereby, rivaling human performance .|
|Robots for stacking dishwashers||In 2016, Boston Dynamics (owned by Google) introduced a dog-shaped robot that can stack a dishwasher and fetch a soda can from the fridge; however, as some videos show “you may have to fight to get that can of soda” .|
|Robots to make people happy||In 2016, Aldebaran Robotics (owned by SoftBank) introduced Pepper, a humanoid robot intended “to make people happy”, enhance people’s lives, facilitate relationships, have fun with people and connect them with the rest of world. Pepper is currently being used as a receptionist at several offices in the UK and can identify visitors with the use of facial recognition, send alerts to meeting organizers and arrange for drinks to be made .|
|“Humanoid” Robot||In 2017, Honda provided a new version of ASIMO (Advanced Step in Innovative Mobility), which is the world’s most advanced bipedal humanoid robot. It has enhanced hand dexterity, ability to sign in both American and Japanese sign language, recognize human faces, climb stairs, hop, jump, balance on one foot, and transition seamlessly between walking and running .|