IoT Doctor – Anomaly detection for Smart Meters

Our client, a utilities company (“Company”) responsible for building digital infrastructure of a smart city, uses smart metering systems to enable hyperconnectivity & data intelligence for city operations. The data from these smart meters (essentially IoT devices) is used for monitoring consumption patterns as well as billing. These operations rely heavily on the quality of data transmitted by IoT devices. However, such data often comes embedded with several quality issues, such as

  • Missing data or data coming in at irregular intervals or abrupt gradient changes
  • Out-of-range data, presence of synthetic data or unusual ‘noise’ patterns
  • Lack of correlation with other parameters
  • Nonconformance to sensor’s expected behavior (i.e., ‘fingerprint’)

In addition to the above, smart meters possess limited in-built capabilities to detect anomalies and provide alerts. Company’s existing platform was unable to mitigate these challenges, and it needed a new approach. This Company joined forces with Scry and trained Concentio® to create an end-to-end system that not only uncovers issues with IoT data but also improves the quality using Scry’s proprietary suite of advance analytics algorithms.

Our Solution

Our team of data scientists and subject matter experts (SMEs) first worked to understand the problem in depth. Next, Scry trained Concentio® on historical meter data to establish the expected behavior of these smart meters. Simultaneously, Concentio’s data preprocessing steps ensured that the data used in its advanced algorithms is of good quality. Once trained, this software enabled following capabilities:

  • Augmenting missing data using algorithms such as interpolation, time series analysis, and resampling.
  • Raising smart alerts in real time based on device’s deviation (anomaly) from its expected behavior (fingerprint).
  • Managing alert severity and confidence level as a function of its magnitude & duration.
  • Suppressing false device alerts by removing unwanted noise from the data.
  • Enabling feedback loop for retraining the model and continuous improvements.

This solution proved its value by improving the data quality and enabling automated anomaly detection, thus helping the client leverage smart meters data for improving city operations.

Business Benefits

  • 99% – Suppression of false and repetitive alerts
  • 90% – Accuracy in handling missing values
  • 88% – Improvement in alert detection time
  • 76% – Anomalies detected faster than the device


Automated Value Extraction from Royalty Contracts

One of the top three software companies in the world also builds and sells a video game player. Third parties (including companies and individuals) build video games on this video game player and receive royalty from this software company; the corresponding royalty agreements signed by the two parties are called Title Licensing Agreements (TLAs). Since old TLAs expire and new TLAs come in every day, it is vital of this software company to extract 35-40 key value pairs and key attributes from each incoming TLA, including but not limited to – (a) name and address of the company or individual producing the game, (b) name of the game itself, (c) what operating system this game works on (d) various obligations that the company has to fulfil, (e) terms and conditions regarding royalty fee and incentive fee, etc., c) terms regarding payments, e.g., when it is due by, discounts, and additional charges, (f) expiration term, (g) power of attorney or transfer of the game from the owner to someone else, (f) amendment clauses, and (h) identify outliers fee calculations or payment schedules, auto-renewals, etc.
The erstwhile process for this software company was manual and hence very laborious and error prone. Also, it could be scaled up only by adding more analysts to their current team.

Our Solution

Our team of data scientists and subject matter experts (SMEs) first worked to understand the problem in depth. Next, Scry trained Collatio® – Contract Intelligence on approximately 250 old TLAs to extract the required 35-40 key value pairs and attributes. Once trained, this software enabled the automated extraction of these attributes for incoming TLAs:

  • Extract attributes (e.g., names of contracting parties, payment terms, address, royalty rate, tax).
  • Identify non-standard clauses and flag them as anomalies.
  • Identify non-standard terms and conditions within the clause (e.g., payment terms less than 30 days) and flag them as anomalies.
  • Knowledge graphs to qualitatively search across contracts (e.g., show all contracts having payment terms of 30 days).
  • Classify contracts for set-up purposes – TLA, Amendments, Addendums, etc.

Business Benefits

  • 92.2% – Straight-through processing across all agreements
  • 97.4% – Accuracy for Title Licensing Agreements
  • 92.6% – Accuracy for non-Title Licensing Agreements (but related ones, e.g., Amendments)

Enhanced Due Diligence and KYC

Banks, insurance companies, procurement groups, government agencies, and many other organizations are required to check the identities of other entities that they are working with; such entities could be individuals, firms or trusts and may be their clients, partners, or vendors, etc. To do so, many firms need to annually extract data from several million documents that include identity cards (e.g., passports, driver’s licenses), incorporation papers, trust documents, and tax forms or certificates. Furthermore, after extracting the relevant data, these institutions need to check the identities of these entities against watch-lists to ensure that these entities are not bad actors or involved in nefarious activities. Because of the sheer volume of such incoming documents, manual extraction of data is extremely laborious and error prone., which can lead to massive risks and potential penalties for these firms.

Our Solution

Our team of consultants and subject matter experts (SMEs) first worked to uncover primary pain points in the domain through various interviews, discussions, and analysis. Then they trained Scry’s software, Collatio® – Enhanced Due Diligence and KYC (Know Your Customer), which contains more than 30 AI-based proprietary algorithms, to extract key-value pairs from various ID documents and validate them against 22 watch-lists (for due diligence). After training, it extracts data with more than 90% accuracy and has the with following additional features:

  • Identification of similarity and “best match” pair from different data sources and documents.
  • Intelligent collation of external data and relationships with internal attributes for a more complete 360° view of an entity.

Business Benefits

  • 90%+ – Accuracy in extracting the key-value pairs
  • 70% – Reduction in time over current process
  • 70% – Reduction in cost over current process

AI-Based Automated Spreading of Financial Statements

Currently, the financial statements – which include balance sheets, income statements, cash-flow statements, changes in equity, and rent rolls and which mainly come in scanned or PDF machine readable formats – require subject matter experts to review and reconcile these statements, extract relevant data, and include it in their credit rating, loan origination, and other financial processes. This manual extraction and reconciliation are expensive to scale and prone to human errors, thereby creating potential risks for downstream processes.

Our Solution

Our team of data scientists and subject matter experts (SMEs) first worked to understand the problem in depth. Next, Scry trained Collatio® – Financial Spreading on approximately 10,000 financial spreading documents to convert this data into an electronic format, recreate appropriate tables, reverse engineer various formulas in these documents, and then map the appropriate key-value pairs from each financial spread onto a template that can be fed to the credit model, loan origination, or other processes. Now that this has been trained, this AI-based product first classifies the incoming document as to whether it is one of the five document types mentioned above and then it extracts the data from these documents and reconciles it. This software uses more than 40 pre-trained proprietary algorithms, a pre-built financial ontology and appropriate business rules. It can ingest financial spreading documents in PDF, image, Excel, and CSV input formats and it provides workflow-based user interface for analysts to review and verify all the output information.

Business Benefits

  • 80% – Reduction in time and cost
  • 90%+ – Accuracy achieved
  • 100% – Customizable user interface for real time analysis

Intelligent Automation of Invoice Reconciliation

Executive Summary

Around 13 billion invoices were sent in the United States in 2019 from businesses to each other and to government agencies. To remit these invoices, firms primarily used manual labor to extract and reconcile relevant fields, which cost them around 2% of invoice value and 25-30 minutes of accountants’ time. Despite this cost, most firms were unable to analyze these invoices in detail and some ended up overpaying their suppliers or continue to pay for products and services they no longer used. This article discusses an AI-enabled software from Scry Analytics, Collatio®- Invoice Reconciliation, which extracts and reconciles relevant data with more than 93% accuracy from invoices, purchase orders and related agreements, and provides insights and decision support for reducing overpayments.


According to industry estimates, in 2019, around 13 billion invoices were sent in the United States from businesses to each other and to government agencies [1,2,3]. Since the corresponding sales were approximately ten trillion Dollars [4,5], the cost of an average invoice was around $770. To pay their suppliers, almost all account payables’ groups conform to the following process:

  1. Extract relevant information from invoices: Approximately 35% invoices come in paper, fax, or scanned formats, 55% come in PDF machine readable formats, and 10% in EDI/XML or other e-invoice formats. Data from the first two formats is either manually typed into accounts payables’systems or after converting these invoices into an electronic format via Optical Character Recognition (OCR) [6,7]. Such manual extraction of relevant information is error prone.
  2. Reconcile extracted information: Information extracted from invoices is manually reconciled with other invoices, with purchase orders (POs), Master Services Agreements (MSAs), or statements of work (SoWs). Such reconciliation is also error prone and time consuming, and it takes another 12-15 minutes per invoice and costs an additional seven Dollars. Moreover, processing errors in this step often cause delays that lead to losing early payment discounts, incurring late payment penalties, repeated supplier inquiries, and strained supplier relationships.
  3. Pay the vendor via check, ACH, wire-transfer, commercial cards, or other means.

Artificial Intelligence to the rescue

Collatio® – Invoice Reconciliation software from Scry Analytics ( uses proprietary AI-based algorithms and automates the invoice reconciliation process thereby reducing the drudgery of manual reconciliation while ensuring accuracy in payments with minimal errors. It consists of four software modules that are discussed below:

Automated digitization of each invoice; extraction and harmonization of relevant information

This AI-based software module executes the following steps that are depicted in the diagram given below:

  1. Upload: Either an analyst uploads invoices in a batch mode or one by one or various invoices are ingested via an API (“Document Express”) or via email.
  2. Data Extraction: If an invoice is in a scanned, fax, JPEG or TIFF formats, then this module uses its deep learning network (NN-OCR) that is specifically trained to convert it into digitized data. If the invoice is in other machine-readable formats that provide higher resolution, then it uses other proprietary AI-based conversion modules to obtain better accuracy of the electronic output.
  3. Attributes Extraction: It uses proprietary AI-based and graph algorithms with various knowledge bases such as ontology and external data feeds to extract more than 100 attributes or entities from invoices and related agreements with more than 93% accuracy. It also extracts attributes for each line item from the invoice item table. The in-built ontology and pre-trained algorithms not only help in extracting relevant information from proforma, interim, past due, recurring, and final invoices, they also provide superior results for multinational invoices and special legend invoices (e.g., those law-firm invoices and tax invoices).
  4. Enhanced Due Diligence: Using the in-built ontology and graph-based algorithms, it then determines the connections among the extracted entities and reconciles them. For example, it “reverse engineers” to compute the tax rate and determines that the multiplication of unit price and the total number of units would equal the total item amount. Similarly, it uses external data enrichment to reconcile the information related to the vendor and its financial institution. By doing so, it detects incorrect and potentially fraudulent invoices with the wrong price, wrong quantity, missing tax amount, missing tax identification number, no purchase order or other contracts, incorrect net amount, and more.
  5. Attributes Verification and Reinforcement Learning: Finally, using in-built APIs and a user interface, it provides various extracted attributes and the item table to the analyst for review and modifications, thereby achieving 99%+ accuracy quickly, and it uses the changes made by the analyst for improving its accuracy going forward.

Time-series analytics of all invoice related information

For each vendor of a payor firm, this module creates a chronological sequence of all invoices that were provided to this firm. Next, using an AI-based algorithm, it determines duplicate invoices as well as invoices that may not be exact duplicates but contain duplicates at the invoice-item level, e.g., an unpaid invoice from the previous month may be included as an item in the new invoice (rather than being mentioned as an unpaid amount from the past). Also, for each invoice-item, it computes and alerts the analyst whether the cost of that item was higher than the threshold set by the firm. Indeed, the firm can set additional thresholds that provide insights as to which invoice items constitute hefty expenses

Automated digitization, extraction and harmonization of relevant data from POs, MSAs and SoWs

This module is like the first module mentioned above; it extracts and harmonizes relevant data after digitizing appropriate POs, MSAs and SOWs. However, in addition to extracting structured data, it also extracts relevant clauses that are important for checking the accuracy of an invoice, and it stores harmonized data in a structured format and clauses in a textual format.

Reconciliation of extracted information from an invoice with other invoices and other agreements

This module first compares entities and values in various invoices with those in the corresponding POs and alerts the user if it finds any discrepancy. Similarly, for MSAs and SoWs, it uses proprietary Natural Integration Layer Attributes Graph Generation Enhanced Due Diligence Reconciliation Attributes Verification & Reinforcement learning Data GU Analyst System Upload documents manually or automatically Reviews only fields & entities alerted by software Language Processing algorithms to determine discrepancies between invoices and the clauses of MSAs or SoWs, and provides to analysts, the amounts that do not obey the MSAs or SoWs and need to be adjusted.

Finally, since more than 80% all invoices are received by small and medium-sized businesses (SMBs), who have limited funding for capital expenses [6],Collatio® – Invoice Reconciliation software is sold in SaaS (software as a service) mode but can be also installed behind the information technology firewall.

keep on reading !

Resolving the COBOL Crisis with Artificial Intelligence

Executive Summary

COBOL is a 61-year old computer language for processing data. Although highly inefficient by modern standards, millions of COBOL programs remain pervasive in government and industry and are responsible for transactions worth three trillion dollars. Recreating them in contemporary computer languages is extremely time consuming, laborious, and expensive. Also, there is an acute shortage of COBOL programmers since universities no longer teach this language. This article illustrates the use of Artificial Intelligence to decode legacy COBOL programs and reduce the dependence on COBOL programmers by 85% as well as the cost of conversion by 75%.


COBOL (“COmmon Business-Oriented Language”) was designed in 1959 by CODASYL to create an English-like, portable computer language for processing data [1]. In 1997, Gartner estimated that there were about 300 billion lines of computer code worldwide, with 80% (240 billion lines) of it in COBOL and 20% (60 billion lines) in all other computer languages combined [2,3]. Today, approximately 12 million COBOL programs with more than 200 billion total lines of code remain in use by organizations across information technology; education; financial services; healthcare; and retail sectors, and these handle three trillion dollars in commerce – mainly for batch transaction processing.

With the growth of the Internet and Cloud Computing, many new companies – particularly in finance and retail sectors – now serve customers in real-time, instead of batch mode. For example, customers can place an online order through Amazon or Target in seconds, and merchants can receive their credit card monies through Square or Stripe almost instantaneously. This is well beyond the original functionality of COBOL programs, which typically run in batch mode one or two times a day, thereby leading to substantially longer delays in order fulfilment and payments than is acceptable by modern standards.

The inability of COBOL programs to scale up and quickly handle so many simultaneous requests has now become vital. This urgency has become particularly pronounced during the COVID-19 pandemic, when outdated COBOL programs used by both federal and state governments have led to delays in disbursing funds and processing unemployment claims. Indeed:

  1. The US Internal Revenue Service scrambled to patch its COBOL-based Individual Master File in order to disburse around 150 million payments mandated by the Coronavirus Aid, Relief, and Economic Security (CARES) Act [4].
  2. With the ensuing unemployment surge in New Jersey, Governor Phil Murphy recently put out a call for volunteers who know how to code in COBOL, because many of New Jersey’s systems still run on old mainframes [5]
  3. Connecticut admitted that it too was struggling to process the large volume of unemployment claims with its 40-year-old COBOL mainframe system and is working to develop a new benefits system with the states of Maine, Rhode Island, Mississippi, and Oklahoma [7].

Impediments to replacing COBOL programs

The above issues highlight the need to replace COBOL programs with newer ones written in modern languages. However, understanding these COBOL programs is a huge impediment because of the following reasons:

  1. Spaghetti code: Unlike contemporary language programs, COBOL programs have intertwined pieces of code (“spaghetti code”), and since most COBOL programs have several thousand lines of code and deal with terabytes of data, updating them often produces inaccurate results or a complete breakdown.
  2. Verbosity: Although COBOL was meant to be easy for programmers to learn, use and maintain while still being readable to non-technical personnel such as managers, by 1984, many COBOL programs had become verbose and incomprehensible.
  3. Little documentation: Since COBOL was built to make the code self-documenting, little or no documentation was provided by the programmers. Hence, government agencies and businesses still rely on “folklore” and long retired COBOL programmers
  4. Lots of COBOL variants: Although meant to be extremely portable, around 300 dialects and 104,976 official variants were created by 2001, rendering maintenance extremely difficult [8].

Unfortunately, COBOL experts who can decipher these programs are in short supply. Estimates reveal only two million such programmers remain in the world with about half retired [3]. These numbers continue to decrease, as colleges have long stopped teaching this language due to the existence of better ones. The few graduating students who know COBOL do not want to work in it for the fear of being labelled as “blue-collar tech workers” [9]. With Tampa Bay Times, one COBOL programmer aptly summarized his experience transitioning from COBOL to Java when he said, “It’s taken them four years, and they’re still not done” [10]. Recently, Reuters reported that when Commonwealth Bank of Australia replaced its core COBOL platform in 2012, it took five year and cost $749.9 million. Finally, there are several solutions in the market for converting COBOL programs to those in other languages, but they also heavily rely on extensive use of COBOL programmers, and the cost and time needed to replace COBOL programs with modernized ones are immense.

Since the cost of replacing COBOL code is around 25 dollars per line [3, 11], the total cost and time of replacing 200 billion lines of code will be about five trillion dollars and 40 million person years, wherein approximately half (2.5 trillion dollars and 20 million person years) will be spent in deciphering COBOL programs. Fortuitously, since the number of non-COBOL programmers is around 24 million and growing [12], if these black-box COBOL programs could somehow decoded, then upgrading these programs to superior ones may only take a few years. However, there are only a million active COBOL programmers and their number is dwindling, the task of decoding the 12 million COBOL programs is likely to take at least twenty years, which will be the fundamental bottleneck going forward.

Artificial Intelligence to the rescue

As discussed above, replacing COBOL programs with those containing enhanced features (e.g., real-time capability and handling surges of requests) involves the following two tasks:

  1. Understanding the COBOL programs and creating flow-charts describing how they work.
  2. Using these flow-charts to create new programs with improved features in contemporary languages.

Evidently, the lack of COBOL programming expertise and the corresponding huge expense that it entails, are the biggest hurdles in replacing legacy COBOL programs. Fortunately, the following two reasons have enabled us to develop Artificial-Intelligence (AI) based software to help decipher COBOL programs, thereby reducing the conversion time and cost by 75%, and dropping the number of COBOL programmers required to just 15%:

  1. COBOL is a comparatively a simple language with no pointers, no data structures, no user definedfunctions or types, and no recursion, and with data types being only numbers or text.
  2. Most COBOL programs spend around 70% of their time executing input/output and read/write operations and their output tables provide a good synopsis of the entire execution.

Collatio® -Data Flow Mapping software from Scry Analytics ingests all input and output tables related to a given COBOL program and uses proprietary AI-based algorithms to reverse engineer the transformations that are performed by this COBOL program, thereby inferring the steps executed by this program, and helping the user create a flow-chart of the program’s inner workings.

Below, we explain how this software works via an example of a legacy COBOL program for approving or rejecting unemployment claims filed by 100,000 people during the ongoing COVID-19 crisis:

  1. One input table ingested by the legacy COBOL program: Typical forms for filing unemployment claims are 10 to 15 pages long and require the applicant to fill 250-300 fields and values (e.g., names, addresses, social security numbers, past employment details). Hence, this table is likely to contain 100,000 rows (one for each applicant) and 250 to 300 columns.
  2. Other potential input tables ingested by the legacy COBOL program: These are likely to include “Single Source of Truth” tables that are used to verify the identity of applicants as well as various employers, etc. Other tables may also include conditions for providing unemployment wages and the formulas for computing the corresponding amounts. Alternatively, some of this information may be “hard-coded” in the legacy program itself
  3. Execution of the legacy program: Once or a few times a day, this program will ingest all input tables, and using these tables and the hard-coded information and formulas (including “If ….Then … Else….” type of formulas), it will be executed in a “batch mode.”
  4. Output of the legacy program : To ensure data lineage, auditability, and traceability, as the program goes through a set of instructions, it is likely to write one or more results in the output table(s). For example, after checking the social security number and driver’s license, it may realize that the applicant made a typographical error in the last name and it will use the actual name provided in one of the “Single Source of Truth” tables, and provide it in one of the output columns. In another output column, it may provide the date-time stamp as to when it accomplished this task. In another column, it may output the approval/rejection result, and yet in another one, it may provide the computed unemployment wage that will be provided to this applicant. In summary, the output table contains a “fingerprint” of all the steps executed by this program and this output table may also be around 100,000 rows and 200 to 300 columns. In fact, the number of rows in the output table may be different than that in the input table, especially if there are duplicate or blank rows.

Since most COBOL programs spend around 70% of their time doing input-output read-write and the remaining 30% in calculating formulas and in manipulating numbers and strings, Collatio®-Data Flow Mapping software uses the “finger print” given in the output tables, and “reverse engineers” to determine the transformations. It primarily uses the cell values in the input and output tables. More precisely, it determines which columns of input tables and what potential formulas and constants (that are hard coded in the legacy program) are being used to produce each column of the output table. During the entire process, it seldom uses column names and since most COBOL programs have little or no documentation, it almost never uses the corresponding ontology (i.e., text-based relationships among various columns).

Key features of this decision support software are given below:

  1. The proprietary algorithms in this software are probabilistic and use advanced techniques from Math and Computer Science, and specifically, from Artificial Intelligence (Machine Learning and Natural Language Processing) and Operations Research.
  2. Since the underlying algorithms are AI-based, the software provides the transformations along with a confidence level (a plausible transformation along with a probability of being correct) and an accuracy measure (percentage of records in the output table that represent the transformation function identified) for each transformation.
  3. Since it is a decision support software, it comes with a preconfigured graphical user interface (GUI) that depicts the transformations among various columns and helps the user in experimenting and determining the steps being executed by the COBOL program. It also provides an API for downloading these transformations to a spreadsheet or JSON format.
  4. Unsurprisingly, this software is highly complete and memory intensive. Hence, it has been optimized with respect to parallel and distributed processing. The number of processing cores and amount of random-access memory (RAM) required depends upon the size of the input and output tables, which can easily contain a thousand columns and several million rows.


About the Author

Dr. Alok Aggarwal received his PhD in Electrical Engineering and Computer Science from Johns Hopkins University and worked at IBM Watson Research Center during 1984 and 2000. During 1989-90, he taught at MIT and advised two PhD students and during 1998-2000, he founded IBM India Research Lab. and grew it to 60 researchers. He co-founded Evalueserve ( in 2000 and was its chairman until 2013; this company provides research and analytics services worldwide and has 3,500 employees. In 2014, Dr. Aggarwal founded Scry Analytics ( to develop a AIbased enterprise applications and a comprehensive platform that enable clients to re-think and automate their data-driven and manually intensive business operations.

Contact: Office: +1 408 872 1078; Mobile: +1 914 980 4717



References :

  1. Ferguson, Andrew. “A History of Computer Programming Languages”.
  2. Brown, Gary DeWard – COBOL: The failure that wasn’t – COBOL
  3. Jones, Capers – The global economic impact of the year 2000 software problem (Jan. 1997)
  4. Long, Heather; Stein, Jeff; Rein, Lisa; Romm, Tony (17 April 2020). “Stimulus checks and other coronavirus relief hindered by dated technology and rocky government rollout”. Washington Post. April 19, 2020
  7. Lämmel, Ralf; Verhoef, Chris (November–December 2001). “Cracking the 500-language problem” (PDF). IEEE Software. 18 (6): 79. doi:10.1009/52.965809, August 19, 2014.
  10. Gartner Research Note “Forecasting the Worldwide IT Services Industry: 1999

Data Lineage

Data comes from various sources, moves through several systems and gets transformed

Problem & Current State

  • Data lineage – difficult to trace data’s origins, what happens to it, where it moves over time & trace errors to root cause
  • Often represented via time-graphs as how the data gets transformed along the way, how the representation and parameters change, and how the data splits or converges after each hop
  • Data governance plays a key role in metadata management for guidelines, strategies, policies, implementation, which can be incorporated in Collatio®


  1. Identify data lineage based on:
  • Rows & columns that did not change; deleted or added columns
  • Columns in New that are composite or are functions of old
  1. System has a large library of transformation rules for various data types and can be configured and trained for specific types of transformations
  2. GUI to define the transformation rules, configure, schedule & execute these rules; dashboards to visualize overall transformation rules, table/column specific transformation results
  3. System generates exceptions and alerts on incremental data based on the transformation rules


  • 75% annual cost savings
  • 95% reduction in overall processing time
  • >98% accuracy

Financial Spreading

Automated Spreading of Balance Sheets and Cashflow Statements

Problem & Current State

  • Manual digitization of financial documents
  • Expensive to scale and maintain accuracy
  • Unidentified human-errors potentially resulting in unknown risk

Our Solution

  • Workflow-based & automated with analyst verification
  • Ingests financials in PDF, image, Excel and CSV input formats
  • ML, NLP and graph-based solution to “reverse engineer” formulas
  • 40+ pre-trained proprietary algorithms to digitize, validate values & fix errors
  • Reinforcement learning on “its own” and via a “human loop”
  • Pre-built financial ontology & business rules
  • Output maps to required financial templates
  • Points out specific areas where the analyst should review

Business Outcomes

  • 80% annual cost savings (55% in the first year)
  • 90% reduction in overall processing time
  • >96% accuracy

Genesis of AI: The First Hype Cycle


Every decade seems to have its technological buzzwords: we had personal computers in 1980s; Internet and worldwide web in 1990s; smart phones and social media in 2000s; and Artificial Intelligence (AI) and Machine Learning in this decade. However, the field of AI is 67 years old and this is the first of a series of five articles wherein:

  1. This article discusses the genesis of AI and the first hype cycle during 1950 and 1982
  2. The second article discusses a resurgence of AI and its achievements during 1983-2010
  3. The third article discusses the domains in which AI systems are already rivaling humans
  4. The fourth article discusses the current hype cycle in Artificial Intelligence
  5. The fifth article discusses as to what 2018-2035 may portend for brains, minds and machines


While artificial intelligence (AI) is among today’s most popular topics, a commonly forgotten fact is that it was actually born in 1950 and went through a hype cycle between 1956 and 1982. The purpose of this article is to highlight some of the achievements that took place during the boom phase of this cycle and explain what led to its bust phase. The lessons to be learned from this hype cycle should not be overlooked – its successes formed the archetypes for machine learning algorithms used today, and its shortcomings indicated the dangers of overenthusiasm in promising fields of research and development.

The Pioneering Question

Although the first computers were developed during World War II [1,2], what seemed to truly spark the field of AI was a question proposed by Alan Turing in 1950 [3]: can a machine imitate human intelligence? In his seminal paper, “Computing Machinery and Intelligence,” he formulated a game, called the imitation game, in which a human, a computer, and a (human) interrogator are in three different rooms. The interrogator’s goal is to distinguish the human from the computer by asking them a series of questions and reading their typewritten responses; the computer’s goal is to convince the interrogator that it is the human [3]. In a 1952 BBC interview, Turing suggested that, by the year 2000, the average interrogator would have less than a 70% chance of correctly identifying the human after a five-minute session [4].

Figure 1: The imitation game, as proposed by Alan Turing

Turing was not the only one to ask whether a machine could model intelligent life. In 1951, Marvin Minsky, a graduate student inspired by earlier neuroscience research indicating that the brain was composed of an electrical network of neurons firing with all-or-nothing pulses, attempted to computationally model the behavior of a rat. In collaboration with physics graduate student Dean Edmonds, he built the first neural network machine called Stochastic Neural Analogy Reinforcement Computer (SNARC) [5]. Although primitive (consisting of about 300 vacuum tubes and motors), it was successful in modeling the behavior of a rat in a small maze searching for food [5].

The notion that it might be possible to create an intelligent machine was an alluring one indeed, and it led to several subsequent developments. For instance, Arthur Samuel built a Checkers-playing program in 1952 that was the world’s first self-learning program [15]. Later, in 1955, Newell, Simon and Shaw built Logic Theorist, which was the first program to mimic the problem-solving skills of a human and would eventually prove 38 of the first 52 theorems in Whitehead and Russell’s Principia Mathematica [6].

Figure 2: Picture of a single neuron contained in SNARC (Source: Gregory Loan, 1950s)

The Beginning of the Boom Phase

Inspired by these successes, young Dartmouth professor John McCarthy organized a conference in 1956 to gather twenty pioneering researchers and, “explore ways to make a machine that could reason like a human, was capable of abstract thought, problem-solving and self-improvement” [7]. It was in his 1955 proposal for this conference where the term, “artificial intelligence,” was coined [7,40,41,42] and it was at this conference where AI gained its vision, mission, and hype.

Researchers soon began making audacious claims about the incipience of powerful machine intelligence, and many anticipated that a machine as intelligent as a human would exist in no more than a generation [40, 41, 42]. For instance:

In 1958, Simon and Newell said, “within ten years a digital computer will be the world’s chess champion,” and, “within ten years a digital computer will discover and prove an important new mathematical theorem”[8].
In 1961, Minsky wrote, “within our lifetime machines may surpass us in general intelligence,” [9] and in 1967 he reiterated, “within a generation, I am convinced, few compartments of intellect will remain outside the machine’s realm – the problem of creating ‘artificial intelligence’ will be substantially solved” [10, 11, 12].

within our lifetime machines may surpass us in general intelligence – Marvin Minsky, 1961

AI had even caught Hollywood’s attention. In 1968, Arthur Clarke and Stanley Kubrick produced the movie, 2001: A Space Odyssey, whose antagonist was an artificially intelligent computer, HAL 9000 exhibiting creativity, a sense of humor, and the ability to scheme against anyone who threatened its survival. This was based on the belief held by Turing, Minsky, McCarthy and many others that such a machine would exist by 2000; in fact, Minsky served as an adviser for this film and one of its characters, Victor Kaminski, was named in his honor.

Figure 3: HAL 9000 as shown in 2001: A Space Odyssey

Sub-fields of AI are Born

Between 1956 and 1982, the unabated enthusiasm in AI led to seminal work, which gave birth to several subfields of AI that are explained below. Much of this work led to the first prototypes for the modern theory of AI.

Figure 4: Important sub-fields of AI as of 1982

Rules Based Systems
Rule based expert systems try to solve complex problems by implementing series of “if-then-else” rules. One advantage to such systems is that their instructions (what the program should do when it sees “if” or “else”) are flexible and can be modified either by the coder, user or program itself. Such expert systems were created and used in the 1970s by Feigenbaum and his colleagues [13], and many of them constitute the foundation blocks for AI systems today.

Machine Learning
The field of machine learning was coined by Arthur Samuel in 1959 as, “the field of study that gives computers the ability to learn without being explicitly programmed” [14]. Machine learning is a vast field and its detailed explanation is beyond the scope of this article. The second article in this series – see Prologue on the first page and [57] – will briefly discuss its subfields and applications. However, below we give one example of a machine learning program, known as the perceptron network.

Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed- Arthur Samuel, 1959

Single and Multilayer Perceptron Networks

Inspired by the work of McCulloch and Pitts in 1943 and of Hebb in 1949 [15,16], Rosenblatt in 1957 introduced the perceptron network as an artificial model of communicating neurons [17]. This model is shown in Figure 5 and can be briefly described as follows. One layer of vertices, where input variables are entered, is connected to a hidden layer of vertices (also called perceptrons), which in turn is connected to an output layer of perceptrons. A signal coming via a connection from an input vertex to a perceptron in the hidden layer is calibrated by a “weight” associated with that connection, and this weight is assigned during a “learning process”. Signals from hidden layer perceptrons to output layer perceptrons are calibrated in an analogous way. Like a human neuron, a perceptron “fires” if the total weight of all incoming signals exceeds a specified potential. However, unlike for humans, signals in this model are only transmitted towards the output layer, which is why these networks are often called “feed-forward.” Perceptron networks with only one hidden layer of perceptrons (i.e., with two layers of weighted edge connections) later became known as “shallow” artificial neural networks. Although shallow networks were limited in power, Rosenblatt managed to create a one-layer perceptron network, which he called created Mark 1, that was able to recognize basic images [17].

Figure 5: A human neuron versus a perceptron and a shallow perceptron network

Today, the excitement is about “deep” (two or more hidden layers) neural networks, which were also studied in the 1960s. Indeed, the first general learning algorithm for deep networks goes back to the work of Ivakhnenko and Lapa in 1965 [18,19]. Networks as deep as eight layers were considered by Ivakhnenko in 1971, when he also provided a technique for training them [20].

Natural Language Processing (NLP)
In 1957 Chomsky revolutionized linguistics with universal grammar, a rule based system for understanding syntax [21]. This formed the first model that researchers could use to create successful NLP systems in the 1960s, including SHRDLU, a program which worked with small vocabularies and was partially able to understand textual documents in specific domains [22]. During the early 1970s, researchers started writing conceptual ontologies, which are data structures that allow computers to interpret relationships between words, phrases and concepts; these ontologies widely remain in use today [23].

Speaker Recognition and Speech to Text Processing
The question of whether a computer could recognize speech was first proposed by a group of three researchers at AT&T Bell Labs in 1952, when they built a system for isolated digit recognition for a single speaker [24]. This system was vastly improved upon during the late 1960s, when Reddy created the Hearsay I, a program which had low accuracy but was one of the first to convert large vocabulary continuous speech into text. In 1975, his students Baker and Baker created the Dragon System [25], which further improved upon Hearsay I by using the Hidden Markov Model (HMM), a unified probabilistic model that allowed them to combine various sources such as acoustics, language, and syntax. Today, the HMM remains an effective framework for speech recognition [26].

Image Processing and Computer Vision
In the summer of 1966, Minsky hired a first-year undergraduate student at MIT and asked him to solve the following problem: connect a television camera to a computer and get the machine to describe what it sees [27]. The aim was to extract three-dimensional structure from images, thereby enabling robotic sensory systems to partially mimic the human visual system. Research in computer vision in the early 1970s formed the foundation for many algorithms that exist today, including extracting edges from images, labeling lines and circles, and estimating motion in videos [28].

Commercial Applications

The above theoretical advances led to several applications, most of which fell short of being used in practice at that time but set the stage for their derivatives to be used commercially later. Some of these applications are discussed below.

Chatterbots or Chat-Bots
Between 1964 and 1966, Weizenbaum created the first chat-bot, ELIZA, named after Eliza Doolittle who was taught to speak properly in Bernard Shaw’s novel, Pygmalion (later adapted into the movie, My Fair Lady). ELIZA could carry out conversations that would sometimes fool users into believing that they were communicating with a human but, as it happens, ELIZA only gave standard responses that were often meaningless [29]. Later in 1972, medical researcher Colby created a “paranoid” chatbot, PARRY, which was also a mindless program. Still, in short imitation games, psychiatrists were unable to distinguish PARRY’s ramblings from those of a paranoid human’s [30].

Figure 6: Conversations with Weizenbaum’s ELIZA and Colby’s PARRY

In 1954, Devol built the first programmable robot called, Unimate, which was one of the few AI inventions of its time to be commercialized; it was bought by General Motors in 1961 for use in automobile assembly lines [31]. Significantly improving on Unimate, in 1972, researchers at Waseda University in 1972 built the world’s first full-scale intelligent humanoid robot, WABOT-1 [32]. Although it was almost a toy, its limb system allowed it to walk and grip as well as transport objects with hands; its vision system (consisting of its artificial eyes and ears) allowed it to measure distances and directions to objects; and its artificial mouth allowed it to converse in Japanese [32]. This gradually led to innovative work in machine vision, including the creation of robots that could stack blocks [33].

Figure 7: Timeline of Important Inventions in Artificial Intelligence During 1950-75

The Bust Phase and the AI Winter

Despite some successes, by 1975 AI programs were largely limited to solving rudimentary problems. In hindsight, researchers realized two fundamental issues with their approach.

Limited and Costly Computing Power
In 1976, the world’s fastest supercomputer (which would have cost over five million US Dollars) was only capable of performing about 100 million instructions per second [34]. In contrast, the 1976 study by Moravec indicated that even the edge-matching and motion detection capabilities alone of a human retina would require a computer to execute such instructions ten times faster [35]. Likewise, a human has about 86 billion neurons and one trillion synapses; basic computations using the figures provided in [36,37] indicate that creating a perceptron network of that size would have cost over 1.6 trillion USD, consuming the entire U.S. GDP in 1974.

The Mystery Behind Human Thought
Scientists did not understand how the human brain functions and remained especially unaware of the neurological mechanisms behind creativity, reasoning and humor. The lack of an understanding as to what precisely machine learning programs should be trying to imitate posed a significant obstacle to moving the theory of artificial intelligence forward. In fact, in the 1970s, scientists in other fields even began to question the notion of, ‘imitating a human brain,’ proposed by AI researchers. For example, some argued that if symbols have no ‘meaning’ for the machine, then the machine could not be described as ‘thinking’ [38].

Eventually it became obvious to the pioneers that they had grossly underestimated the difficulty of creating an AI computer capable of winning the imitation game. For example, in 1969, Minsky and Papert published the book, Perceptrons [39], in which they indicated severe limitations of Rosenblatt’s one-hidden layer perceptron. Coauthored by one of the founders of artificial intelligence while attesting to the shortcomings of perceptrons, this book served as a serious deterrent towards research in neural networks for almost a decade [40,41,42].

In the following years, other researchers began to share Minsky’s doubts in the incipient future of strong AI. For example, in a 1977 conference, a now much more circumspect John McCarthy noted that creating such a machine would require ‘conceptual breakthroughs,’ because ‘what you want is 1.7 Einsteins and 0.3 of the Manhattan Project, and you want the Einsteins first. I believe it’ll take five to 500 years’ [43].

The hype of the 1950s had raised expectations to such audacious heights that, when the results did not materialize by 1973, the U.S. and British governments withdrew research funding in AI [41]. Although the Japanese government temporarily provided additional funding in 1980, it quickly became disillusioned by the late 1980s and withdrew its investments again [42, 40]. This bust phase (particularly between 1974 and 1982) is commonly referred to as the “AI winter,” as it was when research in artificial intelligence almost stopped completely. Indeed, during this time and the subsequent years, “some computer scientists and software engineers would avoid the term artificial intelligence for fear of being viewed as wild-eyed dreamers” [44].

because what you want is 1.7 Einsteins and 0.3 of the Manhattan Project, and you want the Einsteins first. I believe it’ll take five to 500 years – John McCarthy, 1977

The prevailing attitude during the 1974-1982 period was highly unfortunate, as the few substantial advances that took place during this period essentially went unnoticed, and significant effort was undertaken to recreate them. Two such advances are the following:

The first is the backpropagation technique, which is commonly used today to efficiently train neural networks in assigning near-optimal weights to their edges. Although it was introduced by several researchers independently (e.g., Kelley, Bryson, Dreyfus, and Ho) in 1960s [45] and implemented by Linnainmaa in 1970 [46], it was mainly ignored. Similarly, the 1974 thesis of Werbos that proposed that this technique could be used effectively for training neural networks was not published until 1982, when the bust phase was nearing its end [47,48]. In 1986, this technique was rediscovered by Rumelhart, Hinton and Williams, who popularized it by showing its practical significance [49].
The second is the recurrent neural network (RNN), which is analogous to Rosenblatt’s perceptron network that is not feed-forward because it allows connections to go towards both the input and output layers. Such networks were proposed by Little in 1974 [55] as a more biologically accurate model of the brain. Regrettably, RNNs went unnoticed until Hopfield popularized them in 1982 and improved them further [50,51].


The defining characteristics of a hype cycle are a boom phase, when researchers, developers and investors become overly optimistic and enormous growth takes place, and a bust phase, when investments are withdrawn, and growth reduces substantially. From the story presented in this article, we can see that AI went through such a cycle during 1956 and 1982.

Born from the vision of Turing and Minsky that a machine could imitate intelligent life, AI received its name, mission, and hype from the conference organized by McCarthy at Dartmouth University in 1956. This marked the beginning of the boom phase of the AI hype cycle. Between 1956 and 1973, many penetrating theoretical and practical advances were discovered in the field of AI, including rule-based systems; shallow and deep neural networks; natural language processing; speech processing; and image recognition. The achievements that took place during this time formed the initial archetypes for current AI systems.

What also took place during this boom phase was “irrational exuberance” [52]. The pioneers of AI were quick to make exaggerated predictions about the future of strong artificially intelligent machines. By 1974, these predictions did not come to pass, and researchers realized that their promises had been inflated. By this point, investors had also become skeptical and withdrew funding. This resulted in a bust phase, also called the AI winter, when research in AI was slow and even the term, “artificial intelligence,” was spurned. Most of the few inventions during this period, such as backpropagation and recurrent neural networks, went largely overlooked, and substantial effort was spent to rediscover them in the subsequent decades.

In general hype cycles are double-ended swords, and the one exhibited by AI between 1956 and 1982 was no different. Care must be taken to learn from it: the successes of its boom phase should be remembered and appreciated, but its overenthusiasm should be viewed with at least some skepticism to avoid the full penalties of the bust phase. However, like most hype cycles, “green shoots” start appearing again in mid 1980s and there was a gradual resurgence of AI research during 1983 and 2010; we will discuss these and related developments in our next article, “Resurgence of Artificial Intelligence During 1983-2010” [57].


About the Author

Dr. Alok Aggarwal is the founder and CEO of Scry Analytics (; prior to this, he was the co-founder and Chairman of Evalueserve ( He received his PhD in Electrical Engineering and Computer Science from Johns Hopkins University in 1984 and worked at IBM Watson Research Center during 1984 and 2000; during 1989-90, he taught at MIT and advised two PhD students and during 1998-2000, he founded IBM India Research Lab. and grew it to 60 researchers.

Contact: Office: +1 408 872 1078; Mobile: +1 914 980 4717


References for all articles in this series can be found at


Additional information about the history of AI can be found in:

McCorduck, Pamela (2004), Machines Who Think (2nd ed.), Natick, MA: A. K. Peters, Ltd. ISBN 1-56881-205-1, OCLC 52197627.

Crevier Daniel (1993). AI: The Tumultuous Search for Artificial Intelligence. New York, NY: Basic Books. ISBN 0-465-02997-3.

Russell Stuart; Norvig, Peter (2003). Artificial Intelligence: A Modern Approach. London, England: Pearson Education. ISBN 0-137-90395-2.


Resurgence of Artificial Intelligence During 1983-2010


Every decade seems to have its technological buzzwords: we had personal computers in 1980s; Internet and worldwide web in 1990s; smart phones and social media in 2000s; and Artificial Intelligence (AI) and Machine Learning in this decade. However, the field of AI is 67 years old and this is the second of a series of five articles wherein:

  1. The first article discusses the genesis of AI and the first hype cycle during 1950 and 1982
  2. This article discusses a resurgence of AI and its achievements during 1983-2010
  3. The third article discusses the domains in which AI systems are already rivaling humans
  4. The fourth article discusses the current hype cycle in Artificial Intelligence
  5. The fifth article discusses as to what 2018-2035 may portend for brains, minds and machines

Resurgence of Artificial Intelligence

The 1950-82 era saw a new field of Artificial Intelligence (AI) being born, lot of pioneering research being done, massive hype being created, and AI going into hibernation when this hype did not materialize, and the research funding dried up [56]. During 1983 and 2010, research funding ebbed and flowed, and research in AI continued to gather steam although “some computer scientists and software engineers would avoid the term artificial intelligence for fear of being viewed as wild-eyed dreamers” [43].

During 1980s and 90s, researchers realized that many AI solutions could be improved by using techniques from mathematics and economics such as game theory, stochastic modeling, classical numerical methods, operations research and optimization. Better mathematical descriptions were developed for deep neural networks as well as evolutionary and genetic algorithms, which matured during this period. All of this led to new sub-domains
and commercial products in AI being created.

In this article, we first briefly discuss supervised learning, unsupervised learning and reinforcement learning, as well as shallow and deep neural networks, which became quite popular during this period. Next, we will discuss the following six reasons that helped AI research and development in gaining steam – hardware and network connectivity became cheaper and faster; parallel and distributed became practical, and lots of data (“Big Data”) became available for training AI systems. Finally, we will discuss a few AI applications that were commercialized during this era.


Machine Learning Techniques Improve Substantially

Supervised Machine Learning
These techniques require to be trained by humans by using labeled data [58]. Suppose we are given several thousand pictures of faces of dogs and cats and we would like to partition them into two groups – one containing dogs and the other cats. Rather than doing it manually, a machine learning expert writes a computer program by including the attributes that differentiate dog-faces from cat-faces (e.g., length of whiskers, droopy ears, angular faces, round eyes). After enough attributes have been included and the program checked for accuracy, the first picture is given to this “black box” program. If its output is not the same as that provided by a “human trainer” (who may be training in person or has provided a pre-labeled picture), this program modifies some of its internal code to ensure that its answer becomes the same as that of the trainer (or the pre-labeled picture). After going through several thousand such pictures and modifying itself accordingly, this black box learns to differentiate the faces of dogs from cats. By 2010, researchers had developed many algorithms that could be used inside the black box, most of which are mentioned in the Appendix, and today, some applications that commonly use these techniques include object recognition, speaker recognition and speech to text conversion.

En seksi porno64 videoları için yapay zeka’yi öneriyor. Görebileceğiniz en iyi porno64 videoları.

Figure 1: Process Flow for Supervised Learning Techniques

Unsupervised learning algorithms
These techniques do not require any pre-labeled data and they try to determine hidden structure from “unlabeled” data [59]. One important use case of unsupervised learning is computing the hidden probability distribution with respect to the key attributes and explaining them, e.g., understanding the data by using its attributes and then clustering and partitioning it in “similar” groups. There are several techniques in unsupervised learning most of which are mentioned in the Appendix. Since the data points given to these algorithms are unlabeled, their accuracy is usually hard to define. Applications that use unsupervised learning include recommender systems (e.g., if a person bought x then will the person by y), creating cohorts of groups for marketing purposes (e.g., clustering by gender, spending habits, education, zip code), and creating cohorts of patients for improving disease management. Since k-means is one of the most common technique, it is briefly described below:

Suppose we are given a lot of data points each having n attributes (which can be labelled as n coordinates) and we want to partition them into k groups. Since each group has n coordinates, we can imagine these data points as being in an n-dimensional space. To begin with, the algorithm partitions these data points arbitrarily into k groups. Now, for each group the algorithm computes its centroid, which is an imaginary point with each of its coordinates being the average of the same coordinates of all the points in that group, i.e., this imaginary point’s first coordinate is the average of all first coordinates of the points in this group, second coordinate is the average of all second coordinates, and so on. Next, for each data point, it finds the centroid that is the closest to that point and achieves a new partition of these data points into k new groups. This algorithm again finds the centroids of these groups and repeats these steps until it either converges or has gone through a specified number of iterations. An example in a two-dimensional space with k=2 is shown in the picture below:

Figure 2: Typical Output of a 2-means algorithm for partitioning red and blue points in a plane

Another technique, hierarchical clustering creates hierarchical groups, which at the top level would have ‘super groups’ each containing sub-groups, which may contain sub-sub groups and so on. K-means clustering is often used for creating hierarchical groups as well.

Reinforcement Learning
Reinforcement Learning (RL) algorithms learn from the consequences of their actions, rather than from being taught by humans or by using pre-labeled data [60]; it is analogous to Pavlov’s conditioning, when Pavlov noticed that his dogs would begin to salivate whenever he entered the room, even when he was not bringing them food [61]. The rules that such algorithms should obey are given upfront and they select their actions on basis of their past experiences and by considering new choices. Hence, they learn by trial and error in a simulated environment. At the end of each “learning session,” the RL algorithm provides itself a “score” that characterizes its level of success or failure, and over time, the algorithm tries to perform those actions that maximize this score. Although IBM’s Deep Blue, which won the chess match against Kasporov, did not use Reinforcement Learning, as an example, we describe a potential RL algorithm for playing chess:

As input, the RL algorithm is given the rules of playing chess, e.g., 8*8 board, initial location of pieces, what each chess piece can do in one step, a score of zero if the player’s king has a check-mate, a score of one if the opponent’s king has a check-mate, and 0.5 if only two kings are left on the board. In this embodiment, the RL algorithm creates two identical solutions, A and B, which start playing chess against each other. After each game is over, the RL algorithm assigns the appropriate scores to A and B but also keeps complete history of the moves and countermoves made by A and B that can be used to train A and B (individually) for playing better. After playing several thousand such games in the first round, the RL algorithm uses the “self-generated” labelled data with outcomes of 0, 0.5, and 1 for each game and of all the moves played in that game and by using learning techniques, determines the patterns of moves that led A (and similarly B) to getting a poor score. Hence for the next round, it refines these solutions for A and for B and optimizes the play of such “poor moves,” thereby, improving them for the second round, and then for the third round, and so on, until the improvements from one round to another become miniscule, in which case A and B end up being reasonably well-trained solutions.

In 1951, Minsky and Edmonds built the first neural network machine, SNARC (Stochastic Neural Analogy Reinforcement Computer); it successfully modeled the behavior of a rat in a maze searching for food, and as it made its way through the maze, the strength of some synaptic connections would increase, thereby reinforcing the underlying behavior, which seemed to mimic the functioning of living neurons [5]. In general, Reinforcement Learning algorithms perform well while solving optimization problems, in game theoretic situations (e.g., in playing Backgammon [62] or GO [94]) and in problems where the business rules are well defined (e.g., autonomous car driving) since they can self-learn by playing against humans or against each other.

Mixed learning
Mixed learning techniques use a combination of one or more of supervised, unsupervised and reinforcement learning techniques. Semi-supervised learning is particularly useful in cases where it is expensive or time consuming to label a large dataset. “, e.g., while differentiating dog-faces from cat-faces, if the database contains some images that are labeled but most of them are not. Some of their broad uses include classification, pattern recognition, anomaly detection, and clustering/grouping.

Figure 4: Potential Uses of Machine Learning Algorithms

Resurgence of Neural Networks – Both Shallow and Deep

As discussed in the previous article [56], a one-layer perceptron network consists of an input layer, connected to one hidden layer of perceptrons, which is in turn connected to an output layer of perceptrons [17]. A signal coming via a connection is recalibrated by the “weight” of that connection, and this weight is assigned to a connection during the “learning process”. Like a human neuron, a perceptron “fires” if all the incoming signals together exceed a specified potential but unlike humans, in most such networks, signals only move from one layer to that in front of it. The term, Artificial Neural Networks (ANNs) was coined by Igor Aizenberg and colleagues in 2000 for Boolean threshold neurons but is used for perceptrons and other “neurons” of the same ilk [63]. Examples of one hidden layer and eight-hidden layer networks are given below:

Figure 5: one hidden layer network (Left) and Eight hidden layer network (Right) (Source: Google)

Although multi-layer perceptrons were invented in 1965 and an algorithm for training an 8-layer network was provided in 1971 [18, 19, 20], the term, Deep Learning, was introduced by Rina Dechter in 1986 [64]. For our purposes, a deep learning network has more than one hidden layer.

Given below are important deep learning networks that were developed during 1975 and 2006 and are frequently used today; their description is out of scope of this article:

  • In 1979, Fukushima provided the first “convolutional neural network” (CNN) when he developed Neocognitron in which he used a hierarchical, multilayered design [65]. CNNs are widely used for image processing, speech to text conversion, document processing and Bioactivity Prediction in Structure based Drug Discovery [97].
  • In 1983, Hopfield popularized Recurrent Neural Networks (RNNs), which were originally introduced by Little in 1974 [51,52,55]. RNNs are analogous to Rosenblatt’s perceptron networks but are not feedforward because they allow connections to go towards both the input and output layers; this allows RNNs to exhibit temporal behavior. Unlike feedforward neural networks, RNNs use their internal memory to process arbitrary sequences of incoming data. RNNs have since been used for speech to text conversion, natural language processing and for early detection of heart failure onset [98].
  • In 1997, Hochreiter and Schmidhuber developed a specific kind of deep learning recurrent neural network, called LSTM (long short-term memory) [66]. LSTMs mitigate some problems that occur while training RNNs and they are well suited for predictions related to time-series. Applications of such networks include those in robotics, time series prediction, speech recognition, grammar learning, handwriting recognition, protein homology detection, and prediction in medical care pathways [99].
  • In 2006, Hinton, Osindero and Teh invented Deep Belief Networks and showed that in many situations, multi-layer feedforward neural networks could be pre-trained one layer at a time by treating each layer as an unsupervised machine and then fine-tuning it using supervised backpropagation [67]. Applications of such networks include those in image recognition, handwriting recognition, and identifying of onset of diseases such as liver cancer and schizophrenia [100, 109].

Parallel and Distributed Computing Improve AI Capabilities

During 1983 and 2010, hardware became much cheaper and more than 500,000 times faster; however, for many problems, one computer was still not enough to execute many machine learning algorithms in a reasonable amount of time. At a theoretical level, computer science research during 1950-2000 had shown that such problems could be solved much faster by using many computers simultaneously and in a distributed manner. However, the following fundamental problems related to distributed computing remained resolved until 2003: (a) how to parallelize computation, (b) how to distribute data “equitably” among computers and do automatic load balancing, and (b) how to handle computer failures and interrupt them if they go into infinite loops. In 2003, Google published Google File Systems paper and then followed it up by publishing MapReduce in 2004, which was a framework and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster [68]. Since MapReduce was proprietary to Google, in 2006, Cutting and Carafella (from University of Washington but working at Yahoo) created an open source and free version of this framework called Hadoop [69]. Also, in 2012, Spark and its resilient distributed datasets were invented, which reduced the latency of many applications when compared to MapReduce and Hadoop implementations [70]. Today a Hadoop-Spark based infrastructure can handle 100,000 or more computers and several hundred million Gigabytes of storage.

Big Data begins to help AI systems

In 1998, John Mashey (at Silicon Graphics) seemingly first coined the term, “Big Data,” that referred to large volume, variety and velocity at which data is being generated and communicated [71]. Since most learning techniques require lots of data (especially labelled data), the data stored in organizations’ repositories and on the World Wide Web, became vital for AI. By early 2000, social media websites such as Facebook, Twitter, Pinterest, Yelp, and Youtube as well as weblogs and a plethora of electronic devices started generating Big Data, which set the stage for creating several “open databases” with labeled and unlabeled data (for researchers to experiment with) [72,73]. By 2010, humans had already created almost a quadrillion Gigabytes (i.e., one zetta bytes) of data, most of which was either structured (e.g., spreadsheets, relational databases) or unstructured (e.g., text, images, audio and video files) [74].

Figure 6: Important Sub-fields of Artificial Intelligence in 2010

Progress in Sub-fields of AI and Commercial Applications

Reinforcement Learning Algorithms play Backgammon
In 1992, IBM’s Gerald Tesauro built TD-Gammon, which was a reinforcement learning program to play backgammon; its level was slightly below that of the top human backgammon players at that time [62].

Machines beat humans in Chess
Alan Turing was the first to design a computer chess program in 1953 although he “ran the program by flipping through the pages of the algorithm and carrying out its instructions on a chessboard” [75]. In 1989, chess playing programs, HiTech and Deep Thought developed at Carnegie Mellon University, defeated a few chess masters [76]. In 1997, IBM’s Deep Blue became the first computer chess-playing system to beat world’s champion, Garry Kasparov. Deep Blue’s success was essentially due to considerably better engineering and processing 200 million moves per second [77].

In 1994, Adler and his colleagues at Stanford University invented, a stereotactic radiosurgery-performing robot, Cyberknife, which could surgically remove tumors; it is almost as accurate as human doctors, and during the last 20 years, it has treated over 100,000 patients [78]. In 1997, NASA built Sojourner, a small robot that could perform semi-autonomous operations on the surface of Mars [79].

Better Chat-bots
In 1995, Wallce create A.L.I.C.E., which was based on pattern matching but had no reasoning capabilities [80]. Thereafter, Jabberwacky (renamed Cleverbot in 2008) was created, which had web-searching and gameplaying abilities [81] but was still limited in nature. Both chatbots used improved NLP algorithms for communicating with humans.

Improved Natural Language Processing (NLP)
Until the 1980s, most NLP systems were based on complex sets of hand-written rules. In the late 1980s, researchers started using machine learning algorithms for language processing. This was due to the faster and cheaper hardware as well as the reduced dominance of Chomsky-based theories of linguistics. Instead researchers created statistical models that made probabilistic decisions based on assigning weights to appropriate input features, and they also started using supervised and semi-supervised learning techniques and partially labeled data [82,83].

Speech and Speaker Recognition
During late 1990s, SRI researchers used deep neural networks for speaker recognition and they achieved significant success [84]. In 2009, Hinton and Deng collaborated with several colleagues from University of Toronto, Microsoft, Google and IBM, and showed substantial progress in speech recognition using LSTM-based deep networks [85,86].

Recommender Systems
By 2010, several companies (e.g., TiVo, Netflix, Facebook, Pandora) built recommendation engines using AI and started using them for marketing and sales purposes, thereby, improving their revenue and profit margins [87].

Recognizing hand-written digits
In 1989, LeCun and colleagues provided the first practical demonstration of backpropagation; they combined convolutional neural networks (CNNs) with back propagation in order to read “handwritten” digits. This system was eventually used to read the numbers in handwritten checks; in 1998, and by the early 2000s, such networks processed an estimated 10% to 20% of all the checks written in the United States [88].

During 1983 and 2010, exemplary research done by Hinton, Schmidhuber, Bengio, LeCun, Hochreiter, and others ensured rapid progress in deep learning and some networks were also used in commercial applications


The year 2000 had come and gone but Alan Turing’s prediction of humans creating an AI computer remained unfulfilled [3,4] and Loebner prize was initiated in 1990 with the aim of developing such a computer [89]. Nevertheless, substantial progress was made in AI, especially with respect to deep neural networks, which were invented in 1965 with the first algorithm for training them given in 1971 [18,19,20]; during 1983 and 2010, exemplary research done by Hinton, Schmidhuber, Bengio, LeCun, Hochreiter, and others ensured rapid progress in deep learning techniques [90,91,92,93] and some of these networks began to be used in commercial applications. Because of these techniques and the availability of inexpensive hardware and data, which made them practical, the pace of research and development picked up substantially during 2005 and 2010, which in turn, led to a substantial growth in AI solutions that started rivaling humans during 2011 and 2017; we will discuss such solutions in the next article, “Domains in Which AI Systems are Rivaling Humans” [151].


About the Author

Dr. Alok Aggarwal is the founder and CEO of Scry Analytics (; prior to this, he was the co-founder and Chairman of Evalueserve ( He received his PhD in Electrical Engineering and Computer Science from Johns Hopkins University in 1984 and worked at IBM Watson Research Center during 1984 and 2000; during 1989-90, he taught at MIT and advised two PhD students and during 1998-2000, he founded IBM India Research Lab. and grew it to 60 researchers.

Contact: Office: +1 408 872 1078; Mobile: +1 914 980 4717


References for all articles in this series can be found at


Frequently Used Approaches and Techniques for Supervised & Unsupervised Learning