Dr. Alok Aggarwal
CEO and Chief Data Scientist
Scry Analytics, California, USA
Office: +1 408 872 1078; Mobile: +1 914 980 4717
June 20, 2020
COBOL is a 61-year old computer language for processing data. Although highly inefficient by modern standards, millions of COBOL programs remain pervasive in government and industry and are responsible for transactions worth three trillion dollars. Recreating them in contemporary computer languages is extremely time consuming, laborious, and expensive. Also, there is an acute shortage of COBOL programmers since universities no longer teach this language. This article illustrates the use of Artificial Intelligence to decode legacy COBOL programs and reduce the dependence on COBOL programmers by 85% as well as the cost of conversion by 75%.
COBOL (“COmmon Business-Oriented Language”) was designed in 1959 by CODASYL to create an English-like, portable computer language for processing data . In 1997, Gartner estimated that there were about 300 billion lines of computer code worldwide, with 80% (240 billion lines) of it in COBOL and 20% (60 billion lines) in all other computer languages combined [2,3]. Today, approximately 12 million COBOL programs with more than 200 billion total lines of code remain in use by organizations across information technology; education; financial services; healthcare; and retail sectors, and these handle three trillion dollars in commerce – mainly for batch transaction processing.
With the growth of the Internet and Cloud Computing, many new companies – particularly in finance and retail sectors – now serve customers in real-time, instead of batch mode. For example, customers can place an online order through Amazon or Target in seconds, and merchants can receive their credit card monies through Square or Stripe almost instantaneously. This is well beyond the original functionality of COBOL programs, which typically run in batch mode one or two times a day, thereby leading to substantially longer delays in order fulfilment and payments than is acceptable by modern standards.
The inability of COBOL programs to scale up and quickly handle so many simultaneous requests has now become vital. This urgency has become particularly pronounced during the COVID-19 pandemic, when outdated COBOL programs used by both federal and state governments have led to delays in disbursing funds and processing unemployment claims. Indeed:
The above issues highlight the need to replace COBOL programs with newer ones written in modern languages. However, understanding these COBOL programs is a huge impediment because of the following reasons:
Unfortunately, COBOL experts who can decipher these programs are in short supply. Estimates reveal only two million such programmers remain in the world with about half retired . These numbers continue to decrease, as colleges have long stopped teaching this language due to the existence of better ones. The few graduating students who know COBOL do not want to work in it for the fear of being labelled as “blue-collar tech workers” . With Tampa Bay Times, one COBOL programmer aptly summarized his experience transitioning from COBOL to Java when he said, “It’s taken them four years, and they’re still not done” . Recently, Reuters reported that when Commonwealth Bank of Australia replaced its core COBOL platform in 2012, it took five year and cost $749.9 million. Finally, there are several solutions in the market for converting COBOL programs to those in other languages, but they also heavily rely on extensive use of COBOL programmers, and the cost and time needed to replace COBOL programs with modernized ones are immense.
Since the cost of replacing COBOL code is around 25 dollars per line [3, 11], the total cost and time of replacing 200 billion lines of code will be about five trillion dollars and 40 million person years, wherein approximately half (2.5 trillion dollars and 20 million person years) will be spent in deciphering COBOL programs. Fortuitously, since the number of non-COBOL programmers is around 24 million and growing , if these black-box COBOL programs could somehow decoded, then upgrading these programs to superior ones may only take a few years. However, there are only a million active COBOL programmers and their number is dwindling, the task of decoding the 12 million COBOL programs is likely to take at least twenty years, which will be the fundamental bottleneck going forward.
As discussed above, replacing COBOL programs with those containing enhanced features (e.g., real-time capability and handling surges of requests) involves the following two tasks:
Evidently, the lack of COBOL programming expertise and the corresponding huge expense that it entails, are the biggest hurdles in replacing legacy COBOL programs. Fortunately, the following two reasons have enabled us to develop Artificial-Intelligence (AI) based software to help decipher COBOL programs, thereby reducing the conversion time and cost by 75%, and dropping the number of COBOL programmers required to just 15%:
Collatio® -Data Flow Mapping software from Scry Analytics ingests all input and output tables related to a given COBOL program and uses proprietary AI-based algorithms to reverse engineer the transformations that are performed by this COBOL program, thereby inferring the steps executed by this program, and helping the user create a flow-chart of the program’s inner workings.
Below, we explain how this software works via an example of a legacy COBOL program for approving or rejecting unemployment claims filed by 100,000 people during the ongoing COVID-19 crisis:
Since most COBOL programs spend around 70% of their time doing input-output read-write and the remaining 30% in calculating formulas and in manipulating numbers and strings, Collatio®-Data Flow Mapping software uses the “finger print” given in the output tables, and “reverse engineers” to determine the transformations. It primarily uses the cell values in the input and output tables. More precisely, it determines which columns of input tables and what potential formulas and constants (that are hard coded in the legacy program) are being used to produce each column of the output table. During the entire process, it seldom uses column names and since most COBOL programs have little or no documentation, it almost never uses the corresponding ontology (i.e., text-based relationships among various columns).
Key features of this decision support software are given below: