Intelligent Automation of Invoice

Dr. Alok Aggarwal
CEO and Chief Data Scientist
Scry Analytics, California, USA
Office: +1 408 872 1078; Mobile: +1 914 980 4717
October 26, 2020

Executive Summary

Around 13 billion invoices were sent in the United States in 2019 from businesses to each other and to government agencies. To remit these invoices, firms primarily used manual labor to extract and reconcile relevant fields, which cost them around 2% of invoice value and 25-30 minutes of accountants’ time. Despite this cost, most firms were unable to analyze these invoices in detail and some ended up overpaying their suppliers or continue to pay for products and services they no longer used. This article discusses an AI-enabled software from Scry Analytics, Collatio®- Invoice Reconciliation, which extracts and reconciles relevant data with more than 93% accuracy from invoices, purchase orders and related agreements, and provides insights and decision support for reducing overpayments.


According to industry estimates, in 2019, around 13 billion invoices were sent in the United States from businesses to each other and to government agencies [1,2,3]. Since the corresponding sales were approximately ten trillion Dollars [4,5], the cost of an average invoice was around $770. To pay their suppliers, almost all account payables’ groups conform to the following process:

  1. Extract relevant information from invoices: Approximately 35% invoices come in paper, fax, or scanned formats, 55% come in PDF machine readable formats, and 10% in EDI/XML or other e-invoice formats. Data from the first two formats is either manually typed into accounts payables’systems or after converting these invoices into an electronic format via Optical Character Recognition (OCR) [6,7]. Such manual extraction of relevant information is error prone.

  2. Reconcile extracted information: Information extracted from invoices is manually reconciled with other invoices, with purchase orders (POs), Master Services Agreements (MSAs), or statements of work (SoWs). Such reconciliation is also error prone and time consuming, and it takes another 12-15 minutes per invoice and costs an additional seven Dollars. Moreover, processing errors in this step often cause delays that lead to losing early payment discounts, incurring late payment penalties, repeated supplier inquiries, and strained supplier relationships.

  3. Pay the vendor via check, ACH, wire-transfer, commercial cards, or other means.

Salient drawbacks related to manual processing of invoices

In addition to the above-mentioned disadvantages, given below are three drawbacks of this process:

  1. Duplicate payments: Surveys reveal that duplicate payment rate ranges between 0.1% and 1% [8]. To mitigate this issue, 64% of all firms try to manually detect duplicate invoices and some even use third parties [9]. Third-party recovery firms typically charge a contingent fee that is one-third of the recovered amount. Clients find such fees exorbitant especially for large invoices, whereas recovery firms are reluctant to recover money for smaller ones. Besides, third party firms usually do not extract underlying causes of duplicate payments, which limits clients in making process improvements.

  2. Overbilling and over-payments: Overbilling usually occurs because of double billing, padding hours, charging higher rates but providing lower level/experienced professionals, charging for on-the-job training, and charging excessively for trivial tasks (e.g., photocopying), travel expenses or other overheads. To avoid over-payments, around 65% invoices are reconciled with associated POs, and some of the remaining with MSAs and SoWs [10]. Since many invoices from law-firms, consulting firms, accounting firms and Information Technology firms do not have any associated POs and since reconciling with MSAs or SoWs is grueling, the probability of over-paying such invoices is higher.

  3. Money slippage due to inadequate analysis: Since it is arduous to analyze lengthy invoices, most finance departments are unable to control potential leakage that occurs because:

    • When firms down-size, they often forget to right-size the corresponding services or products required by their smaller teams.

    • Organizations often execute “special projects” that require them to buy additional products and services, but they forget to discontinue them after their completion.

    • Firms are unable to benchmark and compare vendors with similar products or services since this involves analyzing many invoices concurrently at the item level.

    • Companies are unable to reconcile invoices properly to determine missing volume or early payment discounts, even though such terms were included in their MSAs or SoWs.

All direct and indirect costs mentioned above often end up being around 4% of the total invoice-value, and saving even half of these costs, is likely to improve profit margins for most firms. In fact, organizations that spend one-quarter to half of their revenue on external suppliers, can witness their profit margins increase by 5% - 10%. Also, a speedier handling of these invoices will help in improving the financial health of suppliers, thereby cementing better relationships with them [11].

Artificial Intelligence to the rescue

Collatio® - Invoice Reconciliation software from Scry Analytics ( uses proprietary AI-based algorithms and automates the invoice reconciliation process thereby reducing the drudgery of manual reconciliation while ensuring accuracy in payments with minimal errors. It consists of four software modules that are discussed below:

  1. Automated digitization of each invoice; extraction and harmonization of relevant information
    This AI-based software module executes the following steps that are depicted in the diagram given below:
    1. Upload: Either an analyst uploads invoices in a batch mode or one by one or various invoices are ingested via an API (“Document Express”) or via email.

    2. Data Extraction: If an invoice is in a scanned, fax, JPEG or TIFF formats, then this module uses its deep learning network (NN-OCR) that is specifically trained to convert it into digitized data. If the invoice is in other machine-readable formats that provide higher resolution, then it uses other proprietary AI-based conversion modules to obtain better accuracy of the electronic output.

    3. Attributes Extraction: It uses proprietary AI-based and graph algorithms with various knowledge bases such as ontology and external data feeds to extract more than 100 attributes or entities from invoices and related agreements with more than 93% accuracy. It also extracts attributes for each line item from the invoice item table. The in-built ontology and pre-trained algorithms not only help in extracting relevant information from proforma, interim, past due, recurring, and final invoices, they also provide superior results for multinational invoices and special legend invoices (e.g., those law-firm invoices and tax invoices).

    4. Enhanced Due Diligence: Using the in-built ontology and graph-based algorithms, it then determines the connections among the extracted entities and reconciles them. For example, it “reverse engineers” to compute the tax rate and determines that the multiplication of unit price and the total number of units would equal the total item amount. Similarly, it uses external data enrichment to reconcile the information related to the vendor and its financial institution. By doing so, it detects incorrect and potentially fraudulent invoices with the wrong price, wrong quantity, missing tax amount, missing tax identification number, no purchase order or other contracts, incorrect net amount, and more.

    5. Attributes Verification and Reinforcement Learning: Finally, using in-built APIs and a user interface, it provides various extracted attributes and the item table to the analyst for review and modifications, thereby achieving 99%+ accuracy quickly, and it uses the changes made by the analyst for improving its accuracy going forward.

  2. Time-series analytics of all invoice related information For each vendor of a payor firm, this module creates a chronological sequence of all invoices that were provided to this firm. Next, using an AI-based algorithm, it determines duplicate invoices as well as invoices that may not be exact duplicates but contain duplicates at the invoice-item level, e.g., an unpaid invoice from the previous month may be included as an item in the new invoice (rather than being mentioned as an unpaid amount from the past). Also, for each invoice-item, it computes and alerts the analyst whether the cost of that item was higher than the threshold set by the firm. Indeed, the firm can set additional thresholds that provide insights as to which invoice items constitute hefty expenses

  3. Automated digitization, extraction and harmonization of relevant data from POs, MSAs and SoWs This module is like the first module mentioned above; it extracts and harmonizes relevant data after digitizing appropriate POs, MSAs and SOWs. However, in addition to extracting structured data, it also extracts relevant clauses that are important for checking the accuracy of an invoice, and it stores harmonized data in a structured format and clauses in a textual format.

  4. Reconciliation of extracted information from an invoice with other invoices and other agreements This module first compares entities and values in various invoices with those in the corresponding POs and alerts the user if it finds any discrepancy. Similarly, for MSAs and SoWs, it uses proprietary Natural Integration Layer Attributes Graph Generation Enhanced Due Diligence Reconciliation Attributes Verification & Reinforcement learning Data GU Analyst System Upload documents manually or automatically Reviews only fields & entities alerted by software Language Processing algorithms to determine discrepancies between invoices and the clauses of MSAs or SoWs, and provides to analysts, the amounts that do not obey the MSAs or SoWs and need to be adjusted.

Finally, since more than 80% all invoices are received by small and medium-sized businesses (SMBs), who have limited funding for capital expenses [6], Collatio® - Invoice Reconciliation software is sold in SaaS (software as a service) mode but can be also installed behind the information technology firewall.

About the Author

Dr. Alok Aggarwal received his PhD in Electrical Engineering and Computer Science from Johns Hopkins University and worked at IBM Watson Research Center during 1984 and 2000. During 1989-90, he taught at MIT and advised two PhD students and during 1998-2000, he founded IBM India Research Lab. and grew it to 60 researchers. He co-founded Evalueserve ( in 2000 and was its chairman until 2013; this company provides research and analytics services worldwide and has 3,500 employees. In 2014, Dr. Aggarwal founded Scry Analytics ( to develop a AIbased enterprise applications and a comprehensive platform that enable clients to re-think and automate their data-driven and manually intensive business operations.

Contact: Office: +1 408 872 1078; Mobile: +1 914 980 4717 Email:

References :

  1. Federal Reserve Bank of Minneapolis, “U.S. Adoption of Electronic Invoicing: Challenges and Opportunities,” See footnote number 3 on page 3 of this white paper, June 2016.

  2. PayStream Advisors, "2014 Global E-Invoicing Report." 2014.

  3. Koch, Bruno.” Implementing E-Invoicing on a Broad Scale”. Consultancy Services on behalf of Australian Tax Office. Billentis, July 16, 2015.

  4. Wisconsin Procurement Institute, A Procurement Assistance Technical Center “Can I Really Sell to the Government.” 2019.

  5. Forrester Research, Inc. “Mapping the $9 Trillion US B2B Online Commerce Market.” 2018.
    Market/-/E- RES142735

  6. PayStream Advisors, “2018 Payables Insight Report.” 2018.

  7. iPayables, Inc. “Why Automation Matters,” 2016.

  8. Ardent Partners Ltd., “ePayables 2013: AP’s New Dawn.”
    /ePayables_2013_APs_New_Dawn_Ardent_Partners.pdf. See also, Institute of Finance and Management, “2013: AP Department Benchmark and Analysis,” 2013.

  9. Based on data published in KPMG, “Low Tech Approach to High Risk Challenges, KPMG Pulse Survey results from the 2012.” RSA Archer GRC Summit; June 2012.

  10. Ardent Partners Ltd., “The State of ePayables 2017: The Convergence of Cash, Suppliers and Intelligence.”