Data Lineage
By Dr. Alok Aggarwal
Data comes from various sources, moves through several systems and gets transformed

Problem & Current State

  • Data lineage – difficult to trace data’s origins, what happens to it, where it moves over time & trace errors to root cause
  • Often represented via time-graphs as how the data gets transformed along the way, how the representation and parameters change, and how the data splits or converges after each hop
  • Data governance plays a key role in metadata management for guidelines, strategies, policies, implementation, which can be incorporated in Collatio®


  1. Identify data lineage based on:
  • Rows & columns that did not change; deleted or added columns
  • Columns in New that are composite or are functions of old
  1. System has a large library of transformation rules for various data types and can be configured and trained for specific types of transformations
  2. GUI to define the transformation rules, configure, schedule & execute these rules; dashboards to visualize overall transformation rules, table/column specific transformation results
  3. System generates exceptions and alerts on incremental data based on the transformation rules


  • 75% annual cost savings
  • 95% reduction in overall processing time
  • >98% accuracy