As data producers embraced directed acyclic graphs (DAGs) to represent data pipelines over the last decade, could a similar evolution occur for data consumers with a "metrics graph"?
In recent years, DAGs have become foundational for data producers, offering a structured way to manage data pipelines. These DAGs facilitate the transition of data from raw to processed states, enabling data producers to streamline transformation code, intermediate datasets, and manage workflows effectively.
Debugging, fixing pipeline runs, adding or editing data models, running tests, and deploying changes have all become more seamless. But, on the flip side, data consumers, positioned at the receiving end of these pipelines, grapple with an increasingly complex mix of data outputs.
The demand for reliable data across expanding use cases results in a conglomerate of atomic activities, facts, dimensions, and multiple sets of pre-aggregated metrics. Each time a consumer poses the classic "one more question," it may translate into adding another object to the growing data DAG. Assessing the relative importance of these objects and associated pipelines becomes a challenge for data producers.
One reason for this dataset entropy is because data models and DAGs do not explicitly capture the business processes. In other words, it’s too “low level” an abstraction.
In addition, this leaves data consumers at the mercy of analytics engineering to add/edit objects. In practice, this leads to bottlenecks and “shadow pipelines” - that is, entire transformation processes that happen outside of the core data pipeline so consumers can meet their needs.
Modeling the business processes explicitly is a promising avenue for progress. We can envision consumers starting with the business model in mind constructing relationships between metrics and sub-metrics, and to other metrics. This elevates the abstraction from a “data graph” into a “metrics graph”, and metric trees are a manifestation of this concept.
Tools that natively understand metric trees can convert consumer requests into resultant data models. We can redesign common workflows like metric root causing, experimentation, and strategic planning and operations on top of these metric trees.
This way of working not only puts the consumer in the driver seat, it also rationalizes the data producer’s work as the clear mapping between metrics use cases and backend data graphs and objects becomes more evident.
The concept of metric graphs holds significant promise, potentially evolving into a foundational unit for data-driven business strategy and operations. The next decade may witness the maturation of the "metrics graph" and its intricate interplay with the "data DAG."