Data transformation is not solely an upstream engineering function - as use cases proliferate, every analyst or end consumer needs to flexibly transform data to extract value.
However, there are different types of transformations in the overall process, akin to an industrial factory with specialized steps and roles.
- The first stage in the data factory is squarely in the hands of data engineers who collect the raw assets. Their goal is to ensure the raw feeds match what is expected. This step can include lightweight entity-based organization, and standardization of time modeling.
- The second stage is in the hands of a data or analytics engineer - here, they establish the base calculations or measurements of interest, map the key entities involved, and associate core attributes. The contours of the business processes emerge at the end of this step.
- The third stage is either in the hands of analytics engineers or analyst consumers, and this represents a core ongoing challenge in organizations. In this step, meaningful business logic is infused, and data is frequently joined, filtered, aggregated, and pivoted to fully flesh out the business processes and generate outputs for various use cases.
In the third stage, some organizations lean on analytics engineers to heavily pre-compute inside the data platform - but these engineers are surprised at how much ad-hoc code and shadow pipelines ends up being built to extract deeper insights and actionable data. A second order effect is how the consumers feel perpetually under-served.
Other organizations empower their end consumers to twist and turn data with sundry code and no-code tools. However, the rapid growth of chaotic transformations leading to inconsistencies frustrates the upstream analytics engineers.
In either scenario, the analysts, who in the middle between the engineers and end consumers, struggle - in the former, they become a service desk writing a lot of ad-hoc code, and in the latter, they spend their time playing reconciliation cops across teams and tools.
Where do we go from here? The key is realizing that use cases today demand fast and flexible data outputs. A data-infused culture is synonymous with this flexibility, but without sacrificing consistency.
We must also realize that we have to meet consumers where they are in their operational workflows. To pretend that analytics engineers can anticipate and service all the needs of the consumer is folly.
So, if analytics engineers can shape the building blocks on the data platform with rich metadata, the next generation of consumption tools needs to push the frontier of what’s possible by democratizing flexible yet reliable consumption for data analysts, product managers, the FP&A experts, logistics specialists, rev ops experts, and so on. This will enable end consumers to get what they need when they need it.
We are excited to be working on this and pushing this frontier forward.