Data is vitally important to decision-making in any enterprise. But when decision-makers have to wait for IT to build a data pipeline to access it, opportunities are missed, and decisions are suboptimal. This is the challenge facing most large enterprises trying to become more data driven to improve their performance.
One of the biggest data management challenges is that yesterday’s technology cannot support today’s growing demand for data. The ETL approach is decades old and centralized governance structures that worked in simpler times just cannot scale to meet the complexity of the AI age.
Innovative approaches that are more distributed, agile and flexible are starting to come to market. A data mesh strategy is one example.
A data mesh is a modern data integration strategy. It is based on a distributed data architecture that moves away from consolidated and centralized data storage and management to a more shared and federated approach. It is an alternative to the ETL data pipelines and data lakes that are built on monolithic architectures and rely on numerous dependencies.
The data mesh architecture is more than technology, it is a comprehensive strategy that incorporates changes in contributors’ roles within data management and data consumption. There are 4 tenants of a Data Mesh strategy. A true data mesh strategy must
The data mesh architecture redistributes more command and control over data to independent domains. Domains are groups engaged in a particular business function. This could be a regional operation, line of business or business function such as sales, marketing, HR or finance.
These domains collect a significant amount of data while carrying on their daily business operations. The domain driven approach puts more responsibility to control and manage this data in the hands of those that collected it, not a centralized authority.
With a data mesh, domains have greater autonomy, but they are not free to do whatever they please. In a federated data governance approach, responsibility for data governance is shared between central IT authorities and those at the domain level. IT handles creating frameworks and policies that apply uniformly across all domains while each individual domain manages rules that apply just to their own data and business processes. Learn more about federated data governance here
When you shift to a data mesh architecture you switch from a project mentality to a product-based approach. Instead of creating an ad-hoc ETL pipeline every time a new set of data is needed, domain teams work to proactively build reusable data products that deliver the data required by decision makers.
To be effective, these products must be discoverable, addressable, trustworthy, and self-describing. This means;
Data mesh architecture must be accessible by non-technical data consumers without assistance from technical professionals. This could be through a data product marketplace or through technology that enables direct access to data products from an analytics or modeling tool. One of the biggest pain points the data mesh solves is breaking down technical barriers between data and those that consume it. Self-service improves the quality and speed of decision making. It also relieves demands on data engineers who are overwhelmed with fulfilling data requests.
In today’s environment the demand for data is outstripping the ability for IT operations to deliver it. Organizations know that more data driven decisions lead to better outcomes and performance yet the challenge of supplying access to the right data that is fit for purpose and trustworthy is technically and culturally challenging.
In the typical organization, data is collected and stored in data silos. Whether it is a legacy transaction application or a SaaS CRM. Sharing data across these silos is difficult. To meet the demand to share data knowledgeable programmers must build pipelines to move data across these silos. These developers must be versed in technologies such as Python, SQL, R and Java to deliver on data requests. Unfortunately, there are simply not enough skilled developers to keep up with demand. In many cases, by the time data requests are fulfilled, the need is no longer there resulting in missed opportunities. With business decisions already being made at a lightning pace and AI positioned to increase that pace at an exponential rate, this approach will not work in the future.
A data mesh enables people and sophisticated technology to work together so decision makers across the organization can get the data they need when they need it.
From a cultural perspective, a data mesh strategy empowers individuals by providing more ownership and responsibility to steward the data in their domain. This makes them more engaged in ensuring data is accessible and trustworthy. Each stakeholder in the process has a role.
Self-serve capabilities of the data mesh and robust data catalogs enable data analysts to explore and deploy the data they need through data products. These analysts are no longer forced to struggle with manual tasks or wait for IT to access data. They can deliver more insights and analysis to decision makers with the skills they have.
Domain managers, who understand the data they collect more than a central governance authority, are empowered to steward it. This greater understanding of the context around their data puts them in the best position to manage it and boost its value.
In shifting to a data mesh, IT professionals and data engineers become empowered to enhance the value that they provide by delivering more strategic services. Data engineers can spend less time coding ETL processes and work closer with data product producers to more efficiently access quality data. They can advise on domain level governance rules and enforce quality metrics. Data engineers can also play a larger role in managing infrastructure to empower their colleagues.
A data mesh runs on a distributed architecture. Instead of dumping data into a data lake, data remains in the system that collected it. When data is needed, it is pulled from the source instead of being copied into another database where it is analyzed. This means storage costs are reduced and discrepancies across various redundant data stores are minimized.
A distributed system is also more scalable, agile and accessible. While the actual data stays in place, the meta data is consolidated into a single database. By separating meta data from the data it describes, data assets can be discovered in a single catalog and data queries can be built independent of the data. This enables;
Single data queries can be created that can access data in multiple different systems simultaneously using the same data model.
Data also doesn’t need to be moved through a batch process but can be merged in real-time and changes can be made on the fly.
By separating the data from the logic, dependencies created by endless data pipelines can be reduced enabling greater scalability.
Higher authority does not always mean better security. The professionals collecting data are in a much better position to understand it’s sensitivity. This puts them in a position to implement smarter data governance policies than a central authority.
A flexible framework governance hierarchy can also be much more effective in ensuring data is accurate, secure, and accessible. By giving domains the autonomy to work within a broader framework they can create policies that work best for them but still meet organizational governance standards. Being closer to the data, they are also in a better position to make changes as threats and demands change.
More autonomy also reduces the tendency for analysts to resort to unsanctioned solutions. If rules are too restrictive and or unapplicable to a certain use case, operators will find ways around them. This creates opaque vulnerabilities that can lead to serious threats to security.
As technology and systems mature, they typically get more sophisticated, complex and distributed. With less centralized control data systems can evolve quickly and be more agile and resilient. By entrusting people with data but creating appropriate guardrails to ensure order, data becomes more accessible and useful.