It’s a debate that’s been raging for years: should you invest in a consolidated, single-vendor technology suite or should you adopt a best-of-breed approach? As software markets evolve, this question inevitably comes up, with the pendulum swinging from one extreme to the other. And because it’s in the early stages, that’s what is currently happening in DataOps tooling.
The real DataOps question
But when it comes to data tooling, I think debating single vendor vs. best-of-breed misses the point. Think about it. At the heart of your DataOps ecosystem is data. And that data lives in a myriad of systems and tools spread across your organization. Inevitably, the data in these systems is dirty. That’s reality. But if your data is dirty – and you don’t address it – do your DataOps tools even matter?
Here’s why.
I would argue that they do not. Time and time again, I’ve seen leading organizations spend time, money, and resources on the implementation of data transformation, data movement, and data governance systems. They implement data catalogs to make the data easier for business users to find. And they roll out self-service analytics to give data access to the masses. But because the data that is running through these systems is dirty, their output will be dirty, too.
Here’s an analogy to help put a finer point on the topic. Think about the plumbing in your house. From the pipes to the faucets, you think a lot about the quality of the materials, the aesthetic of each part, and how well they move water from one part of the house to another. But if the water that runs through your plumbing is murky, brown, and polluted, does the plumbing even matter? After all, nobody wants to turn on a beautiful new faucet only to see gluggy brown water come out.
Addressing the dirty data issue
So, if having clean data is priority number one, how do you fix the dirty data running rampant across your organization? You can attempt to fix it at the point of entry, but that’s really difficult because the data is always messy and changing. Instead, I believe that you need to address data quality at the point of consumption through next-gen master data management (MDM).
Next-gen MDM or “data mastering” adds an entity or business topic layer between your back-end source systems and your front-end self-service tools. Using machine learning, data mastering lets you view messy data and translate it into the business topic areas that matter to your users. Using contextually relevant definitions, users will have visibility into data related to customers, suppliers, parts, products, and more.
Finally, data mastering
Ultimately, I believe that best-of-breed is the right approach when building your DataOps ecosystem. And I would go one step further and argue that while all tools play an important role, data mastering is the most important tool in your modern DataOps ecosystem. And the best data mastering tools are the ones that are cloud-based and machine learning-driven.
By fixing the data, you elevate the value of the tooling in your DataOps ecosystem, allowing you to focus on what matters most: enabling better decision making across the organization.