I have a cherished stuffed animal that I received from my grandmother, who sadly passed away at an early age. I keep it safely tucked away in one of my cabinets, away from curious paws, so I can protect it and take great pride in it.
DataOps is the equivalent of cleaning and taking care of my cherished stuffed animal but for data. Unfortunately, many data teams are not giving this process the attention it deserves.
By implementing DataOps, we can finally give data the love and care it deserves.
What is DataOps
DataOps stands for data operations: the practice of using agile techniques and software engineering best practices to optimize and streamline data-driven processes. This includes strategies such as automation, continuous integration, and continuous delivery, to increase efficiency and productivity when dealing with data-related tasks.
DataOps is the process of managing data products, an area where many companies need the most help. You might think, oh that sounds a lot like DevOps, and I would agree with you, and that is exactly the problem. Most definitions that of DataOps that are floating around forget about the data that is moving between each process. And most importantly, we neglect to monitor that data.
We do not know the answers to these questions.
Is the data the same in each process?
Are we experiencing data drift?
Where are we using the data?
What processes should be in place to ensure data is managed efficiently?
Are there any reporting requirements that need to be fulfilled?
.
.
.
And a lot more questions.
All of these questions cannot be answered by one person or even a handful of people, and it requires a team and more likely teams. Up-skilling data engineers to handle CICD & software engineering best practices is better than teaching software DevOps engineers data best practices; it's easier to teach a data engineer software engineering (hard skill) than teach a software engineer to think like a data person (soft skill). For those same reasons, there is a whole world of practices that Software engineers are well versed in that we as data engineers are not aware of.
How is it then that software is a product, nurtured, protected, and watered? Data, on the other hand, is hidden back in the attic without much water or sunlight but expected to make the same return on investment contributions.
Creating a Data Product
Building a data product has more effect than anybody can imagine. Let’s walk through a few of these benefits.
Purpose i.e. we know what we are building and what the vision is.
Teamwork i.e. we are now part of a team with a product we can be proud of instead of a combination of random teams (data analysts, engineers, or scientists) trying to get requests in and out.
Ownership and responsibility i.e. we know where we start and we end.
Service of others i.e. making sure that our customers are happy with what we provide.
Return of Investment is defined within the product, instead of the individual folks within the data teams.
Prioritization of data
But most importantly, we will be able to define things in data products that otherwise would not be defined at all. The more things we can define within the data realm, the more likely we can get the adoption of data practices not just within data teams but also as an organization as a whole.
We care about whether data practices are adopted across the organization.
Compliance
This is probably the most obvious practice we want the whole company to abide by. Let’s take a look at GDPR as an example. General Data Protection Regulation (GDPR) is a regulation in EU law on data protection and privacy for all individuals within the European Union (EU) and the European Economic Area (EEA). If an organization does not follow GDPR, it can result in significant consequences, including:
Financial penalties: Non-compliance with GDPR can result in substantial fines.
Reputational damage: Organizations that do not follow GDPR can suffer from negative publicity, damaging their reputation and brand.
Legal liability: Organizations that violate GDPR can face legal action from individuals whose personal data has been mishandled or misused.
Loss of trust: Organizations that do not follow GDPR can lose the trust of customers, employees, and partners, which can have a significant impact on business.
Disruptions to operations: Organizations that do not follow GDPR may face disruptions to their operations, including data breaches and the need to stop processing personal data.
Bottomline: heavy fines, lawsuits, and the shutdown of the company.
Answering with Data
Current business cultures tend to rely heavily on intuition or gut feelings when responding to questions. However, by creating a data product, we emphasize the importance of using data to inform our decisions. When we use data on top of intuition, we can then provide a more rigorous, objective, and evidence-based way of making decisions. This allows for a reduction of personal biases and a more accurate and efficient decision-making process overall.
Data prioritization as a product is an invaluable tool for developing a more in-depth comprehension of the issue at hand. By having access to data that has been organized in a way that is easy to understand, it becomes easier to identify key elements of the problem and develop a plan of action to address it. Furthermore, data prioritization can also help to identify potential solutions that may have previously been overlooked, allowing for a more creative and comprehensive approach to problem-solving. In essence, data prioritization can be an immensely useful asset in tackling any issue, providing a more detailed and nuanced perspective that can help to bring about the desired result.
Conclusion
Building a data product will encourage data practices across the company, benefiting the company as a whole. And all it takes is prioritizing DataOps.