The need for DataOps will only become stronger in the next upcoming months and years. DataOps is a set of best practices to help engineers build, test, deploy, and maintain data-driven applications efficiently. But most of us are at a loss to figure out where we need to begin. My goal in this practical guide aims to provide a helping hand. It will give an overview of the key components of DataOps and provide tips for successful implementation.
Define Company and Data Strategy
Before we can even start thinking about DataOps, we need to define the company strategy and the data strategy. Think about the following questions:
What is the company’s north star metric?
How does data play into this?
Who owns what data and how do we give them permission to do so?
How do we as the data team want folks to ask for data?
Answering these questions will help to determine the overall goal of the DataOps implementation and ensure that the processes and tools that are used are tailored to the company’s needs.
Define DataOps Strategy
Once the company and data strategy has been defined, it is important to create a strategy for implementing DataOps. DataOps is a complex process and deserves an entire book itself, but for now, here is an article that will help gather the bits and pieces for you to understand what I mean by it.
Creating a strategy for implementing DataOps includes recording the current status of the problem, determining the goals of the project, outlining the resources required, and creating an approximate timeline for completion. Additionally, it is important to plan for any potential technical challenges that may arise over the course of the project and to create a plan for addressing them.
There are a few things that are must-haves in these discussions:
Focus on the things you (or the team) currently see as a problem and want to see changed.
Focus on personas and not on the people themselves.
Avoid any tools in these discussions.
Address the pain points that the current data team (data engineering, data science, and data analysts) and as a result the stakeholders are having.
Once the initial strategy has been defined and written down, you will need to share this with the rest of the team. DataOps is a culture, not something that can be fulfilled by one or two people.
Discuss with the Team
As engineers, we find a plan and then immediately build whatever we thought of (I am still working on this part), but the most important part here is to slow down. DataOps is an expansive project that cannot be done by one or two people; it has to be done with the entire team.
As a team, discuss:
completion of the strategy
standards e.g. what is our service level agreement on our data?
practicality and implications
risks and limitations
rewards and benefits
Through this process, the team can gain a better understanding of the true potential of the plan, and make an informed decision about its viability. And make changes accordingly.
Once the plan has been discussed and agreed upon (which could very well take several months), it is time to start implementing it.
Implementation
Implementation is one of the hardest ones to figure out. Which area(s) should we focus on first? Is it the data quality? How about the data modeling? Or should we stick with implementing unit tests first?
It is hard for me to share exactly where to go from here as I do not know what your use case is, but whatever you end up deciding will be the first part, remember to start small and iterate from there.
I will walk through data quality here as an example as it is a pertinent problem among the majority of companies.
Example.
I will first want to write a design document (or presentation) about data quality that addresses the following areas:
Background
Explanation of the problem
Requirements for an acceptable and long-term solution
Proposed solution
Alternatives that were considered
Note that the solution should be in iterative processes e.g. first we need to implement static data quality checks, then dynamic checks, etc.
These discussions will naturally lead to tools that we can implement to help solve these problems. Take a look at Chapter 4 in Fundamentals of Data Engineering by
for thoughts about how to decide what tool will be the best.These changes require the whole team as DataOps is an organization-wide effort. We will need to have everyone on board before we can get started. You do not want to have an individual contributor engineer implement a tool that they did not have any input on. That is a jerk move and goes against what DataOps stands for. To iterate, the DataOps Manifesto has two rules on it — number 4 and number 7. The first says that DataOps is a team sport and the latter is to prevent anybody from being a hero.
Agility and Monitor
Once the initial implementation has been completed, the team should be able to monitor their DataOps performance. This involves setting up metrics to measure the success of the implementation, as well as creating feedback loops to ensure that the DataOps processes are agile and able to adapt to the changing needs of the organization.
Monitoring should also include tracking the performance of the data and the data teams, as well as assessing the effectiveness of the tools and processes used. This will provide the team with invaluable insights that can be used to optimize the DataOps implementation and ensure that the organization is getting the most out of its data.
Conclusion
Implementing a successful DataOps strategy requires careful planning and execution. By defining a strategy, establishing the processes and tools that will be used, and implementing and monitoring the system, collaborating teams within companies can ensure a successful DataOps implementation.