Data-Driven Culture Starts with Data Engineers
Want consistent, reliable data? Hire more data engineers and implement data observation.
I took a deep breath as I had to fix one more report on a Friday afternoon. Again. The sole data engineer at a small start-up, I supported the company and fixed its data problems.
We often forget that these reactive moments of fixing reports are a relatively short-term solution to provide a data-driven environment. So what I have come to discuss here is a long-term solution by creating a data-driven environment that will benefit the data engineers and the organization.
I started as a consultant working with some of the biggest companies today. Luckily, I worked in an organization whose primary focus was analytics. My team was like-minded when it came to data. As part of this team, we would participate in brainstorming sessions, share ideas, and provide feedback.
Even though the boundaries of these projects were set in stone, brainstorming for how these projects were accomplished lay on the teams themselves. And the most important priority was to provide clean, better data quality and simplicity of code.
The team's processes were already put into place. We had our daily stand-up and code reviews. We had engineers, scientists, and designers come together and talk about what needed to be done. And most importantly, we were given the resources required to do those tasks.
The resource I wish to discuss here is the support of the leadership. Leadership wants clean, reliable data so that the company can make accurate decisions. Unfortunately, they do not realize that providing these clean, reliable data requires a lot of behind the scene work from data engineers. They expect that hiring a data engineer will suddenly "fix" their data pipelines.
Unfortunately, most of these older pipelines lack stability and robustness. They are spread out over various places, i.e., a mac still inside the office, a virtual machine in the cloud that no one wants to touch, and lastly, a half-baked pipeline in an Azure data factory with no accurate monitoring. And then upper management is surprised when they hear from both people inside the company and sometimes even from customers that these band-aided data pipelines consistently fall apart.
And if you are working in a start-up, this continual, reactive fixing can be a real challenge as the data teams do not even have a manager and usually are smaller than 5. A data team is typically a combination of data analysts, data engineers, and data scientists.
But the worst of all is the fight uphill to get the resources needed. I am not talking here about money. Leaders need to prioritize working on a pro-active approach to data pipelines instead of sucking the data people dry by continually having them fight report inconsistencies. Sure, figuring these inconsistencies out has its time and place. Still, if we want to get good data out of our data pipelines, we need to focus on getting solid, robust data pipelines with monitoring.
Leaders need to prioritize working on a pro-active approach to data pipelines instead of sucking the data people dry by continually having them fight report inconsistencies.
These data pipelines will only start being a priority once the environment of a start-up or any business becomes more data-driven, and that will only happen once people realize that technology is not standing in their way. Instead, people and the company's environment are the limitations of its overall data strategy. In 2021's survey, NewVantage Partners shared that only 7.8% of the 85 firms believed technology was the main challenge, whereas 92.2% thought cultural barriers and processes were the main challenges to becoming entirely data-driven.¹ Thus, changing these cultural barriers is the key to becoming a successful data-driven start-up. While obtaining the right technology or hiring more people can certainly help, it will not guarantee the transformation of a data-resistant culture as management consultant Peter Drucker often stated that "[c]ulture eats strategy for breakfast."
Instead, we should focus on executives' and upper-level management's buy-in and support. In Thomas Davenport's book, Analytics at Work: Smarter Decisions, Better Results, he shows how a customer goods company refuses to re-allocate marketing budget to more effective channels because they "either didn't believe [their own] analysis or weren't comfortable with the implications." ² These marketing executives have shown that their internal data does not matter, and since leadership trickles down, their constituents will as well. More importantly, how do those people who created those reports feel? They just completed this great report with actionable items, but the executives have dismissed their work and deprioritized their needs, resulting in demotivated employees. These executives took an excellent opportunity to showcase the importance of data in their company but instead demoralized their data teams by limiting their resources.
Instead, we should make the relationship between business and analytics as close as possible. In David Waller's article, 10 Steps to Creating a Data-Driven Culture, the first step he suggests implementing with companies with strong data-driven culture is to "have top managers who set an expectation that decisions must be anchored in data." ³ The best way to implement this step is to get the primary data in front of executives and have them use it daily. Executives should be able to ask probing questions about this data, have context, and ultimately make decisions using this data.
Before you implement this in your company, please consider the current resources (read people) already in place. My colleagues and I, on numerous occasions, have been tested beyond our limits to research problems or figure out ways to get our data pipelines temporarily "working" to provide data because the proper time to build these data pipelines is not a priority. Yet, management is surprised when these reports that customers and employees see fall apart consistently.
Fixing these reports is vital, but only when we have these established data pipelines can we begin to fuel a few core reports reliably and robustly, gaining the trust of employees and customers. This suggestion coincides with the 9th step in David Waller's article 10 Steps to Creating a Data-Driven Culture where we should "trade flexibility for consistency — at least in the short term." ³
Choosing which data pipelines to solidify has its challenges. First, how do we define whether a data pipeline is robust? The critical criteria would be data downtime. Data downtime is caused by partial, incomplete, inaccurate, or old. If we were to scrutinize our reports today, how reliable is it? Can we trust the data? Of course, the answer is "No." We will need more insight into these data pipelines before comparing their priorities. The tooling itself does not matter, but the gathering of metrics, traces, and logs does. I fully admit that even this will take some time or be an impossible feat. Many of the tools we use to band together the data pipelines have incomplete or limited metrics and do not easily integrate with metric tooling. I would choose the most critical data pipeline, which could do the most damage if inaccurate, to be considered the priority in this instance.
To define the most critical data pipeline, we should establish their service level objectives (SLOs) and service level indicators (SLIs). SLOs involve data freshness, data correctness, and data isolation/load balancing, whereas SLIs should meet a certain level over a certain period. If you want to have more information, Google defines these words and what they mean in this article. By figuring out which data pipelines have the most rigid criteria and affect customers and employees, these data pipelines will take precedence.
We have prioritized the data pipelines in a list and now need to be allocated the right amount of resources. Creating these robust pipelines is a much easier problem to solve as it requires allocating or hiring data engineers and software engineers who are focused on data pipelines. Regardless of the number of people newly hired, the key to rebuilding these pipelines is simplicity and robustness.
Now that we have robust data pipelines and simple reports, we can talk about the inherent inaccuracies of those reports. Identifying the level of uncertainty behind the collected data allows us to understand the areas that we are missing. This awareness leads us to create and run experiments, and these experiments will enable us to learn more about our users and how they interact with our product. Brent Dykes’ article "Creating A Data-Driven Culture: Why Leading By Example Is Essential" highlighted that former Google executive Marissa Mayer led a project to test 41 different shades of blue for its Google advertising links, and these experiments increased Google's ad revenues by £200M that year.⁴ Thus, a data-driven culture can provide revenue increases in addition to more investigations and thus more data. The more data we receive, the more data pipelines we need, and more importantly, the more data engineers we need.
Data engineers are at the foundation of establishing a data-driven culture. They provide clean, robust data pipelines that will encourage the rest of the company to use the data. Then, the leadership will need to use these reports daily to have the data-first mentality trickle down to the rest of the company. Though we might define some uncertainties in the recently reported data, we do so to provide opportunities for experimentation and growth as a standard instead of as a sideline.
References
[1] "Big Data and AI Executive Survey 2021." NewVantage Partners, 2021. https://www.newvantage.com/_files/ugd/e5361a_76709448ddc6490981f0cbea42d51508.pdf
[2] Davenport, Thomas H., et al. Analytics at Work: Smarter Decisions, Better Results. Harvard Business Review Press, 2010.
[3] Waller, David. "10 Steps to Creating a Data-Driven Culture." Harvard Business Review, https://www.facebook.com/HBR, 6 Feb. 2020, https://hbr.org/2020/02/10-steps-to-creating-a-data-driven-culture.
[4]Dykes, Brent. "Creating A Data-Driven Culture: Why Leading By Example Is Essential." Forbes, 26 Oct. 2017, https://www.forbes.com/sites/brentdykes/2017/10/26/creating-a-data-driven-culture-why-leading-by-example-is-essential.