What is DataOps? Applying DevOps to Data
The definition and meaning of DataOps is simple.
Both DataOps and DevOps apply the best practices of technology development and operations to improve quality, increase speed, reduce security threats, delight customers and provide meaningful and challenging work for skilled professionals. DevOps and DataOps share goals to accelerate product delivery by automating as many process steps as possible. For DataOps, the objective is a resilient data pipeline and trusted insights from data analytics.
The heritage of DevOps is lean manufacturing and Agile development. These traditions help build a DevOps culture deeply rooted in maximizing customer value without breaking the bank. DataOps is a relative newcomer seeking to expand the virtues of DevOps to data pipelines and analytics.
Gartner estimates 50-80% of data analytics projects fail. Getting analytics right is critical because organizations are competing based on the effectiveness of their data-driven insights that inform decision-making. Analytics, machine learning (ML), and continuous testing eliminate costly errors.
Neither DevOps nor DataOps is mature. Both processes are dynamic and evolving, and the blueprint of the future hasn’t been drawn. Consequently, the marketplace for tools is chaotic and may be this way for some time. Many tools overlap, and job roles are fluid.
“DataOps gives programmers confidence they are using the right data at the right time," said Mitch Martin, Director of Software Engineering, Data Society, a Skillsoft learning partner. "Code is solid, while data is fluid and has a more complex life cycle. DataOps provides the orchestration needed,”
The advantages of applying DevOps' strengths to data are too great to sit this one out, but that doesn’t make it any easier to know where to get started. Data literacy is low in many organizations leading to less-than-optimal decisions about data sources and uses.
However, overcoming the chaos in the marketplace isn’t optional.
“DataOps aims to benefit multiple data consumers through data-use cases from the simple data sharing to the full spectrum of data analytics popularized by the Gartner model: descriptive, diagnostic, predictive, and prescriptive. It brings together self-contained teams with data analytics, data science, data engineering, DevOps skills, and line of business expertise in close collaboration. The goal of DataOps for data science is to turn unprocessed data into a useful data science product that provides utility to customers through a rapid, scalable, and repeatable process.”
-Harvinder Atwal, Practical DataOps: Delivering Agile Data Science at Scale.
DataOps is the solution encompassing the data supply chain end-to-end and a nascent approach to address unmet data needs. The process is shepherded by the DataOps Manifesto, a set of principles to govern data science, analytics, data visualization, data metrics, and data administration. The DatOps Manifesto defines these key principles as:
- Continually satisfy your customer
- Value working analytics
- Embrace change
- It's a team sport
- Daily interactions
- Reduce heroism
- Analytics is code
- Make it reproducible
- Disposable environments
- Analytics is manufacturing
- Quality is paramount
- Monitor quality and performance
- Improve cycle times
DataOps formalizes data job roles and clarifies career paths, and provides an organizational model for delivering data products and services at speed and scale.
In an open and connected environment, you need to secure each step in a way that doesn't leak secret or confidential information. You have a build phase, an integration phase, and a deployment phase, and you iterate these. You need to make sure your organizational secrets and client data are all secured along the way. This process culture has to be ingrained from the start. In this build-integrate-deploy process and the iteration of this, building models, integrating them, and then improving the results by iterating or rebuilding based on new information is the process that requires our DevOps to improve and automate. This culture is what benefits from our tactical elements, but the overall process must be there from the start.
For example, the open-source Apache Phoenix Project emphasizes the need to shorten feedback loops to help data scientists learn from mistakes. Apache Phoenix project is relatively new and addresses aspects of data access and availability.
Winning with analytics takes great data and a robust data supply chain. DataOps does not remove job roles or eliminate the need for specialization. In DataOps, for example, the database administrator (DBA) can grow into data engineering by embracing cultural change. Being curious is an advantage.
“DataOps is more of an empowerment idea. Empowering people to take back control of how they produce insight in a way that allows them to have higher job happiness and deliver more results.”
The DataOps Podcast, episode Data is Empowerment.