Dataops is changing data, analytics and machine learning

The Dataops team will help us make the most of the data. The following content shows us how people, processes, technology and culture are integrated.

Have you noticed that most companies are trying to do more with their own data?

Companies are investing heavily in data science projects, self-service business intelligence tools, and artificial intelligence projects to enhance data-driven decision making. Some companies develop customer-facing applications by embedding data visualization into Web and mobile products, or by collecting new data from sensors (Internet of Things), wearables, and third-party APIs. Other companies are using information from unstructured data sources such as documents, images, videos, and spoken language.

Most of the work around data and analytics is to derive value from it. This includes dashboards for reporting, reporting and data visualization, models created by data scientists to predict results, or applications that integrate data, analysis, and models.

The underlying data operations work (ie Dataops) needs to be done before the data is ready for people to analyze and format the application for the end user. But the value of these jobs is often underestimated.

Dataops includes all the work of assembling, processing, cleaning, storing, and managing data. To describe the different functions of data integration, data processing, ETL (extract, transform and load), data preparation, data quality, master data management, data masking, and test data management, we use some complex terminology.

Just as cars are not just the sum of the various components, Dataops is the same. Dataops is a relatively new generic term in data management practices, with the goal of enabling data users (including managers, data scientists, applications) to successfully capture business value from data.

How Dataops works with other technical practices

Dataops brings together many of the many flexible working methods that drive iterative improvements in data processing metrics and quality. At the same time, it also has the advantages of devops, especially in terms of automated data flow, the ability to adjust data processing functions more frequently, and reduce recovery time in response to data operational events.

Dataops has even released a DataOps declaration that includes 20 principles covering culture (continuously satisfying customers), team dynamics (spontaneous organization, daily interaction), technical practices (creating a one-time environment), and quality (monitoring quality and Performance) and many other aspects.

You may be wondering why this term is needed. The answer is that they simplify the language and define roles for key business functions, which helps drive investment, adjust the team, and prioritize around business goals. A better understanding of this new term is defined around people, processes, technology and culture.

Dataops classification of people

On the personnel side, there are several roles related to Dataops:

Customers are the direct beneficiaries of the generated data, analytics, applications, and machine learning. They can be actual product customers, customers who use services, or customers within the enterprise, such as executives and leaders who use analytics for decision making, or employees who use data as part of a business process.

Data end users include data scientists, dashboard developers, report writers, application developers, citizen data scientists, or people who use data and deliver results through tools such as applications, data visualization, and APIs.

People who work directly with data operations, including database engineers, data engineers, developers who manage data flows and database tools.

The data administrator responsible for data quality, definitions, and links.

Business owners are often buyers of data services who make their own decisions around purchasing, funding, creating strategies and processes (data supply chains).

Define data flow, development, and operational processes

Dataops has many processes and rules whose maturity and investment depend to a large extent on the nature of the business requirements, data types, data complexity, service levels, and compliance.

On the one hand, Dataops represents the flow of data from source to delivery. This is the manufacturing process managed through Dataops development and operational processes. The development of data streams or data pipelines can be based on different data integration technologies, data cleansing techniques, and data management platforms. These processes not only introduce data, but also provide data administrators with tools to manage exceptions in data quality and data rules, enable data logging and other metadata functions, and perform data archiving and deletion procedures.

The second aspect of Dataops is the development process, through which various aspects of the data flow can be maintained and enhanced. The development process consists of several phases: sandbox management, development, orchestration, testing, deployment, and monitoring. The orchestration, testing, and deployment phases are similar to the devops CI/CD pipeline.

The Dataops process also involves operating and managing the infrastructure. As with devops, part of this work is closely related to managing production data streams to ensure reliability, security and performance. Because data science workflows are highly variable, especially machine learning, it is more challenging to develop scalable, high-performance, and data-science environments to support different workloads.

The future of Dataops technology

Dataops covers a wide range of data orchestration, processing, and management functions, so many technologies apply this term. In addition, because many companies are investing in big data, data science, and machine learning, vendors are fiercely competing in this space.

Amazon Web Services (AWS) has seven types of databases, such as a common relational database, document storage, and key-value database. Azure also offers several types of databases.

A large collection of tools integrates data and creates data streams, including data integration and data flow. Data quality and master data management exist in the data stream.

Many tools are closely related to Dataops development, data science, and testing. Although many organizations use Jupyter, there are other options for data science work. For example, you can consider tools such as Delphix and QuerySurge for testing.

Alteryx, Databricks, Dataiku, and ai provide end-to-end analytics and machine learning platforms. These platforms combine datasets, data science, and devops.

Other tools handle data security, data masking, and other data manipulation.

Competition is pushing the database culture

The conflict between the application development team and the operations team has spawned devops, the former having to release code frequently to speed up the process, while the latter will naturally slow down to ensure reliability, performance and security. The devops team reconciled this contradiction well and promoted investments in automation such as CI/CD, automated testing, infrastructure, code and centralized monitoring to help bridge the gap between technologies.

Dataops is another new thing. Data scientists, dashboard developers, data engineers, database developers, and other engineers work together on data flow and data quality. In addition to managing the speed of release and the performance, reliability, and security of the infrastructure, the Dataops team can increase the competitive value of data, analytics, machine learning models, and data delivery.

Competitive value depends on the deliverables of the overall analytics work and how the Dataops team solves complex data processing. How fast does the data run in the data stream? How much data volume and what quality level is supported? How fast is the team integrating new data sources? Is the database platform capable of supporting a growing variety of data modeling needs?

These are just some of the issues and performance metrics that the Dataops team must address. As more and more organizations gain business value through data and analytics investments, this also places a corresponding need for Dataops practices and culture.

The author of this article, Isaac Sacolick, is the author of “The Leadership Guide to Driving Digital: A Business Transformation Through Technology.” The guide introduces many practices on agility, devops, and data science, and has important implications for successful digital transformation initiatives.