SRE Data Pipelines & Integrity: Data Pipelines

SRE | Intermediate

21 videos | 1h 11m 12s
Includes Assessment
Earns a Badge

(20)

Site reliability engineers often find data processing complex as demands for faster, more reliable, and extra cost-effective results continue to evolve. In this course, you'll explore techniques and best practices for managing a data pipeline. You'll start by examining the various pipeline application models and their recommended uses. You'll then learn how to define and measure service level objectives, plan for dependency failures, and create and maintain pipeline documentation. Next, you'll outline the phases of a pipeline development lifecycle's typical release flow before investigating more challenging topics such as managing data processing pipelines, using big data with simple data pipelines, and using periodic pipeline patterns. Lastly, you'll delve into the components of Google Workflow and recognize how to work with this system.

WHAT YOU WILL LEARN

Discover the key concepts covered in this course

Describe the characteristics of and rationale for using data processing pipelines

Recognize characteristics of the extract transform load (etl) pipeline model

Define business intelligence and data analytics in the context of data processing and give an example data analytics use case

List characteristics of machine learning (ml) applications

Define what is meant by service-level objectives (slos) and describe how they relate to pipeline data

Outline how to plan for dependency failures

Recognize how to create and maintain pipeline documentation

Outline the stages of a typical development lifecycle

Describe how to reduce hotspotting

Recognize how to implement autoscaling to handle spikes in workloads
Describe how to adhere best to access control and security policies

Plan escalation paths that ensure quick and proactive communication

Describe the effect big data can have on simple pipeline patterns

List the challenges with using the periodic pipeline pattern

Describe the issues that can occur due to uneven work distribution

List the potential drawbacks of periodic pipelines in distributed environments

Describe what comprises google workflow and outline how it works

Outline the stages of execution in google workflow, describing what they entail

Recognize the key factors to ensuring business continuity in big data pipelines using google workflow

Summarize the key concepts covered in this course

IN THIS COURSE

1m 53s

FREE ACCESS
4m 38s

Data processing pipelines are software that help organize and transform large datasets into usable information. They play an important role in modern business operations by providing fast, reliable, and accurate results. Data collection today is different than it was in the past, and data processing pipelines are essential to transforming these massive amounts of data into useful information. FREE ACCESS
3. The Extract Transform Load (ETL) Pipeline Model

3m 8s

The Extract Transform Load (ETL) model is a data transformation model used in business intelligence and IT. ETL pipelines are used to transform data from one format to another, and can be used for a variety of purposes such as preparing data for analysis or serving it to another application. FREE ACCESS
4. Business Intelligence and Data Processing

2m 52s

In this video, you'll learn more about Business Intelligence or BI. It refers to technologies, tools, and practices for pulling together, incorporating, analyzing, and presenting large volumes of data to aid the business with better decision making. You'll learn that BI is the use of databases to store and link data from various areas of a business together so that you can run reports and analyze that data to make key business decisions. FREE ACCESS
5. Features of Machine Learning Apps

4m 18s

In this video, you'll learn more about machine learning. This is an application of artificial intelligence that offers systems the ability to learn and improve from experience and data. Machine learning involves giving the computer some data and telling it to go learn from that data. There are different types of machine learning, including supervised, unsupervised, and reinforcement learning. FREE ACCESS
6. Service-level Objectives (SLOs) and Data Pipelines

4m 35s

In this video, you'll learn how to define what is meant by service-level objectives, or SLOs. You'll discover that an SLO is a target value or range of values for a service level that is measured by a SLI, or a service-level indicator. An SLI is what we measure. So an SLI could be latency, availability, uptime, for example, and the service-level objective is what we want to meet. FREE ACCESS
7. Planning for Dependency Failure

3m 32s

This session is about planning for dependency failures. The speaker discusses how to design for the largest failure that a service level agreement promises and how to plan for stage planned outages in order to be proactive instead of reactive. FREE ACCESS
8. Managing System Documentation

5m 3s

The objectives of this session are to discuss how to create and maintain system documentation, as well as how to identify when a pipeline is running slow. The presenter will also discuss ways to document processes and how to automate them. FREE ACCESS
9. Development Lifecycle Stages

5m 39s

In this video, you'll watch a demo on how to outline the stages of a typical development lifecycle. You'll learn that for starters, prototyping is the first phase of development for our pipeline and for verifying our semantic. It allows us to make sure we can implement the business logic that we need to execute the pipeline. This may mean making a decision on one programming language over another because it integrates with existing libraries. FREE ACCESS
10. Reducing Hotspotting

4m 8s

In this video, you'll learn how to describe the concept of hotspotting. Hotspotting occurs when resources become overloaded, so they get excessive access and this often results in operational failure. Pipelines are susceptible to workload patterns through reads and writes, causing delays in isolated reasons of data. You'll learn that when data for a particular query is concentrated on a limited number of nodes, it can hotspot. FREE ACCESS
11. Implementing Autoscaling for Workload Spikes

3m 38s

In this video, you'll learn the concept of autoscaling. This can help handle workload spikes. Autoscaling is good if your pipeline needs additional server resources to satisfy the number of processing jobs. You'll learn how to implement autoscaling to handle spikes in workloads. FREE ACCESS
12. Adhering to Security Policies

2m 59s

In this session, we will be discussing security policies and how to adhere to them. One of the most important aspects of security is data privacy. Data privacy is the practice of protecting the privacy of data. Security policies are necessary in order to protect data from unauthorized access and alteration. Adherence to security policies protects data by limiting who has access to it, when they have access to it, and how they use it. FREE ACCESS
13. Planning Escalation Paths

2m 16s

In this video, you will learn about pipeline design. You will want to ensure pipeline resiliency when you design your pipelines. Resilient data pipelines adapt in the event of failure. You will learn that a resilient data pipeline needs to detect failures, recover from failures, and return accurate data to the customer. FREE ACCESS
14. Big Data and Simple Pipelines

2m 23s

The objective of this session is to discuss the effect big data has on simple pipelines. Multiphase pipelines are typically used when processing big data because they increase the pipeline depth, making it easier to reason about and troubleshoot. FREE ACCESS
15. The Periodic Pipeline Pattern

2m 25s

FREE ACCESS
16. Issues with Uneven Work Distribution

2m 51s

FREE ACCESS
17. Periodic Pipelines in Distributed Environments

3m 21s

FREE ACCESS
18. Google Workflow's Composition

3m 17s

FREE ACCESS
19. Google Workflow's Stages of Execution

1m 43s

FREE ACCESS
20. Ensuring Business Continuity with Google Workflow

4m 50s

FREE ACCESS
21. Course Summary

1m 42s

FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

Course Software Product Management: Gathering Requirements

(29)

Course Scrum Toolbox: Traditional Scrum Tools

(14)

Course Predictive Analytics: Case Studies for Operations

(23)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Big Data Concepts: Getting to Know Big Data

(445)

Course CEH v11: Vulnerability Assessment Types, Models, Tools & Reports

(25)

Course Best Practices for DevOps Implementation

(178)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills & Salary Report

ESG Impact Report

SRE Data Pipelines & Integrity: Data Pipelines

WHAT YOU WILL LEARN

IN THIS COURSE

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE