Aspire Journeys

Data Infrastructure with Apache Kafka

7 Courses | 7h 54m 5s
1 Lab | 2h

(1)

In the Data Infrastructure with Apache Kafka journey, you will explore Apache Kafka, Integrate Kafka with Python and using consumer groups, Integrate Apache Kafka with Spark, and use Kafka with Cassandra and Confluent. You will explore the Kafka architecture for event streaming, setting up topics, creating brokers, and handling messages. You will also learn how to produce and consume messages using Kafka, tweaking Kafka broker configurations. This journey also focuses on Kafka performance optimization, and structured streaming with Apache Spark, which includes building Spark applications to process data streamed to Kafka topics using Data Frames and integrating Kafka with Spark and Cassandra for NoSQL data.

Track 1: Intro to Data Infrastructure

In this track of the Data Infrastructure with Apache Kafka Skillsoft Aspire journey, the focus will be on data infrastructure in an organization, data mesh architecture, data tools, messaging platforms, and data stores.

1 Course | 45m 46s

Track 2: Apache Kafka

In this track of the Data Infrastructure with Apache Kafka Skillsoft Aspire journey, the focus will be on Apache Kafka and Apache Spark.

6 Courses | 7h 8m 19s
1 Lab | 2h

COURSES INCLUDED

Setting up the Data Infrastructure in an Organization

In this course, you will look into the data mesh architecture and the process of selecting data platforms which best fulfill the needs of a large, data-driven organization. Begin by delving into two approaches to managing the data in an organization: a centralized data team and a data mesh architecture, which is a more federated approach. Explore how a data mesh allows individual domain teams in an organization to manage their own data as long as it is made available to other teams and adheres to certain standards. Next, discover the various considerations for selecting data-related tools in an organization. You will get a glimpse into Apache Kafka and RabbitMQ, two widely used messaging tools, and will see use cases where each of them excel. Finally, you will look into two use cases for data stores: one for a web or mobile app and another for a team performing data analysis. Here, you will look into the use of Apache Cassandra and the Snowflake platform.

7 videos | 45m Assessment Badge

FREE ACCESS

COURSES INCLUDED

Processing Data: Getting Started with Apache Kafka

Apache Kafka is a popular event streaming platform used by Fortune 100 companies for both real-time and batch data processing. In this course, you will explore the characteristics of event streaming and how Kafka architecture allows for scalable streaming data. Install Kafka and create some topics, which are essentially channels of communication between apps and data. Set up and work with multiple topics for durable storage. Create multiple brokers and cluster of nodes to handle messages and store their replicas. Then, monitor the settings and logs for those brokers. Finally, see how topic partitions and replicas provide redundancy and maintain high availability.

11 videos | 1h 31m Assessment Badge

Processing Data: Integrating Kafka with Python & Using Consumer Groups

Producers and consumers are applications that write events to and read events from Kafka. In this course, you will focus on integrating Python applications with a Kafka environment, implementing consumer groups, and tweaking Kafka configurations. Begin by connecting to Kafka from Python. You will produce to and consume messages from a Kafka topic using Python. Next, discover how to tweak Kafka broker configurations. You will place limits on the size of messages and disable deletion of topics. Then, publish messages to partitioned topics and explore the use of partitioning algorithms to determine the placement of messages on partitions. Explore consumer groups, which allow a set of consumers to process messages published to partitioned Kafka topics in parallel - without any duplication of effort. Finally, learn different ways to optimize Kafka's performance, using configurations for brokers and topics, as well as producer and consumer apps.

12 videos | 1h 24m Assessment Badge

Processing Data: Introducing Apache Spark

Apache Spark is a powerful distributed data processing engine that can handle petabytes of data by chunking that data and dividing across a cluster of resources. In this course, explore Spark's structured streaming engine, including components like PySpark shell. Begin by downloading and installing Apache Spark. Then create a Spark cluster and run a job from the PySpark shell. Monitor an application and job runs from the Spark web user interface. Then, set up a streaming environment, reading and manipulating the contents of files that are added to a folder in real-time. Finally, run apps on both Spark standalone and local modes.

13 videos | 1h 44m Assessment Badge

Processing Data: Integrating Kafka with Apache Spark

Flexible and Intuitive, DataFrames are a popular data structure in data analytics. In this course, build Spark applications that process data streamed to Kafka topics using DataFrames. Begin by setting up a simple Spark app that streams in messages from a Kafka topic, processes and transforms them, and publishes them to an output sink. Next, leverage the Spark DataFrame application programming interface by performing selections, projections, and aggregations on data streamed in from Kafka, while also exploring the use of SQL queries for those transformations. Finally, you will perform windowing operations - both tumbling windows, where the windows do not overlap, and sliding windows, where there is some overlapping of data.

12 videos | 1h 45m Assessment Badge

Processing Data: Using Kafka with Cassandra & Confluent

Apache Cassandra is a trusted open-source NoSQL distributed database that easily integrates with Apache Kafka as part of an ETL pipeline. This course focuses on that integration of Kafka, Spark and Cassandra and explores a managed version of Kafka with the Confluent data streaming platform. Begin by integrating Kafka with Apache Cassandra as part of an ETL pipeline involving a Spark application. Discover Apache Cassandra and learn the steps involved in linking Spark with this wide-column database. Next, examine the various features of the Confluent platform and find out how easy it is to set up and work with a Kafka environment. After completing this course, you will be prepared to implement and manage steam processing systems in your organization.

7 videos | 41m Assessment Badge

Final Exam: Apache Kafka

Final Exam: Apache Kafka will test your knowledge and application of the topics presented throughout the Apache Kafka track of the Skillsoft Aspire Data Infrastructure Journey.

1 video | 32s Assessment Badge

FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE TRACKS

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

Course Working with Containers: Introduction to Kubernetes

(35)

Course Setting up the Data Infrastructure in an Organization

(39)

Course Working with Apache Cassandra: Using Collection & User-defined Fields

(6)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More