Processing Data: Integrating Kafka with Apache Spark

Apache Kafka 3.2    |    Intermediate
  • 12 Videos | 1h 45m 38s
  • Includes Assessment
  • Earns a Badge
Flexible and Intuitive, DataFrames are a popular data structure in data analytics. In this course, build Spark applications that process data streamed to Kafka topics using DataFrames. Begin by setting up a simple Spark app that streams in messages from a Kafka topic, processes and transforms them, and publishes them to an output sink. Next, leverage the Spark DataFrame application programming interface by performing selections, projections, and aggregations on data streamed in from Kafka, while also exploring the use of SQL queries for those transformations. Finally, you will perform windowing operations - both tumbling windows, where the windows do not overlap, and sliding windows, where there is some overlapping of data.

WHAT YOU WILL LEARN

  • discover the key concepts covered in this course
    build a Spark application that reads from a Kafka topic
    manipulate streaming data and publish the output to the console
    subscribe to multiple Kafka topics from a Spark application
    write an app that generates data to periodically send to a Kafka topic
    develop a Spark application that publishes transformed data to a Kafka topic
  • transform streaming data with Spark SQL
    perform aggregations on Spark DataFrames and order their contents
    perform group by, aggregations, and ordering
    describe what windows are in the context of Spark streaming and define them using DataFrames
    define operations on tumbling and sliding windows
    summarize the key concepts covered in this course

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    1m 5s
    UP NEXT
  • Playable
    2. 
    Integrating Spark with Kafka
    10m 12s
  • Locked
    3. 
    Transforming Kafka Messages with PySpark
    10m
  • Locked
    4. 
    Reading from Multiple Kafka Topics
    11m 35s
  • Locked
    5. 
    Setting up a Producer and Consumer with Kafka
    11m
  • Locked
    6. 
    Publishing to Kafka from PySpark
    10m 25s
  • Locked
    7. 
    Transforming Data with Spark SQL
    7m 28s
  • Locked
    8. 
    Aggregations on Streaming Data
    10m 49s
  • Locked
    9. 
    Exploring Grouping and Ordering
    10m
  • Locked
    10. 
    Defining Window Operations
    12m 6s
  • Locked
    11. 
    Creating Tumbling and Sliding Windows
    8m 41s
  • Locked
    12. 
    Course Summary
    2m 17s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE