Streaming Data Architectures: Processing Streaming Data with Spark

Data Science    |    Intermediate
  • 11 Videos | 56m 40s
  • Includes Assessment
  • Earns a Badge
Likes 23 Likes 23
Process streaming data with Spark, the analytic engine built on Hadoop. In this course, you will discover how to develop applications in Spark to work with streaming data and generate output. Topics include the following: Configure a streaming data source; Use Netcat and write applications to process the data stream; Learn the effects of using the Update mode on your stream processing application's output; Write a monitoring application that listens for new files added to a directory; Compare the append output with the update mode; Develop applications to limit files processed in each trigger; Use Spark's Complete mode for output; Perform aggregation operations on streaming data with the DataFrame API; Process streaming data with Spark SQL queries.

WHAT YOU WILL LEARN

  • install the latest available version of PySpark
    configure a streaming data source using Netcat and write an application to process the stream
    describe the effects of using the Update mode for the output of your stream processing application
    write an application to listen for new files being added to a directory and process them as soon as they come in
    compare the Append output to the Update mode and distinguish between the two
  • develop applications that limit the files processed in each trigger and use Spark's Complete mode for the output
    perform aggregation operations on streaming data using the DataFrame API
    work with Spark SQL in order to process streaming data using SQL queries
    define and apply standard, re-usable transformations for streaming data
    recall they key ways to use Spark for streaming data and explore the ways to process streams and generate output

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    2m 10s
    UP NEXT
  • Playable
    2. 
    PySpark Setup
    2m 39s
  • Locked
    3. 
    Setting Up a Socket Stream with Netcat
    8m 53s
  • Locked
    4. 
    The Update Output Mode
    3m 32s
  • Locked
    5. 
    Using a File Input Stream
    7m 50s
  • Locked
    6. 
    The Append Output Mode
    2m 15s
  • Locked
    7. 
    The Complete Output Mode
    6m 35s
  • Locked
    8. 
    Aggregations on Streaming Data
    4m 6s
  • Locked
    9. 
    SQL Operations on Streaming Data
    4m 59s
  • Locked
    10. 
    User-Defined Functions (UDFs)
    4m 41s
  • Locked
    11. 
    Exercise: Processing Streaming Data
    4m 30s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Likes 42 Likes 42  
Likes 66 Likes 66  
Likes 609 Likes 609