Streaming Data Architectures: Processing Streaming Data
Streaming Data Architectures: Processing Streaming Data
Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level
Overview/Description
Spark is an analytics engine built on Hadoop that can be used for working with big data, data science and processing batch and streaming data. In this course you will discover how to develop applications in Spark to work with streaming data and explore the different ways to process streams and generate an output.
Expected Duration (hours)
0.9
Lesson Objectives Streaming Data Architectures: Processing Streaming Data
Course Overview
install the latest available version of PySpark
configure a streaming data source using Netcat and write an application to process the stream
describe the effects of using the Update mode for the output of your stream processing application
write an application to listen for new files being added to a directory and process them as soon as they come in
compare the Append output to the Update mode and distinguish between the two
develop applications that limit the files processed in each trigger and use Spark's Complete mode for the output
perform aggregation operations on streaming data using the DataFrame API
work with Spark SQL in order to process streaming data using SQL queries
define and apply standard, re-usable transformations for streaming data
recall they key ways to use Spark for streaming data and explore the ways to process streams and generate output
Course Number: it_dssdardj_02_enus
Expertise Level
Intermediate