Streaming Data Architectures: Processing Streaming Data with Spark
Data Science
| Intermediate
- 11 Videos | 52m 10s
- Includes Assessment
- Earns a Badge
Process streaming data with Spark, the analytic engine built on Hadoop. In this course, you will discover how to develop applications in Spark to work with streaming data and generate output. Topics include the following: Configure a streaming data source; Use Netcat and write applications to process the data stream; Learn the effects of using the Update mode on your stream processing application's output; Write a monitoring application that listens for new files added to a directory; Compare the append output with the update mode; Develop applications to limit files processed in each trigger; Use Spark's Complete mode for output; Perform aggregation operations on streaming data with the DataFrame API; Process streaming data with Spark SQL queries.
WHAT YOU WILL LEARN
-
install the latest available version of PySparkconfigure a streaming data source using Netcat and write an application to process the streamdescribe the effects of using the Update mode for the output of your stream processing applicationwrite an application to listen for new files being added to a directory and process them as soon as they come incompare the Append output to the Update mode and distinguish between the two
-
develop applications that limit the files processed in each trigger and use Spark's Complete mode for the outputperform aggregation operations on streaming data using the DataFrame APIwork with Spark SQL in order to process streaming data using SQL queriesdefine and apply standard, re-usable transformations for streaming datarecall they key ways to use Spark for streaming data and explore the ways to process streams and generate output
IN THIS COURSE
-
1.Course Overview2m 10sUP NEXT
-
2.PySpark Setup2m 39s
-
3.Setting Up a Socket Stream with Netcat8m 53s
-
4.The Update Output Mode3m 32s
-
5.Using a File Input Stream7m 50s
-
6.The Append Output Mode2m 15s
-
7.The Complete Output Mode6m 35s
-
8.Aggregations on Streaming Data4m 6s
-
9.SQL Operations on Streaming Data4m 59s
-
10.User-Defined Functions (UDFs)4m 41s
-
11.Exercise: Processing Streaming Data4m 30s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform
Digital badges are yours to keep, forever.