Processing Data: Introducing Apache Spark

Apache Spark 3.2    |    Intermediate
  • 13 Videos | 1h 44m 10s
  • Earns a Badge
Apache Spark is a powerful distributed data processing engine that can handle petabytes of data by chunking that data and dividing across a cluster of resources. In this course, explore Spark's structured streaming engine, including components like PySpark shell. Begin by downloading and installing Apache Spark. Then create a Spark cluster and run a job from the PySpark shell. Monitor an application and job runs from the Spark web user interface. Then, set up a streaming environment, reading and manipulating the contents of files that are added to a folder in real-time. Finally, run apps on both Spark standalone and local modes.

WHAT YOU WILL LEARN

  • discover the key concepts covered in this course
    describe how Apache Hadoop and Spark work
    recall the architecture and features of Apache Spark
    recognize the use cases of Spark in general and specifically, its structured streaming engine
    install and configure Apache Spark
    create a Spark cluster with a master and worker
    run a job on the PySpark shell and view its details from the Spark web user interface (UI)
  • execute Spark commands and monitor jobs with the Spark web UI
    configure a Spark cluster using the spark-env.sh file
    set up an environment to stream files, and build an app to process files in real-time
    execute apps on a Spark standalone cluster
    distinguish between Spark standalone and local deployment modes
    summarize the key concepts covered in this course

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    1m 23s
    UP NEXT
  • Playable
    2. 
    Apache Spark
    12m 30s
  • Locked
    3. 
    Apache Spark Architecture
    13m 7s
  • Locked
    4. 
    Structured Streaming in Apache Spark
    8m 11s
  • Locked
    5. 
    Downloading and Installing Spark
    6m 50s
  • Locked
    6. 
    Deploying a Spark Cluster
    9m 53s
  • Locked
    7. 
    Launching a Spark Job
    11m 9s
  • Locked
    8. 
    Monitoring Spark Apps with the Web UI
    7m 31s
  • Locked
    9. 
    Configuring a Spark Cluster
    6m 33s
  • Locked
    10. 
    Building a Spark Streaming App
    9m 50s
  • Locked
    11. 
    Running Apps on a Standalone Cluster
    8m 29s
  • Locked
    12. 
    Running Apps on Spark Local
    6m 14s
  • Locked
    13. 
    Course Summary
    2m 30s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.