Using Apache Spark for AI Development

Apache Spark 2.4    |    Intermediate
  • 13 Videos | 42m 22s
  • Includes Assessment
  • Earns a Badge
Likes 6 Likes 6
Spark is a leading open-source cluster-computing framework that is used for distributed databases and machine learning. Although not primarily designed for AI, Spark allows you to take advantage of data parallelism and the large distributed systems used in AI development. AI practitioners should recognize when to use Spark for a particular application. In this course, you'll explore advanced techniques for working with Apache Spark and identify the key advantages of using Spark over other platforms. You'll define the meaning of resilient distributed databases (RDDs) and explore several workflows related to them. You'll move on to recognize how to work with a Spark DataFrame, identifying its features and use cases. Finally, you'll learn how to create a machine learning pipeline using Spark ML Pipelines.

WHAT YOU WILL LEARN

  • discover the key concepts covered in this course
    identify cases in which it is advantageous to use Spark over other platforms
    define a resilient distributed dataset and identify typical sources of data
    specify the unique features of a resilient distributed dataset
    describe how to create a resilient distributed dataset
    list possible operations with resilient distributed datasets and define their roles
    list potential sources of data for a Spark DataFrame and outline how to import these into Spark
  • name the features of a Spark DataFrame and some useful operations with which to use it
    outline how to create a Spark DataFrame
    specify how Spark ML Pipelines can be used for creating and tuning ML models
    describe fundamental concepts of Spark ML pipelines
    create an ML pipeline using Spark ML pipelines
    summarize the key concepts covered in this course

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    2m 46s
    UP NEXT
  • Playable
    2. 
    SPARK vs. Other Platforms
    5m
  • Locked
    3. 
    Resilient Distributed Dataset Sources
    3m 22s
  • Locked
    4. 
    Resilient Distributed Dataset Features
    2m 2s
  • Locked
    5. 
    Resilient Distributed Dataset Creation
    2m 43s
  • Locked
    6. 
    Resilient Distributed Dataset Operations
    2m 53s
  • Locked
    7. 
    Spark DataFrame Sources
    1m 58s
  • Locked
    8. 
    Spark DataFrame Features
    1m 42s
  • Locked
    9. 
    Spark DataFrame Creation
    2m 46s
  • Locked
    10. 
    Spark ML Pipelines
    3m 55s
  • Locked
    11. 
    Spark ML Pipeline Concepts
    2m
  • Locked
    12. 
    Creating a Pipeline with Spark ML
    4m 55s
  • Locked
    13. 
    Course Summary
    51s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.