Spark for High-speed Big Data Analytics

Big Data 2021    |    Beginner
  • 12 Videos | 50m 51s
  • Includes Assessment
  • Earns a Badge
Spark is an open-source, massively parallel, in-memory solution that allows you to run big data analytics pipelines at high speed. Use this course to learn how Apache Spark works and gain an understanding of its architecture. As you progress, investigate the industry-leading examples of Uber and Alibaba to recognize how Spark can add business value to data in many industry types. Moving along, compare the functionality of Spark and Hadoop in relation to use cases, identifying when using Spark is most advantageous. Finally, explore fundamental Spark characteristics, optimization techniques, and best practices. When you've completed this course, you'll have a solid theoretical understanding of how and when to use Apache Spark for specific big data analytics tasks.

WHAT YOU WILL LEARN

  • discover the key concepts covered in this course
    recognize how Spark offers an open-source, scalable, massively parallel, in-memory solution for analytics applications
    outline the two main components of the Spark architecture: Resilient Distributed Dataset and Directed Acyclic Graph
    describe how Spark is providing business value to Uber
    describe how Spark is providing business value to Alibaba
    describe how Spark is providing business value to the Healthcare industry
  • compare and name the main differences between Spark and Hadoop with respect to ease of use, latency, security, and cost
    specify in which scenarios and conditions Spark is a better choice than its alternatives
    list the main features of Spark, such as loading behavior, file formats, parallelism, cache, data skews
    name the most important performance optimization techniques in Apache Spark, such as file format selection, level of parallelism, and API selection
    name simple best practices when using Spark, like starting small or resolving skewness
    summarize the key concepts covered in this course

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    1m 55s
    UP NEXT
  • Playable
    2. 
    The Core Characteristics of Apache Spark
    6m
  • Locked
    3. 
    Components of the Apache Spark Architecture
    4m 10s
  • Locked
    4. 
    Apache Spark Use Case: Uber Using Spark
    4m 52s
  • Locked
    5. 
    Apache Spark Use Case: Alibaba Using Spark
    4m 29s
  • Locked
    6. 
    Apache Spark Use Case: The Healthcare Industry
    3m 4s
  • Locked
    7. 
    Apache Spark vs. Hadoop
    3m 31s
  • Locked
    8. 
    Top Apache Spark Use Cases
    5m 23s
  • Locked
    9. 
    Apache Spark's Main Features
    4m 12s
  • Locked
    10. 
    Apache Spark Performance Optimization Techniques
    3m 42s
  • Locked
    11. 
    Apache Spark Best Practices
    3m 26s
  • Locked
    12. 
    Course Summary
    1m 10s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE