Data Analysis Using the Spark DataFrame API

Apache Spark | Beginner

16 videos | 1h 10m 46s
Includes Assessment
Earns a Badge

(31)

From Channel:

Apache Spark

From Journey:

Data Analyst to Data Scientist

An open-source cluster-computing framework used for data science, Apache Spark has become the de facto big data framework. In this Skillsoft Aspire course, learners explore how to analyze real data sets by using DataFrame API methods. Discover how to optimize operations with shared variables and combine data from multiple DataFrames using joins. Explore the Spark 2.x version features that make it significantly faster than Spark 1.x. Other topics include how to create a Spark DataFrame from a CSV file; apply DataFrame transformations, grouping, and aggregation; perform operations on a DataFrame to analyze categories of data in a data set. Visualize the contents of a Spark DataFrame, with Matplotlib. Conclude by studying how to broadcast variables and DataFrame contents in text file format.

WHAT YOU WILL LEARN

Recognize the features that make spark 2.x versions significantly faster than spark 1.x

Specify the reasons for using shared variables in your spark application and distinguish between the two options available for sharing variables

Create a spark dataframe from the contents of a csv file and apply some simple transformations on the dataframe

Define a transformation to view a random sample of data from a large dataframe

Apply grouping and aggregation operations on a dataframe to analyze categories of data in a dataset

Use matplotlib to visualize the contents of a spark dataframe

Perform operations to prepare your dataset for analysis by trimming unnecessary columns and rows containing missing data

Define and apply a generic transformation on a dataframe
Apply complex transformations on a dataframe to extract meaningful information from a dataset

Work with broadcast variables and perform a join operation with a dataframe that has been broadcast

Use a spark accumulator as a counter

Store the contents of a dataframe in a text file for archiving or sharing

Define and work with a custom accumulator to count a vector of values

Perform different join operations on spark dataframes to combine data from multiple sources

Analyze data using the dataframe api

IN THIS COURSE

2m 25s

FREE ACCESS
6m 14s

After completing this video, you will be able to recognize the features that make Spark 2.x versions significantly faster than Spark 1.x versions. FREE ACCESS
3. Broadcast Variables and Accumulators

4m 54s

Upon completion of this video, you will be able to specify the reasons for using shared variables in your Spark application and distinguish between the two options available for sharing variables. FREE ACCESS
4. Loading Data into a DataFrame

6m 11s

In this video, you will learn how to create a Spark DataFrame from the contents of a CSV file and apply some simple transformations on the DataFrame. FREE ACCESS
5. Sampling the Contents of a DataFrame

4m 9s

In this video, you will learn how to define a transformation to view a random sample of data from a large DataFrame. FREE ACCESS
6. Grouping and Aggregations

6m 23s

To analyze categories of data in a dataset, find out how to apply grouping and aggregation operations on a DataFrame. FREE ACCESS
7. Visualizing Data in a DataFrame

7m 34s

In this video, you will learn how to use Matplotlib to visualize the contents of a Spark DataFrame. FREE ACCESS
8. Trimming and Cleaning Data

4m 32s

Learn how to perform operations to prepare your dataset for analysis by trimming unnecessary columns and rows that contain missing data. FREE ACCESS
9. User-Defined Functions and DataFrames

4m 36s

Learn how to define and apply a generic transformation to a DataFrame. FREE ACCESS
10. Combining Filters, Aggregations, and Sorting

3m 31s

In this video, you will learn how to apply complex transformations on a DataFrame to extract meaningful information from a dataset. FREE ACCESS
11. Using Broadcast Variables

3m 39s

In this video, you will learn how to work with broadcast variables and perform a join operation with a DataFrame that has been broadcast. FREE ACCESS
12. Using Accumulators

3m 59s

During this video, you will learn how to use a Spark accumulator as a counter. FREE ACCESS
13. Exporting DataFrame Contents

2m 15s

During this video, you will learn how to store the contents of a DataFrame in a text file for archival purposes or sharing. FREE ACCESS
14. Custom Accumulators

2m 56s

In this video, you will learn how to define and work with a custom accumulator to count a vector of values. FREE ACCESS
15. Join Operations

3m 28s

In this video, you will learn how to perform different join operations on Spark DataFrames to combine data from multiple sources. FREE ACCESS
16. Exercise: Data Analysis Using the DataFrame API

4m 1s

In this video, you will analyze data using the DataFrame API. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

Course Visualizing Data for Impact: Introduction to Data Visualization

(78)

Book Practical Machine Learning with Spark: Uncover Apache Spark's Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML

Course Apache Spark Getting Started

(137)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Streaming Data Architectures: An Introduction to Streaming Data in Spark

(32)

Course Fundamentals of BigQuery

(50)

Course Processing Data: Introducing Apache Spark

(57)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills & Salary Report

ESG Impact Report

Data Analysis Using the Spark DataFrame API

WHAT YOU WILL LEARN

IN THIS COURSE

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE