Filtering Data Using Hadoop MapReduce

Apache Hadoop | Beginner

9 videos | 58m 7s
Includes Assessment
Earns a Badge

(7)

From Channel:

Apache Hadoop

From Journey:

Data Analyst to Data Scientist

Extracting meaningful information from a very large dataset can be painstaking. In this Skillsoft Aspire course, learners examine how Hadoop's MapReduce can be used to speed up this operation. In a new project, code the Mapper for an application to count the number of passengers in each Titanic class in the input data set. Then develop a Reducer and Driver to generate final passenger counts in each Titanic class. Build the project by using Maven and run on Hadoop master node to check that output correctly shows passenger class numbers. Apply MapReduce to filter only surviving Titanic passengers from the input data set. Execute the application and verify that filtering has worked correctly; examine job and output files with YARN cluster manager and HDFS (Hadoop Distributed File System) NameNode web User interfaces. Using a restaurant app's data set, use MapReduce to obtain the distinct set of cuisines offered. Build and run the application and confirm output with HDFS from both command line and web application. The exercise involves filtering data by using MapReduce.

WHAT YOU WILL LEARN

Create a new project and code up the mapper for an application to count the number of passengers in each class of the titanic in the input dataset

Develop a reducer and driver for the application to generate the final passenger counts in each class of the titanic

Build the project using maven and run it on the hadoop master node to check that the output correctly shows the numbers in each passenger class

Apply mapreduce to filter through only the surviving passengers on the titanic from the input dataset
Execute the application and verify that the filtering has worked correctly; examine the job and the output files using the yarn cluster manager and hdfs namenode web uis

Use mapreduce to obtain a distinct set of the cuisines offered by the restaurants in a dataset

Build and run the application and confirm the output using hdfs from both the command line and the web application

Identify configuration functions used to customize a mapreduce and recognize the types of input and output when null values are transmitted from the mapper to the reducer

IN THIS COURSE

2m 49s

FREE ACCESS
7m 14s

In this video, you will create a new project and code up the Mapper for an application to count the number of passengers in each class of the Titanic in the input dataset. FREE ACCESS
3. The Reducer and Driver Programs

5m 19s

Find out how to develop a Reducer and Driver for the application to generate the final passenger counts in each class of the Titanic. FREE ACCESS
4. Building and Executing the Application

8m 32s

In this video, you will build the project using Maven and run it on the Hadoop master node to check that the output correctly displays the numbers in each passenger class. FREE ACCESS
5. A Simple Filter Using MapReduce

8m 42s

In this video, you will learn how to apply MapReduce to filter through only the passengers who survived on the Titanic from the input dataset. FREE ACCESS
6. Executing and Examining the Output

7m 31s

During this video, you will learn how to execute the application and verify that the filtering has worked correctly. You will also examine the job and the output files using the YARN Cluster Manager and HDFS NameNode web UIs. FREE ACCESS
7. Extracting the Unique Values in a Column

7m 59s

In this video, you will learn how to use MapReduce to obtain a distinct set of cuisines offered by restaurants in a dataset. FREE ACCESS
8. Viewing the Distinct Values Extracted

5m 27s

In this video, find out how to build and run the application and confirm the output using HDFS from both the command line and the web application. FREE ACCESS
9. Exercise: Filtering Data Using MapReduce

4m 33s

In this video, you will learn how to identify configuration functions used to customize a MapReduce and recognize the types of input and output when null values are transmitted from the Mapper to the Reducer. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

Course Developing an AI/ML Data Strategy: Building an AI-powered Workforce

(100)

Course Software Product Management: Measuring Progress & Productivity

(5)

Course Agile Software Projects: Estimation Methods

(176)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Generative AI and Its Impact to Everyday Business

(893)

Course Advanced Operations Using Hadoop MapReduce

(7)

Course Loading & Querying Data with Hive

(32)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills & Salary Report

ESG Impact Report

Filtering Data Using Hadoop MapReduce

WHAT YOU WILL LEARN

IN THIS COURSE

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE