Filtering Data Using Hadoop MapReduce

Apache Hadoop 2.9
  • 9 Videos | 1h 1m 37s
  • Includes Assessment
  • Earns a Badge
Likes 7 Likes 7
Extracting meaningful information from a very large dataset can be painstaking. In this Skillsoft Aspire course, learners examine how Hadoop's MapReduce can be used to speed up this operation. In a new project, code the Mapper for an application to count the number of passengers in each Titanic class in the input data set. Then develop a Reducer and Driver to generate final passenger counts in each Titanic class. Build the project by using Maven and run on Hadoop master node to check that output correctly shows passenger class numbers. Apply MapReduce to filter only surviving Titanic passengers from the input data set. Execute the application and verify that filtering has worked correctly; examine job and output files with YARN cluster manager and HDFS (Hadoop Distributed File System) NameNode web User interfaces. Using a restaurant app's data set, use MapReduce to obtain the distinct set of cuisines offered. Build and run the application and confirm output with HDFS from both command line and web application. The exercise involves filtering data by using MapReduce.

WHAT YOU WILL LEARN

  • create a new project and code up the Mapper for an application to count the number of passengers in each class of the Titanic in the input dataset
    develop a Reducer and Driver for the application to generate the final passenger counts in each class of the Titanic
    build the project using Maven and run it on the Hadoop master node to check that the output correctly shows the numbers in each passenger class
    apply MapReduce to filter through only the surviving passengers on the Titanic from the input dataset
  • execute the application and verify that the filtering has worked correctly; examine the job and the output files using the YARN Cluster Manager and HDFS NameNode web UIs
    use MapReduce to obtain a distinct set of the cuisines offered by the restaurants in a dataset
    build and run the application and confirm the output using HDFS from both the command line and the web application
    identify configuration functions used to customize a MapReduce and recognize the types of input and output when null values are transmitted from the Mapper to the Reducer

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    2m 49s
    UP NEXT
  • Playable
    2. 
    Counting the Data Points in Each Category
    7m 14s
  • Locked
    3. 
    The Reducer and Driver Programs
    5m 19s
  • Locked
    4. 
    Building and Executing the Application
    8m 32s
  • Locked
    5. 
    A Simple Filter Using MapReduce
    8m 42s
  • Locked
    6. 
    Executing and Examining the Output
    7m 31s
  • Locked
    7. 
    Extracting the Unique Values in a Column
    7m 59s
  • Locked
    8. 
    Viewing the Distinct Values Extracted
    5m 27s
  • Locked
    9. 
    Exercise: Filtering Data Using MapReduce
    4m 33s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE