Advanced Operations Using Hadoop MapReduce

Apache Hadoop 2.9    |    Intermediate
  • 9 Videos | 51m 46s
  • Includes Assessment
  • Earns a Badge
Likes 10 Likes 10
In this Skillsoft Aspire course, explore how MapReduce can be used to extract the five most expensive vehicles in a data set, then build an inverted index for the words appearing in a set of text files. Begin by defining a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue, then configure a Mapper to use a PriorityQueue to store the five most expensive automobiles it has processed from the dataset. Learn how to use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top five automobiles overall to the output, then execute the application to verify the results. Next, explore how you can utilize the MapReduce framework in order to generate an inverted index and configure the Reducer and Driver for the inverted index application. This leads on to running the application and examining the inverted index on HDFS (Hadoop Distributed File System). The concluding exercise involves advanced operations using MapReduce.

WHAT YOU WILL LEARN

  • define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue
    configure a Mapper to use a PriorityQueue to store the five most expensive vehicles it has processed from the dataset
    use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top 5 vehicles overall to the output
    execute the application and examine the output on HDFS to confirm that the five most expensive automobiles have been written out
  • define the Mapper for a MapReduce application to build an inverted index from a set of text files
    configure the Reducer and the Driver for the inverted index application
    run the application and examine the inverted index on HDFS
    recognize the data structures and configurations involved when extracting the top N values from a data set

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    2m 30s
    UP NEXT
  • Playable
    2. 
    Defining a User-Defined Type for a PriorityQueue
    6m 45s
  • Locked
    3. 
    Implementing a PriorityQueue in a Mapper
    5m 31s
  • Locked
    4. 
    Using a PriorityQueue in a Reducer
    6m 29s
  • Locked
    5. 
    Running and Verifying the Results
    5m 7s
  • Locked
    6. 
    Building an Inverted Index - Map Phase
    6m 5s
  • Locked
    7. 
    Building an Inverted Index - Reduce Phase
    5m 31s
  • Locked
    8. 
    Executing the Application and Viewing the Index
    5m 1s
  • Locked
    9. 
    Exercise: Advanced Operations Using MapReduce
    5m 17s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Likes 42 Likes 42  
Likes 11 Likes 11