Course details

Getting Started with Hadoop: MapReduce Applications With Combiners

Getting Started with Hadoop: MapReduce Applications With Combiners


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Apache Hadoop is a collection of open-source software utilities that facilitates solving data science problems. Hadoop enables speedy analysis of large datasets by distributing them on a cluster and processing them in parallel. Explore the use of Combiners to make MapReduce applications more efficient by minimizing data transfers.



Expected Duration (hours)
1.4

Lesson Objectives

Getting Started with Hadoop: MapReduce Applications With Combiners

  • recognize the need for combiners to optimize the execution of a MapReduce application by minimizing data transfers within a cluster
  • recall the steps involved in processing data in a MapReduce application
  • describe the working of a Combiner in performing a partial reduction of the data that is output from the Mapper
  • configure a Combiner to optimize a MapReduce application that calculates an average value
  • use Maven to create a new project for a MapReduce application and plan out the Map and Reduce phases by examining the auto prices dataset
  • develop the Mapper and Reducer for the application that will calculate the average price for each make of automobile in the input dataset
  • create the driver program for the MapReduce application
  • run the MapReduce application and check the output to get the average price for each automobile make
  • code up a Combiner for the MapReduce application and configure the Driver to use it for a partial reduction on the Mapper nodes of the cluster
  • fix the bug in the previous application by defining a type that represents both the aggregate price and count of automobiles that can be used to correctly calculate the average price
  • compare the output of the modified application with the previous buggy version and verify that the average prices for the vehicles are being calculated correctly
  • identify the shortcomings of regular MapReduce operations which are addressed by Combiners, and how Combiners differ from Reducers
  • Course Number:
    it_dshpfddj_04_enus

    Expertise Level
    Intermediate