Previous Page

Getting Started with Hadoop: Developing a Basic MapReduce Application

Getting Started with Hadoop: Developing a Basic MapReduce Application


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Apache Hadoop is a collection of open-source software utilities that facilitates solving data science problems. In this course, you will discover how to use Hadoop's MapReduce, including how to provision a Hadoop cluster on the cloud and then build a hello world application using MapReduce to calculate the word frequencies in a text document.



Expected Duration (hours)
1.2

Lesson Objectives

Getting Started with Hadoop: Developing a Basic MapReduce Application

  • create and configure a Hadoop cluster on the Google Cloud Platform using its Cloud Dataproc service
  • work with the YARN Cluster Manager and HDFS NameNode web applications that come packaged with Hadoop
  • use Maven to create a new Java project for the MapReduce application
  • develop a Mapper for the word frequency application that includes the logic to parse one line of the input file and produce a collection of keys and values as output
  • create a Reducer for the application that will collect the Mapper output and calculate the word frequencies in the input text file
  • specify the configurations of the MapReduce applications in the Driver program and the project's pom.xml file
  • build the MapReduce word frequency application using Maven to produce a jar file and then prepare for execution from the master node of the Hadoop cluster
  • run the application and examine the outputs generated to get the word frequencies in the input text document
  • idenfity the apps packaged with Hadoop and the purposes they serve and recall the classes/methods used in the Map and Reduce phases of a MapReduce application
  • Course Number:
    it_dshpfddj_02_enus

    Expertise Level
    Beginner