Previous Page

Getting Started with Hadoop: Fundamentals & MapReduce

Getting Started with Hadoop: Fundamentals & MapReduce


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Apache Hadoop is a collection of open-source software utilities that facilitates solving data science problems. In this course, you will explore the theory behind big data analysis using Hadoop and how MapReduce enables the parallel processing of large datasets distributed on a cluster of machines.



Expected Duration (hours)
1.1

Lesson Objectives

Getting Started with Hadoop: Fundamentals & MapReduce

  • describe what big data is and list the various sources and characteristics of data available today
  • recognize the challenges involved in processing big data and the options available to address them such as vertical and horizontal scaling
  • specify the role of Hadoop in processing big data and describe the function of its components such as HDFS, MapReduce, and YARN
  • identify the purpose and describe the workings of Hadoop's MapReduce framework to process data in parallel on a cluster of machines
  • recall the steps involved in building a MapReduce application and the specific workings of the Map phase in processing each row of data in the input file
  • recognize the functions of the Shuffle and Reduce phases in sorting and interpreting the output of the Map phase to produce a meaningful output
  • recognize the techniques related to scaling data processing tasks, working with clusters, and MapReduce and identify the Hadoop components and their functions
  • Course Number:
    it_dshpfddj_01_enus

    Expertise Level
    Beginner