Previous Page

Hadoop HDFS: Introduction

Hadoop HDFS: Introduction

Expected Duration
Lesson Objectives
Course Number
Expertise Level


HDFS is the file system used for data science which enables the parallel processing of big data in distributed cluster. In this Skillsoft Aspire course, you will explore the concepts of analyzing large datasets and explore how Hadoop and HDFS make this process very efficient.

Expected Duration (hours)

Lesson Objectives

Hadoop HDFS: Introduction

  • Course Overview
  • recognize the need to process massive datasets at scale
  • describe the benefits of horizontal scaling for processing big data and the challenges of this approach
  • recall the features of a distributed cluster which address the challenges of horizontal scaling
  • identify the features of HDFS which enables large datasets to be distributed across a cluster
  • describe the simple and high-availability architectures of HDFS and the implementations for each of them
  • identify the role of Hadoop's MapReduce in processing chunks of big datasets in parallel
  • recognize the role of the YARN resource negotiator in enabling Map and Reduce operations to execute on a cluster
  • describe the steps involved in resource allocation and job execution for operations on a Hadoop cluster
  • recall how Apache Zookeeper enables the HDFS NameNode and YARN ResourceManager to run in high-availability mode
  • identify various technologies which integrate with Hadoop and simplify the task of big data processing
  • recognize the key features of distributed clusters, HDFS, and the input outs of the Map and Reduce phases
  • Course Number:

    Expertise Level