Hadoop HDFS Getting Started

Apache Hadoop 2.9
  • 12 Videos | 1h 19m 36s
  • Includes Assessment
  • Earns a Badge
Likes 25 Likes 25
Explore the concepts of analyzing large data sets in this 12-video Skillsoft Aspire course, which deals with Hadoop and its Hadoop Distributed File System (HDFS), which enables parallel processing of big data efficiently in a distributed cluster. The course assumes a conceptual understanding of Hadoop and its components; purely theoretical, it contains no labs, with just enough information provided to understand how Hadoop and HDFS allow processing big data in parallel. The course opens by explaining the ideas of vertical and horizontal scaling, then discusses functions served by Hadoop to horizontally scale data processing tasks. Learners explore functions of YARN, MapReduce, and HDFS, covering how HDFS keeps track of where all pieces of large files are distributed, replication of data, and how HDFS is used with Zookeeper: a tool maintained by the Apache Software Foundation and used to provide coordination and synchronization in distributed systems, along with other services related to distributed computing—a naming service, configuration management, and so on. Learn about Spark, a data analytics engine for distributed data processing.

WHAT YOU WILL LEARN

  • recognize the need to process massive datasets at scale
    describe the benefits of horizontal scaling for processing big data and the challenges of this approach
    recall the features of a distributed cluster which address the challenges of horizontal scaling
    identify the features of HDFS which enables large datasets to be distributed across a cluster
    describe the simple and high-availability architectures of HDFS and the implementations for each of them
    identify the role of Hadoop's MapReduce in processing chunks of big datasets in parallel
  • recognize the role of the YARN resource negotiator in enabling Map and Reduce operations to execute on a cluster
    describe the steps involved in resource allocation and job execution for operations on a Hadoop cluster
    recall how Apache Zookeeper enables the HDFS NameNode and YARN ResourceManager to run in high-availability mode
    identify various technologies which integrate with Hadoop and simplify the task of big data processing
    recognize the key features of distributed clusters, HDFS, and the input outs of the Map and Reduce phases

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    2m 17s
    UP NEXT
  • Playable
    2. 
    Scaling Datasets
    4m 29s
  • Locked
    3. 
    Horizontal Scaling for Big Data
    7m 12s
  • Locked
    4. 
    Distributed Clusters and Horizontal Scaling
    8m 1s
  • Locked
    5. 
    Overview of HDFS
    4m 52s
  • Locked
    6. 
    HDFS Architectures
    6m 51s
  • Locked
    7. 
    MapReduce for HDFS
    8m 24s
  • Locked
    8. 
    YARN for HDFS
    6m 49s
  • Locked
    9. 
    The Mechanism of Resource Allocation in Hadoop
    2m 43s
  • Locked
    10. 
    Apache Zookeeper for HDFS
    8m 25s
  • Locked
    11. 
    The Hadoop Ecosystem
    8m 9s
  • Locked
    12. 
    Exercise: An Introduction to HDFS
    6m 26s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

CHANNEL Apache Hadoop
Likes 135 Likes 135  
Likes 7 Likes 7  

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Likes 41 Likes 41  
Likes 259 Likes 259  
Likes 104 Likes 104