Hadoop HDFS Getting Started

Apache Hadoop    |    Beginner
  • 12 videos | 1h 14m 36s
  • Includes Assessment
  • Earns a Badge
Rating 4.7 of 45 users Rating 4.7 of 45 users (45)
Explore the concepts of analyzing large data sets in this 12-video Skillsoft Aspire course, which deals with Hadoop and its Hadoop Distributed File System (HDFS), which enables parallel processing of big data efficiently in a distributed cluster. The course assumes a conceptual understanding of Hadoop and its components; purely theoretical, it contains no labs, with just enough information provided to understand how Hadoop and HDFS allow processing big data in parallel. The course opens by explaining the ideas of vertical and horizontal scaling, then discusses functions served by Hadoop to horizontally scale data processing tasks. Learners explore functions of YARN, MapReduce, and HDFS, covering how HDFS keeps track of where all pieces of large files are distributed, replication of data, and how HDFS is used with Zookeeper: a tool maintained by the Apache Software Foundation and used to provide coordination and synchronization in distributed systems, along with other services related to distributed computing-a naming service, configuration management, and so on. Learn about Spark, a data analytics engine for distributed data processing.

WHAT YOU WILL LEARN

  • Recognize the need to process massive datasets at scale
    Describe the benefits of horizontal scaling for processing big data and the challenges of this approach
    Recall the features of a distributed cluster which address the challenges of horizontal scaling
    Identify the features of hdfs which enables large datasets to be distributed across a cluster
    Describe the simple and high-availability architectures of hdfs and the implementations for each of them
    Identify the role of hadoop's mapreduce in processing chunks of big datasets in parallel
  • Recognize the role of the yarn resource negotiator in enabling map and reduce operations to execute on a cluster
    Describe the steps involved in resource allocation and job execution for operations on a hadoop cluster
    Recall how apache zookeeper enables the hdfs namenode and yarn resourcemanager to run in high-availability mode
    Identify various technologies which integrate with hadoop and simplify the task of big data processing
    Recognize the key features of distributed clusters, hdfs, and the input outs of the map and reduce phases

IN THIS COURSE

  • 2m 17s
  • 4m 29s
    After completing this video, you will be able to recognize the need to process massive datasets quickly. FREE ACCESS
  • Locked
    3.  Horizontal Scaling for Big Data
    7m 12s
    After completing this video, you will be able to describe the benefits of horizontal scaling for processing big data and the challenges of this approach. FREE ACCESS
  • Locked
    4.  Distributed Clusters and Horizontal Scaling
    8m 1s
    After completing this video, you will be able to recall the features of a distributed cluster that address the challenges of horizontal scaling. FREE ACCESS
  • Locked
    5.  Overview of HDFS
    4m 52s
    In this video, find out how to identify the features of HDFS which enable large datasets to be distributed across a cluster. FREE ACCESS
  • Locked
    6.  HDFS Architectures
    6m 51s
    Upon completion of this video, you will be able to describe the simple and high-availability architectures of HDFS and the implementations for each of them. FREE ACCESS
  • Locked
    7.  MapReduce for HDFS
    8m 24s
    In this video, you will identify the role of Hadoop's MapReduce in processing chunks of big datasets in parallel. FREE ACCESS
  • Locked
    8.  YARN for HDFS
    6m 49s
    Upon completion of this video, you will be able to recognize the role of the YARN resource negotiator in enabling Map and Reduce operations to execute on a cluster. FREE ACCESS
  • Locked
    9.  The Mechanism of Resource Allocation in Hadoop
    2m 43s
    Upon completion of this video, you will be able to describe the steps involved in resource allocation and job execution for operations on a Hadoop cluster. FREE ACCESS
  • Locked
    10.  Apache Zookeeper for HDFS
    8m 25s
    Upon completion of this video, you will be able to recall how Apache Zookeeper enables the HDFS NameNode and YARN ResourceManager to run in a high-availability mode. FREE ACCESS
  • Locked
    11.  The Hadoop Ecosystem
    8m 9s
    In this video, you will identify various technologies that integrate with Hadoop and simplify the task of big data processing. FREE ACCESS
  • Locked
    12.  Exercise: An Introduction to HDFS
    6m 26s
    After completing this video, you will be able to recognize the key features of distributed clusters, HDFS, and the inputs and outputs of the Map and Reduce phases. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Rating 4.4 of 79 users Rating 4.4 of 79 users (79)
Rating 4.5 of 20 users Rating 4.5 of 20 users (20)
Rating 4.3 of 97 users Rating 4.3 of 97 users (97)