Apache Hadoop: Apache Hadoop 2.0 Beginner

https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66168&expertiselevel=66167 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66172&expertiselevel=66167 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66168&expertiselevel=66170 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66169&expertiselevel=66170 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66171&expertiselevel=66170 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66172&expertiselevel=66170 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66173&expertiselevel=66170
  • 7 Courses | 6h 5m 28s
  • 7 Books | 32h 25m
  • 6 Courses | 6h 31m 24s
  • 4 Books | 18h 27m
  • 21 Courses | 17h 34m 3s
  • 8 Books | 39h 5m
  • 27 Courses | 29h 56m 42s
  • 8 Books | 42h 22m
  • 4 Courses | 4h 37m 12s
  • 1 Book | 3h 4m
  • 3 Courses | 3h 12m 25s
  • 4 Books | 18h 27m
  • 5 Courses | 3h 58m 14s
  • 8 Books | 36h 48m
Likes 147 Likes 147
 
Apache Hadoop is an open source framework for the storage and processing of big data. Come explore the ins and outs of Hadoop.

GETTING STARTED

Fundamentals & Installation

  • Playable
    1. 
    What is Hadoop? What Does it Mean for the Developer?
    3m 14s
    NOW PLAYING
  • Playable
    2. 
    Who is Using Hadoop?
    2m 54s
    UP NEXT

GETTING STARTED

Hadoop HDFS Getting Started

  • Playable
    1. 
    Course Overview
    2m 17s
    NOW PLAYING
  • Playable
    2. 
    Scaling Datasets
    4m 29s
    UP NEXT

GETTING STARTED

Ecosystem for Hadoop

  • Playable
    1. 
    Mapping Big Data
    4m 52s
    NOW PLAYING
  • Playable
    2. 
    Continuing to Map Big Data
    3m 22s
    UP NEXT

GETTING STARTED

Designing Clusters

  • Playable
    1. 
    Defining Supercomputing
    5m 50s
    NOW PLAYING
  • Playable
    2. 
    Examining Engineering Teams
    5m 43s
    UP NEXT

GETTING STARTED

Managing Big Data Using HDInsight Hadoop

  • Playable
    1. 
    Features of HDInsight
    6m 37s
    NOW PLAYING
  • Playable
    2. 
    Fundamentals and Types of Clusters in HDInsight
    5m 25s
    UP NEXT

GETTING STARTED

Hadoop HDFS File Permissions

  • Playable
    1. 
    Course Overview
    2m 11s
    NOW PLAYING
  • Playable
    2. 
    The HDFS count and du Commands
    7m 42s
    UP NEXT

GETTING STARTED

Hadoop Distributed File System

  • Playable
    1. 
    HDFS Architecture
    6m 55s
    NOW PLAYING
  • Playable
    2. 
    HDFS Considerations
    3m 19s
    UP NEXT

COURSES INCLUDED

Fundamentals & Installation
Apache Hadoop is a set of algorithms for distributed storage and distributed processing of very large data sets. Get started with Hadoop by learning about big data, and how to install and use Hadoop.
12 videos | 45m has Assessment available Badge
Storage & MapReduce
MapReduce is a framework for writing applications to process huge amounts of data. Let's look at Hadoop storage, MapReduce, and how to use MapReduce with associated development tools.
11 videos | 51m has Assessment available Badge
Programming with MapReduce
You must have a good understanding of MapReduce to be able to program with it. Here we look at MapReduce in detail, and demonstrate the basics of programming in MapReduce.
16 videos | 1h 13m has Assessment available Badge
Using Hive & Pig with Hadoop
There are components other than MapReduce that let you write code to process large data sets stored in Hadoop. Let's see how to work with two such components - Hive and Pig.
7 videos | 35m has Assessment available Badge
Introduction
Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Explore Hadoop, its key tools, and applications.
10 videos | 46m has Assessment available Badge
Ecosystem & MapReduce
Hadoop is a framework providing for distributed storage and processing of large data sets. Explore the Hadoop ecosystem and Java MapReduce.
10 videos | 43m has Assessment available Badge
Introduction to Data Modeling
Discover various data genres and management tools, the reasons behind the evolving plethora of new big data platforms from the perspective of big data management systems, and analytical tools.
15 videos | 1h 8m has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Hadoop HDFS Getting Started
Explore the concepts of analyzing large data sets in this 12-video Skillsoft Aspire course, which deals with Hadoop and its Hadoop Distributed File System (HDFS), which enables parallel processing of big data efficiently in a distributed cluster. The course assumes a conceptual understanding of Hadoop and its components; purely theoretical, it contains no labs, with just enough information provided to understand how Hadoop and HDFS allow processing big data in parallel. The course opens by explaining the ideas of vertical and horizontal scaling, then discusses functions served by Hadoop to horizontally scale data processing tasks. Learners explore functions of YARN, MapReduce, and HDFS, covering how HDFS keeps track of where all pieces of large files are distributed, replication of data, and how HDFS is used with Zookeeper: a tool maintained by the Apache Software Foundation and used to provide coordination and synchronization in distributed systems, along with other services related to distributed computing—a naming service, configuration management, and so on. Learn about Spark, a data analytics engine for distributed data processing.
12 videos | 1h 19m has Assessment available Badge
Introduction to the Shell for Hadoop HDFS
In this Skillsoft Aspire course, learners discover how to set up a Hadoop Cluster on the cloud and explore bundled web apps—the YARN Cluster Manager app and the HDFS (Hadoop Distributed File System) NameNode UI. This 9-video course assumes a good understanding of what Hadoop is, and how HDFS enables processing of big data in parallel by distributing large data sets across a cluster; learners should also be familiar with running commands from the Linux shell, with some fluency in basic Linux file system commands. The course opens by exploring two web applications which are packaged with Hadoop, the UI for the YARN cluster manager, and the node name UI for HDFS. Learners then explore two shells which can be used to work with HDFS, the Hadoop FS shell and Hadoop DFS shell. Next, you will explore basic commands which can be used to navigate HDFS; discuss their similarities with Linux file system commands; and discuss distributed computing. In a closing exercise, practice identifying web applications used to explore and also monitor Hadoop.
9 videos | 56m has Assessment available Badge
Working with Files in Hadoop HDFS
In this Skillsoft Aspire course, learners will encounter basic Hadoop file system operations such as viewing the contents of directories and creating new ones. This 8-video course assumes good understanding of what Hadoop is, and how HDFS enables processing of big data in parallel by distributing large data sets across a cluster; learners should also be familiar with running commands from the Linux shell, with some fluency in basic Linux file system commands. Begin by working with files in various ways, including transferring files between a local file system and HDFS (Hadoop Distributed File System) and explore ways to create and delete files on HDFS. Then examine different ways to modify files on HDFS. After exploring the distributed computing concept, prepare to begin working with HDFS in a production setting. In the closing exercise, write a command to create a directory/data/products/files on HDFS, for which data/products may not exist; list two commands for two copy operations—one from local file system to HDFS, and another for reverse transfer, from HDFS to local host.
8 videos | 50m has Assessment available Badge
Hadoop & MapReduce Getting Started
In this course, learners will explore the theory behind big data analysis using Hadoop, and how MapReduce enables parallel processing of large data sets distributed on a cluster of machines. Begin with an introduction to big data and the various sources and characteristics of data available today. Look at challenges involved in processing big data and options available to address them. Next, a brief overview of Hadoop, its role in processing big data, and the functions of its components such as the Hadoop Distributed File System (HDFS), MapReduce, and YARN (Yet Another Resource Negotiator). Explore the working of Hadoop's MapReduce framework to process data in parallel on a cluster of machines. Recall steps involved in building a MapReduce application and specifics of the Map phase in processing each row of the input file's data. Recognize the functions of the Shuffle and Reduce phases in sorting and interpreting the output of the Map phase to produce a meaningful output. To conclude, complete an exercise on the fundamentals of Hadoop and MapReduce.
8 videos | 1h 6m has Assessment available Badge
Developing a Basic MapReduce Hadoop Application
In this Skillsoft Aspire course, discover how to use Hadoop's MapReduce; provision a Hadoop cluster on the cloud; and build an application with MapReduce to calculate word frequencies in a text document. To start, create a Hadoop cluster on the Google Cloud Platform using its Cloud Dataproc service; then work with the YARN Cluster Manager and HDFS (Hadoop Distributed File System) NameNode web applications that come packaged with Hadoop. Use Maven to create a new Java project for the MapReduce application, and develop a mapper for word frequency application. Create a Reducer for the application that will collect Mapper output and calculate word frequencies in input text files, and identify configurations of MapReduce applications in the Driver program and the project's pom.xml file. Next, build the MapReduce word frequency application with Maven to produce a jar file and prepare for execution from the master node of the Hadoop cluster. Finally, run the application and examine outputs generated to get word frequencies in the input text document. The exercise involves developing a basic MapReduce application.
10 videos | 1h 17m has Assessment available Badge
Filtering Data Using Hadoop MapReduce
Extracting meaningful information from a very large dataset can be painstaking. In this Skillsoft Aspire course, learners examine how Hadoop's MapReduce can be used to speed up this operation. In a new project, code the Mapper for an application to count the number of passengers in each Titanic class in the input data set. Then develop a Reducer and Driver to generate final passenger counts in each Titanic class. Build the project by using Maven and run on Hadoop master node to check that output correctly shows passenger class numbers. Apply MapReduce to filter only surviving Titanic passengers from the input data set. Execute the application and verify that filtering has worked correctly; examine job and output files with YARN cluster manager and HDFS (Hadoop Distributed File System) NameNode web User interfaces. Using a restaurant app's data set, use MapReduce to obtain the distinct set of cuisines offered. Build and run the application and confirm output with HDFS from both command line and web application. The exercise involves filtering data by using MapReduce.
9 videos | 1h 1m has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Ecosystem for Hadoop
Hadoop is a framework providing for distributed storage and processing of large data sets. Introduce yourself to a big data model and the Hadoop ecosystem.
8 videos | 36m has Assessment available Badge
Hadoop Design Principles
Hadoop’s HDFS is a highly fault-tolerant distributed file system suitable for applications that have large data sets. Explore the principles of supercomputing and Hadoop's open source software components.
11 videos | 48m has Assessment available Badge
Selecting & Creating an Environment
Learn how to prepare your environment for a Hadoop installation. Here we review the minimum system requirements, create a development environment, install Java, and set up SSH for Hadoop.
4 videos | 27m available Badge
Installation & Configuration
Once your environment is set up, you are ready to install Hadoop. Follow the step-by-step instructions for installing Hadoop in a pseudo-mode, and learn more about the Hadoop architecture.
9 videos | 1h 5m has Assessment available Badge
Configuration & Troubleshooting
After installation, there are tasks you need to perform before using Hadoop. Learn how to first use HDFS, WordCount, & Web UIs, perform configuration changes, and troubleshoot installation errors.
8 videos | 39m has Assessment available Badge
Data Repository with HDFS & HBase
It is vital you understand the Hadoop Distributed File System (HDFS). Explore the server architecture, and learn about the command line interface and common HDFS administration issues facing all end users.
13 videos | 1h 6m has Assessment available Badge
HBase & ZooKeeper
Hadoop is all about big data. Explore the theory of HBase as another data repository built alongside or on top of HDFS. Also, learn how to install and configure HBase and ZooKeeper, and use the HBase command line.
7 videos | 54m has Assessment available Badge
Data Repository with Flume
Flume is tool for dealing with extraction and loading of unstructured data. Learn about the theory of Flume, its functional parts, and how to install Flume for use.
12 videos | 52m has Assessment available Badge
Timestamps, Sources, & Troubleshooting
Flume is tool for dealing with extraction and loading of unstructured data. Learn how to work with Flume sinks, sources, & agents, and how to troubleshoot Flume agents & failures.
12 videos | 51m has Assessment available Badge
Data Repository with Sqoop
Sqoop is a tool for transferring structured data between Hadoop and a RDBMS. Explore the architecture and installation of Sqoop, how to perform imports and exports, Hive SQL statements, and more.
16 videos | 1h 14m available Badge
Data Refinery with YARN
YARN is a parallel processing framework that provides the resources for data computations. Explore the theory of parallel processing and the architecture of the YARN framework.
7 videos | 26m has Assessment available Badge
Data Refinery with MapReduce
MapReduce is a set of classes, which abstract away the complexity of parallel processing. Learn how MapReduce can take a single compute job and run it in our super computing platform.
13 videos | 58m available Badge
Data Factory with Hive
Hive is a SQL-like tool for interfacing with Hadoop. Learn how to install and configure Hive, and how to use basic SQL commands in Hive to access and manipulate data.
11 videos | 1h 3m available Badge
Hive Joining, Partitioning, & Troubleshooting
Hive is a SQL-like tool for interfacing with Hadoop. Learn how to use Hive joins and views, partition Hive data, create Hive buckets, and troubleshoot errors.
10 videos | 42m available Badge
Data Factory with Pig
Pig is a data flow language for interfacing with Hadoop to extract, transform, and load data. Learn how to install & configure Pig, and use the command line to write and execute Pig scripts.
12 videos | 49m available Badge
Pig Functions & Troubleshooting
Pig is a data flow language for interfacing with Hadoop to extract, transform, and load data. Learn how to work with Pig joins, groups, & user-defined functions, and troubleshoot & debug with Pig.
8 videos | 47m available Badge
HiveServer2 & HCatalog
Oozie is a workflow tool for coordinating other components in Hadoop. To use Oozie, a number of other components must be installed first. Learn the purpose of and how to install and configure the Hive metastore, HiveServer2, and HCatalog.
6 videos | 53m available Badge
Data Factory with Oozie
Oozie is a workflow tool for coordinating other components of the Hadoop ecosystem. Learn how to install, configure, & use Oozie to create and run workflows.
10 videos | 58m available Badge
Data Factory with Hue
Hue is an easy-to-use web UI to interface to HTFS, MapReduce, Hive, Pig, & Oozie. Learn how to install, configure, & use Hue to work with Hadoop components.
6 videos | 34m has Assessment available Badge
Data Flow for the Hadoop Ecosystem
Data must move into and through Hadoop for it to function. Here we look at Hadoop and the data life cycle management, and use Sqoop and Hive to flow data.
12 videos | 1h 3m available Badge
Ecosystem Components
Explore components for the Hadoop ecosystem and best practices for pseudo-mode implementation. Also, examine the administration tasks of troubleshooting classpath errors and creating complex configuration files.
5 videos | 39m has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Designing Clusters
Hadoop is a framework providing fast and reliable analysis of large data sets. Introduce yourself to supercomputing, and explore the design principles of using Hadoop as a supercomputing platform.
6 videos | 35m has Assessment available Badge
Hadoop Cluster Architecture
Learn how to design a Hadoop cluster by taking an in-depth look at the hardware, network concepts, and the architecture that make up the cluster.
11 videos | 57m has Assessment available Badge
Cluster Deployment Planning
Let's take a look at Hadoop cluster deployment, including planning a deployment and how to avoid or overcome deployment problems.
5 videos | 40m has Assessment available Badge
Hadoop in the Cloud
Amazon Web Services (AWS) is a secure cloud-computing platform offered by Amazon.com. Explore the key services offered by AWS and learn how to set up a Hadoop cluster.
16 videos | 1h 38m has Assessment available Badge
Data Migration & EMR
Discover how to use the AWS command line interface, examine AWS Elastic MapReduce (EMR), learn how to set up an EMR cluster, and explore the various ways to run EMR jobs.
10 videos | 1h 14m available Badge
Cluster Deployment Tools & Images
To deploy a Hadoop Cluster, you must ensure networks, disks, and hosts are configured correctly. Examine the configuration management tools, learn how to create configuration items, and set up a CM environment.
6 videos | 49m available Badge
Cluster Architecture Configuration
To deploy a Hadoop Cluster, you must ensure networks, disks, and hosts are configured correctly. Explore the Hadoop cluster architecture, learn how to start, stop, & configure Hadoop clusters, and configure logging & MySQL databases.
8 videos | 1h 1m has Assessment available Badge
Cluster Deployment
To deploy a Hadoop Cluster, you must ensure networks, disks, and hosts are configured correctly. Learn how to set up of some of the common open-source software used to create and deploy a Hadoop ecosystem.
8 videos | 1h 7m available Badge
Cluster Availability
Nothing is more important than having your Hadoop cluster available for use. Discover how Hadoop leverages fault tolerance, and explore a number of the reliability features that have been designed into Hadoop.
10 videos | 1h 11m has Assessment available Badge
Availability Configuration
To be useful, your Hadoop cluster must be available. Here we discuss and demonstrate high availability for HDFS NameNode and how to recover from failures.
6 videos | 53m available Badge
YARN Availability
Nothing is more important than having your Hadoop cluster available for use. Examine YARN container and job reliability, and discover how to set up high-availability for YARN's ResourceManager.
8 videos | 42m has Assessment available Badge
Securing Clusters
Hadoop lets big data technologies reach companies, but as this grows so do the security concerns. Examine the risks and learn how to implement security groups and work with Kerberos.
8 videos | 1h 8m has Assessment available Badge
Securing with Kerberos
Hadoop lets big data technologies reach companies, but as this grows so do the security concerns. Examine the risks and learn how to implement HDFS, YARN, Hive, and other measures.
10 videos | 1h 16m has Assessment available Badge
Managing Security
Hadoop lets big data technologies reach companies, but as this grows so do the security concerns. Examine the risks and learn how to manage user security, access control lists, and other features.
9 videos | 1h 6m has Assessment available Badge
Operating Hadoop Clusters
Hadoop is a framework for running applications on large clusters of commodity hardware. Discover service levels, Hadoop releases, change management, and rack awareness.
5 videos | 38m has Assessment available Badge
Cluster Administration
Hadoop is a framework for running applications on large clusters of commodity hardware. Discover HDFS administration, quotas, DataNodes, HDFS scaling, and more.
10 videos | 1h 9m available Badge
Balancing, Backup, & Upgrades
Hadoop is a framework for running applications on large clusters of commodity hardware. Discover how to balance a Hadoop cluster, manage jobs, and perform backup and recovery for HDFS.
8 videos | 1h 3m available Badge
Stabilizing Clusters
Tuning Hadoop clusters is vital to improve  cluster performance. Explore the importance of incident management and working with Nagios.
8 videos | 1h 26m available Badge
Ganglia & Metrics2
Tuning Hadoop clusters is vital to improve  cluster performance. Discover Ganglia functionality and Metrics2.
7 videos | 1h 3m has Assessment available Badge
Monitoring & Troubleshooting
Tuning Hadoop clusters is vital to improve  cluster performance. Explore log management, problem management, and best practices for root cause analysis.
10 videos | 1h 12m has Assessment available Badge
Capacity Management Strategies
Apache Hadoop is an open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware. Explore capacity management of Hadoop clusters, including strategies and schedulers.
4 videos | 28m has Assessment available Badge
Capacity Management
Apache Hadoop is an open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware. Explore resource management through scheduling, the Fair Scheduler tool, and how to plan for scaling.
16 videos | 1h 46m has Assessment available Badge
Performance Tuning Best Practices
Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage. Discover performance tuning concepts, including compression, tune up options, and memory optimization.
11 videos | 1h 20m has Assessment available Badge
Cluster Performance Tuning
Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage. Examine tune up options, best practices for performance tuning, HDFS, YARN and MapReduce.
13 videos | 1h 22m has Assessment available Badge
Cloudera Manager & Hadoop Clusters
Cloudera Manager is a simple automated customizable management tool for Hadoop clusters. Explore web consoles for Cloudera Manager, cluster management tools, and cluster deployment.
6 videos | 58m has Assessment available Badge
Cloudera Manager Administration
Cloudera Manager is a simple automated customizable management tool for Hadoop clusters. Discover Cloudera Manager administration, including cluster management, services, and resource management.
7 videos | 1h 6m available Badge
Cloudera Manager Tools & Configuration
Cloudera Manager is a simple automated customizable management tool for Hadoop clusters. Discover Cloudera Manager tools and configuration, including performance tweaking, Impala, Sentry, Hive, Hue with MySQL, and Oozie workflows.
12 videos | 1h 56m available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Managing Big Data Using HDInsight Hadoop
Explore the fundamentals of Azure HDInsight and the essential architectural components.
12 videos | 1h 10m has Assessment available Badge
Microsoft Analytics Platform System & Hive
Explore the Microsoft Analytics Platform System and using Hive to manage data from a data warehouse perspective.
17 videos | 1h 35m has Assessment available Badge
HDInsight & Retail Sales Implementation Using Hive
This course covers the implementation of data warehousing in retail sales. Learners will learn to design and implement data warehousing solutions using Hive and PowerBI on HDInsight.
11 videos | 50m has Assessment available Badge
Working with Spark Using HDInsight & Cluster Management
Discover how to work with Spark and its in-memory capabilities of data management. How to manage and troubleshoot HDInsight clusters using Ambari and the Azure CLI tool is also covered.
12 videos | 1h has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Hadoop HDFS File Permissions
Explore reasons why not all users should have free reign over all data sets, when managing a data warehouse. In this 9-video Skillsoft Aspire course, learners explore how file permissions can be viewed and configured in HDFS (Hadoop File Management System) and how the NameNode UI is used to monitor and explore HDFS. For this course, you need a good understanding of Hadoop and HDFS, along with familiarity with the HDFS shells, and confidence in working with and manipulating files on HDFS, and exploring it from the command line. The course focuses on different ways to view permissions, which are linked to files and directories, and how these can be modified. Learners explore automating many tasks involving HDFS by simply scripting them, and to use HDFS NameNode UI to monitor the distributed file system, and explore its contents. Review distributed computing and big data. The closing exercise involves writing a command to be used on the HDFS dfs shell to count the number of files within a directory on HDFS, and to perform related tasks.
9 videos | 52m has Assessment available Badge
Hadoop MapReduce Applications With Combiners
In this Skillsoft Aspire course, explore the use of Combiners to make MapReduce applications more efficient by minimizing data transfers. Start by learning about the need for Combiners to optimize the execution of a MapReduce application by minimizing data transfers within a cluster. Recall the steps to process data in a MapReduce application, and look at using a Combiner to perform partial reduction of data output from the Mapper. Then create a new project to calculate average automobile prices using Maven for a MapReduce application. Next, develop the Mapper and Reducer to calculate the average price for automobile makes in the input data set. Create a driver program for the MapReduce application, run it, and check output to get the average price per automobile. Learn how to code up a Combiner for a MapReduce application, fix the bug in the application so it can be used to correctly calculate the average price, then run the fixed application to verify that the prices are being calculated correctly. The concluding exercise concerns optimizing MapReduce with Combiners.
13 videos | 1h 28m has Assessment available Badge
Advanced Operations Using Hadoop MapReduce
In this Skillsoft Aspire course, explore how MapReduce can be used to extract the five most expensive vehicles in a data set, then build an inverted index for the words appearing in a set of text files. Begin by defining a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue, then configure a Mapper to use a PriorityQueue to store the five most expensive automobiles it has processed from the dataset. Learn how to use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top five automobiles overall to the output, then execute the application to verify the results. Next, explore how you can utilize the MapReduce framework in order to generate an inverted index and configure the Reducer and Driver for the inverted index application. This leads on to running the application and examining the inverted index on HDFS (Hadoop Distributed File System). The concluding exercise involves advanced operations using MapReduce.
9 videos | 51m has Assessment available Badge

COURSES INCLUDED

Hadoop Distributed File System
Discover the HDFS architecture and its main building blocks. In addition, explore data replication, communication protocols, and accessibility.
11 videos | 37m has Assessment available Badge
Clusters
Clusters are used to store and analyze large volumes of data in a distributed computer environment. Explore the best practices to follow when implementing clusters in Hadoop.
8 videos | 51m has Assessment available Badge
Hadoop on Amazon EMR
Hadoop can be used with Amazon EMR to process vast amounts of data. Explore how to use Hadoop with Amazon EMR.
10 videos | 52m has Assessment available Badge
Hadoop Ranger
Apache Ranger is used to provide data security across a Hadoop implementation. Explore the installation of Ranger and Ranger authentication considerations, as well as customizing services to run Ranger alongside Hadoop.
9 videos | 55m has Assessment available Badge
Maintenance & Distributions
Distributions provide performance and functionality enhancements over the base open source code Apache provides. Explore the various distributions available and common maintenance tasks in a Hadoop environment.
10 videos | 41m has Assessment available Badge
SHOW MORE
FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

BOOKS INCLUDED

Book

Professional Hadoop
Serving as the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings, this guide details every aspect of Hadoop technology to enable optimal processing of large data sets, and gets you acquainted with the framework's processes and capabilities right away.
Book Duration 3h 47m Book Authors By Benoy Antony, et al.

Book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools
From setting up the environment to running sample applications, this step-by-step resource is a practical tutorial on using the Apache Hadoop ecosystem project.
Book Duration 4h 56m Book Authors By Deepak Vohra

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset
Approaching the problem of managing massive data sets from a systems perspective, this book explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage - and then explains, in an easily understood manner and through numerous examples, how to use each tool.
Book Duration 5h 27m Book Authors By Michael Frampton

Book

Pro Hadoop
Written from the perspective of a principal engineer with down–in–the–trenches knowledge of what to do wrong with Hadoop, this book shows how to avoid the common, expensive first errors that everyone makes with creating their own Hadoop system.
Book Duration 7h Book Authors By Jason Venner

Book

Hadoop Architecture and SQL: The Best HiveQL Book in the Universe
Including hundreds of pages of SQL examples and explanations, this book is perfect for anyone who wants to query Hadoop with SQL and educates readers on how to create tables, how the data is distributed, and how the system processes the data.
Book Duration 1h 32m Book Authors By Jason Nolander, Tom Coffing

Book

Hadoop for Dummies
Showing you how to harness the power of your data and rein in the information overload, this detailed guide will help you understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.
Book Duration 6h 39m Book Authors By Dirk deRoos, et al.

Book

Processing Big Data with Azure HDInsight: Building Real-World Big Data Systems on Azure HDInsight Using the Hadoop Ecosystem
As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer.
Book Duration 3h 4m Book Authors By Vinit Yadav
SHOW MORE
FREE ACCESS

BOOKS INCLUDED

Book

Big Data and Hadoop: Learn by Example
Containing the latest trends in big data and Hadoop, this learn-by-doing resource explains how big Big Data is and why everybody is trying to implement it into their IT projects.
Book Duration 4h 17m Book Authors By Mayank Bhushan

Book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools
From setting up the environment to running sample applications, this step-by-step resource is a practical tutorial on using the Apache Hadoop ecosystem project.
Book Duration 4h 56m Book Authors By Deepak Vohra

Book

Professional Hadoop
Serving as the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings, this guide details every aspect of Hadoop technology to enable optimal processing of large data sets, and gets you acquainted with the framework's processes and capabilities right away.
Book Duration 3h 47m Book Authors By Benoy Antony, et al.

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset
Approaching the problem of managing massive data sets from a systems perspective, this book explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage - and then explains, in an easily understood manner and through numerous examples, how to use each tool.
Book Duration 5h 27m Book Authors By Michael Frampton
SHOW MORE
FREE ACCESS

BOOKS INCLUDED

Book

Professional Hadoop
Serving as the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings, this guide details every aspect of Hadoop technology to enable optimal processing of large data sets, and gets you acquainted with the framework's processes and capabilities right away.
Book Duration 3h 47m Book Authors By Benoy Antony, et al.

Book

Big Data and Hadoop: Learn by Example
Containing the latest trends in big data and Hadoop, this learn-by-doing resource explains how big Big Data is and why everybody is trying to implement it into their IT projects.
Book Duration 4h 17m Book Authors By Mayank Bhushan

Book

Pro Apache Hadoop, Second Edition
Taking you quickly to the seasoned pro level on the hottest cloud-computing framework, this book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data.
Book Duration 7h 26m Book Authors By Jason Venner, Madhu Siddalingaiah, Sameer Wadkar

Book

Hadoop for Dummies
Showing you how to harness the power of your data and rein in the information overload, this detailed guide will help you understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.
Book Duration 6h 39m Book Authors By Dirk deRoos, et al.

Book

Pro Hadoop Data Analytics: Designing and Building Big Data Systems using the Hadoop Ecosystem
Emphasizing best practices to ensure coherent, efficient development, this book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation.
Book Duration 3h 4m Book Authors By Kerry Koitzsch

Book

Professional Hadoop Solutions
With in-depth code examples in Java and XML and the latest on recent additions to the Hadoop ecosystem, this complete resource also covers the use of APIs, exposing their inner workings and allowing architects and developers to better leverage and customize them.
Book Duration 8h 2m Book Authors By Alexey Yakubovich, Boris Lublinsky, Kevin T. Smith

Book

Big Data Processing Beyond Hadoop and MapReduce
Authored by EMC Proven Professionals, Knowledge Sharing articles present ideas, expertise, unique deployments, and best practices. This article provides an overview of various new and upcoming alternatives to Hadoop MR.
Book Duration 23m Book Authors By Ravi Sharda

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset
Approaching the problem of managing massive data sets from a systems perspective, this book explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage - and then explains, in an easily understood manner and through numerous examples, how to use each tool.
Book Duration 5h 27m Book Authors By Michael Frampton
SHOW MORE
FREE ACCESS

BOOKS INCLUDED

Book

Professional Hadoop
Serving as the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings, this guide details every aspect of Hadoop technology to enable optimal processing of large data sets, and gets you acquainted with the framework's processes and capabilities right away.
Book Duration 3h 47m Book Authors By Benoy Antony, et al.

Book

Big Data and Hadoop: Learn by Example
Containing the latest trends in big data and Hadoop, this learn-by-doing resource explains how big Big Data is and why everybody is trying to implement it into their IT projects.
Book Duration 4h 17m Book Authors By Mayank Bhushan

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset
Approaching the problem of managing massive data sets from a systems perspective, this book explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage - and then explains, in an easily understood manner and through numerous examples, how to use each tool.
Book Duration 5h 27m Book Authors By Michael Frampton

Book

Practical Hadoop Security
For administrators planning a production Hadoop deployment who want to secure their Hadoop clusters, this resource takes you through a comprehensive study of how to implement defined security within a Hadoop cluster in a hands-on way.
Book Duration 3h 40m Book Authors By Bhushan Lakhe

Book

Professional Hadoop Solutions
With in-depth code examples in Java and XML and the latest on recent additions to the Hadoop ecosystem, this complete resource also covers the use of APIs, exposing their inner workings and allowing architects and developers to better leverage and customize them.
Book Duration 8h 2m Book Authors By Alexey Yakubovich, Boris Lublinsky, Kevin T. Smith

Book

Pro Hadoop Data Analytics: Designing and Building Big Data Systems using the Hadoop Ecosystem
Emphasizing best practices to ensure coherent, efficient development, this book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation.
Book Duration 3h 4m Book Authors By Kerry Koitzsch

Book

Hadoop for Dummies
Showing you how to harness the power of your data and rein in the information overload, this detailed guide will help you understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.
Book Duration 6h 39m Book Authors By Dirk deRoos, et al.

Book

Pro Apache Hadoop, Second Edition
Taking you quickly to the seasoned pro level on the hottest cloud-computing framework, this book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data.
Book Duration 7h 26m Book Authors By Jason Venner, Madhu Siddalingaiah, Sameer Wadkar
SHOW MORE
FREE ACCESS

BOOKS INCLUDED

Book

Processing Big Data with Azure HDInsight: Building Real-World Big Data Systems on Azure HDInsight Using the Hadoop Ecosystem
As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer.
Book Duration 3h 4m Book Authors By Vinit Yadav

BOOKS INCLUDED

Book

Big Data and Hadoop: Learn by Example
Containing the latest trends in big data and Hadoop, this learn-by-doing resource explains how big Big Data is and why everybody is trying to implement it into their IT projects.
Book Duration 4h 17m Book Authors By Mayank Bhushan

Book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools
From setting up the environment to running sample applications, this step-by-step resource is a practical tutorial on using the Apache Hadoop ecosystem project.
Book Duration 4h 56m Book Authors By Deepak Vohra

Book

Professional Hadoop
Serving as the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings, this guide details every aspect of Hadoop technology to enable optimal processing of large data sets, and gets you acquainted with the framework's processes and capabilities right away.
Book Duration 3h 47m Book Authors By Benoy Antony, et al.

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset
Approaching the problem of managing massive data sets from a systems perspective, this book explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage - and then explains, in an easily understood manner and through numerous examples, how to use each tool.
Book Duration 5h 27m Book Authors By Michael Frampton
SHOW MORE
FREE ACCESS

BOOKS INCLUDED

Book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools
From setting up the environment to running sample applications, this step-by-step resource is a practical tutorial on using the Apache Hadoop ecosystem project.
Book Duration 4h 56m Book Authors By Deepak Vohra

Book

Pro Hadoop Data Analytics: Designing and Building Big Data Systems using the Hadoop Ecosystem
Emphasizing best practices to ensure coherent, efficient development, this book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation.
Book Duration 3h 4m Book Authors By Kerry Koitzsch

Book

Professional Hadoop
Serving as the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings, this guide details every aspect of Hadoop technology to enable optimal processing of large data sets, and gets you acquainted with the framework's processes and capabilities right away.
Book Duration 3h 47m Book Authors By Benoy Antony, et al.

Book

Practical Hive: A Guide to Hadoop's Data Warehouse System
From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, this go-to resource gives you a detailed treatment of the software.
Book Duration 3h 57m Book Authors By Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard, Scott Shaw

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset
Approaching the problem of managing massive data sets from a systems perspective, this book explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage - and then explains, in an easily understood manner and through numerous examples, how to use each tool.
Book Duration 5h 27m Book Authors By Michael Frampton

Book

Hadoop Architecture and SQL: The Best HiveQL Book in the Universe
Including hundreds of pages of SQL examples and explanations, this book is perfect for anyone who wants to query Hadoop with SQL and educates readers on how to create tables, how the data is distributed, and how the system processes the data.
Book Duration 1h 32m Book Authors By Jason Nolander, Tom Coffing

Book

Pro Apache Hadoop, Second Edition
Taking you quickly to the seasoned pro level on the hottest cloud-computing framework, this book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data.
Book Duration 7h 26m Book Authors By Jason Venner, Madhu Siddalingaiah, Sameer Wadkar

Book

Hadoop for Dummies
Showing you how to harness the power of your data and rein in the information overload, this detailed guide will help you understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.
Book Duration 6h 39m Book Authors By Dirk deRoos, et al.
SHOW MORE
FREE ACCESS

YOU MIGHT ALSO LIKE

Likes 22 Likes 22  
Likes 0 Likes 0  
Likes 4 Likes 4