Data Warehousing: Apache Hive 2.3.2 intermediate

https://www.skillsoft.com/channel/data-warehousing-e8fdec76-bd65-4935-80c9-031d06a3f056?technologyandversion=122843&expertiselevel=122842 https://www.skillsoft.com/channel/data-warehousing-e8fdec76-bd65-4935-80c9-031d06a3f056?technologyandversion=122844&expertiselevel=122842 https://www.skillsoft.com/channel/data-warehousing-e8fdec76-bd65-4935-80c9-031d06a3f056?technologyandversion=122847&expertiselevel=122842 https://www.skillsoft.com/channel/data-warehousing-e8fdec76-bd65-4935-80c9-031d06a3f056?technologyandversion=122844&expertiselevel=122845 https://www.skillsoft.com/channel/data-warehousing-e8fdec76-bd65-4935-80c9-031d06a3f056?technologyandversion=122846&expertiselevel=122845 https://www.skillsoft.com/channel/data-warehousing-e8fdec76-bd65-4935-80c9-031d06a3f056?technologyandversion=122847&expertiselevel=122845
  • 5 Courses | 4h 18m 1s
  • 2 Books | 6h 18m
  • 2 Courses | 2h 44m 30s
  • 5 Books | 38h 25m
  • 3 Courses | 3h 28m 4s
  • 3 Books | 6h 4m
  • 2 Courses | 2h 1m 36s
  • 6 Courses | 6h 33m 45s
  • 2 Books | 14h 36m
  • 3 Courses | 2h 46m 10s
  • 3 Books | 6h 4m
Rating 3.0 of 1 users Rating 3.0 of 1 users (1)
 
Explore Data Warehousing (EDW), a well-established system that can be used to enhance business productivity by providing systems that facilitate reporting and data analysis.

GETTING STARTED

Scalable Data Architectures: Getting Started

  • 2m 37s
  • 9m 11s

GETTING STARTED

Data Warehouse Essential: Concepts

  • 8m 28s
  • 7m 2s

GETTING STARTED

Getting Started with Hive

  • 2m 21s
  • 4m 54s

GETTING STARTED

Modern Data Warehouses

  • 2m 9s
  • 7m 48s

GETTING STARTED

Data Warehousing with Azure: Architecture & Modeling Techniques

  • 5m 23s
  • 5m 53s

GETTING STARTED

Optimizing Query Executions with Hive

  • 2m 18s
  • 4m 52s

COURSES INCLUDED

Scalable Data Architectures: Getting Started
Explore theoretical foundations of the need for and characteristics of scalable data architectures in this 8-video course. Learn to use data warehouses to store, process, and analyze big data. Key concepts covered here include how to recognize the need to scale architectures to keep up with needs for storage and processing of big data; how to identify characteristics of data warehouses ideally suiting them to tasks of big data analysis and processing; and how to distinguish between relational databases and data warehouses. Next, learn to recognize specific characteristics of systems meant for online transaction processing and online analytical processing, and how data warehouses are an example of online analytical processing (OLAP) systems. Then, learn to identify various components of data warehouses enabling them to work with varied sources, extract and transform big data, and generate reports of analysis operations efficiently. Finally, study features of Amazon Redshift enabling big data to be processed at scale; features of data warehouses, contrasted with those of relational databases; and two options available to scale compute capacity.
8 videos | 52m has Assessment available Badge
Scalable Data Architectures: Using Amazon Redshift
Using a hands-on lab approach, explore how to use Amazon Redshift to set up and configure a data warehouse on the cloud in this 9-video course. Discover how to interact with Redshift service with both the console and Amazon Web Services (AWS) Command Line Interface (CLI). Key concepts covered here include how to use the Amazon Redshift Quick Launch feature to provision a data warehouse; provisioning a Redshift cluster with the default cluster; and tool configuration options for a Redshift cluster, and metrics available to optimize a cluster configuration. Next, learn how to create Identity and Access Management (IAM) roles on AWS that include necessary permissions to interact with Redshift and S3 services; to provision an IAM user that can connect to and interact with AWS using the CLI; and to install the AWS command-line interface to create and delete Redshift clusters. Then learn to use Redshift Query Editor to create tables, load data, and run queries; and learn features of Amazon Redshift and commands and configurations needed to work with Redshift by using the CLI.
9 videos | 54m has Assessment available Badge
Scalable Data Architectures: Using Amazon Redshift & QuickSight
In this 12-video course, explore the loading of data from an external source such as Amazon S3 into a Redshift cluster, as well as configuration of snapshots and resizing of clusters. Discover how to use Amazon QuickSight to visualize data. Key concepts covered in this course include using the AWS console to load data sets to Amazon S3 and then into a table provisioned on a Redshift cluster; running queries on data in a Redshift cluster with the query evaluation feature; and working with SQL Workbench to connect to and query data in a Redshift cluster. Learn how to disable automated snapshots for a Redshift cluster and configure a table to be excluded from snapshots; recover an individual table from the snapshot of an entire cluster; and create a security group rule enabling access from Amazon's QuickSight servers to a Redshift cluster. Next, configure Amazon QuickSight to load data from a table in a Redshift cluster for analysis; and use the QuickSight dashboard to generate a time series plot to visualize sales at a retailer over time.
12 videos | 1h 17m has Assessment available Badge
Traditional Data Architectures: Relational Databases
Databases are essential in working with large amounts of data. Managers, leaders, and decision-makers need to choose the right approach when working on a large data project, distinguishing among multiple database types and their use cases. A relational database is a primary traditional data architecture commonly used by most businesses. Working with relational databases has some key advantages but also poses certain limitations. In this course, learn how critically evaluate and work with relational databases. Explore normalization and denormalization of datasets along with specific use cases of these opposite approaches. Examine two main online information processing systems, Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) systems. Finally, investigate the concepts of data warehousing, data marts, and data mining. Upon completion, you'll be able to identify when and how to use a relational database.
12 videos | 34m has Assessment available Badge
Traditional Data Architectures: Data Warehousing and ETL Systems
Data warehouses are actively used for business intelligence and, because they integrate data from multiple sources, are advantageous to simple databases in many instances. Considering modern companies often have ETL-based data warehousing systems, decision-makers need to comprehend how they operate and are appropriately managed. In this course, learn the necessary concepts and processes required to work with and manage projects related to data warehousing. Study data warehousing architectures and schemas and investigate some core data warehouse elements, such as dimension, fact tables, and keys. Furthermore, examine the extract, transform, and load (ETL) approach for working with data warehouses, specifying process flow, tools, and software as well as best practices. When you're done, you'll know how to adopt data warehousing and ETL systems for your business intelligence and data management needs.
12 videos | 38m has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Data Warehouse Essential: Concepts
Discover the fundamentals of data warehousing and the approaches of implementing it. Explore Data Warehouse planning, processes, schemes, and terms. You will also examine global and local Data Warehouses as well as comparing Data Warehouses with RDBMS and Data Lake.
18 videos | 1h 43m has Assessment available Badge
Data Warehouse Essential: Architecture Frameworks & Implementation
Examine architectures of data warehouse implementations, including logical and physical design. How to effectively implement and manage data warehousing projects is also covered.
11 videos | 1h 1m has Assessment available Badge

COURSES INCLUDED

Getting Started with Hive
This 9-video Skillsoft Aspire course focuses solely on theory and involves no programming or query execution. Learners begin by examining what a data warehouse is, and how it differs from a relational database, important because Apache Hive is primarily a data warehouse, despite giving a SQL-like interface to query data. Hive facilitates work on very large data sets, stored as files in the Hadoop Distributed File System, and lets users perform operations in parallel on data in these files by effectively transforming Hive queries into MapReduce operations. Next, you will hear about types of data and operations which data warehouses and relational databases handle, before moving on to basic components of the Hadoop architecture. Finally, the course discusses features of Hive making it popular among data analysts. The concluding exercise recalls differences between online transaction processing and online analytical processing systems, asking learners to identify Hadoop's three major components; list Hadoop offerings on three major cloud platforms (AWS, Microsoft Azure, and Google Cloud Platform); and list benefits of Hive for data analysts.
10 videos | 55m has Assessment available Badge
Loading & Querying Data with Hive
Among the market's most popular data warehouses used for data science, Apache Hive simplifies working with large data sets in files by representing them as tables. In this 12-video Skillsoft Aspire course, learners explore how to create, load, and query Hive tables. For this hands-on course, learners should have a conceptual understanding of Hive and its basic components, and prior experience with querying data from tables using SQL (structured query language) and with using the command line. Key concepts covered include cluster, joining tables, and modifying tables. Demonstrations covered include using the Beeline client for Hive for simple operations; creating tables, loading them with data, and then running queries against them. Only tables with primitive data types are used here, with data loaded into these tables from HDFS (Hadoop Distributed File System) file system and local machines. Learners will work with Hive metastore and temporary tables, and how they can be used. You will become familiar with basics of using the Hive query language and quite comfortable working with HDFS.
13 videos | 1h 19m has Assessment available Badge
Viewing & Querying Complex Data with Hive
Learners explore working with complex data types in Apache Hive in this Skillsoft Aspire course, which assumes previous work with Hive tables using the Hive query language, and comfort using a command-line interface or Hive client to run queries. Learners begin this 12-video, hands-on course by working with Hive tables whose columns are of complex data types (arrays, maps, and structs). Watch demonstrations of set operations and transforming complex types into tabular form with explode operation. Then use lateral views to add more data to exploded outputs. Course labs use the Beeline client; the instructor's Beeline terminal runs on the master node of a Hadoop cluster, provisioned on Google Cloud platform using its Dataproc service, and learner access is assumed to a Hadoop cluster and Beeline, on-premises or in the cloud. Finally, learners observe how to use views to aggregate contents of multiple columns. As the course concludes, you should be comfortable working with all types of data in Hive and performing analysis tasks on tables with both parameter types as well as complex data.
12 videos | 1h 12m has Assessment available Badge

COURSES INCLUDED

Modern Data Warehouses
In today's world, data warehouses have become necessary for making informed business decisions. The wide availability of data comes at an increased cost of storing it efficiently - a necessity for any business working with large amounts of data. Learn more about the key concepts, architecture, stages, use cases, and available solutions for data warehouses using this course. You will examine data warehousing solutions, architecture, and techniques, discover Amazon Redshift and Google BigQuery, and explore the concepts, such as batch, stream, and real-time analytics. This course will also help highlight the considerations for implementing a data warehouse for a business and the implementation steps and best practices required. After completing this course, you will have a foundational knowledge of implementing a data warehousing solution for your business.
12 videos | 1h 4m has Assessment available Badge
Azure Databricks & Data Pipelines
Azure Databricks is a data analytics platform optimized to work with Microsoft Azure cloud services and is an example of a cloud platform designed to serve business analytics needs. Use this course to explore the architecture, features, advantages, and disadvantages of Azure Databricks - a leading cloud-based tool used for data engineering, and Snowflake - a data warehouse-as-a-service. Examine different types of data pipelines and their components and advantages. You will also compare various data pipeline tools and learn more about building a data pipeline through a case study. Upon finishing this course, you will be able to recognize the capabilities of different data warehouses and the steps required for building data pipelines.
12 videos | 56m has Assessment available Badge

COURSES INCLUDED

Data Warehousing with Azure: Architecture & Modeling Techniques
Explore the fundamentals of data warehousing and the essential architectures and components being implemented to manage data.
15 videos | 1h 17m has Assessment available Badge
Data Warehousing with Azure: Implementing Azure SQL Data Warehouse
Explore the practical implementation of Azure SQL Data Warehouse. Examine how to design, model, and apply ELT approaches of extracting loads and transforming data.
11 videos | 1h 7m has Assessment available Badge
Data Warehousing with Azure: Working with SQL Data Warehouse Objects
Explore how to create and utilize SQL Data Warehouse objects and work with T-SQL to implement tables of diversified categories.
17 videos | 1h 26m has Assessment available Badge
Data Warehousing with Azure: Analytics & Reporting
Discover how to use Azure Analysis Services and Power BI to prepare reports that can be used to analyze data in SQL Data Warehouse and Azure Data Lake.
10 videos | 44m has Assessment available Badge
Data Warehousing with Azure: Data Lake Implementation Using Azure
Explore the fundamentals of data lakes and approaches for building and using data lakes. How to build and use an Azure Data Lake using Gen1 and Gen2 implementation approaches is also covered.
13 videos | 58m has Assessment available Badge
Data Warehousing with Azure: Managing Azure Data Lake
Explore the advanced features of Azure Data Lakes with additional focus on managing various scenarios of data ingestion. Securing and tuning an Azure Data Lake for performance enhancement is also covered.
13 videos | 59m has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Optimizing Query Executions with Hive
In this 7-video Skillsoft Aspire course, learners can explore optimizations allowing Apache Hive to handle parallel processing of data, while users can still contribute to improving query performance. For this course, learners should have previous experience with Hive and familiarity with querying big data for analysis purposes. The course focuses only on concepts; no queries are run. Learners begin to understand how to optimize query executions in Hive, beginning with exploring different options available in Hive to query data in an optimal manner. Discuss how to split data into smaller chunks, specifically, partitioning and bucketing, so that queries need not scan full data sets each time. Hive truly democratizes access to data stored in a Hadoop cluster, eliminating the need to know MapReduce to process cluster data, and makes data accessible using the Hive query language. All files in Hadoop are exposed in the form of tables. Watch demonstrations of structuring queries to reduce numbers of map reduce operations generated by Hive, and speeding up query executions. Other concepts covered include partitioning, bucketing, and joins.
7 videos | 42m has Assessment available Badge
Using Hive to Optimize Query Executions with Partitioning
Continue to explore the versatility of Apache Hive, among today's most popular data warehouses, in this 10-video Skillsoft Aspire course. Learners are shown ways to optimize query executions, including the powerful technique of partitioning data sets. The hands-on course assumes previous work with Hive tables using the Hive query language and in processing complex data types, along with theoretical understanding of improving query performance by partitioning very large data sets. Demonstrations focus on basics of partitioning and how to create partitions and load data into them. Learners work with both Hive-managed tables and external tables to see how partitioning works for each; then watch navigating to the shell of the Hadoop master node, and creating new directories in the Hadoop file system. Observe dynamic partitioning of tables and how this simplifies loading of data into partitions. Finally, you explore how using multiple columns in a table can partition data within it. During this course, learners will acquire a sound understanding of how exactly large data sets can be partitioned into smaller chunks, improving query performance.
10 videos | 1h has Assessment available Badge
Bucketing & Window Functions with Hive
Learners explore how Apache Hive query executions can be optimized, including techniques such as bucketing data sets, in this Skillsoft Aspire course. Using windowing functions to extract meaningful insights from data is also covered. This 10-video course assumes previous work with partitions in Hive, as well as conceptual understanding of how buckets can improve query performance. Learners begin by focusing on how to use the bucketing technique to process big data efficiently. Then take a look at HDFS (Hadoop Distributed File System) by navigating to the shell of the Hadoop master node; from there, make use of the Hadoop fs-ls command to examine contents of the directory. Observe three subdirectories corresponding to three partitions based on the value of the category column. You will then explore how to combine both the partitioning as well as bucketing techniques to further improve query performance. Finally, learners will explore the concept of co-windowing, which helps users analyze a subset of ordered data, and then to see how this technique can be implemented in Hive.
9 videos | 1h 3m has Assessment available Badge

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

BOOKS INCLUDED

Book

Scalable Big Data Architecture: A Practitioner's Guide to Choosing Relevant Big Data Architecture
Covering real-world, concrete industry use cases, this book is for developers, data architects, and data scientists looking for a better understanding of how to choose the most relevant pattern for a big data project and which tools to integrate into that pattern.
book Duration 1h 51m book Authors By Bahaaldine Azarmi

Book

Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault
Drawing upon years of practical experience and using numerous examples and an easy to understand framework, this timely guide defines the importance of data architecture and how it can be used effectively to harness big data within existing systems.
book Duration 4h 27m book Authors By Daniel Linstedt, W.H. Inmon

BOOKS INCLUDED

Book

Emerging Perspectives in Big Data Warehousing
This book is an essential research publication that explores current innovative activities focusing on the integration between data warehousing and data mining with an emphasis on the applicability to real-world problems.
book Duration 6h 48m book Authors By David Taniar, Johanna Wenny Rahayu (eds)

Book

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence Remastered Collection, Second Edition
Organized for quick navigation and easy reference, this vital resource is the essential reference for data warehouse and business intelligence design, packed with best practices, design tips, and valuable insight from industry pioneer Ralph Kimball and the Kimball Group.
book Duration 21h 34m book Authors By Margy Ross, Ralph Kimball

Book

Data Mining and Data Warehousing, BPB Publications (c) 2015
Providing comprehensive coverage of various aspects of data mining and warehousing concepts, this book offers examples, diagrams, and questions in a simple language, a crystal clear approach, and a straightforward comprehensible presentation.
book Duration 2h 2m book Authors By Akash Saxena, Khushboo Saxena, Sandeep Saxena

Book

Enterprise Business Intelligence and Data Warehousing: Program Management Essentials
Covering best practices for managing and leading an enterprise-scale business intelligence (BI) and data warehousing (DW) program, this essential book describes what the Enterprise Program Manager must accomplish to orchestrate the many moving parts involved.
book Duration 1h 22m book Authors By Alan Simon

Book

Data Warehousing in the Age of Big Data
Helping you navigate through the complex layers of Big Data and data warehousing, this practical and timely book provides information on how to effectively think about using all of the technologies and architectures to design the next-generation data warehouse.
book Duration 6h 39m book Authors By Krish Krishnan
SHOW MORE
FREE ACCESS

BOOKS INCLUDED

Book

Hadoop Architecture and SQL: The Best HiveQL Book in the Universe
Including hundreds of pages of SQL examples and explanations, this book is perfect for anyone who wants to query Hadoop with SQL and educates readers on how to create tables, how the data is distributed, and how the system processes the data.
book Duration 1h 32m book Authors By Jason Nolander, Tom Coffing

Book

Practical Hive: A Guide to Hadoop's Data Warehouse System
From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, this go-to resource gives you a detailed treatment of the software.
book Duration 3h 57m book Authors By Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard, Scott Shaw

Book

Apache Hive: 34 Most Asked Questions On Apache Hive
Offering a thorough view of key knowledge and detailed insight, this all-embracing guide provides comprehensive answers and extensive details and references for everything you want to know about Apache Hive.
book Duration 35m book Authors By Jacqueline Douglas

BOOKS INCLUDED

Book

Microsoft Azure SQL Data Warehouse: Architecture and SQL (Book 18)
Including numerous SQL examples and explanations, this book details the architecture of the Azure SQL Data Warehouse and the SQL commands available, and educates readers on how to create tables and indexes, how the data is distributed, and how the system process the data.
book Duration 3h 22m book Authors By Todd Wilson, Tom Coffing

Book

Data Warehouse Systems: Design and Implementation
With extensive coverage of all data warehouse issues, ranging from basic technologies to the most recent findings and systems, this book illustrates the concepts with an on-going example based on the North wind database using Microsoft Analysis Services and Pentaho Business Analytics.
book Duration 11h 14m book Authors By Alejandro Vaisman, Esteban Zimányi

BOOKS INCLUDED

Book

Hadoop Architecture and SQL: The Best HiveQL Book in the Universe
Including hundreds of pages of SQL examples and explanations, this book is perfect for anyone who wants to query Hadoop with SQL and educates readers on how to create tables, how the data is distributed, and how the system processes the data.
book Duration 1h 32m book Authors By Jason Nolander, Tom Coffing

Book

Practical Hive: A Guide to Hadoop's Data Warehouse System
From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, this go-to resource gives you a detailed treatment of the software.
book Duration 3h 57m book Authors By Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard, Scott Shaw

Book

Apache Hive: 34 Most Asked Questions On Apache Hive
Offering a thorough view of key knowledge and detailed insight, this all-embracing guide provides comprehensive answers and extensive details and references for everything you want to know about Apache Hive.
book Duration 35m book Authors By Jacqueline Douglas

YOU MIGHT ALSO LIKE

Channel SAS
Rating 5.0 of 2 users Rating 5.0 of 2 users (2)
Rating 5.0 of 1 users Rating 5.0 of 1 users (1)
Rating 5.0 of 1 users Rating 5.0 of 1 users (1)