Bucketing & Window Functions with Hive
Apache Hive 2.3.2
| Intermediate
- 9 Videos | 1h 3m 14s
- Includes Assessment
- Earns a Badge
Learners explore how Apache Hive query executions can be optimized, including techniques such as bucketing data sets, in this Skillsoft Aspire course. Using windowing functions to extract meaningful insights from data is also covered. This 10-video course assumes previous work with partitions in Hive, as well as conceptual understanding of how buckets can improve query performance. Learners begin by focusing on how to use the bucketing technique to process big data efficiently. Then take a look at HDFS (Hadoop Distributed File System) by navigating to the shell of the Hadoop master node; from there, make use of the Hadoop fs-ls command to examine contents of the directory. Observe three subdirectories corresponding to three partitions based on the value of the category column. You will then explore how to combine both the partitioning as well as bucketing techniques to further improve query performance. Finally, learners will explore the concept of co-windowing, which helps users analyze a subset of ordered data, and then to see how this technique can be implemented in Hive.
WHAT YOU WILL LEARN
-
implement bucketing for a Hive table and explore the structure of the table and bucket on HDFSapply both bucketing and partitioning for a table and describe the structure of such a table on HDFSextract further performance from Hive queries by sorting the contents of bucketswork with samples of a Hive table by dividing it into buckets
-
perform join operations on three or more tables by chaining the joinsimplement a window function to calculate running totals on an ordered datasetapply a window function within a partition of your datasetapply bucketing of Hive tables to boost query performance and to use window functions
IN THIS COURSE
-
1.Course Overview2m 9sUP NEXT
-
2.Apply Bucketing for a Table in Hive8m 58s
-
3.Using Bucketing and Partitioning Together in Hive8m 13s
-
4.Sorting a Bucket's Contents in Hive4m 41s
-
5.Sampling a Table in Hive7m 42s
-
6.Joining Multiple Tables in Hive7m 16s
-
7.Introducing Window Functions in Hive9m 31s
-
8.Windows Functions with Partitions in Hive9m 23s
-
9.Exercise: Bucketing and Window Functions in Hive5m 22s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform
Digital badges are yours to keep, forever.YOU MIGHT ALSO LIKE

AUDIOBOOK
Data Pipelines with Apache Airflow