Using Hive to Optimize Query Executions with Partitioning

Apache Hive 2.3.2
  • 10 Videos | 1h 4m 47s
  • Includes Assessment
  • Earns a Badge
Likes 20 Likes 20
Continue to explore the versatility of Apache Hive, among today’s most popular data warehouses, in this 10-video Skillsoft Aspire course. Learners are shown ways to optimize query executions, including the powerful technique of partitioning data sets. The hands-on course assumes previous work with Hive tables using the Hive query language and in processing complex data types, along with theoretical understanding of improving query performance by partitioning very large data sets. Demonstrations focus on basics of partitioning and how to create partitions and load data into them. Learners work with both Hive-managed tables and external tables to see how partitioning works for each; then watch navigating to the shell of the Hadoop master node, and creating new directories in the Hadoop file system. Observe dynamic partitioning of tables and how this simplifies loading of data into partitions. Finally, you explore how using multiple columns in a table can partition data within it. During this course, learners will acquire a sound understanding of how exactly large data sets can be partitioned into smaller chunks, improving query performance.

WHAT YOU WILL LEARN

  • use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. Not required if you already have a Hadoop environment set up with Hive
    define a table which will contain data partitioned based on the value in one of its columns
    insert data into partitions of a Hive table and explore the partition and its data on HDFS
    load data into table partitions from files
    create and populate partitions in an external table
  • alter the definition of a partition to modify its contents
    define and work with dynamic partitions on your Hive tables
    configure a table to use more than one column to define partitions and explore the partition on HDFS
    use partitioning to boost query performance in HDFS

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    2m 26s
    UP NEXT
  • Playable
    2. 
    Setting up a Hadoop Cluster on the Google Cloud
    4m 52s
  • Locked
    3. 
    Creating a Partitioned Table in Hive
    6m 16s
  • Locked
    4. 
    Working with Partitions in Hive
    7m 2s
  • Locked
    5. 
    Populating Partitions in Hive
    7m 43s
  • Locked
    6. 
    Partitioning External Tables in Hive
    7m 21s
  • Locked
    7. 
    Modifying Partitions in Hive
    4m 28s
  • Locked
    8. 
    Dynamic Partitions in Hive
    7m 12s
  • Locked
    9. 
    Using Multiple Columns for Partitioning in Hive
    7m 47s
  • Locked
    10. 
    Exercise: Optimize Executions with Partitioning
    5m 41s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Likes 41 Likes 41  
Likes 35 Likes 35  
Likes 104 Likes 104