Course details

Getting Started with Hive: Optimizing Query Executions with Partitioning

Getting Started with Hive: Optimizing Query Executions with Partitioning


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Apache Hive is one of the most popular data warehouses out in the market used for data science.Hive allows processing of big data in parallel by means of a simple query interface. In this Skillsoft Aspire course, you will explore the ways query executions can be optimized, including the powerful technique of partitioning datasets.



Expected Duration (hours)
1.0

Lesson Objectives

Getting Started with Hive: Optimizing Query Executions with Partitioning

  • Course Overview
  • use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. Not required if you already have a Hadoop environment set up with Hive
  • define a table which will contain data partitioned based on the value in one of its columns
  • insert data into partitions of a Hive table and explore the partition and its data on HDFS
  • load data into table partitions from files
  • create and populate partitions in an external table
  • alter the definition of a partition to modify its contents
  • define and work with dynamic partitions on your Hive tables
  • configure a table to use more than one column to define partitions and explore the partition on HDFS
  • use partitioning to boost query performance in HDFS
  • Course Number:
    it_dsgshvdj_05_enus

    Expertise Level
    Intermediate