Data Engineering on Microsoft Azure: Designing Data Storage Structures

Azure    |    Beginner
  • 11 videos | 1h 8m 8s
  • Includes Assessment
  • Earns a Badge
Rating 4.5 of 71 users Rating 4.5 of 71 users (71)
Planning the structure for data storage is integral to performance in big data operations. In this course, you'll learn about key considerations for data lakes and how to determine which file type and file format are the most appropriate for your use case. Then, you'll explore how to define how to design table storage for efficient querying and how data pruning can remove unnecessary data to accelerate transactions. You'll examine folder structures and data lake zones for organizing data effectively. Finally, you'll learn how to define storage tiers and how to manage the life cycle of data. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Describe key considerations for designing a data lake
    Identify and evaluate criteria for selecting a file format for big data applications
    Recognize the defining characteristics of the supported file formats in azure data lake
    Describe steps for efficient read operations for a table storage service
    Describe the dynamic data pruning feature in databricks at the file and partition level
  • Recognize an efficient folder structure design
    Define the zones within a data lake for organizing data distribution
    Describe the data access tiers in azure blob storage and how data can be moved between them for efficient and cost-effective storage
    Describe the steps to archive data in an azure blob storage container, rehydrate blob data, and automate access tiers using life cycle management
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 1m 30s
    This course will introduce you to data lakes and the files types that are most appropriate for your use case. See how to design table storage that supports efficient queries, how to prune unneeded data, and how to organize data effectively. FREE ACCESS
  • 7m 12s
    Study how to design a data lake. Consider the reasons to use a data lake and compare its purpose with that of a data warehouse. Examine use cases for data lakes, such as for descriptive, diagnostic, predictive, and prescriptive analysis. Review the characteristics of a data lake, and some challenges. FREE ACCESS
  • Locked
    3.  Big Data File Type Planning
    7m 17s
    Identify and evaluate criteria to select a file format for big data applications, such as text, or binary, data types, schema, and OLTP or OLAP. Review big data storage considerations, such as splitability, compression support, bath or streaming, organizational standards, and data catalog needs. FREE ACCESS
  • Locked
    4.  Big Data File Formats
    8m 46s
    Explore the defining characteristics of the supported file formats in Azure Data Lake. Examine benefits and issues with the comma separated value (CSV) format, extensible markup language (XML), Apache Avro, Apache Parquet, and optimized row columnar (ORC). Review protocol buffers. FREE ACCESS
  • Locked
    5.  Designing Table Storage for Querying
    8m 31s
    In this video, you will study how to design efficient table storage for queries. Discover the benefits of denormalized data, point queries, and the long tail pattern. Consider alternate approaches to domain models. FREE ACCESS
  • Locked
    6.  Dynamic Data Pruning
    6m 25s
    Examine the dynamic data pruning feature in Databricks at the file and partition level. Review nested filters, static partition pruning, pruning challenges, and dynamic partition pruning considerations. FREE ACCESS
  • Locked
    7.  Designing a Folder Structure
    7m 3s
    Watch how to design a folder structure. Discover why data structure is important for a data lake, so that it does not become a data swamp. Review governance practices for metadata management. See why nested elements should be avoided. FREE ACCESS
  • Locked
    8.  Data Lake Zones
    5m 51s
    Explore how to define the zones within a data lake to organize data distribution. Review the roles of data separation, governance, service level agreements, and security. Consider the requirements of the raw zone, the structured zone, the curated zone, the serving zone, and the exploratory zone. FREE ACCESS
  • Locked
    9.  Storage Archiving Tier
    6m 47s
    Examine the data access tiers in Azure Blob storage, and how data can be moved between them for efficient and cost-effective storage. Look at blob types, access tiers, blob lifecycle management, the archive tier, and immutable blobs. FREE ACCESS
  • Locked
    10.  Data Archiving, Rehydrating, and Life Cycle Management
    7m 52s
    Walk through the steps to archive data in an Azure Blob storage container, rehydrate blob data, and automate access tiers using life cycle management. See how to move data between tiers for cost effective data management. FREE ACCESS
  • Locked
    11.  Course Summary
    54s
    This course introduced you to data lakes and the files types that are most appropriate for your use case. You learned how to design table storage that supports efficient queries, how to prune unneeded data, and how to organize data effectively. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Rating 4.6 of 295 users Rating 4.6 of 295 users (295)
Rating 5.0 of 3 users Rating 5.0 of 3 users (3)
Rating 4.7 of 49 users Rating 4.7 of 49 users (49)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Rating 4.7 of 52 users Rating 4.7 of 52 users (52)
Rating 4.4 of 186 users Rating 4.4 of 186 users (186)
Rating 4.7 of 34 users Rating 4.7 of 34 users (34)