NLP with LLMs: Working with Tokenizers in Hugging Face

Large Language Models (LLMs)    |    Intermediate
  • 15 videos | 2h 18m 14s
  • Includes Assessment
  • Earns a Badge
Hugging Face, a leading company in the field of artificial intelligence (AI), offers a comprehensive platform that enables developers and researchers to build, train, and deploy state-of-the-art machine learning (ML) models with a strong emphasis on open collaboration and community-driven development. In this course, you will discover the extensive libraries and tools Hugging Face offers, including the Transformers library, which provides access to a vast array of pre-trained models and datasets. Next, you will set up your working environment in Google Colab. You will also explore the critical components of the text preprocessing pipeline: normalizers and pre-tokenizers. Finally, you will master various tokenization techniques, including byte pair encoding (BPE), Wordpiece, and Unigram tokenization, which are essential for working with transformer models. Through hands-on exercises, you will build and train BPE and WordPiece tokenizers, configuring normalizers and pre-tokenizers to fine-tune these tokenization methods.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Provide an overview of the hugging face platform
    Outline how tokenization works for transformer models
    Work with the hugging face platform
    Set up a colab notebook
    Explore normalization and pre-tokenization
    Perform byte pair encoding (bpe) and wordpiece tokenization
    Set up a bpe tokenizer
  • Implement bpe tokenization
    Set up a wordpiece tokenizer
    Implement wordpiece tokenization
    Train a bpe tokenizer
    Perform normalization and pre-tokenization with wordpiece
    Train a wordpiece tokenizer
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 2m 10s
    In this video, we will discover the key concepts covered in this course. FREE ACCESS
  • 12m 5s
    After completing this video, you will be able to provide an overview of the Hugging Face platform. FREE ACCESS
  • Locked
    3.  Hugging Face Tokenizers
    11m 20s
    Upon completion of this video, you will be able to outline how tokenization works for transformer models. FREE ACCESS
  • Locked
    4.  Exploring the Hugging Face Platform
    11m 23s
    Discover how to work with the Hugging Face platform. FREE ACCESS
  • Locked
    5.  Setting up the Colab Environment
    5m 38s
    Learn how to set up a Colab notebook. FREE ACCESS
  • Locked
    6.  Normalizers and Pre-tokenizers
    8m 2s
    In this video, we will explore normalization and pre-tokenization. FREE ACCESS
  • Locked
    7.  Byte Pair Encoding (BPE), Wordpiece, and Unigram Tokenization
    10m 18s
    During this video, you will learn how to perform byte pair encoding (BPE) and WordPiece tokenization. FREE ACCESS
  • Locked
    8.  Implementing Byte Pair Encoding Tokenization - I
    12m 2s
    Find out how to set up a BPE tokenizer. FREE ACCESS
  • Locked
    9.  Implementing Byte Pair Encoding Tokenization - II
    11m 58s
    In this video, discover how to implement BPE tokenization. FREE ACCESS
  • Locked
    10.  Implementing Wordpiece Tokenization - I
    12m 12s
    Learn how to set up a WordPiece tokenizer. FREE ACCESS
  • Locked
    11.  Implementing Wordpiece Tokenization - II
    9m 36s
    Discover how to implement WordPiece tokenization. FREE ACCESS
  • Locked
    12.  Building and Training a BPE Tokenizer
    11m 22s
    In this video, find out how to train a BPE tokenizer. FREE ACCESS
  • Locked
    13.  Configuring the Normalizer and Pre-tokenizer for Wordpiece Tokenization
    7m 30s
    In this video, you will learn how to perform normalization and pre-tokenization with WordPiece. FREE ACCESS
  • Locked
    14.  Building and Training a Wordpiece Tokenizer
    9m 31s
    Discover how to train a WordPiece tokenizer. FREE ACCESS
  • Locked
    15.  Course Summary
    3m 7s
    In this video, we will summarize the key concepts covered in this course. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Rating 4.4 of 136 users Rating 4.4 of 136 users (136)
Rating 4.5 of 407 users Rating 4.5 of 407 users (407)