Aspire Journeys

Natural Language Processing and LLMs

  • 14 Courses | 22h 20m 59s
Embark on an enriching journey into the world of Natural Language Processing (NLP) and Large Language Models (LLMs) with our comprehensive track series. Beginning with the "Fundamentals of Natural Language Processing," participants will build a solid foundation in NLP techniques, mastering text preprocessing, representation, and classification. Moving forward, the "Natural Language Processing with Deep Learning" track delves into advanced deep learning methodologies for NLP tasks, while the "Natural Language Processing with LLMs" track pushes the boundaries with cutting-edge LLMs, attention mechanisms, and transformer architectures. From understanding attention mechanisms to implementing state-of-the-art LLMs for tasks like language translation and text summarization, this journey equips participants with the knowledge and skills to navigate the forefront of NLP innovation.

Track 1: Natural Language Processing

The "Fundamentals of Natural Language Processing" track provides a comprehensive introduction to the core concepts and techniques in NLP. Beginning with an overview of NLP components, including natural language understanding (NLU) and natural language generation (NLG), the track explores common NLP tasks such as speech recognition and sentiment analysis. Participants will then delve into preprocessing text data using NLTK, covering essential techniques such as text cleaning, sentence segmentation, and parts-of-speech tagging. Additionally, the track explores methods for representing text in numeric format, including one-hot encoding and TF-IDF encoding, before introducing classification models for text data. Through hands-on exercises and practical examples, participants will learn how to build classification models using rule-based approaches, Naive Bayes classification, and other techniques, leveraging tools like Scikit-learn pipelines and grid search for optimal performance. Participants will then harness the power of TensorFlow for building deep learning models, followed by an in-depth exploration of text preprocessing techniques such as normalization, tokenization, and text vectorization. Through hands-on exercises, learners will delve into the intricacies of modeling building, training, and evaluation for text classification tasks, encompassing binary classification and multi-class classification using dense neural networks, recurrent neural networks (RNNs), and RNNs with LSTM cells. The track will also cover hyperparameter tuning using the Keras tuner to optimize model performance. Participants will gain proficiency in leveraging word embeddings, including training embedding layers in models, exploring and visualizing embeddings, and utilizing embeddings for tasks like word and semantic similarity. Moreover, the track will explore text translation using RNNs and demonstrate the utilization of pre-trained models for semantic textual similarity, providing participants with a comprehensive understanding of cutting-edge NLP techniques in the context of deep learning.

  • 9 Courses | 13h 31m 9s

Track 2: Architecting LLM for your Technical solutions

The "Natural Language Processing with LLMs" track is designed to immerse participants in the transformative world of Large Language Models (LLMs), leveraging state-of-the-art techniques powered by deep learning and attention mechanisms. Participants will gain a deep understanding of attention mechanisms and the revolutionary transformer architecture, including self-attention and multi-head attention mechanisms. Through hands-on exercises and practical demonstrations, learners will explore the foundational concepts of LLMs and delve into implementing translation models using transformers. Moreover, participants will be introduced to the Hugging Face platform, learning to leverage pre-trained models from the Hugging Face library and fine-tune them for specific use cases. From text classification to language translation, question answering, text summarization, and natural language generation, participants will acquire the skills needed to harness the full potential of LLMs for a wide range of NLP tasks.

  • 5 Courses | 8h 49m 50s

COURSES INCLUDED

Fundamentals of NLP: Introducing Natural Language Processing
Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that focuses on programmatically working with text or speech - the term ‘natural' here emphasizes that the program must work with and be aware of everyday language, grammar and semantics, rather than structured text data such as might be found in database or string processing. In this course, you will learn about the two main branches of NLP, natural language understanding and natural language generation. You will also explore the Natural Language Toolkit (NLTK) and spaCy, two popular Python libraries for natural language processing and analysis. Next, you will delve into common preprocessing steps for natural language data. This includes cleaning and tokenizing data, removing stopwords from your text, performing stemming and lemmatization, part-of-speech (POS) tagging, and named entity recognition (NER). Finally, you will get set up with your Python environment and libraries for NLP and explore some text corpora that NLTK offers for working with text.
6 videos | 47m has Assessment available Badge
Fundamentals of NLP: Preprocessing Text Using NLTK & SpaCy
Tokenization, stemming, and lemmatization are essential natural language processing (NLP) tasks. Tokenization involves breaking text into units (tokens), such as words or phrases, facilitating analysis. Stemming reduces words to a common base form by removing prefixes or suffixes, promoting simplicity in representation. In contrast, lemmatization considers grammatical aspects to transform words into their base or dictionary form. You will begin this course by tokenizing text using the Natural Language Toolkit (NLTK) and SpaCy, which involves splitting a large block of text into smaller units called tokens, usually words or sentences. You will then remove stopwords, common words such as "a" and "the" that add little meaning to text. Next, you'll explore the WordNet lexical database, which contains information about the semantic relationship between words. You'll use Synsets to view similar words and explore hypernyms, hyponyms, meronyms and holonyms. Finally, you'll compare stemming and lemmatization using NLTK and SpaCy. You will explore both processes with NLTK and perform lemmatization using SpaCy.
13 videos | 1h 56m has Assessment available Badge
Fundamentals of NLP: Rule-based Models for Sentiment Analysis
Sentiment Analysis is a common use-case within the discipline of Natural Language Processing (NLP). Here, a model attempts to understand the contents of a text document well enough to capture the feelings, or sentiments, conveyed by the text. Sentiment Analysis is widely used by political forecasters, marketing professionals, and hedge fund managers looking to spot trends in voter, user, or market behavior. You will start this course by loading and preprocessing your data. You will read in data on movie reviews from IMDB and explore the dataset. You will then visualize the data using histograms and box plots to understand review length distribution. After that, you will perform basic data cleaning on text, utilizing regular expressions to remove elements like URLs and digits. Finally, you will conduct sentiment analysis using the Valence Aware Dictionary and Sentiment Reasoner (VADER) and TextBlob.
7 videos | 45m has Assessment available Badge
Fundamentals of NLP: Representing Text as Numeric Features
When performing sentiment classification using machine learning, it is necessary to encode text into a numeric format because machine learning models can only parse numbers, not text. There are a number of encoding techniques for text data, such as one-hot encoding, count vector encoding, and word embeddings. In this course, you will learn how to use one-hot encoding, a simple technique that builds a vocabulary from all words in your text corpus. Next, you will move on to count vector encoding, which tracks word frequency in each document and explore term frequency-inverse document frequency (TF-IDF) encoding, which also creates vocabularies and document vectors but uses a TF-IDF score to represent words. Finally, you will perform sentiment analysis using encoded text. You will use a count vector to encode your input data and then set up a Gaussian Naïve-Bayes model. You will train the model and evaluate its metrics. You will also explore how to improve the model performance by stemming words, removing stopwords, and using N-grams.
15 videos | 2h has Assessment available Badge
Fundamentals of NLP: Word Embeddings to Capture Relationships in Text
Before training any text-based machine learning model, it is necessary to encode that text into a machine-readable numeric form. Embeddings are the preferred way to encode text, as they capture data about the meaning of text, and are performant even with large vocabularies. You will start this course by working with Word2Vec embeddings, which represent words and terms in feature vector space, capturing the meaning and context of a word in a sentence. You will generate Word2Vec embeddings on your data corpus, set up a Gaussian Naïve-Bayes classification model, and train it on Word2Vec embeddings. Next, you will move on to GloVe embeddings. You will use the pre-trained GloVe word vector embeddings and explore how to view similar words and identify the odd one out in a set. Finally, you will perform classification using many different models, including Naive-Bayes and Random Forest models.
8 videos | 1h has Assessment available Badge
Natural Language Processing Using Deep Learning
Deep learning has revolutionized natural language processing (NLP), offering powerful techniques for understanding, generating, and processing human language. Through deep neural networks (DNNs), NLP models can now comprehend complex linguistic structures, extract meaningful information from vast amounts of text data, and even generate human-like responses. Begin this course by learning how to utilize Keras and TensorFlow to construct and train neural networks. Next, you will build a DNN to classify messages as spam or not. You will find out how to encode data using count vector and term frequency-inverse document frequency (TF-IDF) encodings via the Keras TextVectorization layer. To enhance the training process, you will employ Keras callbacks to gain insights into metrics tracking, TensorBoard integration, and model checkpointing. Finally, you will apply sentiment analysis using word embeddings, explore the use of pre-trained GloVe word vector embeddings, and incorporate convolutional layers to grasp local text context.
14 videos | 1h 55m has Assessment available Badge
Using Recurrent Networks For Natural Language Processing
Recurrent neural networks (RNNs) are a class of neural networks designed to efficiently process sequential data. Unlike traditional feedforward neural networks, RNNs possess internal memory, which enables them to learn patterns and dependencies in sequential data, making them well-suited for a wide range of applications, including natural language processing. In this course, you will explore the mechanics of RNNs and their capacity for processing sequential data. Next, you will perform sentiment analysis with RNNs, generating and visualizing word embeddings through the TensorBoard embedding projector plug-in. You will construct an RNN, employing these word embeddings for sentiment analysis and evaluating the RNN's efficacy on a set of test data. Then, you will investigate advanced RNN applications, focusing on long short-term memory (LSTM) and bidirectional LSTM models. Finally, you will discover how LSTM models enhance the processing of long text sequences and you will build and train a bidirectional LSTM model to process data in both directions and capture a more comprehensive understanding of the text.
8 videos | 1h 14m has Assessment available Badge
Using Out-of-the-Box Transformer Models for Natural Language Processing
Transfer learning is a powerful machine learning technique that involves taking a pre-trained model on a large dataset and fine-tuning it for a related but different task, significantly reducing the need for extensive datasets and computational resources. Transformers are groundbreaking neural network architectures that use attention mechanisms to efficiently process sequential data, enabling state-of-the-art performance in a wide range of natural language processing tasks. In this course, you will discover transfer learning, the TensorFlow Hub, and attention-based models. Then you will learn how to perform subword tokenization with WordPiece. Next, you will examine transformer models, specifically the FNet model, and you will apply the FNet model for sentiment analysis. Finally, you will explore advanced text processing techniques using the Universal Sentence Encoder (USE) for semantic similarity analysis and the Bidirectional Encoder Representations from Transformers (BERT) model for sentence similarity prediction.
10 videos | 1h 29m has Assessment available Badge
Attention-based Models and Transformers for Natural Language Processing
Attention mechanisms in natural language processing (NLP) allow models to dynamically focus on different parts of the input data, enhancing their ability to understand context and relationships within the text. This significantly improves the performance of tasks such as translation, sentiment analysis, and question-answering by enabling models to process and interpret complex language structures more effectively. Begin this course by setting up language translation models and exploring the foundational concepts of translation models, including the encoder-decoder structure. Then you will investigate the basic translation process by building a transformer model based on recurrent neural networks without attention. Next, you will incorporate an attention layer into the decoder of your language translation model. You will discover how transformers process input sequences in parallel, improving efficiency and training speed through the use of positional and word embeddings. Finally, you will learn about queries, keys, and values within the multi-head attention layer, culminating in training a transformer model for language translation.
15 videos | 2h 20m has Assessment available Badge

COURSES INCLUDED

NLP with LLMs: Working with Tokenizers in Hugging Face
Hugging Face, a leading company in the field of artificial intelligence (AI), offers a comprehensive platform that enables developers and researchers to build, train, and deploy state-of-the-art machine learning (ML) models with a strong emphasis on open collaboration and community-driven development. In this course, you will discover the extensive libraries and tools Hugging Face offers, including the Transformers library, which provides access to a vast array of pre-trained models and datasets. Next, you will set up your working environment in Google Colab. You will also explore the critical components of the text preprocessing pipeline: normalizers and pre-tokenizers. Finally, you will master various tokenization techniques, including byte pair encoding (BPE), Wordpiece, and Unigram tokenization, which are essential for working with transformer models. Through hands-on exercises, you will build and train BPE and WordPiece tokenizers, configuring normalizers and pre-tokenizers to fine-tune these tokenization methods.
15 videos | 2h 18m has Assessment available Badge
NLP with LLMs: Hugging Face Classification, QnA, & Text Generation Pipelines
Sentiment analysis, named entity recognition (NER), question answering, and text generation are pivotal tasks in the realm of Natural Language Processing (NLP) that enable machines to interpret and understand human language in a nuanced manner. In this course, you will be introduced to the concept of Hugging Face pipelines, a streamlined approach to applying pre-trained models to a variety of NLP tasks. Through hands-on exploration, you will learn how to classify text using zero-shot classification techniques, perform sentiment analysis with DistilBERT, and apply models to specialized tasks, utilizing the power of NLP to adapt to niche domains. Next, you will discover how to employ models to accurately answer questions based on provided contexts and understand the mechanics behind model-based answers, including their limitations and capabilities. Finally, you will discover various text generation strategies such as greedy search and beam search, learning how to balance predictability with creativity in generated text. You will also explore text generation through sampling techniques and the application of mask filling with BERT models.
13 videos | 1h 50m has Assessment available Badge
NLP with LLMs: Language Translation, Summarization, & Semantic Similarity
Language translation, text summarization, and semantic textual similarity are advanced problems within the field of Natural Language Processing (NLP) that are increasingly solvable due to advances in the use of large language models (LLMs) and pre-trained models. In this course, you will learn to translate text between languages with state-of-the-art pre-trained models such as T5, M2M 100, and Opus. You will also gain insights into evaluating translation accuracy with BLEU scores and explore multilingual translation techniques. Next, you will explore the process of summarizing text, utilizing the powerful BART and T5 models for abstractive summarization. You will see how these models extract and generate key information from large texts and learn to evaluate the quality of summaries using ROUGE scores. Finally, you will master the computation of semantic textual similarity using sentence transformers and apply clustering techniques to group texts based on their semantic content. You will also learn to compute embeddings and measure similarity directly.
10 videos | 1h 29m has Assessment available Badge
NLP with LLMs: Fine-tuning Models for Classification & Question Answering
Fine-tuning in the context of text-based models refers to the process of taking a pre-trained model and adapting it to a specific task or dataset with additional training. This technique leverages the general language understanding capabilities acquired by the model during its initial extensive training on a large corpus of text and refines its abilities to perform well on a more narrowly defined task or domain-specific data. In this course, you will learn how to fine-tune a model for sentiment analysis, starting with the preparation of datasets optimized for this purpose. You will be guided through setting up your computing environment and preparing a BERT classifier for sentiment analysis. Next, you will discover how to structure text data and align named entity recognition (NER) tags with subword tokenization. You will build on this knowledge to fine-tune a BERT model specifically for NER, training it to accurately identify and classify entities within text. Finally, you will explore the domain of question answering, learning to handle the challenges of long contexts to extract precise answers from extensive texts. You will prepare QnA data for fine-tuning and utilize a DistilBERT model to create an effective QnA system.
12 videos | 1h 33m has Assessment available Badge
NLP with LLMs: Fine-tuning Models for Language Translation, & Summarization
Causal language modeling (CLM), text translation, and summarization demonstrate the versatility and depth of language understanding and generation by artificial intelligence (AI). Fine-tuning models help improve the performance of models for these specific tasks. ​In this course, you will explore CLM with DistilGPT-2 and masked language modeling (MLM) with DistilRoBERTa, learning how to prepare, process, and fine-tune models for generating and predicting text. Next, you will dive into the nuances of language translation, focusing on translating English to Spanish. You will prepare and evaluate training data and learn to use BLEU scores for assessing translation quality. You will fine-tune a pre-trained T5-small model, enhancing its accuracy and broadening its linguistic capabilities. Finally, you will explore the intricacies of text summarization. Starting with data loading and visualization, you will establish a benchmark using the pre-trained T5-small model. You will then fine-tune this model for summarization tasks, learning to condense extensive texts into succinct summaries.
12 videos | 1h 38m has Assessment available Badge

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE TRACKS

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.