Fundamentals of NLP: Representing Text as Numeric Features

Natural Language Processing    |    Intermediate
  • 15 videos | 2h 17s
  • Includes Assessment
  • Earns a Badge
When performing sentiment classification using machine learning, it is necessary to encode text into a numeric format because machine learning models can only parse numbers, not text. There are a number of encoding techniques for text data, such as one-hot encoding, count vector encoding, and word embeddings. In this course, you will learn how to use one-hot encoding, a simple technique that builds a vocabulary from all words in your text corpus. Next, you will move on to count vector encoding, which tracks word frequency in each document and explore term frequency-inverse document frequency (TF-IDF) encoding, which also creates vocabularies and document vectors but uses a TF-IDF score to represent words. Finally, you will perform sentiment analysis using encoded text. You will use a count vector to encode your input data and then set up a Gaussian Naïve-Bayes model. You will train the model and evaluate its metrics. You will also explore how to improve the model performance by stemming words, removing stopwords, and using N-grams.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Understand one-hot encoding representations
    Perform one-hot encoding on text
    Use the countvectorizer object for one-hot encoding
    Outline how to encode text based on frequencies
    Encode text as count vectors
    Explore bag-of-words and bag-of-ngrams encoding
    Encode data using term frequency–inverse document frequency (tf-idf) scores
  • Explore and analyze data
    Create a naive-bayes model for sentiment analysis
    Stem words and remove stopwords for machine learning
    Filter words based on frequency for classification
    Train classification models on n-grams
    Train models on tf-idf encodings
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 2m 8s
    In this video, we will discover the key concepts covered in this course. FREE ACCESS
  • 7m 26s
    After completing this video, you will be able to understand one-hot encoding representations. FREE ACCESS
  • Locked
    3.  Utilizing One-hot Encoding to Represent Text Data
    10m 5s
    Find out how to perform one-hot encoding on text. FREE ACCESS
  • Locked
    4.  Performing One-hot Encoding Using the Count Vectorizer
    7m 6s
    Discover how to use the CountVectorizer object for one-hot encoding. FREE ACCESS
  • Locked
    5.  Frequency-based Encodings to Represent Text in Numeric Form
    5m 47s
    Upon completion of this video, you will be able to outline how to encode text based on frequencies. FREE ACCESS
  • Locked
    6.  Perform Count Vector Encoding Using the Count Vectorizer
    4m 56s
    In this video, you will learn how to encode text as count vectors. FREE ACCESS
  • Locked
    7.  Working with Bag-of-Words and Bag-of-N-grams Representation
    12m 2s
    In this video, we will explore bag-of-words and bag-of-n-grams encoding. FREE ACCESS
  • Locked
    8.  Perform TF-IDF Encoding to Represent Text Data
    11m 37s
    In this video, find out how to encode data using term frequency–inverse document frequency (TF-IDF) scores. FREE ACCESS
  • Locked
    9.  Exploring the Product Reviews Dataset
    11m 28s
    Learn how to explore and analyze data. FREE ACCESS
  • Locked
    10.  Building a Classification Model Using Count Vector Encoding
    10m
    During this video, you will discover how to create a Naive-Bayes model for sentiment analysis. FREE ACCESS
  • Locked
    11.  Comparing Models Trained with Stemmed Words and Stopword Removed
    8m 40s
    Discover how to stem words and remove stopwords for machine learning. FREE ACCESS
  • Locked
    12.  Classifying Text Using Frequency Filtering and TF-IDF Encodings
    8m 38s
    Find out how to filter words based on frequency for classification. FREE ACCESS
  • Locked
    13.  Training Classification Models Using Bag of N-grams
    10m 14s
    Learn how to train classification models on n-grams. FREE ACCESS
  • Locked
    14.  Training Classification Models with N-grams and TF-IDF Representation
    7m 16s
    In this video, you will learn how to train models on TF-IDF encodings. FREE ACCESS
  • Locked
    15.  Course Summary
    2m 54s
    In this video, we will summarize the key concepts covered in this course. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Rating 3.8 of 4 users Rating 3.8 of 4 users (4)
Rating 4.0 of 1 users Rating 4.0 of 1 users (1)