Building ML Training Sets: Introduction

Machine Learning | Beginner

10 videos | 1h 9m 21s
Includes Assessment
Earns a Badge

(39)

There are numerous options available to scale and encode features and labels in data sets to get the best out of machine learning (ML) algorithms. In this 10-video course, explore techniques such as standardizing, nomalizing, and one-hot encoding. Learners begin by learning how to use Pandas library to load a data set in the form of a CSV file and perform exploratory analysis on its features. Then use scikit-learn's Binarizer to transform the continuous data in a series to binary values; apply the MiniMaxScaler on a data set to get two similar columns to have the same range of values; and standardize multiple columns in data sets with scikit-learn's StandardScaler. Examine differences between the Normalizer and other scaling techniques, and learn how to represent values in a column as a proportion of the maximum absolute value by using the MaxAbScaler. Finally, discover how to use Pandas library to one-hot encode one or more features of your data set and distinguish between this technique and label encoding. The concluding exercise involves building ML training sets.

WHAT YOU WILL LEARN

Use the pandas library to load a dataset in the form of a csv file and perform some exploratory analysis on its features

Transform the continuous data in a series to binary values by using scikit-learn's binarizer

Apply the minmaxscaler on a dataset to get two similar columns to have the same range of values

Standardize multiple columns in your dataset using scikit-learn's standardscaler

Distinguish between the normalizer and other scaling techniques and apply this scaler on the continuous features of a dataset
Represent the values in a column as a proportion of the maximum absolute value by using the maxabsscaler

Apply label encoding on the features and target in your dataset and recognize its limitations when applied on input features

Use the pandas library to one-hot encode one or more features of your dataset and distinguish between this technique and label encoding

Transform a continuous series into a categorical (binary) one, distinguish between normalization and other scaling techniques, score each product as a proportion of the top product’s sales, and encode the ”vehicletype” field which contains values [“hatchback”, “sedan”, “suv”]

IN THIS COURSE

2m 37s

FREE ACCESS
9m 7s

In this video, you will use the Pandas library to load a dataset in the form of a CSV file and perform some exploratory analysis on its features. FREE ACCESS
3. The Binarizer

6m 20s

In this video, learn how to transform continuous data in a series to binary values by using scikit-learn's Binarizer. FREE ACCESS
4. The MinMaxScaler

8m 38s

In this video, you will learn how to apply the MinMaxScaler on a dataset to get two similar columns to have the same range of values. FREE ACCESS
5. The StandardScaler

7m 54s

Learn how to standardize multiple columns in your dataset using the StandardScaler from scikit-learn. FREE ACCESS
6. The Normalizer

8m 54s

In this video, you will learn how to distinguish between the Normalizer and other scaling techniques, and apply this scaler on the continuous features of a dataset. FREE ACCESS
7. The MaxAbsScaler

5m 11s

To find out how to represent the values in a column as a proportion of the maximum absolute value, use the MaxAbsScaler. FREE ACCESS
8. Label Encoding

8m 46s

In this video, learn how to apply label encoding to the features and target in your dataset and recognize its limitations when applied to input features. FREE ACCESS
9. One-Hot Encoding

4m 23s

During this video, you will learn how to use the Pandas library to one-hot encode one or more features of your dataset and distinguish between this technique and label encoding. FREE ACCESS
10. Exercise: Building ML Training Sets

7m 31s

In this video, you will learn how to transform a continuous series into a categorical (binary) one, distinguish between Normalization and other scaling techniques, score each product as a proportion of the top product's sales, and encode the "VehicleType" field which contains values ["Hatchback", "Sedan", "SUV"]. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

Course Fundamentals of AI & ML: Advanced Data Science Methods

(136)

Audiobook Ensemble Methods for Machine Learning

Channel Introduction to Machine Learning Bootcamp

(1)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Linear Regression Models: Introduction to Logistic Regression

(13)

Course Convolutional and Recurrent Neural Networks

(45)

Course Getting Started with Python: Introduction

(4247)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills & Salary Report

ESG Impact Report

Building ML Training Sets: Introduction

WHAT YOU WILL LEARN

IN THIS COURSE

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE