Building ML Training Sets: Preprocessing Datasets for Linear Regression
Machine Learning
| Beginner
- 7 videos | 50m 12s
- Includes Assessment
- Earns a Badge
This 7-video course helps learners discover how to implement machine learning scaling techniques such as standardizing and min-max scaling on continuous data and one-hot encoding on categorical features to improve performance of linear regression models. In the first tutorial, you will use Pandas library to load a CSV file into a data frame and analyze its contents by using Pandas and Matplotlib. You will then learn how to create a linear regression model with scikit-learn to predict the sale price of a house and evaluate this model by using metrics such as mean squared error and r-square. Next, learners will examine the application of min-max scaling on continuous fields and one-hot encoding on the categorical columns of a data set. Then analyze effects of preprocessing by recognizing benefits of scaling and encoding data sets by evaluating the performance of a regression model built with preprocessed data. Also, learn how to use scikit-learn's StandardScaler on a data set's continuous features and compare its effects with that of min-max scaling. The concluding exercise involves preprocessing data for regression.
WHAT YOU WILL LEARN
-
Use the pandas library to load a csv file into a dataframe and analyze its contents using pandas and matplotlibCreate a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-squareApply min-max scaling on the continuous fields and one-hot encoding on the categorical columns of a dataset
-
Recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed dataUse scikit-learn's standardscaler on the continuous features of a dataset and compare its effects with that of min-max scalingIdentify the characteristics of the standardscaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a boxplot
IN THIS COURSE
-
2m 40s
-
9m 21sDuring this video, you will learn how to use the Pandas library to load a csv file into a dataframe and analyze its contents using Pandas and Matplotlib. FREE ACCESS
-
8m 30sIn this video, you will create a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-square. FREE ACCESS
-
6m 15sLearn how to apply min-max scaling to the continuous fields and one-hot encoding to the categorical columns of a dataset. FREE ACCESS
-
7m 29sAfter completing this video, you will be able to recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed data. FREE ACCESS
-
7m 37sIn this video, you will use scikit-learn's StandardScaler on the continuous features of a dataset and compare its effects with that of min-max scaling. FREE ACCESS
-
8m 20sIn this video, find out how to identify the characteristics of the StandardScaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a Boxplot. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.YOU MIGHT ALSO LIKE
Audiobook
Managing Machine Learning Projects
Audiobook
Feature Engineering Bookcamp
Audiobook
Ensemble Methods for Machine Learning