# Building ML Training Sets: Preprocessing Datasets for Linear Regression

Machine Learning    |    Beginner
• 7 videos | 50m 12s
• Includes Assessment
Rating 4.3 of 32 users (32)
This 7-video course helps learners discover how to implement machine learning scaling techniques such as standardizing and min-max scaling on continuous data and one-hot encoding on categorical features to improve performance of linear regression models. In the first tutorial, you will use Pandas library to load a CSV file into a data frame and analyze its contents by using Pandas and Matplotlib. You will then learn how to create a linear regression model with scikit-learn to predict the sale price of a house and evaluate this model by using metrics such as mean squared error and r-square. Next, learners will examine the application of min-max scaling on continuous fields and one-hot encoding on the categorical columns of a data set. Then analyze effects of preprocessing by recognizing benefits of scaling and encoding data sets by evaluating the performance of a regression model built with preprocessed data. Also, learn how to use scikit-learn's StandardScaler on a data set's continuous features and compare its effects with that of min-max scaling. The concluding exercise involves preprocessing data for regression.

## WHAT YOU WILL LEARN

• Use the pandas library to load a csv file into a dataframe and analyze its contents using pandas and matplotlib
Create a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-square
Apply min-max scaling on the continuous fields and one-hot encoding on the categorical columns of a dataset
• Recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed data
Use scikit-learn's standardscaler on the continuous features of a dataset and compare its effects with that of min-max scaling
Identify the characteristics of the standardscaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a boxplot

## IN THIS COURSE

• During this video, you will learn how to use the Pandas library to load a csv file into a dataframe and analyze its contents using Pandas and Matplotlib.
• 3.  Building and Evaluating a Linear Regression Model
In this video, you will create a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-square.
• 4.  Scaling and Encoding the Data
Learn how to apply min-max scaling to the continuous fields and one-hot encoding to the categorical columns of a dataset.
• 5.  Analyzing the Effects of Preprocessing
After completing this video, you will be able to recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed data.
• 6.  Standardizing Continuous Data
In this video, you will use scikit-learn's StandardScaler on the continuous features of a dataset and compare its effects with that of min-max scaling.
• 7.  Exercise: Preprocessing Data for Regression
In this video, find out how to identify the characteristics of the StandardScaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a Boxplot.

## EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

## YOU MIGHT ALSO LIKE

Rating 4.3 of 7 users (7)
Rating 5.0 of 1 users (1)
Rating 4.4 of 36 users (36)

## PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Rating 4.3 of 39 users (39)
Rating 4.3 of 164 users (164)
Rating 3.8 of 19 users (19)