Building ML Training Sets: Preprocessing Datasets for Linear Regression
Machine Learning
| Beginner
- 7 Videos | 50m 12s
- Includes Assessment
- Earns a Badge
This 7-video course helps learners discover how to implement machine learning scaling techniques such as standardizing and min-max scaling on continuous data and one-hot encoding on categorical features to improve performance of linear regression models. In the first tutorial, you will use Pandas library to load a CSV file into a data frame and analyze its contents by using Pandas and Matplotlib. You will then learn how to create a linear regression model with scikit-learn to predict the sale price of a house and evaluate this model by using metrics such as mean squared error and r-square. Next, learners will examine the application of min-max scaling on continuous fields and one-hot encoding on the categorical columns of a data set. Then analyze effects of preprocessing by recognizing benefits of scaling and encoding data sets by evaluating the performance of a regression model built with preprocessed data. Also, learn how to use scikit-learn's StandardScaler on a data set's continuous features and compare its effects with that of min-max scaling. The concluding exercise involves preprocessing data for regression.
WHAT YOU WILL LEARN
-
use the Pandas library to load a csv file into a dataframe and analyze its contents using Pandas and Matplotlibcreate a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-squareapply min-max scaling on the continuous fields and one-hot encoding on the categorical columns of a dataset
-
recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed datause scikit-learn's StandardScaler on the continuous features of a dataset and compare its effects with that of min-max scalingidentify the characteristics of the StandardScaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a Boxplot
IN THIS COURSE
-
1.Course Overview2m 40sUP NEXT
-
2.Loading and Analyzing a Dataset9m 21s
-
3.Building and Evaluating a Linear Regression Model8m 30s
-
4.Scaling and Encoding the Data6m 15s
-
5.Analyzing the Effects of Preprocessing7m 29s
-
6.Standardizing Continuous Data7m 37s
-
7.Exercise: Preprocessing Data for Regression8m 20s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform
Digital badges are yours to keep, forever.