Building ML Training Sets: Preprocessing Datasets for Linear Regression

Machine Learning    |    Beginner
  • 7 Videos | 52m 42s
  • Includes Assessment
  • Earns a Badge
Likes 33 Likes 33
This 7-video course helps learners discover how to implement machine learning scaling techniques such as standardizing and min-max scaling on continuous data and one-hot encoding on categorical features to improve performance of linear regression models. In the first tutorial, you will use Pandas library to load a CSV file into a data frame and analyze its contents by using Pandas and Matplotlib. You will then learn how to create a linear regression model with scikit-learn to predict the sale price of a house and evaluate this model by using metrics such as mean squared error and r-square. Next, learners will examine the application of min-max scaling on continuous fields and one-hot encoding on the categorical columns of a data set. Then analyze effects of preprocessing by recognizing benefits of scaling and encoding data sets by evaluating the performance of a regression model built with preprocessed data. Also, learn how to use scikit-learn's StandardScaler on a data set's continuous features and compare its effects with that of min-max scaling. The concluding exercise involves preprocessing data for regression.  

WHAT YOU WILL LEARN

  • use the Pandas library to load a csv file into a dataframe and analyze its contents using Pandas and Matplotlib
    create a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-square
    apply min-max scaling on the continuous fields and one-hot encoding on the categorical columns of a dataset
  • recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed data
    use scikit-learn's StandardScaler on the continuous features of a dataset and compare its effects with that of min-max scaling
    identify the characteristics of the StandardScaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a Boxplot

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    2m 40s
    UP NEXT
  • Playable
    2. 
    Loading and Analyzing a Dataset
    9m 21s
  • Locked
    3. 
    Building and Evaluating a Linear Regression Model
    8m 30s
  • Locked
    4. 
    Scaling and Encoding the Data
    6m 15s
  • Locked
    5. 
    Analyzing the Effects of Preprocessing
    7m 29s
  • Locked
    6. 
    Standardizing Continuous Data
    7m 37s
  • Locked
    7. 
    Exercise: Preprocessing Data for Regression
    8m 20s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Likes 44 Likes 44  
Likes 70 Likes 70  
Likes 39 Likes 39