Building ML Training Sets: Preprocessing Datasets for Linear Regression

Machine Learning    |    Beginner
  • 7 videos | 50m 12s
  • Includes Assessment
  • Earns a Badge
Rating 4.3 of 32 users Rating 4.3 of 32 users (32)
This 7-video course helps learners discover how to implement machine learning scaling techniques such as standardizing and min-max scaling on continuous data and one-hot encoding on categorical features to improve performance of linear regression models. In the first tutorial, you will use Pandas library to load a CSV file into a data frame and analyze its contents by using Pandas and Matplotlib. You will then learn how to create a linear regression model with scikit-learn to predict the sale price of a house and evaluate this model by using metrics such as mean squared error and r-square. Next, learners will examine the application of min-max scaling on continuous fields and one-hot encoding on the categorical columns of a data set. Then analyze effects of preprocessing by recognizing benefits of scaling and encoding data sets by evaluating the performance of a regression model built with preprocessed data. Also, learn how to use scikit-learn's StandardScaler on a data set's continuous features and compare its effects with that of min-max scaling. The concluding exercise involves preprocessing data for regression.

WHAT YOU WILL LEARN

  • Use the pandas library to load a csv file into a dataframe and analyze its contents using pandas and matplotlib
    Create a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-square
    Apply min-max scaling on the continuous fields and one-hot encoding on the categorical columns of a dataset
  • Recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed data
    Use scikit-learn's standardscaler on the continuous features of a dataset and compare its effects with that of min-max scaling
    Identify the characteristics of the standardscaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a boxplot

IN THIS COURSE

  • 2m 40s
  • 9m 21s
    During this video, you will learn how to use the Pandas library to load a csv file into a dataframe and analyze its contents using Pandas and Matplotlib. FREE ACCESS
  • Locked
    3.  Building and Evaluating a Linear Regression Model
    8m 30s
    In this video, you will create a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-square. FREE ACCESS
  • Locked
    4.  Scaling and Encoding the Data
    6m 15s
    Learn how to apply min-max scaling to the continuous fields and one-hot encoding to the categorical columns of a dataset. FREE ACCESS
  • Locked
    5.  Analyzing the Effects of Preprocessing
    7m 29s
    After completing this video, you will be able to recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed data. FREE ACCESS
  • Locked
    6.  Standardizing Continuous Data
    7m 37s
    In this video, you will use scikit-learn's StandardScaler on the continuous features of a dataset and compare its effects with that of min-max scaling. FREE ACCESS
  • Locked
    7.  Exercise: Preprocessing Data for Regression
    8m 20s
    In this video, find out how to identify the characteristics of the StandardScaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a Boxplot. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Rating 4.3 of 164 users Rating 4.3 of 164 users (164)
Rating 4.0 of 40 users Rating 4.0 of 40 users (40)
Rating 4.3 of 39 users Rating 4.3 of 39 users (39)