Course details

Building ML Training Sets: Preprocessing Datasets for Linear Regression

Building ML Training Sets: Preprocessing Datasets for Linear Regression


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Discover how to implement scaling techniques such as standardizing and min-max scaling on continuous data and one-hot encoding on categorical features to improve the performance of linear regression models.



Expected Duration (hours)
0.9

Lesson Objectives

Building ML Training Sets: Preprocessing Datasets for Linear Regression

  • Course Overview
  • use the Pandas library to load a csv file into a dataframe and analyze its contents using Pandas and Matplotlib
  • create a linear regression model using scikit-learn to predict the sale price of a house and evaluate this model using metrics such as mean squared error and r-square
  • apply min-max scaling on the continuous fields and one-hot encoding on the categorical columns of a dataset
  • recognize the benefits of scaling and encoding datasets by evaluating the performance of a regression model built with preprocessed data
  • use scikit-learn's StandardScaler on the continuous features of a dataset and compare its effects with that of min-max scaling
  • identify the characteristics of the StandardScaler, encode a feature column which contains certain values, recall two metrics used to evaluate regression models, and enumerate the details conveyed in a Boxplot
  • Course Number:
    it_mlbmltdj_02_enus

    Expertise Level
    Beginner