Simple Linear Regression

Simple linear regression is the most basic model in machine learning where we fit a scalar variable y from another scalar variable x through a linear model. We illustrate simple linear regression with a basic demo for automobile data. The lab then tests the concepts on a standard housing dataset.

Simple Linear Regression for Automobile mpg Data

In this demo, you will see how to:

  • Load data from a text fileausing the pandas package.
  • Create a scatter plot of data.
  • Handle missing data.
  • Fit a simple linear model.
  • Plot the linear fit with the test data.
  • Use a nonlinear transformation for an improved fit.

Loading the Data

The python pandas library is a powerful package for data analysis. In this course, we will use a small portion of its features — just reading and writing data from files. After reading the data, we will convert it to numpy for all numerical processing including running machine learning algorithms.

We begin by loading the packages.

import pandas as pd
import numpy as np

The data for this demo comes from a survey of cars to determine the relation of mpg to engine characteristics. The data can be found in the UCI library: https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg