Linear Regression via Least Squares#

Supervised learning#

Suppose that you observe \(n\), \(d\)-dimensional, inputs:

\[ \mathbf{x}_{1:n} = \{\mathbf{x}_1,\dots,\mathbf{x}_n\}, \]

and outputs:

\[ \mathbf{y}_{1:n} = \{y_1,\dots,y_n\}. \]

The supervised learning problem consists of using the data \(\mathbf{x}_{1:n}\) and \(\mathbf{y}_{1:n}\) to find the map that connects the inputs to the outputs.

We call the inputs \(\mathbf{x}\) features.

The outputs \(\mathbf{y}\) are also called targets.

The regression problem#

When the outputs are continuous variables, e.g., dollars, weight, and mass, we say we have a regression problem. You may have heard of linear regression. That’s when the map that connects the inputs to the outputs is a linear function. There is also the generalized linear regression problem, where the map is a nonlinear function but linear in the parameters. For example, the map could be a polynomial function of the inputs. You can also do nonlinear regression, where the map is a nonlinear function, and the map is nonlinear in the parameters. Neural networks are an example of nonlinear regression.

The classification problem#

When the outputs are discrete labels, e.g., 0 or 1, “cat” or “dog,” we say we have a classification problem. Logistic regression is an example of a classification problem. You can also solve the classification problem with neural networks.

How do we train a supervised learning model?#

There are two basic ways to train a supervised learning model:

  1. Optimization. We find the parameters by maximizing the likelihood of the data. Equivalently, we minimize the negative log-likelihood of the data (the loss).

  2. Probabilistic inference. We apply Bayes’ rule to find the posterior distribution of the parameters given the data. We then use the posterior distribution to make predictions. We can characterize the posterior distribution analytically, sample from it, or approximate it with a simpler distribution.

Note

This lecture moves fast. If you have never seen least squares before, please go over the following material from my undergraduate course: