Unsupervised Learning#

You are given some observations \(\mathbf{x}_{1:n} = (\mathbf{x}_1,\dots,\mathbf{x}_n)\), e.g., a bunch of pictures, and you want to find some structure in the data. This is an unsupervised learning problem. The critical difference from supervised learning is that you do not have any labels/targets/outputs. The unsupervised learning problem may sound very open-ended (and it is), but here are some specific examples we are going to study:

  • Clustering: Split the observations into, say \(K\), distinct clusters.

  • Dimensionality reduction: Reduce the dimensionality of the data.

  • Density estimation: Learn the probability density that gave rise to the data, i.e., learn how to generate new samples with similar features as the observations.

  • and more.

Each one of these tasks is an unsupervised learning problem and requires several lectures to develop fully. Since we do not have the time, we will study the basic algorithms for the three most popular unsupervised learning problems: clustering, density estimation, and dimensionality reduction.