Singular Value Decomposition

Singular Value Decomposition#

Singular value decomposition (SVD) is a factorization of a matrix into three matrices. It is used in many applications such as data compression, denoising, and solving linear systems of equations. In scientific machine learning, it is used in principal component analysis (PCA), Karhunen-Loève transform, dynamic mode decomposition, and proper orthogonal decomposition.

More details on the theory can be found on the book Data-driven Science and Engineering.

Let \(\mathbf{X}\) be an \(n \times m\) matrix. Think of \(\mathbf{X}\) as matrix you can make when doing \(n\) experiments and measuring \(m\) different things. The SVD of \(\mathbf{X}\) is given by

\[ \mathbf{X} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T \]

where \(\mathbf{U}\) is an \(n \times n\) orthogonal matrix, \(\mathbf{\Sigma}\) is an \(n \times m\) matrix with non-negative real numbers on the diagonal and zeros elsewhere, and \(\mathbf{V}\) is an \(m \times m\) orthogonal matrix.

The columns of \(\mathbf{U}\) are called the left singular vectors of \(\mathbf{X}\), the columns of \(\mathbf{V}\) are called the right singular vectors of \(\mathbf{X}\), and the diagonal elements of \(\mathbf{\Sigma}\) are called the singular values of \(\mathbf{X}\).

Economy-size SVD#

Assume that \(n \geq m\). Then, \(\mathbf{\Sigma}\) has the form:

\[\begin{split} \mathbf{\Sigma} = \begin{bmatrix} \hat{\mathbf{\Sigma}} \\ \mathbf{0} \end{bmatrix}, \end{split}\]

where \(\hat{\mathbf{\Sigma}}\) is an \(m \times m\) diagonal matrix with non-negative real numbers on the diagonal and zeros elsewhere, and \(\mathbf{0}\) is an \((n-m) \times m\) matrix with zeros. Now, only the first \(m\) columns of \(\mathbf{U}\) are needed to represent \(\mathbf{X}\). We write the economy-size SVD as:

\[ \mathbf{X} = \mathbf{U}_m \hat{\mathbf{\Sigma}} \mathbf{V}^T, \]

where \(\mathbf{U}_m\) is an \(n \times m\) matrix with the first \(m\) columns of \(\mathbf{U}\).

Truncated SVD#

The truncated SVD is a low-rank approximation of \(\mathbf{X}\). It is given by:

\[ \mathbf{X} \approx \mathbf{U}_k \hat{\mathbf{\Sigma}}_k \mathbf{V}_k^T, \]

where \(\mathbf{U}_k\) is an \(n \times k\) matrix with the first \(k\) columns of \(\mathbf{U}\), \(\hat{\mathbf{\Sigma}}_k\) is a \(k \times k\) diagonal matrix with the first \(k\) singular values of \(\mathbf{\Sigma}\), and \(\mathbf{V}_k\) is an \(m \times k\) matrix with the first \(k\) columns of \(\mathbf{V}\).

We can also write:

\[ \mathbf{X} \approx \mathbf{X}_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T, \]

where \(\sigma_i\) is the \(i\)-th singular value, and \(\mathbf{u}_i\) and \(\mathbf{v}_i\) are the \(i\)-th columns of \(\mathbf{U}\) and \(\mathbf{V}\), respectively.

One can show that the matrix \(\mathbf{X}_k\) is the best rank-\(k\) approximation of \(\mathbf{X}\) in the Frobenius norm.

Demonstration - SVD for image compression#

Let’s download an image and compress it using the truncated SVD.

# If you are working on Google Colab run this to download the image:
!curl -O "https://raw.githubusercontent.com/PredictiveScienceLab/advanced-scientific-machine-learning/main/book/hup/funcin/neom-DMGDdksVoWI-unsplash.jpg"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  751k  100  751k    0     0   148k      0  0:00:05  0:00:05 --:--:--  187k

from PIL import Image
with open("neom-DMGDdksVoWI-unsplash.jpg", "rb") as f:
    img = Image.open(f)
    img = img.convert("L")  # convert to grayscale
img

../../_images/c07a47949509536c33045ebd2ab73875ceb3596982a98ed3645cc868e6b4b99d.png

Extract the matrix that represents the image:

import numpy as np
X = np.array(img)
X.shape

(1798, 2400)

Do SVD of the matrix:

import scipy

U, s, Vt = np.linalg.svd(X, full_matrices=False)

Let’s look at the singular values as a function of the index:

fig, ax = plt.subplots()
ax.plot(s)
ax.set(yscale='log', xlabel="Singular value index", ylabel="Singular value")
sns.despine();

../../_images/c396d213ec67cdb877027d512ec4feb3f47f04325cef31b7e9c7e9bad5dc8809.svg

Typically, we pick the number of singular values such that the sum of the squares of the singular values is a certain percentage of the sum of the squares of all the singular values. For example, we can pick \(k\) so that:

\[ \frac{\sum_{i=1}^k \sigma_i^2}{\sum_{i=1}^m \sigma_i^2} \geq 0.998. \]

To facilitate our choice we can do the following plot:

../../_images/ddea5bd29e02f4349ebad08b199a5bf44a5ced5f1da05d9442f94991beaf9da1.svg

Let’s plot the compressed images for some choices of \(k\):

ks = [1, 2, 4, 8, 16, 32, 64, 128, 256, 682]

for k in ks:
    X_k = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
    fig, ax = plt.subplots()
    ax.imshow(X_k, cmap='gray')
    ax.axis('off')
    ax.set_title(f"k={k}, compressed size = {100*((X.shape[0] + X.shape[1]) * k)/np.prod(X.shape):.2f}% original")

../../_images/789303abac85f7b9a8142360bbe20b2271a7b967d4db014bc1cf647206f19d8c.svg

../../_images/abff9bd0d76344a2602648d2473dbf79810ccc86db99445dee817b22c4499b66.svg

../../_images/b8f8bb416095413b33138de1661fb64c3007bf485f78353dcae178b8af7dc11d.svg

../../_images/cdb5fe53481bdac19db493854c46e921616e4dc87215d5d59f013b31429dce9f.svg

../../_images/827384c85c436901728ea435642fb81d6eae27dee3cc0982665aa3f84dfd97be.svg

../../_images/6d4e40949c6200124564249627be9a4b8f0ebcb7bb9168e81ef0be02c907e466.svg

../../_images/c638f470a3220026d60f714cb633bceec50a4d8f39f6199f8c2d2609a88aef8d.svg

../../_images/fe602694e5b499fd3318244b3ed08a16586db24c8ae5cc17686c7a32016a685c.svg

../../_images/ec7a3b28fde802439dcde66925dde78530669f63bb691e8158042d99c936c99f.svg

../../_images/637695fe86a49fd4eb6bc955aaebad1363642093b37378ff59200b8366c832f9.svg

Singular Value Decomposition

Contents

Singular Value Decomposition#

Economy-size SVD#

Truncated SVD#

Demonstration - SVD for image compression#