Hide code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");

The Multivariate Normal - Diagonal Covariance Case#

Consider the \(N\)-dimensional multivariate normal:

\[ \mathbf{X} \sim N\left(\boldsymbol{\mu}, \operatorname{diag}\left(\sigma_1^2,\dots,\sigma_N^2\right)\right), \]

where \(\boldsymbol{\mu}\) is a \(N\)-dimensional vector, \(\sigma_i\) are positive numbers. This is known as the multivariate normal with diagonal covariance. All the components of the vector are independent. Let’s first visualize the joint PDF of this random vector in 2D. We are going to plot its contours.

import scipy.stats as st
import numpy as np
# The mean vector
mu = [1.0, 2.0]
# The variance of each component
sigma2 = [1.0, 1.0]
# The covariance matrix of the multivariate normal
Sigma = np.diag(sigma2)
# Create the random variable using scipy.stats
X = st.multivariate_normal(mean=mu, cov=Sigma)

Here is how you can sample from \(X\).

X.rvs()
array([-0.80991631,  2.69009153])

Here is how you can evaluate the PDF at a point.

pdf_val_X = X.pdf([0.5, -1.0])
print(f"PDF at (0.5, -1.0) = {pdf_val_X:.3e}")
PDF at (0.5, -1.0) = 1.560e-03

Now, let’s do the contour of the PDF. We will do it slowlly. It is very important to learn how to do contours in Python.

It takes three steps. First, we need a grid of x1 and x2 points.

# Points along x1 dimension
x1 = np.linspace(-3, 5, 64)
# Points along x2 dimension
x2 = np.linspace(-3, 5, 64)
# Create the grid
X1, X2 = np.meshgrid(x1, x2)
# These are 64 x 64 matrices
print(X1.shape)
(64, 64)

Here is how your grid looks like:

fig, ax = plt.subplots()
ax.plot(X1.flatten(), X2.flatten(), '.', markersize=1)
sns.despine(trim=True)
ax.set_xlabel(r'$x_1$')
ax.set_ylabel(r'$x_2$');
../_images/9d26c52e2ef26f4450565ba7f64ac1f378bae55f5a20b58750100732d07df869.svg

We are going to evaluate the PDF on all these points. To this end, we create a (64^2) x 2 array of all the grid points by flattening X1 and X2:

X_flat = np.hstack(
    [
        X1.flatten()[:, None],
        X2.flatten()[:, None]
    ]
)
print(X_flat.shape)
(4096, 2)

Now we can evaluate the PDF at all these points:

z = X.pdf(X_flat)
print(z.shape)
print(z)
(4096,)
[1.98968008e-10 3.28001464e-10 5.32065780e-10 ... 1.58606574e-06
 9.77758585e-07 5.93115274e-07]

We reshape z to 64 x 64:

Z = z.reshape((64, 64))
print(Z.shape)
(64, 64)

And now we can plot the contour:

fig, ax = plt.subplots()
c = ax.contour(X1, X2, Z, levels=5)
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.clabel(c, inline=1, fontsize=10);
sns.despine(trim=True);
../_images/8de8afb8c279a728dc845e4d4685892173c7f83ca5f158809811ff9dda91237c.svg

Now, let’s take samples of X and add them in this figure:

num_samples = 500
x_samples = X.rvs(size=num_samples)
ax.plot(x_samples[:, 0], x_samples[:, 1], '.', markersize=2, alpha=0.5, label="samples")
fig
../_images/8572eed7b5501badb338179cc78646622b27080b89d680db4ed2eb17f03cd4ad.svg

Questions#

  • Rerun the steps above after moving \(\boldsymbol{\mu}\) to \((0, 1)\). Observe how the contours of the PDF move.

  • Rerun the steps above for \(\sigma_1^2 = 0.1\) and \(\sigma_2^2 = 1\). Observe how the contours of the PDF change.

  • Rerun the steps above for \(\sigma_1^2 = 1\) and \(\sigma_2^2 = 0.1\). Observe how the contours of the PDF change.

  • Rerun the steps above for \(\sigma_1^2 = 1\) and \(\sigma_2^2 = 0.01\). Observe how the contours of the PDF change.

Sampling the multivariate normal with diagonal covariance using the standard normal#

We will show how you can sample from a multivariate normal with diagonal covariance using samples from the standard normal. Recall that in Lecture 4 we found a connection between any univariate normal and the standard normal. We will now extend this to the multivariate case. Let \(\mathbf{Z}\) be an \(N\)-dimensional standard normal:

\[ \mathbf{Z} \sim N(0,\mathbf{I}), \]

where \(\mathbf{I}\) is the \(N\times N\) unit matrix (all zeros except the diagonal which is all ones). Define the random vector:

\[ \mathbf{X} = \boldsymbol{\mu} + \operatorname{diag}\left(\sigma_1, \dots,\sigma_N\right)\mathbf{Z}. \]

Observe that its component \(X_i\) is given by:

\[ X_i = \mu_i + \sigma_i Z_i. \]

So, using what we learned in Lecture 4, we have:

\[ X_i \sim N\left(\mu_i, \sigma_i^2\right). \]

Collectively, we have:

\[ \mathbf{X} \sim N\left(\boldsymbol{\mu}, \operatorname{diag}\left(\sigma_1^2,\dots,\sigma_N^2\right)\right). \]

Let’s verify this by creating samples of \(\mathbf{X}\) using samples of \(\mathbf{Z}\) and then plotting these samples together with the contours of \(N\left(\boldsymbol{\mu}, \operatorname{diag}\left(\sigma_1^2,\dots,\sigma_N^2\right)\right)\).

# The multivariate normal that you want to study:
# The mean vector
mu = [1.0, 2.0]
# The variance of each component
sigma2 = [1.0, 1.0]
# The covariance matrix of the multivariate normal
Sigma = np.diag(sigma2)
# Create the random variable using scipy.stats
X = st.multivariate_normal(mean=mu, cov=Sigma)

# The number of samples you want to take:
num_samples = 500
# Here is how you can sample from Z:
z_samples = np.random.randn(num_samples, 2)
# Transforms these to samples of X 
# (2-vector + (N x 2)-matrix * (2 x 2)-matrix = 2-vector + (N x 2)-matrix)
# = (N x 2)-matrix
# Please pay attention to the sqrt(Sigma)
# NOT just Sigma.
x_samples = mu + z_samples @ np.sqrt(Sigma)

# ******************************************************
# In case you missed it, "@" does matrix multiplication.
# ******************************************************

# Visualize everything
fig, ax = plt.subplots()
x1 = np.linspace(-3, 5, 64)
X1, X2 = np.meshgrid(x1, x1)
X_flat = np.hstack(
    [
        X1.flatten()[:, None],
        X2.flatten()[:, None]
    ]
)
pdf_X_flat = X.pdf(X_flat).reshape(X1.shape)
c = ax.contour(X1, X2, pdf_X_flat)
ax.clabel(c, inline=1, fontsize=10);
ax.plot(
    x_samples[:, 0],
    x_samples[:, 1],
    '.',
    markersize=2
)
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
sns.despine(trim=True);
../_images/cd8bed3cbb5120904fad4348fb2d56af9b31d821fe1bafdd3c29e3db4a2d4fc3.svg

Questions#

  • Rerun the steps above changing \(\boldsymbol{\mu}, \sigma_1^2\) and \(\sigma_2^2\) and observe that you are always getting the correct samples.