The Multivariate Normal - Conditioning

MAKE_BOOK_FIGURES=Trueimport numpy as npimport scipy.stats as stimport matplotlib as mplimport matplotlib.pyplot as plt%matplotlib inlineimport matplotlib_inlinematplotlib_inline.backend_inline.set_matplotlib_formats('svg')import seaborn as snssns.set_context("paper")sns.set_style("ticks")def set_book_style():    plt.style.use('seaborn-v0_8-white')     sns.set_style("ticks")    sns.set_palette("deep")    mpl.rcParams.update({        # Font settings        'font.family': 'serif',  # For academic publishing        'font.size': 8,  # As requested, 10pt font        'axes.labelsize': 8,        'axes.titlesize': 8,        'xtick.labelsize': 7,  # Slightly smaller for better readability        'ytick.labelsize': 7,        'legend.fontsize': 7,                # Line and marker settings for consistency        'axes.linewidth': 0.5,        'grid.linewidth': 0.5,        'lines.linewidth': 1.0,        'lines.markersize': 4,                # Layout to prevent clipped labels        'figure.constrained_layout.use': True,                # Default DPI (will override when saving)        'figure.dpi': 600,        'savefig.dpi': 600,                # Despine - remove top and right spines        'axes.spines.top': False,        'axes.spines.right': False,                # Remove legend frame        'legend.frameon': False,                # Additional trim settings        'figure.autolayout': True,  # Alternative to constrained_layout        'savefig.bbox': 'tight',    # Trim when saving        'savefig.pad_inches': 0.1   # Small padding to ensure nothing gets cut off    })def set_notebook_style():    plt.style.use('seaborn-v0_8-white')    sns.set_style("ticks")    sns.set_palette("deep")    mpl.rcParams.update({        # Font settings - using default sizes        'font.family': 'serif',        'axes.labelsize': 10,        'axes.titlesize': 10,        'xtick.labelsize': 9,        'ytick.labelsize': 9,        'legend.fontsize': 9,                # Line and marker settings        'axes.linewidth': 0.5,        'grid.linewidth': 0.5,        'lines.linewidth': 1.0,        'lines.markersize': 4,                # Layout settings        'figure.constrained_layout.use': True,                # Remove only top and right spines        'axes.spines.top': False,        'axes.spines.right': False,                # Remove legend frame        'legend.frameon': False,                # Additional settings        'figure.autolayout': True,        'savefig.bbox': 'tight',        'savefig.pad_inches': 0.1    })def save_for_book(fig, filename, is_vector=True, **kwargs):    """    Save a figure with book-optimized settings.        Parameters:    -----------    fig : matplotlib figure        The figure to save    filename : str        Filename without extension    is_vector : bool        If True, saves as vector at 1000 dpi. If False, saves as raster at 600 dpi.    **kwargs : dict        Additional kwargs to pass to savefig    """        # Set appropriate DPI and format based on figure type    if is_vector:        dpi = 1000        ext = '.pdf'    else:        dpi = 600        ext = '.tif'        # Save the figure with book settings    fig.savefig(f"{filename}{ext}", dpi=dpi, **kwargs)def make_full_width_fig():    return plt.subplots(figsize=(4.7, 2.9), constrained_layout=True)def make_half_width_fig():    return plt.subplots(figsize=(2.35, 1.45), constrained_layout=True)if MAKE_BOOK_FIGURES:    set_book_style()else:    set_notebook_style()make_full_width_fig = make_full_width_fig if MAKE_BOOK_FIGURES else lambda: plt.subplots()make_half_width_fig = make_half_width_fig if MAKE_BOOK_FIGURES else lambda: plt.subplots()

The Multivariate Normal - Conditioning#

Consider the \(N\)-dimensional multivariate normal:

\[ \mathbf{X} \sim N\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}\right), \]

where \(\boldsymbol{\mu}\) is a \(N\)-dimensional vector, \(\boldsymbol{\Sigma}\) is a positive-definite matrix.

Now split \(\mathbf{X}\) into two vectors \(\mathbf{X}_1\) and \(\mathbf{X}_2\) of dimensions \(N_1\) and \(N_2\) (\(N_1 + N_2 = N\)):

\[\begin{split} \mathbf{X} = \begin{pmatrix} \mathbf{X}_1\\ \mathbf{X}_2 \end{pmatrix}. \end{split}\]

Similarly, split \(\boldsymbol{\mu}\) into two vectors \(\boldsymbol{\mu}_1\) and \(\boldsymbol{\mu}_2\) of dimensions \(N_1\) and \(N_2\) (\(N_1 + N_2 = N\)):

\[\begin{split} \boldsymbol{\mu} = \begin{pmatrix} \boldsymbol{\mu}_1\\ \boldsymbol{\mu}_2 \end{pmatrix}. \end{split}\]

Similarly for \(\boldsymbol{\Sigma}\):

\[\begin{split} \boldsymbol{\Sigma} = \begin{pmatrix} \boldsymbol{\Sigma}_1 & \boldsymbol{\Sigma}_{12}\\ \boldsymbol{\Sigma}_{12}^T&\boldsymbol{\Sigma}_2 \end{pmatrix}, \end{split}\]

where \(\boldsymbol{\Sigma}_{ii}\) are \(N_i\times N_i\) matrices, and \(\boldsymbol{\Sigma}_{12}\) is a \(N_1\times N_2\) matrix.

Using marginalization, we can show that:

\[ \mathbf{X}_1 \sim N\left(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_1\right). \]

and

\[ \mathbf{X}_2 \sim N\left(\boldsymbol{\mu}_2, \boldsymbol{\Sigma}_2\right). \]

Suppose now we know the value of \(\mathbf{X}_2\), i.e., \(\mathbf{X}_2 = \mathbf{x}_2\). What is the distribution of \(\mathbf{X}_1\)? To do this, we apply Bayes’ rule:

\[ p(\mathbf{x}_1|\mathbf{x}_2) = \frac{p(\mathbf{x}_1,\mathbf{x}_2)}{p(\mathbf{x}_2)}. \]

We have all the required terms on the right-hand side. We need to substitute and do the algebra. If we do it and we use the “complete the square” trick, we get:

\[ \mathbf{X}_1|\mathbf{X}_2 = \mathbf{x}_2 = N(\boldsymbol{\mu}_{1|2}, \boldsymbol{\Sigma}_{1|2})), \]

where

\[ \boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1+\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_2^{-1}(\mathbf{x}_2-\boldsymbol{\mu}_2), \]

and

\[ \boldsymbol{\Sigma}_{1|2} = \boldsymbol{\Sigma}_1-\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\Sigma}_{12}^T. \]

Note

More details If you want to see the details, read Chapter 2.3 of [Bishop, 2006].

Let’s demonstrate this with an example.

X ~ N(mu, Sigma),
mu = [1. 2.]
Sigma = 
[[2.  0.9]
 [0.9 4. ]]

x_2 = -1.00 (hypothetical observation)

Let’s plot the contour of the joint and see where \(x_2\) falls:

../_images/50d0885f13428d0a937a7c818ed3a1f0f8c2f268a959199f36163638f48bd182.svg

Intuitively, the probability density of getting a particular value \(x_1\) is proportional to the joint PDF of \(x_1\) and \(x_2\) at the intersection of the dashed line. Let’s see what is the answer we get from the theory. We need to calculate the mean and variance of \(x_1\) conditional on observing \(x_2\). Because \(x_1\) is one dimensional, it is very simple to implement the formula we have above.

Sigma11 = X.cov[0, 0]
Sigma12 = X.cov[0, 1]
Sigma22 = X.cov[1,1]

mu1 = X.mean[0]
mu2 = X.mean[1]

mu1_cond = mu1 + Sigma12 * (x2_observed - mu2) / Sigma22

Sigma11_cond = Sigma11 - Sigma12 ** 2 / Sigma22

print(f"x_1 | x_2 ~ N(mu = {mu1_cond:.2f}, sigma^2 = {Sigma11_cond:.2f})")

x_1 | x_2 ~ N(mu = 0.32, sigma^2 = 1.80)

Let’s plot this conditional pdf for \(x_1\) and compare it to its marginal pdf:

../_images/281c9df6ea0eaf953ac77e4c67a7854ea685870c82004b72e855b5fd47070bd9.svg

This is our first example of how Bayes’ rule can be used to condition on observations. In the plot above, you can think of \(p(x_1)\) as your state of knowledge about \(x_1\) before you observe \(x_2\). Because \(x_1\) and \(x_2\) are correlated, your state of knowledge about \(x_1\) changes after you observe \(x_2\). This is captured by the conditional \(p(x_1|x_2)\).

Questions#

Rerun the code above multiple times to see how the conditinal PDF moves around as other points are picked randomly.
Modify the code so that you get the conditional PDF of \(X_2\) given \(X_1=x_1\).

The Multivariate Normal - Conditioning

Contents

The Multivariate Normal - Conditioning#

Questions#