Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");

Visualizing Monte Carlo Uncertainty#

In the last two lectures, we repeatedly used the law of large numbers to estimate expectations using samples. In particular, we studied this integral:

\[ I = \mathbb{E}[g(X)]=\int g(x) p(x) dx, \]

where \(X\sim p(x)\) and \(g(x)\) is a function of \(x\). The sampling-based approximation required \(X_1,X_2,\dots\) be independent copies of \(X\). Then, we considered the random variables \(Y_1 = g(X_1), Y_2 = g(X_2), \dots\), which are also independent and identically distributed. The law of large states that their sampling average converges to their mean:

\[ \bar{I}_N=\frac{g(X_1)+\dots+g(X_N)}{N}=\frac{Y_1+\dots+Y_N}{N}\rightarrow I,\;\text{a.s.} \]

This is the Monte Carlo way of estimating integrals. If you played with the hands-on activities, you noticed that for small \( N \), we could get very different answers. Here we will build some intuition about this epistemic uncertainty induced by finite samples.

Example: 1D expectation#

Let’s try it out with the same test function we used before (Example 3.4 of [Robert and Casella, 2004]). Assume that \(X\sim\mathcal{U}([0,1])\) and pick:

\[ g(x) = \left(\cos(50x) + \sin(20x)\right)^2. \]

The correct value for the expectation is:

\[ \mathbb{E}[g(x)] = 0.965. \]

Let’s calculate the Monte Carlo estimate a few times and visualize its uncertainty:

Hide code cell source
# The function of x we would like to consider
g = lambda x: (np.cos(50 * x) + np.sin(20 * x)) ** 2

# How many times do you want to run MC
num_mc = 2

# Number of samples to take
N = 100

# A common plot for all estimates
fig, ax = plt.subplots()
# So do it ``num_mc`` times:
for i in range(num_mc):
    # Generate samples from X
    x_samples = np.random.rand(N)
    # Get the corresponding Y's
    y_samples = g(x_samples)
    # Evaluate the sample average for all sample sizes
    I_running = np.cumsum(y_samples) / np.arange(1, N + 1)
    ax.plot(np.arange(1, N+1), I_running, 'b', lw=0.5)
# The true value
ax.plot(np.arange(1, N+1), [0.965] * N, color='r')
# and the labels
ax.set_xlabel('$N$')
ax.set_ylabel(r'$\bar{I}_N$')
sns.despine(trim=True);
../_images/111de4b14bdad986cc1cbef939cf8554c2a86db9a8d27fe2b6a395efa42807ac.svg

Questions#

  • Run the code 2-3 times to observe that you get a slightly different answer every time.

  • Set the number of Monte Carlo samples num_mc to 100 (or higher). Observe how different MC runs envelop the correct answer. This is epistemic uncertainty. How can we get it without running this repeatedly?

  • Now increase N to 10000 and see how the uncertainty disappears.