Show code cell source
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");
Visualizing Monte Carlo Uncertainty#
In the last two lectures, we repeatedly used the law of large numbers to estimate expectations using samples. In particular, we studied this integral:
where \(X\sim p(x)\) and \(g(x)\) is a function of \(x\). The sampling-based approximation required \(X_1,X_2,\dots\) be independent copies of \(X\). Then, we considered the random variables \(Y_1 = g(X_1), Y_2 = g(X_2), \dots\), which are also independent and identically distributed. The law of large states that their sampling average converges to their mean:
This is the Monte Carlo way of estimating integrals. If you played with the hands-on activities, you noticed that for small \( N \), we could get very different answers. Here we will build some intuition about this epistemic uncertainty induced by finite samples.
Example: 1D expectation#
Let’s try it out with the same test function we used before (Example 3.4 of [Robert and Casella, 2004]). Assume that \(X\sim\mathcal{U}([0,1])\) and pick:
The correct value for the expectation is:
Let’s calculate the Monte Carlo estimate a few times and visualize its uncertainty:
Show code cell source
# The function of x we would like to consider
g = lambda x: (np.cos(50 * x) + np.sin(20 * x)) ** 2
# How many times do you want to run MC
num_mc = 2
# Number of samples to take
N = 100
# A common plot for all estimates
fig, ax = plt.subplots()
# So do it ``num_mc`` times:
for i in range(num_mc):
# Generate samples from X
x_samples = np.random.rand(N)
# Get the corresponding Y's
y_samples = g(x_samples)
# Evaluate the sample average for all sample sizes
I_running = np.cumsum(y_samples) / np.arange(1, N + 1)
ax.plot(np.arange(1, N+1), I_running, 'b', lw=0.5)
# The true value
ax.plot(np.arange(1, N+1), [0.965] * N, color='r')
# and the labels
ax.set_xlabel('$N$')
ax.set_ylabel(r'$\bar{I}_N$')
sns.despine(trim=True);
Questions#
Run the code 2-3 times to observe that you get a slightly different answer every time.
Set the number of Monte Carlo samples
num_mc
to 100 (or higher). Observe how different MC runs envelop the correct answer. This is epistemic uncertainty. How can we get it without running this repeatedly?Now increase
N
to 10000 and see how the uncertainty disappears.