The Principle of Maximum Entropy for Discrete Random Variables

Hide code cell source
MAKE_BOOK_FIGURES=Trueimport numpy as npimport scipy.stats as stimport matplotlib as mplimport matplotlib.pyplot as plt%matplotlib inlineimport matplotlib_inlinematplotlib_inline.backend_inline.set_matplotlib_formats('svg')import seaborn as snssns.set_context("paper")sns.set_style("ticks")def set_book_style():    plt.style.use('seaborn-v0_8-white')     sns.set_style("ticks")    sns.set_palette("deep")    mpl.rcParams.update({        # Font settings        'font.family': 'serif',  # For academic publishing        'font.size': 8,  # As requested, 10pt font        'axes.labelsize': 8,        'axes.titlesize': 8,        'xtick.labelsize': 7,  # Slightly smaller for better readability        'ytick.labelsize': 7,        'legend.fontsize': 7,                # Line and marker settings for consistency        'axes.linewidth': 0.5,        'grid.linewidth': 0.5,        'lines.linewidth': 1.0,        'lines.markersize': 4,                # Layout to prevent clipped labels        'figure.constrained_layout.use': True,                # Default DPI (will override when saving)        'figure.dpi': 600,        'savefig.dpi': 600,                # Despine - remove top and right spines        'axes.spines.top': False,        'axes.spines.right': False,                # Remove legend frame        'legend.frameon': False,                # Additional trim settings        'figure.autolayout': True,  # Alternative to constrained_layout        'savefig.bbox': 'tight',    # Trim when saving        'savefig.pad_inches': 0.1   # Small padding to ensure nothing gets cut off    })def set_notebook_style():    plt.style.use('seaborn-v0_8-white')    sns.set_style("ticks")    sns.set_palette("deep")    mpl.rcParams.update({        # Font settings - using default sizes        'font.family': 'serif',        'axes.labelsize': 10,        'axes.titlesize': 10,        'xtick.labelsize': 9,        'ytick.labelsize': 9,        'legend.fontsize': 9,                # Line and marker settings        'axes.linewidth': 0.5,        'grid.linewidth': 0.5,        'lines.linewidth': 1.0,        'lines.markersize': 4,                # Layout settings        'figure.constrained_layout.use': True,                # Remove only top and right spines        'axes.spines.top': False,        'axes.spines.right': False,                # Remove legend frame        'legend.frameon': False,                # Additional settings        'figure.autolayout': True,        'savefig.bbox': 'tight',        'savefig.pad_inches': 0.1    })def save_for_book(fig, filename, is_vector=True, **kwargs):    """    Save a figure with book-optimized settings.        Parameters:    -----------    fig : matplotlib figure        The figure to save    filename : str        Filename without extension    is_vector : bool        If True, saves as vector at 1000 dpi. If False, saves as raster at 600 dpi.    **kwargs : dict        Additional kwargs to pass to savefig    """        # Set appropriate DPI and format based on figure type    if is_vector:        dpi = 1000        ext = '.pdf'    else:        dpi = 600        ext = '.tif'        # Save the figure with book settings    fig.savefig(f"{filename}{ext}", dpi=dpi, **kwargs)def make_full_width_fig():    return plt.subplots(figsize=(4.7, 2.9), constrained_layout=True)def make_half_width_fig():    return plt.subplots(figsize=(2.35, 1.45), constrained_layout=True)if MAKE_BOOK_FIGURES:    set_book_style()else:    set_notebook_style()make_full_width_fig = make_full_width_fig if MAKE_BOOK_FIGURES else lambda: plt.subplots()make_half_width_fig = make_half_width_fig if MAKE_BOOK_FIGURES else lambda: plt.subplots()

The Principle of Maximum Entropy for Discrete Random Variables#

The Brandeis dice problem#

This problem is from the 1962 Brandeis lectures of E. T. Jaynes.

When a die is tossed, the number of spots up can have any value \(x\) in \(1,\dots,6\). Suppose a die has been tossed \(N\) times, and we are told only that the average number of spots up was not \(3.5\) (as we might expect from an “honest” die) but 4.5. Given this information, and nothing else, what probability should we assign to \(x\) spots on the next toss?

Let \(X\) be a random variable corresponding to the result of tossing the die. The description above imposes the following mean value constraint on the random variable \(X\):

\[ \sum_{x=1}^6 x p(x) = 4.5. \]

We want to develop a probability mass function for \(X\) by maximizing the entropy subject to the constraints above. We saw that this constrained optimization problem has a unique solution of the form:

\[ p(x) = \frac{\exp\{\lambda x\}}{Z(\lambda)}, \]

where \(Z(\lambda)\) is the partition function:

\[ Z(\lambda) = \sum_{i}e^{\lambda i} = e^{\lambda} + e^{2\lambda} + \dots + e^{6\lambda}, \]

and \(\lambda\) is a parameter to be tuned to satisfy the constraint. We will identify \(\lambda\) by solving a root-finding problem. To this end, let us write the partition function as:

\[ Z(\lambda) = \left(e^{\lambda}\right)^1+\left(e^{\lambda}\right)^2 + \dots + \left(e^{\lambda}\right)^6. \]

According to the theory, to find \(\lambda\) we must solve:

\[ \frac{\partial \log Z}{\partial \lambda} = 4.5. \]

Or equivalently:

\[ \frac{1}{Z(\lambda)}\sum_{i=1}^6 i e^{-\lambda i} = 4.5. \]

So, to find \(\lambda\), we need to find the root of this function:

\[ f(\lambda) = \frac{1}{Z(\lambda)}\sum_{i=1}^6 i e^{-\lambda i} - 4.5. \]

Let’s code it up:

def f(lam : float):
    """The function of which the root we want to find."""
    p_unormalized = np.exp(np.arange(1, 7) * lam)
    p = p_unormalized / np.sum(p_unormalized)
    E_X = np.sum(np.arange(1, 7) * p)
    return E_X - 4.5

To find the root, we will use the Brent’s method as implemented in scipy:

import scipy.optimize

# Left bound for x
a = -2
# Right bound for x
b = 2
res = scipy.optimize.root_scalar(
    f,
    bracket=(a,b),
    method='brentq',
    xtol=1e-20,
    rtol=1e-15
)

print(res)

lam = res.root

print(f'Lambda = {lam:.2f}')

# The maximum entropy probabilities
p = np.exp(lam * np.arange(1, 7))
p = p / np.sum(p)

print(f'p = {p}')
      converged: True
           flag: 'converged'
 function_calls: 11
     iterations: 10
           root: 0.3710489380810334
Lambda = 0.37
p = [0.05435317 0.07877155 0.11415998 0.1654468  0.23977444 0.34749407]

Check that the expectation turns out to be correct:

(p * np.arange(1, 7)).sum()
4.5

We are good!

Now, let’s plot the maximum entropy probabilities:

Hide code cell source
fig, ax = plt.subplots()
plt.bar(np.arange(1, 7), p, alpha=0.5)
ax.set_xlabel('Die result ($x$)')
ax.set_ylabel('Probability $p(x)$')
sns.despine(trim=True);
../_images/e249b59c853b375575cdbb762c9bbef2c2ea09a9ae392f266b615c6d07332fbc.svg

Questions#

  • Rerun the code above, assuming that the mean is 3.5. What kind of distribution do you find? Why?

  • If you have some time to spare, modify the example above to add the constraint that the variance of \(X\) should be 0.2. Hint: First, translate the constraint about the variance to a constraint about \(\mathbb{E}[X^2]\). Second, you need to introduce one more parameter to optimize for. Call it \(\mu\). The distribution would be \(p(x) = \frac{\exp\{\lambda x + \mu x^2\}}{Z(\lambda,\mu)}\). Then derive the set of non-linear equations you need to solve to find \(\lambda\) and \(\mu\) by expanding these two equations:

\[ \frac{\partial Z}{\partial \lambda} = \mathbb{E}[X], \]

and

\[ \frac{\partial Z}{\partial \mu} = \mathbb{E}[X^2]. \]

Finally, use scipy.optimize.root to solve the root-finding problem. Be careful with this because it could take several hours to do right.