Practicing with Joint Probability Mass Functions

MAKE_BOOK_FIGURES=Trueimport numpy as npimport scipy.stats as stimport matplotlib as mplimport matplotlib.pyplot as plt%matplotlib inlineimport matplotlib_inlinematplotlib_inline.backend_inline.set_matplotlib_formats('svg')import seaborn as snssns.set_context("paper")sns.set_style("ticks")def set_book_style():    plt.style.use('seaborn-v0_8-white')     sns.set_style("ticks")    sns.set_palette("deep")    mpl.rcParams.update({        # Font settings        'font.family': 'serif',  # For academic publishing        'font.size': 8,  # As requested, 10pt font        'axes.labelsize': 8,        'axes.titlesize': 8,        'xtick.labelsize': 7,  # Slightly smaller for better readability        'ytick.labelsize': 7,        'legend.fontsize': 7,                # Line and marker settings for consistency        'axes.linewidth': 0.5,        'grid.linewidth': 0.5,        'lines.linewidth': 1.0,        'lines.markersize': 4,                # Layout to prevent clipped labels        'figure.constrained_layout.use': True,                # Default DPI (will override when saving)        'figure.dpi': 600,        'savefig.dpi': 600,                # Despine - remove top and right spines        'axes.spines.top': False,        'axes.spines.right': False,                # Remove legend frame        'legend.frameon': False,                # Additional trim settings        'figure.autolayout': True,  # Alternative to constrained_layout        'savefig.bbox': 'tight',    # Trim when saving        'savefig.pad_inches': 0.1   # Small padding to ensure nothing gets cut off    })def set_notebook_style():    plt.style.use('seaborn-v0_8-white')    sns.set_style("ticks")    sns.set_palette("deep")    mpl.rcParams.update({        # Font settings - using default sizes        'font.family': 'serif',        'axes.labelsize': 10,        'axes.titlesize': 10,        'xtick.labelsize': 9,        'ytick.labelsize': 9,        'legend.fontsize': 9,                # Line and marker settings        'axes.linewidth': 0.5,        'grid.linewidth': 0.5,        'lines.linewidth': 1.0,        'lines.markersize': 4,                # Layout settings        'figure.constrained_layout.use': True,                # Remove only top and right spines        'axes.spines.top': False,        'axes.spines.right': False,                # Remove legend frame        'legend.frameon': False,                # Additional settings        'figure.autolayout': True,        'savefig.bbox': 'tight',        'savefig.pad_inches': 0.1    })def save_for_book(fig, filename, is_vector=True, **kwargs):    """    Save a figure with book-optimized settings.        Parameters:    -----------    fig : matplotlib figure        The figure to save    filename : str        Filename without extension    is_vector : bool        If True, saves as vector at 1000 dpi. If False, saves as raster at 600 dpi.    **kwargs : dict        Additional kwargs to pass to savefig    """        # Set appropriate DPI and format based on figure type    if is_vector:        dpi = 1000        ext = '.pdf'    else:        dpi = 600        ext = '.tif'        # Save the figure with book settings    fig.savefig(f"{filename}{ext}", dpi=dpi, **kwargs)def make_full_width_fig():    return plt.subplots(figsize=(4.7, 2.9), constrained_layout=True)def make_half_width_fig():    return plt.subplots(figsize=(2.35, 1.45), constrained_layout=True)if MAKE_BOOK_FIGURES:    set_book_style()else:    set_notebook_style()make_full_width_fig = make_full_width_fig if MAKE_BOOK_FIGURES else lambda: plt.subplots()make_half_width_fig = make_half_width_fig if MAKE_BOOK_FIGURES else lambda: plt.subplots()

Practicing with Joint Probability Mass Functions#

Joint probability mass function of two discrete random variables#

Consider two discrete random variables, \(X\) and \(Y\). Say that \(X\) takes \(5\) values, \(0,1,\dots,4\), and \(Y\) takes \(10\) values, \(0,1,\dots,9\). Then, you can think of the joint probability mass function of \(X\) and \(Y\) as the \(5\times 10\) matrix:

\[ A_{ij} = p(X=i, Y=j). \]

Let’s make up such a matrix to play with:

import numpy as np
# This is to make sure that we all get the same results
np.random.seed(12345)
# First I draw a matrix with random entries in [0,1]
A = np.random.rand(5, 10)
print(A)

[[0.92961609 0.31637555 0.18391881 0.20456028 0.56772503 0.5955447
  0.96451452 0.6531771  0.74890664 0.65356987]
 [0.74771481 0.96130674 0.0083883  0.10644438 0.29870371 0.65641118
  0.80981255 0.87217591 0.9646476  0.72368535]
 [0.64247533 0.71745362 0.46759901 0.32558468 0.43964461 0.72968908
  0.99401459 0.67687371 0.79082252 0.17091426]
 [0.02684928 0.80037024 0.90372254 0.02467621 0.49174732 0.52625517
  0.59636601 0.05195755 0.89508953 0.72826618]
 [0.81835001 0.50022275 0.81018941 0.09596853 0.21895004 0.25871906
  0.46810575 0.4593732  0.70950978 0.17805301]]

# And then I scale it so that the sum of all elements is one:
A = A / np.sum(A)
print(f"A = {A}")
print(f"Sum of A_ij = {np.sum(A):.2f}")

A = [[0.03382266 0.01151084 0.0066916  0.00744261 0.0206558  0.02166798
  0.03509238 0.02376485 0.02724782 0.02377914]
 [0.02720446 0.03497567 0.0003052  0.00387282 0.01086788 0.02388251
  0.02946379 0.03173278 0.03509722 0.02633018]
 [0.02337548 0.02610345 0.01701287 0.0118459  0.01599579 0.02654862
  0.0361657  0.02462701 0.02877286 0.00621845]
 [0.00097687 0.02912024 0.03288056 0.00089781 0.01789147 0.01914699
  0.02169786 0.0018904  0.03256646 0.02649685]
 [0.02977441 0.01819984 0.0294775  0.00349167 0.00796616 0.0094131
  0.01703131 0.01671359 0.02581442 0.00647819]]
Sum of A_ij = 1.00

Now we have a matrix that corresponds to a proper joint probability mass function.

Let’s start by extracting the probability mass function of \(X\) from the joint probability mass function of \(X\) and \(Y\). Remember that you need to marginalize:

\[ p(x) = \sum_{y} p(x,y) = \sum_{y} A_{xy}. \]

This is easy:

p_x = np.sum(A, axis=1) # Axis = 1 tells sum to sum only the second axis
print(f"pmf of just X: {p_x}")

pmf of just X: [0.21167568 0.2237325  0.21666614 0.1835655  0.16436018]

Verify that this is indeed a pmf:

print(f"sum of p_x = {np.sum(p_x):.2f}")

sum of p_x = 1.00

With this you can easily find the expectation of \(X\):

E_X = np.sum(np.arange(5) * p_x)
print(f"E[X] = {E_X:.2f}")

E[X] = 1.87

Similarly for the variance of \(X\):

E_X2 = np.sum(np.arange(5) ** 2 * p_x)
V_X = E_X2 - E_X ** 2
print(f"V[X] = {V_X:.2f}")

V[X] = 1.89

Let’s do the same for \(Y\):

p_y = np.sum(A, axis=0)
print(f"pmf of just Y: {p_y}")
E_Y = np.sum(np.arange(10) * p_y)
print(f"E[Y] = {E_Y:.2f}")
E_Y2 = np.sum(np.arange(10) ** 2 * p_y)
V_Y = E_Y2 - E_Y ** 2
print(f"V[Y] = {V_Y:.2f}")

pmf of just Y: [0.11515387 0.11991004 0.08636773 0.0275508  0.07337711 0.1006592
 0.13945104 0.09872863 0.14949879 0.0893028 ]
E[Y] = 4.70
V[Y] = 8.98

Alright, we have found all the individual statistics. Let’s now find the covariance of the two random variables. Remember the formula:

\[ \mathbb{C}[X,Y] = \sum_{x,y} (x-\mathbb{E}[X])(y-\mathbb{E}[Y])p(x,y). \]

Here we go:

# We will loop over all the possible values
C_XY = 0.0 # Keeping track of the sum
for x in range(5):
    for y in range(10):
        C_XY += (x - E_X) * (y - E_Y) * A[x, y] # the += means add to the left hand side
print(f"C[X, Y] = {C_XY:.2f}")

C[X, Y] = -0.38

Remember that the covariance depends on the units of the random variables. Let’s find the correlation coefficient:

rho_XY = C_XY / (np.sqrt(V_X) * np.sqrt(V_Y))
print(f"rho_XY = {rho_XY:.2f}")

rho_XY = -0.09

We see that \(X\) and \(Y\) are slightly negatively correlated. How can we visualize this? We need to sample from the joint distribution of \(X\) and \(Y\). We can do this by sampling a tuple of indices \((i,j)\) with probability \(A_{ij}\). This is like sampling from a categorical with \(5\times 10 = 50\) different labels \(c_0 = (1,1), c_1 = (1,2), \dots, c_{49} = (5, 10)\) each with a probability \(A_{00}, A_{01}, \dots, A_{4,9}\).

This is what the code below does. You can look at the details, if you are so inclined.

Let’s try it out - take 10 samples:

for n in range(10):
    x, y = sample_X_and_Y()
    print(f"x = {x:d}, y = {y:d}")

x = 2, y = 5
x = 0, y = 8
x = 3, y = 6
x = 4, y = 5
x = 2, y = 7
x = 0, y = 7
x = 2, y = 2
x = 1, y = 8
x = 4, y = 0
x = 4, y = 2

Let’s take many more samples and do a scatter plot:

samples = np.array([sample_X_and_Y() for _ in range(100)])
xs = samples[:,0]
ys = samples[:,1]
plt.scatter(xs, ys, alpha=0.1)
plt.xlabel('$x$')
plt.ylabel('$y$')
sns.despine(trim=True)

../_images/31cce307ff281f75044140e633a34fff2f7ca87a5d729b7249077e46fd9147fd.svg

You can sort of see the negative correlation. We will see more concrete patterns when we work with continuous random variables.

Now, let’s write some code to find the expectation of a function \(g(X, Y) = X + Y\). It is:

\[ \mathbb{E}[f(X,Y)] = \sum_{x,y}g(x,y) p(x,y) = \sum_{x,y}g(x,y)A_{xy}. \]

Let’s write a function that calculates the expectation of any \(f\) and then use it for \(g(X,Y) = X + Y\).

def expectation(g, A):
    """Returns the expectation of the function f(X, Y).
    
    Arguments:
    g -- A function g(X, Y).
    A -- A matrix containing the joint probability mass
         function of X and Y. Say that A has dimensions
         n x m. We assume that X takes values
         0, 1, ..., n-1 and that Y takes values
         0, 1, ..., m-1.
    """
    n, m = A.shape
    res = 0.
    for i in range(n):
        for j in range(m):
            res += g(i, j) * A[i, j]
    return res

Notice that expectation is a Python function that takes as an input the Python function of which we want to find the expectation. This is cool! It is called a higher-order function or a meta-function.

Here is how to use expectation. First, you need to define the function the expectation of which you want to find.

def g(x, y):
    return x + y

We can calculate the expectation like this:

E_g = expectation(g, A)

print(f"E[g(X, Y)] = {E_g:.2f}")

E[g(X, Y)] = 6.56

For the particular case \(g(x,y) = x + y\), we also know that

\[ \mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y]. \]

Let’s test this:

print(f"E[X] + E[Y] = {E_X + E_Y:.2f}")

E[X] + E[Y] = 6.56

Cool! It’s the same!

Now let’s find the variance of \(g(X, Y)\). We can use the formula:

\[ \mathbb{V}[g(X,Y)] = \mathbb{E}\left[g^2(X,Y)\right] - \left\{\mathbb{E}[g(X,Y)]\right\}^2. \]

We need to define a function that corresponds to the square of \(g(X,Y)\). Here it is:

def f(x, y):
    return g(x, y) ** 2

So, now we can do:

V_g = expectation(f, A) - E_g ** 2

print(f"V[g(X, Y)] = {V_g:.2f}")

V[g(X, Y)] = 10.11

Questions#

Change the random seed to another number (try a few different ones) and re-run the code above. What happens to the corelation coefficient? How does the scatter plot change?
Write code to find the expectation of \(3X + 5Y\).

# Your code here

Write code to find the variance of \(3X + 5Y\).

# Your code here

Write code that finds the expectation and variance of the function \(h(X,Y) = XY^3\).

# Your code here

If you are feeling bored, learn some functional programming…#

I have to show you a neat way to do this using functional programming ideas. Functional programming is a programming paradigm that is very different from the imperative programming that we have been doing so far. It is a very powerful paradigm, and it is the basis of many modern programming languages. Examples of such languages are Haskell, Scala, and Clojure. Python is not a functional programming language, but it has some functional programming features. Feel free to skip if you find it wierd… It is not required for the rest of the course.

First, here is how you can define a function super-quickly in Python:

f = lambda x, y: x + y

This is called a lambda expression. It is a way to define a function in one line.

Now, I am going to define a function that squares any function. Pay attention, I will define a function that takes as an argument any function (with any arguments) and it returns another function which is just the square of the first. Here it is in one line:

square = lambda f: lambda *args: f(*args) ** 2

Here is what I can do with this:

f2 = square(f)

print(f"f(1,2) = {f(1,2)}, f(1, 3) ** 2 = {f2(1,2)}")

f(1,2) = 3, f(1, 3) ** 2 = 9

So, now can find the variance like this:

V_f = expectation(square(f), A) - expectation(f, A) ** 2

print(f"V[f(X, Y)] = {V_f:.2f}")

V[f(X, Y)] = 10.11

We can even write another function that finds the variance of any function. Here is how:

variance = lambda f: expectation(square(f), A) - expectation(f, A)**2

Now we can simply do:

print(f"V[f(X, Y)] = {variance(f):.2f}")

V[f(X, Y)] = 10.11

Neat, right?

Functional programming resources#

If you want to learn more about functional programming in Python, check out this tutorial.
If you want to learn more about the lambda keyword, check out this tutorial.
You can also learn a little Haskell here.
Or if you want, your can go and learn some Lisp using the freely available masterpiece Structure and Interpretation of Computer Programs. The language used in the book is Scheme, which is hardly used. But your brain will thank you after reading the book. You will be a better programmer.