Collections of Random Variables: Theory#

Joint probability mass function#

Consider two random variables, \(X\) and \(Y\). The joint probability mass function of the pair \((X, Y)\) is the function \(f_{X, Y}(x,y)\) giving the probability that \(X=x\) and \(Y=y\). Mathematically (and introducing a simplified notation), we have:

\[ p(x,y) \equiv p(X=x, Y=y) \equiv f_{X,Y}(x,y) := p\left(\{\omega: X(\omega) = x, Y(\omega)=y\}\right). \]

Properties of the joint probability mass function#

  • It is nonnegative:

\[ p(x,y) \ge 0. \]
  • If you sum over all the possible values of all random variables, you should get one:

\[ \sum_x \sum_y p(x,y) = 1. \]
  • If you marginalize one of the random variables’ values, you get the other’s pmf. For example:

\[ p(x) = \sum_y p(x,y), \]

and

\[ p(y) = \sum_x p(x, y). \]

Joint probability density function#

Let \(X\) and \(Y\) be two random variables. Their joint probability density \(f_{X, Y}(x,y)\) is the function that can give us the probability that the pair \((X, Y)\) belongs to any “good” subset \(A\) of \(\mathbb{R}^2\) as follows:

\[ p\left((X,Y)\in A\right) = \int\int_{A} f_{X,Y}(x,y)dxdy. \]

Of course, we will be writing:

\[ p(x,y) := f_{X,Y}(x,y), \]

when there is no ambiguity.

If you integrate one of the variables out of the joint, you get the PDF of the other variable. For example:

\[ p(x) = \int_{-\infty}^\infty p(x,y) dy, \]

and

\[ p(y) = \int_{-\infty}^\infty p(x, y) dx. \]

When we integrate one variable, we say that we marginalize. The process we follow is called marginalization.

Conditioning a random variable on another#

Consider two random variables, \(X\) and \(Y\). If we had observed that \(Y=y\), how would this change the PDF of \(X\)? The answer is given via Bayes’ rule. Recall Bayes’ rule for two logical statements \(A\) and \(B\):

\[ p(A|B) = \frac{p(A,B)}{p(B)}. \]

To find the correct equation for the conditional probability, pick \(A=(X=x)\) and \(B=(Y=y)\). You get:

\[ p(x|y) = \frac{p(x,y)}{p(y)}. \]

The numerator is the joint PDF, and the denominator is the marginal PDF of \(Y\). Since you can get the marginal PDF of \(Y\) by integrating the joint PDF over \(x\), you can also write:

\[ p(x|y) = \frac{p(x,y)}{\int_{-\infty}^\infty p(x',y)dx'}. \]

So, knowing the joint PDF of two random variables, you can find the conditional PDF of one given the other. The joint contains all the information you need.

Expectation of a function of two random variables#

Let \(X\) and \(Y\) be two random variables. Take a third random variable \(Z = g(X, Y)\). One can show that the expectation of \(Z\) is given in terms of the joint PDF of \(X\) and \(Y\) as follows:

\[ \mathbb{E}[g(X,Y)] = \int_{-\infty}^\infty \int_{-\infty}^\infty g(x,y)p(x,y)dxdy. \]

One case often appears is \(Z = g(X, Y) = X + Y\). In this case, one can show that:

\[ \mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y]. \]

The covariance operator#

The covariance operator measures how correlated two random variables \(X\) and \(Y\) are. Its definition is:

\[ \mathbb{C}[X,Y] = \mathbb{E}\left[\left(X-\mathbb{E}[X]\right)\left(Y-\mathbb{E}[Y]\right)\right]. \]

The covariance is the expectation of the product of the deviations of the random variables from their means. Let us try to understand this:

  • The product is positive if the two random variables move in the same direction. Think about it; when one is above its mean, the other is above its mean.

  • The product is negative if the two random variables move in opposite directions. When one is above its mean, the other is below its mean.

In the first case, we say that the two random variables are positively correlated; in the second case, we say they are negatively correlated.

Properties of the covariance operator#

The covariance operator has the following properties:

  • It is symmetric:

\[ \mathbb{C}[X,Y] = \mathbb{C}[Y,X]. \]
  • It is linear:

\[ \mathbb{C}[aX + bY, Z] = a\mathbb{C}[X,Z] + b\mathbb{C}[Y,Z]. \]

All these properties are easy to prove using the definition of the covariance operator.

Another valuable property of the covariance operator is that it can tell you something about the variance of the sum of two random variables. It is:

\[ \mathbb{V}[X + Y] = \mathbb{V}[X] + \mathbb{V}[Y] + 2\mathbb{C}[X,Y]. \]

Let’s try to understand this result. The variance of the sum of two random variables is the sum of the variances plus a term that depends on the covariance. If the two random variables are positively correlated, then the covariance is positive, and the variance of the sum is larger than the sum of the variances. Two random variables that move in the same direction tend to amplify the fluctuations of their sum. If the two random variables are negatively correlated, then the covariance is negative, and the variance of the sum is smaller than the sum of the variances. Here the two random variables tend to cancel each other out. If the two random variables are not correlated, then the covariance is zero, and the variance of the sum is the sum of the variances.

The correlation coefficient#

The covariance of two random variables is not a very useful correlation measure because it depends on the units of the random variables. For example, if you measure a person’s height in meters or feet, the covariance of the height and the weight of the person will be different. To solve this problem, we can define the correlation coefficient:

\[ \rho_{X,Y} = \frac{\mathbb{C}[X,Y]}{\sqrt{\mathbb{V}[X]\mathbb{V}[Y]}}. \]

The correlation coefficient is a number between \(-1\) and \(1\). The two random variables are positively correlated if it is close to \(1\). The two random variables are negatively correlated if it is close to \(-1\). The two random variables are not correlated if it is close to \(0\). The correlation coefficient is unitless and a more useful measure of correlation.

Independent random variables#

Take two random variables, \(X\) and \(Y\). We say that the two random variables are independent given the background, and we write:

\[ X\perp Y, \]

if and only if conditioning on one does not tell you anything about the other, i.e

\[ p(x|y) = p(x). \]

It is easy to show using Bayes’ rule that the definition is consistent, i.e., you also get:

\[ p(y|x) = p(y). \]

Properties of independent random variables#

The joint pmf factories:

\[ p(x,y) = p(x)p(y). \]

This property is typically taken as the definition of independence in standard probability theory. But we have shown that it is a consequence of the above definition.

Another useful property is that the expectation of the product of two independent random variables is the product of the expectations:

\[ \mathbb{E}[XY] = \mathbb{E}[X]\cdot \mathbb{E}[Y]. \]

A critical property worth remembering is that the covariance of two independent random variables is zero:

\[ \mathbb{C}[X,Y] = 0. \]

Be careful the reverse is not true! Two random variables can be very uncorrelated and still be dependent. We will see an example of this in the homework.

A final property is that the variance of the sum of two independent random variables is the sum of the variances:

\[ \mathbb{V}[X+Y] = \mathbb{V}[X] + \mathbb{V}[Y]. \]