Hide code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");

Continuous Random Variables#

Consider a random variable \(X\) that can take values in \(\mathbb{R}\). We say that \(X\) is a continuous random variable if the range of \(X\) is uncountable, e.g., if it forms an interval. For example, \(X\) can be the mass of a ball bearing, a room’s temperature, or a gas’s pressure. In what follows, we introduce the concepts of cumulative distribution function, probability density function, and expectation for continuous random variables.

The cumulative distribution function#

The cumulative distribution function (CDF) of a continuous random variable \(X\) is defined as:

\[ F_X(x) := p(X\le x) = p\left(\left\{\omega: X(\omega) \le x\right\}\right). \]

In words, it is the probability that \(X\) is less than or equal to \(x\).

Hide code cell source
# Plot a sigmoid function
import numpy as np
x = np.linspace(-10, 10, 100)
y = 1 / (1 + np.exp(-x))
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel('$F_X(x)$')
sns.despine(trim=True)
plt.title("CDF of a generic random variable $X$");
../_images/562b0838777f6a426a6046f596e5484e62285a7bd59353f90c8721165211d160.svg

The CDF has the following properties:

  • \(F_X(x)\) is an increasing function. Intuitively, this is because the more \(x\) increases, the more likely it is that \(X\) is less than or equal to \(x\).

  • \(F_X(-\infty) = 0\). Intuitively, this is because \(X\) cannot be less than \(-\infty\).

  • \(F_X(+\infty) = 1\). Intuitively, this is because \(X\) is always less than or equal to \(+\infty\).

  • \(p(a\le X \le b) = F_X(b) - F_X(a)\).

Note

If there is no ambiguity, we will write \(F(x)\) instead of \(F_X(x)\).

(pdf=)

The probability density function#

The probability density function (PDF) is a “function” \(f_X(x)\) that can give us the probability that \(X\) is in any “good” subset \(A\) of \(\mathbb{R}\) as follows:

\[ p(X\in A) = \int_A f_X(x) dx. \]

Note

If there is no ambiguity, we will write \(p(x)\) instead of \(f_X(x)\). Here, when you see \(p(x)\), you should understand that a random variable \(X\) is implicit and that \(p(x)\) is the PDF of \(X\) evaluated at \(x\). As with the PMF, this is a common abuse of notation, especially in machine learning research papers. Again, you cannot write \(p(0.5)\) because you need to know what random variable you are referring to. But you can write \(p(x=0.5)\) to remove the ambiguity.

Hide code cell source
# Plot the pdf of a normal random variable
from scipy.stats import norm
x = np.linspace(-5, 5, 100)
y = norm.pdf(x)
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel('$f_X(x)$')
sns.despine(trim=True)
plt.title("PDF of a typical variable $X$");
../_images/1a29479c1366a731cff2e47a885e2a67d9ada5b90c202dffe5e052a72aec1d01.svg

The PDF has the following properties:

  • \(p(x) \ge 0\) for all \(x\). This equation holds because probabilities are non-negative.

  • \(\int_{-\infty}^{\infty} p(x) dx = 1\). It holds because the probability that \(X\) takes any value is one.

  • The derivative of the CDF is the PDF, i.e., \(F_X'(x) = p(x)\). It is a consequence of the fundamental theorem of calculus.

Note

The PDF is not a probability but a probability density. It can be greater than one. It is the area under the PDF that is a probability. And that area is one.

Expectations of continuous random variables#

The expectation of a continuous random variable is:

\[ \mathbb{E}[X] = \int_{-\infty}^\infty x p(x)dx. \]

Geometrically, the expected value of \(X\) is the x coordinate of the centroid of the area under the curve \(p(x)\). You can think of it as the value you “expect” to take.

Hide code cell source
# Plot the pdf of a normal random variable and mark the expected value
from scipy.stats import norm
x = np.linspace(-5, 5, 100)
y = norm.pdf(x)
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel('$f_X(x)$')
plt.axvline(x=0, color='r', linestyle='--')
sns.despine(trim=True)
plt.title("PDF of a typical variable $X$ with $E[X]=0$");
../_images/c1060a5216539b7baf54fe695c8615b3b59182f84b6951187ea48409bd629b0d.svg

But remember that the expected value may not be a value that \(X\) can take. Here is an example.

# Plot the pdf of a mixture of two normal random variables with different means but the same variance
from scipy.stats import norm
x = np.linspace(-10, 10, 200)
y = 0.5 * norm.pdf(x, loc=-5) + 0.5 * norm.pdf(x, loc=5)
plt.plot(x, y)
# Add the expected value
plt.axvline(x=0, color='r', linestyle='--', label='$\mathbb{E}[X]$')
plt.xlabel('$x$')
plt.ylabel('$f_X(x)$')
sns.despine(trim=True)
plt.title("Example of a case in which the expected value is not representative of the distribution");
../_images/29fda465afac5df78b3c0f4ac0fdeb1a199e41ec42d1f25d07b6d3c99fd3345f.svg

Another useful formula is:

\[ \mathbb{E}[f(X)] = \int_{-\infty}^\infty f(x)p(x)dx. \]

It gives you the expected value of \(f(X)\), where \(f\) is any function.

Note

The expected value of a continuous random variable has the same properties as the expected value of a discrete random variable, see this. The proofs are similar. You just change the sums to integrals.

Variance of continuous random variables#

The variance of a continuous random variable is:

\[ \mathbb{V}[X] = \mathbb{E}\left[\left(X-\mathbb{E}[X]\right)^2\right] = \int_{-\infty}^\infty (x-\mathbb{E}[X])^2 p(x)dx. \]

You can think of it as the average squared distance of \(X\) from its expected value. It tells you how spread out \(X\) is.

Hide code cell source
# Plot two random variables with the same expected value but different variances
# Mark them as small and large variance using text labels
from scipy.stats import norm
x = np.linspace(-10, 10, 200)
y = norm.pdf(x, loc=0, scale=1)
plt.plot(x, y, label='Small variance')
y = norm.pdf(x, loc=0, scale=3)
plt.plot(x, y, label='Large variance')
# Add the expected value
plt.xlabel('$x$')
plt.ylabel('$f_X(x)$')
plt.legend(loc="best", frameon=False)
sns.despine(trim=True);
../_images/bf900abc81a37ccbafa6ffc51368edc5121ed1445891cb757a45aabc28704c6d.svg

The variance has units of \(X\) squared. To get a measure of the spread that has the same units as \(X\), we take the square root of the variance. It is called the standard deviation of \(X\) and it is denoted by \(\sigma_X\).

\[ \sigma_X = \sqrt{\mathbb{V}[X]}. \]

Finally, note that the variance of a continuous random variable has the same properties as the variance of a discrete random variable, see this.