Show code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");
Continuous Random Variables#
Consider a random variable \(X\) that can take values in \(\mathbb{R}\). We say that \(X\) is a continuous random variable if the range of \(X\) is uncountable, e.g., if it forms an interval. For example, \(X\) can be the mass of a ball bearing, a room’s temperature, or a gas’s pressure. In what follows, we introduce the concepts of cumulative distribution function, probability density function, and expectation for continuous random variables.
The cumulative distribution function#
The cumulative distribution function (CDF) of a continuous random variable \(X\) is defined as:
In words, it is the probability that \(X\) is less than or equal to \(x\).
Show code cell source
# Plot a sigmoid function
import numpy as np
x = np.linspace(-10, 10, 100)
y = 1 / (1 + np.exp(-x))
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel('$F_X(x)$')
sns.despine(trim=True)
plt.title("CDF of a generic random variable $X$");
The CDF has the following properties:
\(F_X(x)\) is an increasing function. Intuitively, this is because the more \(x\) increases, the more likely it is that \(X\) is less than or equal to \(x\).
\(F_X(-\infty) = 0\). Intuitively, this is because \(X\) cannot be less than \(-\infty\).
\(F_X(+\infty) = 1\). Intuitively, this is because \(X\) is always less than or equal to \(+\infty\).
\(p(a\le X \le b) = F_X(b) - F_X(a)\).
Proof
Let \(a < b\). Then:
Note
If there is no ambiguity, we will write \(F(x)\) instead of \(F_X(x)\).
(pdf=)
The probability density function#
The probability density function (PDF) is a “function” \(f_X(x)\) that can give us the probability that \(X\) is in any “good” subset \(A\) of \(\mathbb{R}\) as follows:
Again, it is not that simple
First, “good” subsets are the so-called Borel sets of \(\mathbb{R}\). One can obtain Borel sets by starting with the open intervals of \(\mathbb{R}\) and applying the operations of countable union, countable intersection, and complement. It is hard to make sets that are not Borel sets. You learn about this in a measure theory course.
Second, not all random variables have a PDF that is a function in the usual sense. However, if you allow the PDF to include Dirac’s \(\delta\), any random variable, including discrete random variables, has a PDF. For example, if \(X\) is a random variable that takes the value \(1\) with probability \(1/2\) and the value \(2\) with probability \(1/2\), then the PDF of \(X\) is:
where \(\delta(x)\) is Dirac’s delta function.
Note
If there is no ambiguity, we will write \(p(x)\) instead of \(f_X(x)\). Here, when you see \(p(x)\), you should understand that a random variable \(X\) is implicit and that \(p(x)\) is the PDF of \(X\) evaluated at \(x\). As with the PMF, this is a common abuse of notation, especially in machine learning research papers. Again, you cannot write \(p(0.5)\) because you need to know what random variable you are referring to. But you can write \(p(x=0.5)\) to remove the ambiguity.
Show code cell source
# Plot the pdf of a normal random variable
from scipy.stats import norm
x = np.linspace(-5, 5, 100)
y = norm.pdf(x)
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel('$f_X(x)$')
sns.despine(trim=True)
plt.title("PDF of a typical variable $X$");
The PDF has the following properties:
\(p(x) \ge 0\) for all \(x\). This equation holds because probabilities are non-negative.
\(\int_{-\infty}^{\infty} p(x) dx = 1\). It holds because the probability that \(X\) takes any value is one.
The derivative of the CDF is the PDF, i.e., \(F_X'(x) = p(x)\). It is a consequence of the fundamental theorem of calculus.
Proof}
Let us write \(f_X(x)\) instead of \(p(x)\) to avoid ambiguity. Then, we have:
Differentiating both sides with respect to \(x\), we get the result.
Note
The PDF is not a probability but a probability density. It can be greater than one. It is the area under the PDF that is a probability. And that area is one.
Expectations of continuous random variables#
The expectation of a continuous random variable is:
Geometrically, the expected value of \(X\) is the x coordinate of the centroid of the area under the curve \(p(x)\). You can think of it as the value you “expect” to take.
Show code cell source
# Plot the pdf of a normal random variable and mark the expected value
from scipy.stats import norm
x = np.linspace(-5, 5, 100)
y = norm.pdf(x)
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel('$f_X(x)$')
plt.axvline(x=0, color='r', linestyle='--')
sns.despine(trim=True)
plt.title("PDF of a typical variable $X$ with $E[X]=0$");
But remember that the expected value may not be a value that \(X\) can take. Here is an example.
# Plot the pdf of a mixture of two normal random variables with different means but the same variance
from scipy.stats import norm
x = np.linspace(-10, 10, 200)
y = 0.5 * norm.pdf(x, loc=-5) + 0.5 * norm.pdf(x, loc=5)
plt.plot(x, y)
# Add the expected value
plt.axvline(x=0, color='r', linestyle='--', label='$\mathbb{E}[X]$')
plt.xlabel('$x$')
plt.ylabel('$f_X(x)$')
sns.despine(trim=True)
plt.title("Example of a case in which the expected value is not representative of the distribution");
Another useful formula is:
It gives you the expected value of \(f(X)\), where \(f\) is any function.
Note
The expected value of a continuous random variable has the same properties as the expected value of a discrete random variable, see this. The proofs are similar. You just change the sums to integrals.
Variance of continuous random variables#
The variance of a continuous random variable is:
You can think of it as the average squared distance of \(X\) from its expected value. It tells you how spread out \(X\) is.
Show code cell source
# Plot two random variables with the same expected value but different variances
# Mark them as small and large variance using text labels
from scipy.stats import norm
x = np.linspace(-10, 10, 200)
y = norm.pdf(x, loc=0, scale=1)
plt.plot(x, y, label='Small variance')
y = norm.pdf(x, loc=0, scale=3)
plt.plot(x, y, label='Large variance')
# Add the expected value
plt.xlabel('$x$')
plt.ylabel('$f_X(x)$')
plt.legend(loc="best", frameon=False)
sns.despine(trim=True);
The variance has units of \(X\) squared. To get a measure of the spread that has the same units as \(X\), we take the square root of the variance. It is called the standard deviation of \(X\) and it is denoted by \(\sigma_X\).
Finally, note that the variance of a continuous random variable has the same properties as the variance of a discrete random variable, see this.