Hide code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");

The Gaussian Distribution#

The Normal distribution#

The normal (or Gaussian) distribution is a ubiquitous one. It appears over and over again. There are two explanations as to why it appears so often:

  • It is the distribution of maximum uncertainty that matches a known expectation and a known variance variance.

  • It is the distribution that arises when you add a lot of random variables together.

We will learn about both these in subsequent lectures.

We write:

\[ X | \mu, \sigma \sim N(\mu, \sigma^2), \]

and we read “\(X\) conditioned on \(\mu\) and \(\sigma^2\) follows a normal distribution with expected value \(\mu\) and variance \(\sigma^2\).

Note

First, a lot of people write \(N(\mu, \sigma)\) instead of \(N(\mu, \sigma^2)\). You should always check what the author means.

Second, many people refer to \(\mu\) as the mean instead of the expected value. We will use the terms “mean” and “expected value” interchangeably when dealing with Gaussian distributions.

The standard normal distribution#

When we have zero mean and unit variance, we say that we have a standard normal distribution. If \(Z\) is a standard normal random variable, we write:

\[ Z\sim N(0,1). \]

The PDF of the standard normal is typically denoted by \(\phi(z)\) and is given by:

\[ \phi(z) = \frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{z^2}{2}\right\}. \]

The CDF of the standard normal is denoted by \(\Phi(z)\) and is given by:

\[ \Phi(z) := p(Z \le z) = \int_{-\infty}^z \phi(z')dz'. \]

The CDF of the standard normal is not available in closed form. In the old days, we used to look it up in a table. Nowadays, we use a computer to calculate it.

Here is how you can get the PDF of the standard normal. First, let’s make a standard normal random variable in scipy.stats:

import scipy.stats as st
Z = st.norm()

You can evaluate it anywhere you want:

print(f"phi(0.5) = {Z.pdf(0.5):.2f}")
phi(0.5) = 0.35

Let’s plot the PDF:

Hide code cell source
fig, ax = plt.subplots()
import numpy as np
zs = np.linspace(-4.0, 4.0, 100)
ax.plot(zs, Z.pdf(zs))
ax.set_xlabel("$z$")
ax.set_ylabel("$\phi(z)$")
sns.despine(trim=True);
../_images/6e12e45cc4070bfc0bf2d5eae67648cc8d52c012ff6459ba9abf76693e0dcdc1.svg

And here is the CDF:

Hide code cell source
fig, ax = plt.subplots()
ax.plot(zs, Z.cdf(zs))
ax.set_xlabel("$z$")
ax.set_ylabel("$\Phi(z)$")
sns.despine(trim=True);
../_images/84939d813874e36f2fc126d439349d9586257567bfab5caf9b89f934aa54e333.svg

Here is the expectation:

print(f"E[Z] = {Z.expect():.2f}")
E[Z] = -0.00

And the variance:

print(f"V[Z] = {Z.var():.2f}")
V[Z] = 1.00

Here is the probability that Z is between two numbers:

a = 1.0
b = 3.0
prob_Z_in_ab = Z.cdf(b) - Z.cdf(a)
print(f"p({a:.2f} <= Z <= {b:.2f}) = {prob_Z_in_ab:.2f}")
p(1.00 <= Z <= 3.00) = 0.16

And here is how you can sample:

Z.rvs(100)
Hide code cell output
array([-0.68098751, -0.08219633,  0.20786469, -0.50730995,  0.45700647,
       -0.2559003 ,  0.65936334, -0.23402872,  1.09875376, -0.65752522,
        0.87598653, -0.0461331 ,  0.50262267,  0.70764474, -0.98254677,
       -1.41454882, -0.11455164,  0.75359663, -1.02398395, -1.86342273,
       -0.22218493, -0.57929841,  1.61763382, -0.3512537 , -0.58212603,
       -1.42276009,  1.30183894, -0.79181306,  0.87481463,  0.26491516,
       -0.99758581,  0.7516001 , -1.35293929,  0.14887563, -1.81354551,
       -1.03881138, -0.66440162, -1.29772731, -0.03500321,  0.1314219 ,
        0.45940001, -1.50587312,  0.6103862 ,  0.66813906, -0.31945307,
        1.2086607 , -0.04778578, -0.50059914, -0.13445603, -0.86555327,
        0.91601074, -0.97407919, -1.42836505, -0.18455424, -1.26131747,
       -0.19964524,  1.65131343,  0.23724141,  0.20041386,  0.53681951,
       -1.67482189, -1.65050958,  1.14247417,  1.08756538, -0.07532283,
       -1.47098224,  1.24651781, -0.74154696,  0.05659722, -0.27788922,
        0.89646394,  1.07632242,  0.92632264, -0.5623226 ,  0.10293485,
        0.07106044, -0.14816709, -0.55905882,  0.22567311,  0.13283325,
       -1.43633922, -0.40999822,  0.72885276,  0.67071962, -0.45190328,
       -0.819717  ,  1.46093725, -0.55625844, -0.67173311,  0.27282091,
        0.14917277,  0.66093895,  0.49070751,  0.10162476, -2.40113228,
        0.24445444,  1.43220066,  1.47577485, -0.57539355, -1.90800178])

And, of course, you can also sample using the functionality of numpy:

np.random.randn(100)
Hide code cell output
array([ 0.17675988,  2.08604093, -2.32928242, -0.31336707, -0.63508336,
        0.67371205,  1.90808253, -0.02121881,  0.5099677 , -0.77540186,
       -0.64627549, -1.54157974,  0.26082948,  2.69541908,  0.2675784 ,
       -1.06147151, -1.64492616,  0.68948065,  1.03904888,  0.25079342,
       -0.56337513,  0.38106535,  0.05408547, -0.397527  ,  0.20617051,
        2.06732106, -0.77393155, -0.5615434 ,  1.70336458,  1.27549767,
        2.144291  ,  0.00601306,  0.33266612, -1.13324241, -1.93575747,
       -0.60570986, -0.8859008 ,  1.07228981,  0.60790884, -0.50623704,
       -0.71449871,  1.0671928 , -1.22584563, -1.37429395, -1.38428782,
       -1.19234723, -0.43768159, -0.35739968, -1.15151434, -1.07603713,
        0.42115532,  0.90580295,  0.2056319 ,  0.71246248, -0.554142  ,
        1.59157731, -1.97265792,  0.69586134, -0.69544253,  0.94968473,
        0.12604349, -0.03078179, -1.12016362,  0.39424782,  0.05911653,
       -0.54681854,  1.37700422, -0.35340388,  0.95453985,  0.40723134,
       -1.55549029, -0.42597788, -0.6278352 ,  0.33644935, -0.16735517,
       -1.53184807, -0.94499227,  1.3118843 ,  0.1585745 ,  0.78911817,
        0.4914747 , -0.90895756, -1.18857701, -0.14776013, -0.95868734,
       -1.91836248,  1.10461855,  1.59235448, -0.12523874,  0.70358221,
        0.57531609,  0.28489483, -1.61977487, -1.60483268, -0.97463031,
       -1.20620496, -1.22983379,  0.14890532, -0.14807777,  0.35014314])

The general normal distribution#

Take

\[ X \sim N(\mu, \sigma^2). \]

The PDF is given by:

\[ f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}. \]

Note

Some people write \(N(x|\mu, \sigma^2)\) to refer to the PDF of \(N(\mu, \sigma^2)\) evaluated at \(x\).

If you mediate a little bit with this expression, you will notice that you can connect it to the standard normal PDF:

\[ f_X(x) = \frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right). \]

This highlights two things. First, the mean \(\mu\) shifts the distribution to the right or to the left.

# Plot the PDF of a standard normal and the PDF of a non-zero mean, unit variance normal
# with an arrow indicating the change from the first to the second.
fig, ax = plt.subplots()
zs = np.linspace(-4.0, 4.0, 100)
ax.plot(zs, Z.pdf(zs), label="$\phi(x)$")
ax.plot(zs, st.norm(loc=1.0).pdf(zs), label="$N(x|\mu=1,\sigma^2=1)$")
ax.set_xlabel("$x$")
ax.set_ylabel("$\phi(x)$")
ax.legend(loc="best", frameon=False)
ax.annotate("", xytext=(0.0, 0.4), xy=(1.0, 0.4), arrowprops=dict(arrowstyle="->",
            color=sns.color_palette()[2],
            lw=2))
plt.title("The mean translates the standard normal PDF")
sns.despine(trim=True);
../_images/7f114d0d8ec89ad37797906d095e41d4cc7115c2cbf6e8d4c98f23f8090c4fb9.svg

The standard deviation \(\sigma\) scales the distribution.

# Plot the PDF of a standard normal and the pdf of a zero mean Normal with variance two
fig, ax = plt.subplots()
zs = np.linspace(-4.0, 4.0, 100)
ax.plot(zs, Z.pdf(zs), label="$\phi(x)$")
ax.plot(zs, st.norm(loc=0.0, scale=np.sqrt(2.0)).pdf(zs), label="$N(x|\mu=0,\sigma^2=2)$")
ax.set_xlabel("$x$")
ax.set_ylabel("$p(x)$")
ax.legend(loc="best", frameon=False)
plt.title("The variance scales the standard normal PDF")
sns.despine(trim=True);
../_images/cf42dd148dd82f5d78bab816b9636e4b4d9232eb443c2d06ebce7b0c90856de4.svg

Quantiles of the normal#

There are a few more exciting things to know about the standard normal. For example, how can you find a value \(z_q\) such that the probability of \(Z\) being less than \(z_q\) is \(q\)? Mathematically, you wish to find this:

\[ \Phi(z_q) = q. \]

The point \(z_q\) is called the \(q\)-quantile. To find it, you need to do this:

\[ z_q = \Phi^{-1}(q). \]

For example, \(z_{0.50}\) is called the median (and it coincides with the expectation here). Another set of exciting quantiles is \(z_{0.025}\) and \(z_{0.975}\). Why? Because the probability that \(Z\) lies between them is \(95\)%. Here it is:

\[ p(z_{0.025} \le Z \le z_{0.975}) = \Phi(z_{0.975}) - \Phi(z_{0.025}) = 0.975 - 0.025 = 0.95. \]

Let’s find these quantiles and visualize them using the functionality of scipy.stats. We will use the percent point function (ppf), which the inverse of the CDF:

z_025 = Z.ppf(0.025) 
z_500 = Z.ppf(0.5)
z_975 = Z.ppf(0.975)
print(f"0.025 quantile of Z = {z_025:.2f}")
print(f"0.50 quantile of Z = {z_500:.2f}")
print(f"0.975 quantile of Z = {z_975:.2f}")
0.025 quantile of Z = -1.96
0.50 quantile of Z = 0.00
0.975 quantile of Z = 1.96

Here is how much probability there is between the two extreme quantiles:

print(f"p({z_025:.2f} <= Z <= {z_975:.2f}) = {Z.cdf(z_975) - Z.cdf(z_025):.2f}")
p(-1.96 <= Z <= 1.96) = 0.95

Note

Observe that the quantiles are symmetric around the mean (which is zero here). The 0.025 quantile is at -1.96 and the 0.975 quantile is at 1.96. Being engineers, we are going to call the -1.96 a two.

Let’s also visualize the quantiles on top of the PDF:

Hide code cell source
fig, ax = plt.subplots()
ax.plot(zs, Z.pdf(zs))
ax.plot(z_025, [0.0], "o", label="0.025 quantile")
ax.plot(z_975, [0.0], "o", label="0.975 quantile")
plt.legend(loc="best", frameon=False)
sns.despine(trim=True);
../_images/cab05fbac5392caf21845008542df68cfe91a6708ed38bc329eb1d3d1f672336.svg

Note

Quantiles vs percentiles A percentile is a quantile expressed as a percentage. So, you say the 0.025 quantile is the 2.5 percentile. The 0.975 quantile is the 97.5 percentile. And so on.

Questions#

  • Modify the code above so that you find and vizualize \(z_{0.001}\) and \(z_{0.999}\).

  • What is the difference between \(z_{0.999}\) and \(z_{0.001}\)?

  • What is the probability that \(Z\) is between \(z_{0.001}\) and \(z_{0.999}\)?

Getting any normal from the standard normal#

Knowledge of the quantiles of the standard normal is sufficient to give us the quantiles of any normal. Let \(Z\) be a standard normal. Take some \(\mu\) and \(\sigma\). Then, the random variable

\[ X = \mu + \sigma Z, \]

follows a \(N(\mu,\sigma^2)\).

To show this mathematically, we need to show that the PDF of \(X\) is the one we expect.

Proof

First, write down the CDF of \(X\):

\[ F_X(x) = p(X \le x) = p(\mu + \sigma Z \le x) = p\left(Z \le \frac{x-\mu}{\sigma}\right) = \Phi\left(\frac{x-\mu}{\sigma}\right). \]

Now take the derivative of the CDF to get the PDF:

\[ f_X(x) = F'_X(x) = \frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right), \]

which is exactly what we wanted.

The formula is extremely useful. For example, you can use it to make samples from any normal distribution using samples from the standard normal. Here is how:

mu = 1.0
sigma = 0.1
X = st.norm(mu, sigma)
xs = np.linspace(mu - 6.0 * sigma, mu + 6.0 * sigma, 100)
x_samples = mu + sigma * Z.rvs(size=10000)

Compare the histogram of the samples with the PDF of the normal:

fig, ax = plt.subplots()
ax.hist(x_samples, density=True, alpha=0.5, label="Samples from $X = \mu + \sigma Z$")
ax.plot(xs, X.pdf(xs), label="PDF of $N(\mu, \sigma^2)$")
ax.set_xlabel("$x$")
ax.set_ylabel("$p(x)$")
plt.legend(loc="best", frameon=False)
sns.despine(trim=True);
../_images/dcddf11e64100ca30541f61bfbe1b8c0089da7eae67a7098b1e2875aefe03e98.svg

How can you find the quantiles of this normal? Well, you can simply use the functionality of scipy.stats. As an example, let’s find \(x_{0.025}\):

x_025 = X.ppf(0.025)
print(f"0.025-quantile of N({mu:.2f}, {sigma:.2f}^2) = {x_025:1.2f}")
0.025-quantile of N(1.00, 0.10^2) = 0.80

But we can also find this quantile by exploiting the connection between \(X\) and \(Z\). The definition of a quantile of \(X\) is:

\[ p(X \le x_q) = q. \]

But, since \(X=\mu+\sigma Z\), this is equivalent to:

\[ p(\mu + \sigma Z \le x_q) = q, \]

which becomes:

\[ p(\sigma Z \le x_q-\mu) = q, \]

and then:

\[ p\left(Z \le \frac{x_q-\mu}{\sigma}\right) = q. \]

This is just:

\[ \Phi\left(\frac{x_q-\mu}{\sigma}\right) = q, \]

which tells us that \(\frac{x_q-\mu}{\sigma}\) is the \(q\)-quantile of \(Z\), i.e.,

\[ z_q = \frac{x_q-\mu}{\sigma}. \]

Solving for \(x_q\), we get:

\[ x_q = \mu + \sigma z_q. \]

Let’s do a sanity check:

z_025 = Z.ppf(0.025)
print(f"mu + sigma * z_025 = {mu + sigma * z_025:.2f}")
mu + sigma * z_025 = 0.80

which is the same as what we found before. So, let’s find also the 0.975-quantile:

z_975 = Z.ppf(0.975)
x_975 = mu + sigma * z_975
print(f"0.975-quantile of N({mu:.2f}, {sigma:.2f}^2) = {x_975:1.2f}")
0.975-quantile of N(1.00, 0.10^2) = 1.20

Let’s visualize the quantiles like we did before:

Hide code cell source
fig, ax = plt.subplots()
ax.plot(xs, X.pdf(xs))
ax.plot(x_025, 0, "o", label="0.025-quantile")
ax.plot(x_975, 0, "o", label="0.975-quantile")
ax.set_xlabel("$x$")
ax.set_ylabel("$p(x)$")
plt.legend(loc="best", frameon=False)
sns.despine(trim=True);
../_images/109829a5b1a83a4eca92ecafd8576a10107c2243620c4295dd346bb61bb0cf50.svg

Now, let’s find the distance between \(x_{2.5}\) and \(x_{97.5}\) in terms of the standard deviation \(\sigma\). We have:

\[ x_{97.5} - x_{2.5} = \mu + \sigma z_{97.5} - \mu - \sigma z_{2.5} = \sigma (z_{97.5} - z_{2.5}). \]

This is:

print(f"x_975 - x_025 ~= sigma * {z_975 - z_025:.2f}")
x_975 - x_025 ~= sigma * 3.92

Okay. So we see that 95% of the probability is contained within a \(3.92\sigma\) interval. This interval is centered at the median (which here is the same as the mode and the expectation of the probability density). The number 3.92 could be nicer, so we will round up to 4 intervals. A 4\(\sigma\) interval about the mean gives us a bit more than 95% of the probability, but it’s simpler to remember.

Note

Remember For a normal random variable \(N(\mu,\sigma^2)\), the 95% probability interval is about \(\mu \pm 4\sigma\). In other words:

\[ p(\mu - 2\sigma < X < \mu + 2 \sigma) \approx 0.95, \]

Questions#

  • Write code that finds exactly how much probability there is between \(\mu - 2\sigma\) and \(\mu + 2\sigma\), i.e., find \(p(\mu - 2\sigma < X < \mu + 2 \sigma)\).

  • Modify the code you just written, to find how much probability there is in \(\mu - 3\sigma\) and \(\mu + 3\sigma\), i.e., find \(p(\mu - 3\sigma < X < \mu + 3 \sigma)\). This is six-sigmas interval about the mean. Have you ever heard of the six-sigma process improvement technique?