Show code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");
The Gaussian Distribution#
The Normal distribution#
The normal (or Gaussian) distribution is a ubiquitous one. It appears over and over again. There are two explanations as to why it appears so often:
It is the distribution of maximum uncertainty that matches a known expectation and a known variance variance.
It is the distribution that arises when you add a lot of random variables together.
We will learn about both these in subsequent lectures.
We write:
and we read “\(X\) conditioned on \(\mu\) and \(\sigma^2\) follows a normal distribution with expected value \(\mu\) and variance \(\sigma^2\).
Note
First, a lot of people write \(N(\mu, \sigma)\) instead of \(N(\mu, \sigma^2)\). You should always check what the author means.
Second, many people refer to \(\mu\) as the mean instead of the expected value. We will use the terms “mean” and “expected value” interchangeably when dealing with Gaussian distributions.
The standard normal distribution#
When we have zero mean and unit variance, we say that we have a standard normal distribution. If \(Z\) is a standard normal random variable, we write:
The PDF of the standard normal is typically denoted by \(\phi(z)\) and is given by:
The CDF of the standard normal is denoted by \(\Phi(z)\) and is given by:
The CDF of the standard normal is not available in closed form. In the old days, we used to look it up in a table. Nowadays, we use a computer to calculate it.
Here is how you can get the PDF of the standard normal.
First, let’s make a standard normal random variable in scipy.stats
:
import scipy.stats as st
Z = st.norm()
You can evaluate it anywhere you want:
print(f"phi(0.5) = {Z.pdf(0.5):.2f}")
phi(0.5) = 0.35
Let’s plot the PDF:
Show code cell source
fig, ax = plt.subplots()
import numpy as np
zs = np.linspace(-4.0, 4.0, 100)
ax.plot(zs, Z.pdf(zs))
ax.set_xlabel("$z$")
ax.set_ylabel("$\phi(z)$")
sns.despine(trim=True);
And here is the CDF:
Show code cell source
fig, ax = plt.subplots()
ax.plot(zs, Z.cdf(zs))
ax.set_xlabel("$z$")
ax.set_ylabel("$\Phi(z)$")
sns.despine(trim=True);
Here is the expectation:
print(f"E[Z] = {Z.expect():.2f}")
E[Z] = -0.00
And the variance:
print(f"V[Z] = {Z.var():.2f}")
V[Z] = 1.00
Here is the probability that Z is between two numbers:
a = 1.0
b = 3.0
prob_Z_in_ab = Z.cdf(b) - Z.cdf(a)
print(f"p({a:.2f} <= Z <= {b:.2f}) = {prob_Z_in_ab:.2f}")
p(1.00 <= Z <= 3.00) = 0.16
And here is how you can sample:
Z.rvs(100)
Show code cell output
array([-0.68098751, -0.08219633, 0.20786469, -0.50730995, 0.45700647,
-0.2559003 , 0.65936334, -0.23402872, 1.09875376, -0.65752522,
0.87598653, -0.0461331 , 0.50262267, 0.70764474, -0.98254677,
-1.41454882, -0.11455164, 0.75359663, -1.02398395, -1.86342273,
-0.22218493, -0.57929841, 1.61763382, -0.3512537 , -0.58212603,
-1.42276009, 1.30183894, -0.79181306, 0.87481463, 0.26491516,
-0.99758581, 0.7516001 , -1.35293929, 0.14887563, -1.81354551,
-1.03881138, -0.66440162, -1.29772731, -0.03500321, 0.1314219 ,
0.45940001, -1.50587312, 0.6103862 , 0.66813906, -0.31945307,
1.2086607 , -0.04778578, -0.50059914, -0.13445603, -0.86555327,
0.91601074, -0.97407919, -1.42836505, -0.18455424, -1.26131747,
-0.19964524, 1.65131343, 0.23724141, 0.20041386, 0.53681951,
-1.67482189, -1.65050958, 1.14247417, 1.08756538, -0.07532283,
-1.47098224, 1.24651781, -0.74154696, 0.05659722, -0.27788922,
0.89646394, 1.07632242, 0.92632264, -0.5623226 , 0.10293485,
0.07106044, -0.14816709, -0.55905882, 0.22567311, 0.13283325,
-1.43633922, -0.40999822, 0.72885276, 0.67071962, -0.45190328,
-0.819717 , 1.46093725, -0.55625844, -0.67173311, 0.27282091,
0.14917277, 0.66093895, 0.49070751, 0.10162476, -2.40113228,
0.24445444, 1.43220066, 1.47577485, -0.57539355, -1.90800178])
And, of course, you can also sample using the functionality of numpy:
np.random.randn(100)
Show code cell output
array([ 0.17675988, 2.08604093, -2.32928242, -0.31336707, -0.63508336,
0.67371205, 1.90808253, -0.02121881, 0.5099677 , -0.77540186,
-0.64627549, -1.54157974, 0.26082948, 2.69541908, 0.2675784 ,
-1.06147151, -1.64492616, 0.68948065, 1.03904888, 0.25079342,
-0.56337513, 0.38106535, 0.05408547, -0.397527 , 0.20617051,
2.06732106, -0.77393155, -0.5615434 , 1.70336458, 1.27549767,
2.144291 , 0.00601306, 0.33266612, -1.13324241, -1.93575747,
-0.60570986, -0.8859008 , 1.07228981, 0.60790884, -0.50623704,
-0.71449871, 1.0671928 , -1.22584563, -1.37429395, -1.38428782,
-1.19234723, -0.43768159, -0.35739968, -1.15151434, -1.07603713,
0.42115532, 0.90580295, 0.2056319 , 0.71246248, -0.554142 ,
1.59157731, -1.97265792, 0.69586134, -0.69544253, 0.94968473,
0.12604349, -0.03078179, -1.12016362, 0.39424782, 0.05911653,
-0.54681854, 1.37700422, -0.35340388, 0.95453985, 0.40723134,
-1.55549029, -0.42597788, -0.6278352 , 0.33644935, -0.16735517,
-1.53184807, -0.94499227, 1.3118843 , 0.1585745 , 0.78911817,
0.4914747 , -0.90895756, -1.18857701, -0.14776013, -0.95868734,
-1.91836248, 1.10461855, 1.59235448, -0.12523874, 0.70358221,
0.57531609, 0.28489483, -1.61977487, -1.60483268, -0.97463031,
-1.20620496, -1.22983379, 0.14890532, -0.14807777, 0.35014314])
The general normal distribution#
Take
The PDF is given by:
Note
Some people write \(N(x|\mu, \sigma^2)\) to refer to the PDF of \(N(\mu, \sigma^2)\) evaluated at \(x\).
If you mediate a little bit with this expression, you will notice that you can connect it to the standard normal PDF:
This highlights two things. First, the mean \(\mu\) shifts the distribution to the right or to the left.
# Plot the PDF of a standard normal and the PDF of a non-zero mean, unit variance normal
# with an arrow indicating the change from the first to the second.
fig, ax = plt.subplots()
zs = np.linspace(-4.0, 4.0, 100)
ax.plot(zs, Z.pdf(zs), label="$\phi(x)$")
ax.plot(zs, st.norm(loc=1.0).pdf(zs), label="$N(x|\mu=1,\sigma^2=1)$")
ax.set_xlabel("$x$")
ax.set_ylabel("$\phi(x)$")
ax.legend(loc="best", frameon=False)
ax.annotate("", xytext=(0.0, 0.4), xy=(1.0, 0.4), arrowprops=dict(arrowstyle="->",
color=sns.color_palette()[2],
lw=2))
plt.title("The mean translates the standard normal PDF")
sns.despine(trim=True);
The standard deviation \(\sigma\) scales the distribution.
# Plot the PDF of a standard normal and the pdf of a zero mean Normal with variance two
fig, ax = plt.subplots()
zs = np.linspace(-4.0, 4.0, 100)
ax.plot(zs, Z.pdf(zs), label="$\phi(x)$")
ax.plot(zs, st.norm(loc=0.0, scale=np.sqrt(2.0)).pdf(zs), label="$N(x|\mu=0,\sigma^2=2)$")
ax.set_xlabel("$x$")
ax.set_ylabel("$p(x)$")
ax.legend(loc="best", frameon=False)
plt.title("The variance scales the standard normal PDF")
sns.despine(trim=True);
Quantiles of the normal#
There are a few more exciting things to know about the standard normal. For example, how can you find a value \(z_q\) such that the probability of \(Z\) being less than \(z_q\) is \(q\)? Mathematically, you wish to find this:
The point \(z_q\) is called the \(q\)-quantile. To find it, you need to do this:
For example, \(z_{0.50}\) is called the median (and it coincides with the expectation here). Another set of exciting quantiles is \(z_{0.025}\) and \(z_{0.975}\). Why? Because the probability that \(Z\) lies between them is \(95\)%. Here it is:
Let’s find these quantiles and visualize them using the functionality of scipy.stats
.
We will use the percent point function (ppf), which the inverse of the CDF:
z_025 = Z.ppf(0.025)
z_500 = Z.ppf(0.5)
z_975 = Z.ppf(0.975)
print(f"0.025 quantile of Z = {z_025:.2f}")
print(f"0.50 quantile of Z = {z_500:.2f}")
print(f"0.975 quantile of Z = {z_975:.2f}")
0.025 quantile of Z = -1.96
0.50 quantile of Z = 0.00
0.975 quantile of Z = 1.96
Here is how much probability there is between the two extreme quantiles:
print(f"p({z_025:.2f} <= Z <= {z_975:.2f}) = {Z.cdf(z_975) - Z.cdf(z_025):.2f}")
p(-1.96 <= Z <= 1.96) = 0.95
Note
Observe that the quantiles are symmetric around the mean (which is zero here). The 0.025 quantile is at -1.96 and the 0.975 quantile is at 1.96. Being engineers, we are going to call the -1.96 a two.
Let’s also visualize the quantiles on top of the PDF:
Show code cell source
fig, ax = plt.subplots()
ax.plot(zs, Z.pdf(zs))
ax.plot(z_025, [0.0], "o", label="0.025 quantile")
ax.plot(z_975, [0.0], "o", label="0.975 quantile")
plt.legend(loc="best", frameon=False)
sns.despine(trim=True);
Note
Quantiles vs percentiles A percentile is a quantile expressed as a percentage. So, you say the 0.025 quantile is the 2.5 percentile. The 0.975 quantile is the 97.5 percentile. And so on.
Questions#
Modify the code above so that you find and vizualize \(z_{0.001}\) and \(z_{0.999}\).
What is the difference between \(z_{0.999}\) and \(z_{0.001}\)?
What is the probability that \(Z\) is between \(z_{0.001}\) and \(z_{0.999}\)?
Getting any normal from the standard normal#
Knowledge of the quantiles of the standard normal is sufficient to give us the quantiles of any normal. Let \(Z\) be a standard normal. Take some \(\mu\) and \(\sigma\). Then, the random variable
follows a \(N(\mu,\sigma^2)\).
To show this mathematically, we need to show that the PDF of \(X\) is the one we expect.
Proof
First, write down the CDF of \(X\):
Now take the derivative of the CDF to get the PDF:
which is exactly what we wanted.
The formula is extremely useful. For example, you can use it to make samples from any normal distribution using samples from the standard normal. Here is how:
mu = 1.0
sigma = 0.1
X = st.norm(mu, sigma)
xs = np.linspace(mu - 6.0 * sigma, mu + 6.0 * sigma, 100)
x_samples = mu + sigma * Z.rvs(size=10000)
Compare the histogram of the samples with the PDF of the normal:
fig, ax = plt.subplots()
ax.hist(x_samples, density=True, alpha=0.5, label="Samples from $X = \mu + \sigma Z$")
ax.plot(xs, X.pdf(xs), label="PDF of $N(\mu, \sigma^2)$")
ax.set_xlabel("$x$")
ax.set_ylabel("$p(x)$")
plt.legend(loc="best", frameon=False)
sns.despine(trim=True);
How can you find the quantiles of this normal? Well, you can simply use the functionality of scipy.stats
.
As an example, let’s find \(x_{0.025}\):
x_025 = X.ppf(0.025)
print(f"0.025-quantile of N({mu:.2f}, {sigma:.2f}^2) = {x_025:1.2f}")
0.025-quantile of N(1.00, 0.10^2) = 0.80
But we can also find this quantile by exploiting the connection between \(X\) and \(Z\). The definition of a quantile of \(X\) is:
But, since \(X=\mu+\sigma Z\), this is equivalent to:
which becomes:
and then:
This is just:
which tells us that \(\frac{x_q-\mu}{\sigma}\) is the \(q\)-quantile of \(Z\), i.e.,
Solving for \(x_q\), we get:
Let’s do a sanity check:
z_025 = Z.ppf(0.025)
print(f"mu + sigma * z_025 = {mu + sigma * z_025:.2f}")
mu + sigma * z_025 = 0.80
which is the same as what we found before. So, let’s find also the 0.975-quantile:
z_975 = Z.ppf(0.975)
x_975 = mu + sigma * z_975
print(f"0.975-quantile of N({mu:.2f}, {sigma:.2f}^2) = {x_975:1.2f}")
0.975-quantile of N(1.00, 0.10^2) = 1.20
Let’s visualize the quantiles like we did before:
Show code cell source
fig, ax = plt.subplots()
ax.plot(xs, X.pdf(xs))
ax.plot(x_025, 0, "o", label="0.025-quantile")
ax.plot(x_975, 0, "o", label="0.975-quantile")
ax.set_xlabel("$x$")
ax.set_ylabel("$p(x)$")
plt.legend(loc="best", frameon=False)
sns.despine(trim=True);
Now, let’s find the distance between \(x_{2.5}\) and \(x_{97.5}\) in terms of the standard deviation \(\sigma\). We have:
This is:
print(f"x_975 - x_025 ~= sigma * {z_975 - z_025:.2f}")
x_975 - x_025 ~= sigma * 3.92
Okay. So we see that 95% of the probability is contained within a \(3.92\sigma\) interval. This interval is centered at the median (which here is the same as the mode and the expectation of the probability density). The number 3.92 could be nicer, so we will round up to 4 intervals. A 4\(\sigma\) interval about the mean gives us a bit more than 95% of the probability, but it’s simpler to remember.
Note
Remember For a normal random variable \(N(\mu,\sigma^2)\), the 95% probability interval is about \(\mu \pm 4\sigma\). In other words:
Questions#
Write code that finds exactly how much probability there is between \(\mu - 2\sigma\) and \(\mu + 2\sigma\), i.e., find \(p(\mu - 2\sigma < X < \mu + 2 \sigma)\).
Modify the code you just written, to find how much probability there is in \(\mu - 3\sigma\) and \(\mu + 3\sigma\), i.e., find \(p(\mu - 3\sigma < X < \mu + 3 \sigma)\). This is six-sigmas interval about the mean. Have you ever heard of the six-sigma process improvement technique?