Show code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('png')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");
Homework 4 - TEMPLATE - DO NOT DO IT YET#
References#
Module 3: Uncertainty Propagation Through Scientific Models
Surrogate modeling
Active learning
Enforcing symmetries
Instructions#
Type your name and email in the “Student details” section below.
Develop the code and generate the figures you need to solve the problems using this notebook.
For the answers that require a mathematical proof or derivation you should type them using latex. If you have never written latex before and you find it exceedingly difficult, we will likely accept handwritten solutions.
The total homework points are 100. Please note that the problems are not weighed equally.
Student details#
First Name:
Last Name:
Email:
Used generative AI to complete this assignment (Yes/No):
Which generative AI tool did you use (if applicable)?:
Problem 1 - Buliding a Surrogate Model of a Legacy Code#
The purpose of this homework problem is to teach you how to build a surrogate model of a legacy code.
Suppose you have access to a legacy code that solves an important engineering problem. For example, it could be using a finite element method to solve a partial differential equation. The code is computationally expensive and you would like to build a surrogate model to make predictions faster. I am going to sparse you the trouble of actually running a real legacy code in this homework problem. Instead, we are going to use the Brannin function as our ``legacy code’’. The Brannin function is a simple mathematical function that is often used as a benchmark for optimization algorithms. It is defined as:
where \(x = (x_1, x_2)\) with values in the interval \(x_1 \in [-5, 10]\) and \(x_2 \in [0, 15]\).
Let’s write some C++ code to evaluate the Brannin function:
branin_code = r"""
#include <cmath>
#include <iostream>
int main(int argc, char* argv[]) {
if (argc != 3) {
std::cerr << "Usage: " << argv[0] << " x y\n";
return 1;
}
double x1 = std::stod(argv[1]);
double x2 = std::stod(argv[2]);
double f = (x2 - 5.1 * x1 * x1 / (4 * M_PI * M_PI) + 5 * x1 / M_PI - 6) * (x2 - 5.1 * x1 * x1 / (4 * M_PI * M_PI) + 5 * x1 / M_PI - 6) + 10 * (1 - 1 / (8 * M_PI)) * std::cos(x1) + 10;
std::cout << f << std::endl;
return 0;
}
"""
with open("branin.cpp", "w") as f:
f.write(branin_code)
Let me compile it for you:
!g++ -std=c++11 -O3 -o branin branin.cpp -lm
Here is the executable of the legacy code:
!ls -l branin
-rwxr-xr-x@ 1 ibilion staff 36728 Feb 19 19:58 branin
You can run it like this:
!./branin 0.1 0.5
48.0926
Part A - Build a Python wrapper for the legacy C++ code#
Write a Python function that takes \(x = (x_1, x_2)\) as input and returns the output of the Brannin function. Vectorize the function so that it can take a 2D array of \(x\)’s arranged in a \(n \times 2\) matrix and return a 1D array of \(n\) outputs.
Hints:
You need to use the
subprocess
module to call the executable and read the standard output.For the vectorization, use
numpy.vectorize
- not jax.
# Your answer here, with as many code cells as you need.
Part B - Generate initial set of training input-output data and test data#
Use \(n=10\) points from the Sobol sequence, map them to \([-5, 10] \times [0, 15]\), and evaluate the Brannin function using the Python wrapper you built in Part A.
Use another \(n_{\text{test}}=50\) points from the Sobol sequence, map them to \([-5, 10] \times [0, 15]\), and evaluate the Brannin function using the Python wrapper you built in Part A.
# Your answer here, with as many code cells as you need.
Part C - Build an initial surrogate model#
Build a Gaussian process surrogate model using the training data generated in Part B. Pick a squared exponential kernel and optimize the hyper-parameters using the marginal likelihood.
Test your model on the test data generated in Part B.
Calculate and report the root mean squared error on the test data:
\[ \text{RMSE} = \sqrt{\frac{1}{n_{\text{test}}} \sum_{i=1}^{n_{\text{test}}} (f_{\text{true}}(x_i) - f_{\text{pred}}(x_i))^2} \]where \(f_{\text{true}}(x_i)\) is the true value of the Brannin function at the test point \(x_i\) and \(f_{\text{pred}}(x_i)\) is the predicted value of the Brannin function at the test point \(x_i\).
Plot the predictions on the test data along with the true values.
Calculate the stanardized errors on the test data and plot them.
Plot the quantile-quantile plot of the standardized errors.
Hint: Standardize the output data before building the surrogate model using the initial test set.
Part D - Randomly collect more training data#
Pick the next point from the Sobol sequence, map it to \([-5, 10] \times [0, 15]\), and evaluate the Brannin function using the Python wrapper you built in Part A.
Add the new point to the training data and rebuild the surrogate model.
Test your model on the test data generated in Part B.
Calculate and report the root mean squared error on the test data.
Iterate this process for 190 iterations.
Plot the root mean squared error as a function of the number of training points (not the number of iterations).
For the last iteration, plot the predictions on the test data along with the true values.
For the last iteration, calculate the stanardized errors on the test data and plot them.
For the last iteration, plot the quantile-quantile plot of the standardized errors.
# Your answer here, with as many code cells as you need.
Part E - Active learning#
Use uncertainty sampling to select the next point to evaluate. Implement it as follows:
Start with a clean surrogate model using only the \(n=10\) initial training points.
On each iteration:
Generate \(n_{\text{candidate}} = 1000\) points from the Sobol sequence.
Calculate the predictive variance of the surrogate model at each candidate point.
Select the candidate point with the highest predictive variance.
Evaluate the legacy code at the selected candidate point (don’t forget to scale it to \([-5, 10] \times [0, 15]\)).
Add the new point to the training data and rebuild the surrogate model.
Test your model on the test data generated in Part B.
Calculate and report the root mean squared error on the test data.
Iterate this process for 190 iterations.
Plot the root mean squared error as a function of the number of training points (not the number of iterations).
For the last iteration, plot the predictions on the test data along with the true values.
For the last iteration, calculate the stanardized errors on the test data and plot them.
For the last iteration, plot the quantile-quantile plot of the standardized errors.
Plot the root mean squared error as a function of the number of training points for the random sampling case. Compare it with the random sampling case.
Did you do better with active learning than with random sampling? Why?
Plot the points that were selected by the active learning process. Where are most of them located?
# Your answer here, with as many code cells as you need.
Part F - Optimization#
Now suppose that what you wanted to do was to find the minimum of the legacy code.
Start with a clean surrogate model using only the \(n=10\) initial training points.
On each iteration:
Generate \(n_{\text{candidate}} = 1000\) points from the Sobol sequence.
Calculate the predictive mean \(\mu(x)\) and standard deviation \(\sigma(x)\) of the surrogate model at each candidate point.
Select the candidate point with the highest expected improvement:
\[ \text{EI}(x) = \mathbb{E}[\max(f_{\text{min}} - f(x), 0)] = \sigma(x) [\gamma(x) \Phi(\gamma(x)) + \phi(\gamma(x))] \]where \(f_{\text{min}}\) is the minimum value of the Brannin function found so far, \(\gamma(x) = (f_{\text{min}} - \mu(x)) / \sigma(x)\), \(\Phi\) is the cumulative distribution function of the standard normal distribution, and \(\phi\) is the probability density function of the standard normal distribution.
Add the new point to the training data and rebuild the surrogate model.
Iterate this process for 50 iterations.
Plot the expected improvement as a function of the number of training points.
Plot the minimum value of the Brannin function found so far as a function of the number of training points.
Plot the points that were selected by the optimization process. Where are most of them located?
# Your answer here, with as many code cells as you need.
Part G - Multi-fidelity approach#
Now, let’s pretend that we have access to a low-fidelity version of the legacy code. Pick this:
Notice that this is very similar to the legacy code, but it misses the cosine term and the constant term. Let’s pretend that the low-fidelity code is much cheaper to evaluate than the high-fidelity code. We can use the low-fidelity code to build a surrogate model and then use the high-fidelity code to correct the surrogate model.
Generate \(n_{\text{low}}=200\) points from the Sobol sequence and evaluate the low-fidelity code at these points.
Build a Gaussian process surrogate model using the low-fidelity data.
Build a multi-fidelity Gaussian process surrogate model using the \(n=10\) initial training points and the low-fidelity surrogate mean. Hint: Just make your covariance function:
\[ k(x, x') = k_1(x, x')k_2(f_{\text{low}}(x), f_{\text{low}}(x')), \]where \(k_1\) is the squared exponential kernel and \(k_2\) is the squared exponential kernel with different hyper-parameters. Use one variance hyper-parameter (set the other to 1).
Iteratively add more points using uncertainty sampling as in Part E. Go again to 200 points.
Plot the root mean squared error as a function of the number of training points and compare to the single-fidelity case.
Plot the points that were selected by the multi-fidelity active learning process. Where are most of them located?
# Your answer here, with as many code cells as you need.
Problem 2 - The permutation group#
The purpose of this problem is to teach you the basics of the permutation group and its representations.
The permutation group is particularly important for two reasons. First, according to Cayley’s theorem, every finite group is isomorphic to a subgroup of a permutation group. Second, we will see that every permutation group has a representation as a matrix group with the operation being the common matrix multiplication. Therefore, every finite group has a matrix representation.
The permutation group \(S_n\) is the group of all permutations of \(n\) elements. Each element of the group is a bijective function that maps \(\{1, 2, \ldots, n\}\) to itself. Bijective means one-to-one and onto, i.e., each element of the set is mapped to a unique element of the set and each element of the set is mapped to. For example, an element of \(S_3\) is the function \(\sigma: \{1, 2, 3\} \to \{1, 2, 3\}\) defined by \(\sigma(1) = 2\), \(\sigma(2) = 3\), and \(\sigma(3) = 1\).
The group operation is composition of functions. The identity element is the function that maps each element to itself. The inverse of a function is the function that undoes the permutation.
Part A#
How many elements does \(S_n\) have?
Part B#
One way to represent permuations is as a 2-row matrix where the first row is the input and the second row is the output. For example, the permutation \(\sigma\) defined above can be represented as:
Represent as such a matrix the permutation that maps 1 to 3, 2 to 1, and 3 to 2.
Part C#
A cycle is a permutation that moves some elements and leaves the others fixed. For example, the permutation:
in \(S_4\) is a cycle. There is a cycle notation that is more compact. We can also write:
And we mean that 1 is mapped to 2, 2 is mapped to 3, and 3 is mapped to 1. The cycle notation is not unique. For example, we could also write:
And we would mean the same permutation. Write the permutation:
in cycle notation.
Part D#
Any permutation can be written as a product of disjoint cycles. For example, the permutation:
can be written as:
To figure out this decomposition, you can start with the first element and follow the permutation until you get back to the first element. Then you write the cycle and remove the elements that are part of the cycle. You repeat this process until you have written all the cycles. Write the permutation:
in \(S_6\) as a product of disjoint cycles.
Part E#
A transposition is a cycle of length 2. Like \((1, 2)\) or \((3, 4)\). Any permutation can be written as a product of transpositions.
For example, take the permutation
in \(S_5\). First, write it as a product of disjoint cycles:
Then write each cycle as a product of transpositions. For example,
And then you have:
Write the permutation:
in \(S_5\) as a product of transpositions.
Part F#
The number of transpositions in the decomposition of a permutation is always the same. If the number of transpositions is even, the permutation is called even. If the number of transpositions is odd, the permutation is called odd.
Is the permutation:
in \(S_5\) even or odd?
Part G#
The set of all even permutations in \(S_n\) is a subgroup of \(S_n\). It is called the alternating group and denoted by \(A_n\).
Show that when you multiply two even permutations you get an even permutation. Hint: If you multiply two even permutations, how many transpositions do you get?
Show that the identity permutation is even. Hint: How many transpositions do you need to write the identity permutation?
Show that the inverse of the transposition \((a, b)\) is itself.
Show that the inverse of the transposition \((1, 2)(3, 4)\) is \((1, 2)(3, 4)\).
Show that the inverse of the transposition \((1, 2)(2, 3)\) is \((2, 3)(1, 2)\).
Argue that the inverse of a an even permutation is even. Hint: Write the permutation as a product of transpositions and generalize the previous results.
Argue that \(A_n\) is a group and since it is closed under multiplication, it is a subgroup of \(S_n\).
Part H#
Now let’s represent permutations as square matrices. A permutation matrix is a square matrix that has exactly one 1 in each row and each column and 0’s elsewhere. For example, the permutation:
in \(S_4\) can be represented as the permutation matrix:
The matrix can act on a one-hot vector representation of the numbers from 1 to \(4\). For example, \(2\) is represented as:
And the matrix acting on \(2\) gives:
Find the matrix representation, \(D(\sigma)\) and \(D(\tau)\), of the \(S_4\) permutations:
\[ \sigma = (1, 2)(2, 3) \]and
\[ \tau = (1, 3)(2, 4). \]Verify, by direct calculation, that the matrix product \(D(\sigma)D(\tau)\) is the matrix representation of the permutation \(\sigma \tau\).
Find the inverse of the matrix \(D(\sigma)\). To which permutation does it correspond? Verify that it is indeed the inverse by multiplying the matrix by its inverse and showing that you get the identity matrix.
Part E#
Using what you know you could create a \(3\times 3\) matrix representation of \(S_3\) (group of permutations of three objects). But it actually possible to create a faithful representation that is \(2\times 2\), albeit it will be complex. We will, of course, map the identity permutation to the identity matrix:
But where do we map the rest? Let’s start with the cycle \((1, 2, 3)\). Observe that:
And once more:
So, we need to find a 2\times 2 matrix \(D((1,2,3))\) such that:
Try a diagonal matrix:
\[\begin{split} D((1,2,3)) = \begin{bmatrix} a & 0\\ 0 & b \end{bmatrix}. \end{split}\]Hint: You will need to use complex numbers and the cube root of unity, \(\omega = e^{2\pi i/3}\).
What is the matrix representation of \((1, 3, 2)\)? Hint: Use the fact that \((1, 3, 2) = (1, 2, 3)^2\).
Pick that the matrix representation of \((1,2)\) to be:
\[\begin{split} D((1,2)) = \begin{bmatrix} 0 & 1\\ 1 & 0 \end{bmatrix}. \end{split}\]Verify that \(D((1,2))^2 = \text{Id}\).
Find the matrix representation of \(D((2,3))\). Hint: Use the fact that \((2,3) = (1, 2)(1, 2, 3)\).
Find the matrix representation of \(D((1,3))\).