Show code cell source
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('png')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks");
Homework 3 - TEMPLATE - DO NOT DO IT YET#
References#
TBD
Instructions#
Type your name and email in the “Student details” section below.
Develop the code and generate the figures you need to solve the problems using this notebook.
For the answers that require a mathematical proof or derivation you should type them using latex. If you have never written latex before and you find it exceedingly difficult, we will likely accept handwritten solutions.
The total homework points are 100. Please note that the problems are not weighed equally.
Student details#
First Name:
Last Name:
Email:
Used generative AI to complete this assignment (Yes/No):
Which generative AI tool did you use (if applicable)?:
Problem 1 - Implement autoencoders in jax
, equinox
, and optax
#
Implement autoencoders in jax
and train it on the MNIST dataset.
Autoencoders, consist of two neural networks, an encoder and a decoder. The encoder maps the input to a latent space (typically of a much smaller dimension than the input), and the decoder maps the latent space back to the input space.
You can think of the encoder as a compression algorithm and the decoder as a decompression algorithm.
Alternatively, you can think of the encoder as the projection of the input data onto a lower-dimensional manifold, and the decoder as the reconstruction operator.
Follow these directions:
Pick the dimension of the latent space to be 2. This means that the encoder will map the input to a 2-dimensional space, and the decoder will map the 2-dimensional space back to the input space.
Your encoder should work on a flattened version of the input image. This means that the input to the encoder is a vector of 784 elements (28x28).
Start by picking your encoder \(z = f(x;\theta_f)\) to be a neural network with 2 hidden layers, each with 128 units and ReLU activations. Increase the number of units and layers if you think it is necessary.
Start by picking your decoder \(x' = g(z;\theta_g)\) to be a neural network with 2 hidden layers, each with 128 units and ReLU activations. Increase the number of units and layers if you think it is necessary.
Make all your neural networks in
equinox
.The loss function is the mean squared error between the input and the output of the decoder:
where \(N\) is the number of samples in the dataset.
Split the MNIST dataset into a training and a test set.
Use
optax
for the optimization.Train the autoencoder using the Adam optimizer with a learning rate of 0.001 for 1 epoch to debug. Use a batch size of 32. Feel free to play with the learning rate and batch size.
Monitor the loss function on the training and test set. Increase the number of epochs up to the point where the loss function on the test set stops decreasing.
Once you are done training, visualize the projections of the digits in the latent space. Don’t bother drawing little images of the digits. Just plot the 2D points and label them with the digit they correspond to. You can use matplotlib
for this.
Here is the dataset:
# Download the MNIST dataset
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
# Split the dataset into training and test sets
from sklearn.model_selection import train_test_split
X_train_val, X_test, y_train_val, y_test = train_test_split(
mnist.data, mnist.target, test_size=10000, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(
X_train_val, y_train_val, test_size=10000, random_state=42)
/Users/ibilion/.pyenv/versions/3.11.6/lib/python3.11/site-packages/sklearn/datasets/_openml.py:1022: FutureWarning: The default value of `parser` will change from `'liac-arff'` to `'auto'` in 1.4. You can set `parser='auto'` to silence this warning. Therefore, an `ImportError` will be raised from 1.4 if the dataset is dense and pandas is not installed. Note that the pandas parser may return different data types. See the Notes Section in fetch_openml's API doc for details.
warn(
Part A#
Put your answer here. Use as many markdown and code blocks as you want.
# your code
Part B#
Pick the first five digits in the test set and plot the original and reconstructed images.
# your code here
Part C#
Plot the projections of the digits in the latent space (training and test).
# your code here
Part D#
Use scikitlearn
to fit a mixture of Gaussians to the latent space. Use 10 components.
Then sample five times from the fitted mixture of Gaussians, reconstruct the samples, and plot the reconstructed images.
# your code here
Problem 2 - Physics-informed Neural Networks for Solving a Neo-Hookean Hyperelasticity Problem#
*The original version of this problem was developed by Atharva Hans as a companion to this.
Consider a neo-Hookean square body defined on \((x,y) \in [0,1]^2\). Let \(\mathbf{u}(x,y) = (u_1, u_2)\) describe the displacement field for this body. This body is subjected to the following displacement boundary conditions:
with \(\delta\) referring to the applied displacement along the x-direction.
For this hyperelastic material, the stored energy \(E_b\) in the body can be expressed in as:
with
where \(\mathbf{I}\) is an identity matrix.
The final orientation of this body is described by a displacement field that minimizes the stored energy \(E_b\). The idea is to use a neural network to approximate the displacement field and train it by minimizing the stored energy \(E_b\).
To automatically satisfy the boundary conditions, we will use this approximation: $\( u_1(x,y) = \delta - \delta(1-x) + x(1-x)N_1(x,y;\theta), \)\( and, \)\( u_2(x,y) = x(1-x)N_2(x,y;\theta) \)\( where \)N_1(x,y;\theta)\( and \)N_2(x,y;\theta)$ are neural networks.
Part A#
Solve the problem above for \(\delta=0.1\) using a physics-informed neural network (PINN). Use separate neural networks for \(N_1(x,y;\theta)\) and \(N_2(x,y;\theta)\). Start with a multi-layer perceptron with 3 hidden layers, each with 128 units, and tanh activations. Add a Fourier feature layer at the beginning of the network. Feel free to change the architecture if you think it is necessary.
Use equinox
for the neural networks and optax
for the optimization.
Use a sampling average of 32 collocation points to compute the integral of the stored energy.
Use the Adam optimizer with a learning rate of 0.001 for 1000 iterations to debug.
Feel free to play with the learning rate, the number of collocation points, and the number of iterations.
Show the evolution of the loss function over the iterations. Plot the final displacement field (plot \(u_1(x,y)\) and \(u_2(x,y)\) separately).
Put your answer here. Use as many markdown and code blocks as you want.
# your code here
Part B#
Solve the problem for \(\delta=0.5\) using the same architecture as above. It will likely fail to train. If yes, then use the solution of \(\delta=0.1\) as the initial guess for \(\delta=0.2\), and then use the solution of \(\delta=0.2\) as the initial guess for \(\delta=0.3\), and so on, until you reach \(\delta=0.5\). This is called transfer learning.
At the end, plot the final displacement field for \(\delta=0.5\).
Put your answer here. Use as many markdown and code blocks as you want.
# your code here
Part C#
Solve the parametric problem for \(\delta \in [0,0.5]\). That is, build a neural network that takes \(\delta\) as input and outputs the displacement field. To do this:
Modify the loss function to:
Modify the neural networks to take \(\delta\) as input, say \(N_1(x,y;\delta;\theta)\) and \(N_2(x,y;\delta;\theta)\). Your field will be \(\mathbf{u}(x,y;\delta;\theta)\). Use the following architecture for the neural networks:
Here, \(n\) is your choice (start with \(n=10\)), \(b_{1,i}\) is a neural network that takes \(\delta\) as input and outputs a scalar, and \(t_{1,i}(x,y)\) is a multi-layer perceptron with 3 hidden layers, each with 128 units, and tanh activations, and Fourier features at the beginning. The same applies to \(N_2(x,y;\delta)\). This representation resembles an expansion in terms of basis functions. The same architecture appears in DeepONet.
Plot the \(x\) and \(y\) displacement at \(x=0.5, y=0.5\) as a function of \(\delta\).