Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks")

import urllib.request
import os

def download(
    url : str,
    local_filename : str = None
):
    """Download a file from a url.
    
    Arguments
    url            -- The url we want to download.
    local_filename -- The filemame to write on. If not
                      specified 
    """
    if local_filename is None:
        local_filename = os.path.basename(url)
    urllib.request.urlretrieve(url, local_filename)

Classification with Deep Neural Networks#

We implement image classification network in PyTorch. We add L2 regularization, convolutional layers, and hyper parameter tuning. You may find these tutorials useful:

The CIFAR10 dataset#

We will demonstrate multiclass classification using the CIFAR10 dateset. The dataset consists of 60000 32x32 color images in 10 classes (plane, car, bird, cat, deer, dog, frog, horse, ship, and truck), with 6000 images per class. The dataset can be downloaded directly from PyTorch using the torchvision module.

You can think of the original images as 32x32x3 arrays. The first two dimensions correspond to the pixels. The third dimension corresponds to the color (red, green, blue). Of course, we must turn them into PyTorch tensors. Also, it is more convenient to scale them to be between \([-1,1]\). We will achieve this using a transformation. Don’t worry about this now. We will explain it as we go.

import torch
import torchvision
import torchvision.transforms as transforms

# This is the transformation that we will apply to each image
transform = transforms.Compose(
    [transforms.ToTensor(),   # This turns the picture to a Tensor
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # This scales it to [-1, 1]

# Here is how you can download the training dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

# And here is how to download the test dataset:
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)

# These are the class labels
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
100%|██████████| 170498071/170498071 [00:04<00:00, 34368893.07it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified

Now, all these data went into the folder “./data.” Here is what this folder contains:

!ls ./data/
cifar-10-batches-py    cifar-10-python.tar.gz

The file cifar-10-python.tar.gz is a compressed file containing everything. The contents were automatically extracted and put in the folder cifar-10-batches-py. Let’s look inside this folder:

!ls -lht data/cifar-10-batches-py
total 363752
-rw-r--r--  1 ibilion  staff    88B Jun  4  2009 readme.html
-rw-r--r--  1 ibilion  staff   158B Mar 31  2009 batches.meta
-rw-r--r--  1 ibilion  staff    30M Mar 31  2009 data_batch_4
-rw-r--r--  1 ibilion  staff    30M Mar 31  2009 data_batch_1
-rw-r--r--  1 ibilion  staff    30M Mar 31  2009 data_batch_5
-rw-r--r--  1 ibilion  staff    30M Mar 31  2009 data_batch_2
-rw-r--r--  1 ibilion  staff    30M Mar 31  2009 data_batch_3
-rw-r--r--  1 ibilion  staff    30M Mar 31  2009 test_batch

You see several files. The important ones are data_batch_1 to data_batch_5 and test_batch. Each of these contains 10,000 images in a binary format. The format is explained here. We can read them as follows:

def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

data = unpickle('data/cifar-10-batches-py/data_batch_1')
# data is a dictionary
# Here are the keys
print(data.keys())
dict_keys([b'batch_label', b'labels', b'data', b'filenames'])
# One key has to do with the pictures
# It gives you a numpy array:
print(data[b'data'].shape)
(10000, 3072)
# The first dimension correspond to differnt picture
# The second dimension is
32 * 32 * 3
3072
# So this is the first picture:
img = data[b'data'][0, :].reshape((32, 32, 3), order='F')
# Here is the Red channel:
print(img[:, :, 0])
[[ 59  16  25 ... 208 180 177]
 [ 43   0  16 ... 201 173 168]
 [ 50  18  49 ... 198 186 179]
 ...
 [158 123 118 ... 160 184 216]
 [152 119 120 ...  56  97 151]
 [148 122 109 ...  53  83 123]]

The numbers go from 0 (no red) to 255 (full red). Here is how to visualize it:

fig, ax = plt.subplots(figsize=(1, 1))
ax.imshow(np.transpose(img, (1, 0, 2)));
../_images/3155a8cbd231f4e210c288f4c7ee3e785b02f5c45b2484753028e07bc8f12c51.svg

This is clearly a frog. Let’s verify this:

classes[data[b'labels'][0]]
'frog'

This is nice. And we could proceed manually like this. However, PyTorch offers some helpful functionality. Let’s investigate the trainset that was returned by CIFAR10:

trainset
Dataset CIFAR10
    Number of datapoints: 50000
    Root location: ./data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
           )
# Here are the classes:
trainset.classes
['airplane',
 'automobile',
 'bird',
 'cat',
 'deer',
 'dog',
 'frog',
 'horse',
 'ship',
 'truck']
# Here is the correspondence between classes and discrete labels
trainset.class_to_idx
{'airplane': 0,
 'automobile': 1,
 'bird': 2,
 'cat': 3,
 'deer': 4,
 'dog': 5,
 'frog': 6,
 'horse': 7,
 'ship': 8,
 'truck': 9}
# Here are the images from all training batches
print(trainset.data.shape)
(50000, 32, 32, 3)
# Here are the labels
print(trainset.targets[:10])
[6, 9, 9, 4, 1, 1, 2, 7, 8, 3]

Alright. Now, let’s use PyTorch functionality for looping over the training and the test datasets. We need a DataLoader:

# One for the training data:
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=0)

# One for the test data:
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=0)

These objects work as follows:

# They help you loop over all the data in a random way (because we had shuffle=True)
for i, data in enumerate(trainloader, 0):
    inputs, labels = data
    # Here inputs are of size batch_size x (3 x 32 x 32)
    # Since we had specified, the batch_size to be 4
    # this essentially loads four images per iteration
    if i % 1000 == 0:
        print('Data point:', i, 'input size:', str(inputs.shape))
Data point: 0 input size: torch.Size([4, 3, 32, 32])
Data point: 1000 input size: torch.Size([4, 3, 32, 32])
Data point: 2000 input size: torch.Size([4, 3, 32, 32])
Data point: 3000 input size: torch.Size([4, 3, 32, 32])
Data point: 4000 input size: torch.Size([4, 3, 32, 32])
Data point: 5000 input size: torch.Size([4, 3, 32, 32])
Data point: 6000 input size: torch.Size([4, 3, 32, 32])
Data point: 7000 input size: torch.Size([4, 3, 32, 32])
Data point: 8000 input size: torch.Size([4, 3, 32, 32])
Data point: 9000 input size: torch.Size([4, 3, 32, 32])
Data point: 10000 input size: torch.Size([4, 3, 32, 32])
Data point: 11000 input size: torch.Size([4, 3, 32, 32])
Data point: 12000 input size: torch.Size([4, 3, 32, 32])

When you reach the end of the loop, you have visited all the images once. Notice that PyTorch has reshaped the images to 3 x 32 x 32 3D arrays. This is more convenient for the convolutional layers we will use later. Also, PyTorch uses the transformations we gave it to scale the data to array elements to \([-1, 1]\). Let me show you an example:

for i, data in enumerate(trainloader, 0):
    inputs, labels = data
    print(inputs[0])
    break
tensor([[[ 0.6627,  0.7098,  0.4824,  ..., -0.0510, -0.0902, -0.4039],
         [ 0.6235,  0.7255,  0.6078,  ..., -0.3412, -0.4431, -0.7333],
         [ 0.3647,  0.4588,  0.4588,  ..., -0.4745, -0.5608, -0.7255],
         ...,
         [ 0.2627, -0.1137, -0.2471,  ...,  0.2314, -0.1765, -0.3647],
         [ 0.2863, -0.1373, -0.2863,  ...,  0.2078, -0.2392, -0.3725],
         [-0.0039, -0.5608, -0.5765,  ..., -0.2941, -0.4980, -0.3882]],

        [[ 0.8510,  0.8353,  0.7098,  ...,  0.0667, -0.0118, -0.3569],
         [ 0.7490,  0.6784,  0.6627,  ..., -0.2549, -0.3882, -0.7176],
         [ 0.6078,  0.5294,  0.4431,  ..., -0.4196, -0.5451, -0.7490],
         ...,
         [ 0.5529,  0.1059, -0.0667,  ...,  0.2078, -0.1922, -0.3804],
         [ 0.5294,  0.0431, -0.1529,  ...,  0.1843, -0.2549, -0.3882],
         [ 0.1922, -0.4275, -0.4980,  ..., -0.3176, -0.5216, -0.4039]],

        [[ 1.0000,  0.9216,  0.9059,  ...,  0.3804,  0.2392, -0.1843],
         [ 0.9216,  0.6784,  0.7569,  ..., -0.0667, -0.2549, -0.6157],
         [ 0.9059,  0.6314,  0.5216,  ..., -0.2706, -0.4275, -0.6549],
         ...,
         [ 0.9059,  0.4510,  0.2000,  ...,  0.2627, -0.1608, -0.3725],
         [ 0.9059,  0.3490,  0.0667,  ...,  0.2392, -0.2235, -0.3804],
         [ 0.4745, -0.2392, -0.3490,  ..., -0.2627, -0.4902, -0.3961]]])

Training a classifier using dense DNNs#

Let’s train a classifier using a dense neural network. It could be better, but it is very easy to put together. We will start the network with 3 x 32 x 32 = 3072, followed by a few dense layers ending with ten outputs passed through softmax. However, for numerical stability reasons, we will not end with the softmax layer during training.

import torch.nn as nn

# The classifer - The dimensions of the layers have
# been picked to match those of the convolutional neural network
# that we are going to build later
# For now, just notice that we gradually take the 3072-dimensional input
# down to 10 dimensions (the number of classes we have)
# Also, notice that I do not add the softmax layer at this point
model_dense = nn.Sequential(nn.Linear(3072, 1176), nn.ReLU(),
                            nn.Linear(1176, 400), nn.ReLU(),
                            nn.Linear(400, 120), nn.ReLU(),
                            nn.Linear(120, 84), nn.ReLU(),
                            nn.Linear(84, 10))

# This is our loss function. 
# Read this: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
criterion = nn.CrossEntropyLoss()
# The reason we did not add the Softmax layer at the end is because
# the loss function above is doing it internally.
# It expects that you provide "contain raw, unnormalized scores for each class"
# Here is the optimizer
import torch.optim as optim
optimizer = optim.SGD(model_dense.parameters(), lr=0.001, momentum=0.9)

Let’s train the network. This is going to take a while (about 3 minutes on my desktop):

# How many times do you want to go over the entire dataset?
# Don't pick a very big number because you will overfit
num_epochs = 2

# Here is the main training algorithm
for epoch in range(num_epochs):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model_dense(inputs.reshape(4, 3 * 32 * 32))
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 1000 == 999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 1000))
            running_loss = 0.0

print('Finished Training')
[1,  1000] loss: 2.289
[1,  2000] loss: 2.098
[1,  3000] loss: 1.915
[1,  4000] loss: 1.814
[1,  5000] loss: 1.758
[1,  6000] loss: 1.718
[1,  7000] loss: 1.707
[1,  8000] loss: 1.671
[1,  9000] loss: 1.624
[1, 10000] loss: 1.615
[1, 11000] loss: 1.604
[1, 12000] loss: 1.559
[2,  1000] loss: 1.473
[2,  2000] loss: 1.481
[2,  3000] loss: 1.479
[2,  4000] loss: 1.488
[2,  5000] loss: 1.481
[2,  6000] loss: 1.470
[2,  7000] loss: 1.472
[2,  8000] loss: 1.433
[2,  9000] loss: 1.463
[2, 10000] loss: 1.450
[2, 11000] loss: 1.409
[2, 12000] loss: 1.404
Finished Training

Since training networks takes a while, it’s a good idea to save it:

torch.save(model_dense.state_dict(), 'hands-on-25-model-dense.pth')

Here it is as a file:

!ls -lht hands-on-25-model-dense.pth
-rw-r--r--  1 ibilion  staff    16M Oct 23 16:45 hands-on-25-model-dense.pth

Now let’s make some predictions:

# Get the first four images and their labels
dataiter = iter(testloader)
images, labels = next(dataiter)
print(labels)
tensor([3, 8, 8, 0])
# Make predictions with the net and pass them through 
# softmax to turn them into probabilities
st = nn.Softmax(dim=1)
predictions = st(model_dense(images.reshape(4, 3072)))
def imshow(img, ax):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    ax.imshow(np.transpose(npimg, (1, 2, 0)))

# Plot the pictures and the predictions
for i in range(4):
    fig, ax = plt.subplots(figsize=(1,1))
    imshow(images[i], ax)
    fig2, ax2 = plt.subplots()
    ax2.bar(np.arange(10), predictions[i].detach().numpy())
    ax2.set_xticks(np.arange(10))
    ax2.set_xticklabels(classes)
    sns.despine(trim=True)
../_images/4da3f7ebc55a63541dc45405328ed91a368ccf2112645ecd5ed4333993025393.svg../_images/d12c0ca799d7d1737c03b2b522b95c007103e5878cd4fbfa5d4e7f8044f2ffd8.svg../_images/be36f89eb46f20bee1caec065eb82220a7b31111388f9798140af47b9122b43c.svg../_images/9341c8e128c1c652b17d4cfdd18966747c85adc09b51d32b745ae8b6784b8a21.svg../_images/fa5f48e136a36b2dc2cc9ebbc24b3b69ae566f46b0c2456dbdff49905bb9350f.svg../_images/ed4ca72115a81d6855bfc897c828a0efedf36d35dc808d7f23c09f670318e10a.svg../_images/441138b4d890c2174f7463fc1deef2f0026c52cf75147cf774b09d9dc2229df7.svg../_images/228235a85aafcd93809520953a118e5df7fa2155d1e1bbae867b4efa722d2a78.svg

Now, let’s do the same thing with a convolutional neural network. We are not going to use nn.Sequential this time. Instead, we will use nn.Module to manually create the network. The documentation is here. You need to inherit nn.Module, and implement __init__() and forward().

import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # A convolutional layer:
        # 3 = input channels (colors),
        # 6 = output channels (features),
        # 5 = kernel size
        self.conv1 = nn.Conv2d(3, 6, 5)
        # A 2 x 2 max pooling layer - we are going to use it two times
        self.pool = nn.MaxPool2d(2, 2)
        # Another convolutional layer
        self.conv2 = nn.Conv2d(6, 16, 5)
        # Some linear layers
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # This function implements your network output
        # Convolutional layer, followed by relu, followed by max pooling
        x = self.pool(F.relu(self.conv1(x)))
        # Same thing
        x = self.pool(F.relu(self.conv2(x)))
        # Flatting the output of the convolutional layers
        x = x.view(-1, 16 * 5 * 5)
        # Go throught the first dense linear layer followed by relu
        x = F.relu(self.fc1(x))
        # Through the second dense layer
        x = F.relu(self.fc2(x))
        # Finish up with a linear transformation
        x = self.fc3(x)
        return x


model_cnn = Net()

Here is the optimizer:

optimizer = optim.SGD(model_cnn.parameters(), lr=0.001, momentum=0.9)

And here is the training loop. Notice that this network trains faster.

# How many times do you want to go over the entire dataset?
# Don't pick a very big number because you will overfit
num_epochs = 5

# Here is the main training algorithm
for epoch in range(num_epochs):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model_cnn(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 1000 == 999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 1000))
            running_loss = 0.0

print('Finished Training')
[1,  1000] loss: 1.835
[1,  2000] loss: 1.763
[1,  3000] loss: 1.705
[1,  4000] loss: 1.674
[1,  5000] loss: 1.667
[1,  6000] loss: 1.645
[1,  7000] loss: 1.619
[1,  8000] loss: 1.609
[1,  9000] loss: 1.607
[1, 10000] loss: 1.542
[1, 11000] loss: 1.567
[1, 12000] loss: 1.564
[2,  1000] loss: 1.523
[2,  2000] loss: 1.519
[2,  3000] loss: 1.478
[2,  4000] loss: 1.515
[2,  5000] loss: 1.481
[2,  6000] loss: 1.506
[2,  7000] loss: 1.455
[2,  8000] loss: 1.493
[2,  9000] loss: 1.465
[2, 10000] loss: 1.448
[2, 11000] loss: 1.446
[2, 12000] loss: 1.450
[3,  1000] loss: 1.404
[3,  2000] loss: 1.380
[3,  3000] loss: 1.376
[3,  4000] loss: 1.405
[3,  5000] loss: 1.435
[3,  6000] loss: 1.381
[3,  7000] loss: 1.412
[3,  8000] loss: 1.416
[3,  9000] loss: 1.393
[3, 10000] loss: 1.406
[3, 11000] loss: 1.374
[3, 12000] loss: 1.349
[4,  1000] loss: 1.313
[4,  2000] loss: 1.317
[4,  3000] loss: 1.349
[4,  4000] loss: 1.353
[4,  5000] loss: 1.337
[4,  6000] loss: 1.312
[4,  7000] loss: 1.316
[4,  8000] loss: 1.327
[4,  9000] loss: 1.313
[4, 10000] loss: 1.333
[4, 11000] loss: 1.338
[4, 12000] loss: 1.333
[5,  1000] loss: 1.227
[5,  2000] loss: 1.266
[5,  3000] loss: 1.286
[5,  4000] loss: 1.267
[5,  5000] loss: 1.258
[5,  6000] loss: 1.282
[5,  7000] loss: 1.280
[5,  8000] loss: 1.271
[5,  9000] loss: 1.255
[5, 10000] loss: 1.283
[5, 11000] loss: 1.303
[5, 12000] loss: 1.260
Finished Training

Make some predictions:

# Make predictions with the net and pass them through 
# softmax to turn them into probabilities
st = nn.Softmax(dim=1)
predictions = st(model_cnn(images))
for i in range(4):
    fig, ax = plt.subplots(figsize=(1,1))
    imshow(images[i], ax)
    fig2, ax2 = plt.subplots()
    ax2.bar(np.arange(10), predictions[i].detach().numpy())
    ax2.set_xticks(np.arange(10))
    ax2.set_xticklabels(classes)
    sns.despine(trim=True)
../_images/20b6bdd6ef97b2996081d167f56d31792b00022a8504071ac92a7cee79cdd203.svg../_images/6535a663e502f6adf0cd97c4fd61eeb170aac88f625b5208a9d6317c4a550abb.svg../_images/41683aa3861889fd521ba88a21bdb2b35cd8c623c8aa3036ed743250b23aabae.svg../_images/d87ffe99777307de8ee74c76ddfa6e26ce389b1725eb3ed2a951b3303ba8337b.svg../_images/1aa152035e3e1814e1d586df97c9e289e11047b616b7bf6bc6c9ad9c15a9604b.svg../_images/0203e0a508ec6ee8f71ed6510aeaa719e45fa749a3d45ed48f66fe3e31df5591.svg../_images/7a636ac1dd16b98282bade556725b668664e53a2146cc9d552727cd3525dfdb9.svg../_images/3e8eb708a9e596a239708e4f4247ec6de2e57fc6eaafa7cfd47d030593bacc58.svg

It doesn’t work equally well for all classes. Here is some code from the PyTorch tutorial to get the accuracy for each class:

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = model_cnn(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))
Accuracy of plane : 59 %
Accuracy of   car : 65 %
Accuracy of  bird : 46 %
Accuracy of   cat : 25 %
Accuracy of  deer : 42 %
Accuracy of   dog : 48 %
Accuracy of  frog : 66 %
Accuracy of horse : 62 %
Accuracy of  ship : 59 %
Accuracy of truck : 62 %

This could be better. There are several things that we can do. First, we would run this for more epochs. At least 50 epochs are needed to train it properly. Second, we could add data augmentation. This can be done through a transformation; see this. Third, we have to make the network bigger. Here is a list of large networks trained on CIFAR10. It is possible to reach an accuracy of 95%.

A general advice: When you train neural networks think big. Make them big. Add a lot of data. Train more (but not too much).

Questions#

  • Set the number of epochs for the CNN-based model to 40. How much better accuracy do you get? Make sure you do this before bed and look at it in the morning. It takes a while to train.