My Journey into Teaching Deep Learning: Training a Multi-Layer Perceptron in PyTorch
As I stood in front of a classroom filled with eager MSc students, I couldn’t help but feel a wave of nostalgia wash over me. Just a few years ago, I was in their shoes, grappling with the complexities of deep learning and neural networks. Today, I had the privilege of teaching them how to train their very first Multi-Layer Perceptron (MLP) using PyTorch. The experience was both exhilarating and humbling, as their questions and insights reminded me of my own struggles as a beginner in this fascinating field.
In this blog post, I want to share my journey and the key lessons I learned while guiding these students through their first steps in deep learning. If you’re familiar with NumPy or TensorFlow and are looking to deepen your understanding of deep learning through hands-on coding, you’re in the right place. Let’s dive in!
Getting Started: The Basics of PyTorch
Before we could train our MLP, we needed to set up our environment and import the necessary libraries. Here’s a quick overview of the essential imports:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
The torch.nn
package contains all the layers required to build our neural network. Each layer needs to be instantiated before being called in our model. We typically define our trainable components within a class that inherits from torch.nn.Module
. This structure allows us to create flexible and modular neural networks.
The Importance of Using a GPU
One of the first hurdles I encountered with my students was their reluctance to use a GPU. Many of them were working with small toy datasets and didn’t see the need for the added complexity. I encouraged them to think about scalability—training larger models on bigger datasets is where the GPU truly shines.
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print('device:', device)
Using a GPU significantly speeds up the training process. However, I emphasized the importance of having the option to switch to CPU execution for debugging purposes. GPU-related errors can be tricky, and running code on the CPU often provides clearer error messages.
Preparing the Data: Image Transforms and Datasets
For our MLP, we decided to use the CIFAR10 dataset, which consists of 50,000 training images and 10,000 test images. To prepare the data, we needed to apply some transformations:
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])
Normalizing the input data is crucial for training stability. By scaling the pixel values to a range of [-1, 1], we ensure that our model can learn more effectively. This normalization helps the gradients remain stable during training, making optimization easier.
Understanding the CIFAR10 Dataset
We loaded the CIFAR10 dataset using PyTorch’s built-in functionality:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
valset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
It’s essential to understand the structure of the dataset and the labels associated with each image. The CIFAR10 dataset contains ten classes, including airplanes, automobiles, birds, and more. Each class is represented by a unique label, which we later convert to one-hot encoding for training.
Building the DataLoader
To efficiently load our data in batches, we used the DataLoader class:
train_loader = torch.utils.data.DataLoader(trainset, batch_size=256, shuffle=True)
val_loader = torch.utils.data.DataLoader(valset, batch_size=256, shuffle=False)
Shuffling the training data is critical to ensure that our model learns effectively. It helps prevent overfitting and ensures that each batch is representative of the overall dataset.
Designing the Multi-Layer Perceptron (MLP)
With our data prepared, it was time to build our MLP. Here’s a simple implementation:
class MLP(nn.Module):
def __init__(self, in_channels, num_classes, hidden_sizes=[64]):
super(MLP, self).__init__()
assert len(hidden_sizes) >= 1, "Specify at least one hidden layer"
layers = nn.ModuleList()
layer_sizes = [in_channels] + hidden_sizes
for dim_in, dim_out in zip(layer_sizes[:-1], layer_sizes[1:]):
layers.append(nn.Linear(dim_in, dim_out))
layers.append(nn.ReLU())
self.layers = nn.Sequential(*layers)
self.out_layer = nn.Linear(hidden_sizes[-1], num_classes)
def forward(self, x):
out = x.view(x.shape[0], -1)
out = self.layers(out)
out = self.out_layer(out)
return out
This implementation allows for flexible architecture, enabling us to specify the number of hidden layers and their sizes. The forward
method defines how data flows through the network.
Training and Validation Loops
Training a neural network involves iterating over the dataset multiple times, adjusting the weights based on the loss calculated from the predictions. Here’s a simplified version of our training loop:
def train_one_epoch(model, optimizer, train_loader, device):
model.train()
criterion = nn.CrossEntropyLoss()
loss_step = []
correct, total = 0, 0
for (inp_data, labels) in train_loader:
labels = labels.view(labels.shape[0]).to(device)
inp_data = inp_data.to(device)
outputs = model(inp_data)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
with torch.no_grad():
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum()
loss_step.append(loss.item())
loss_curr_epoch = np.mean(loss_step)
train_acc = (100 * correct / total).cpu()
return loss_curr_epoch, train_acc
The validation loop is similar but does not involve updating the model’s weights. Instead, it assesses the model’s performance on unseen data.
Putting It All Together
After defining our model, training, and validation loops, we were ready to train our MLP on the CIFAR10 dataset. With a learning rate of 0.001 and a momentum of 0.9, we trained the model for 50 epochs.
model = MLP(in_channels, num_classes, hidden_sizes).to(device)
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum, weight_decay=wd)
dict_log = train(model, optimizer, epochs, train_loader, val_loader, device)
By the end of our training, we achieved a validation accuracy of over 53%, a significant improvement over random guessing.
Key Design Choices and Best Practices
Throughout the process, I emphasized several key design choices:
- Batch Size: Use a batch size that is a multiple of 32 for optimal GPU utilization.
- IID Assumption: Ensure that training data is shuffled to maintain the independent and identically distributed assumption.
- Start Simple: Begin with a small model and gradually increase complexity as needed.
- Regularization: Only add regularization techniques after identifying overfitting.
Conclusion: Where to Go Next
Reflecting on my experience teaching this class, I realized that the journey of learning deep learning is ongoing. While our MLP performed reasonably well, there is still much to explore and improve. I encourage you to implement your models on new datasets, experiment with different architectures, and share your results.
For those looking to deepen their understanding further, consider exploring additional resources or taking structured courses. The entire code for this tutorial is available on GitHub.
As you embark on your own deep learning journey, remember that every expert was once a beginner. Embrace the challenges, ask questions, and most importantly, enjoy the process of learning!