This article explores advanced training techniques in PyTorch, including transfer learning, learning rate schedulers, and model sharing with step-by-step examples.
In this article, we dive into advanced training techniques in PyTorch. We explore methods such as transfer learning, creating learning rate schedulers, and sharing models with the PyTorch community. With step-by-step examples, you’ll learn how to modify pre-trained models, fine-tune parameters, and efficiently manage training routines.Below is an outline of the topics covered:• Using the TorchVision library to load pre-trained models
• Modifying model output layers for custom tasks
• Utilizing the PyTorch Hub to list, download, and share models
• Creating and using learning rate schedulers
• Freezing model layers and fine-tuning only the final layer──────────────────────────────
Loading and Using Pre-trained Models with TorchVision
PyTorch offers a wide array of pre-trained models through the TorchVision library. You can leverage these models directly for fine-tuning or use them as feature extractors. The examples below demonstrate how to list available models, load the VGG19 model, and implement both the legacy pretrained=True approach and the modern API using the weights argument.
Copy
Ask AI
# Import PyTorch vision modelsfrom torchvision import models# List available models and print the countprint(models.list_models())print(f"Number available: {len(models.list_models())}")# Load a pre-trained VGG19 model (old API)model = models.vgg19(pretrained=True)# Load with weights argument (new API)model = models.vgg19(weights=models.VGG19_Weights.DEFAULT)# Display the model parametersprint(model.state_dict())
As demonstrated, the models module downloads the required weights to your local directory. The updated API offers the flexibility of choosing specific pre-trained weights for your models.──────────────────────────────
Pre-trained models are typically configured for 1,000 classes. When working on a custom task with a different number of classes, you must adjust the output layer accordingly. The following example shows how to modify the final classifier layer of the VGG19 model to output 20 classes:
Copy
Ask AI
# Display the classifier layersprint(model.classifier)import torch.nn as nn# Modify the output layer (for 20 classes)model.classifier[6] = nn.Linear(4096, 20)# Verify the updated classifierprint(model.classifier)
Modifying the output layer is crucial for adapting a pre-trained model to new tasks and datasets.
The PyTorch Hub provides a seamless way to list, download, and share models from GitHub repositories. It also supports loading pre-trained weights effortlessly. For instance, the following snippet demonstrates how to list available models from the PyTorch Vision repository on GitHub:
Copy
Ask AI
from torch import hub# List models from a specific version of torchvision (v0.10.0)hub.list('pytorch/vision:v0.10.0')
Similarly, to explore models like YOLOv5 from Ultralytics:
Copy
Ask AI
# List available YOLOv5 modelshub.list('ultralytics/yolov5')
To load a model from the hub, you can use the code below:
Copy
Ask AI
from torch import hub# Load the YOLOv5s modelmodel = hub.load('ultralytics/yolov5', 'yolov5s')# Provide a batch of images for inferenceimgs = ['cat1.jpg', 'zidane.jpg']results = model(imgs)# Print the inference resultsresults.print()
Additionally, here’s how to load a ResNet-50 model with pre-trained weights from the PyTorch Vision GitHub repository:
Copy
Ask AI
# Load model weights and the model using the hubweights = hub.load("pytorch/vision", "get_model_weights", name="resnet50")model = hub.load("pytorch/vision", "resnet50", weights=weights.DEFAULT)
You can easily share your custom models using a GitHub repository by including a hub configuration file (typically named hub-conf.py). This file defines dependencies, model URLs, and entry points. Below is an excerpt from a typical hub-conf.py that specifies a simple neural network model called FakeNet:
Copy
Ask AI
# Define dependenciesdependencies = ['torch']import torchimport torch.nn as nnimport torch.nn.functional as Ffrom torch.hub import load_state_dict_from_url# Dictionary mapping model names to URLs containing the state dictmodels_url = { 'fake_model': 'https://github.com/kodekloudhub/PyTorch/raw/refs/heads/main/section_3/demos/030-105-additional-training-methods/model_state_dict'}# Define the model classclass FakeNet(nn.Module): def __init__(self): super(FakeNet, self).__init__() self.fc1 = nn.Linear(10, 50) self.batch_norm = nn.BatchNorm1d(50) self.fc2 = nn.Linear(50, 1) def forward(self, x): x = F.relu(self.fc1(x)) x = self.batch_norm(x) x = self.fc2(x) return x# Entry point for the fake_modeldef fake_model(pretrained=False, **kwargs): """ FakeNet model pretrained (bool): Load pretrained weights if True. """ model = FakeNet(**kwargs) if pretrained: model.load_state_dict(load_state_dict_from_url(models_url['fake_model'], progress=True)) return model
Once hosted on GitHub, users can load your model with the following code:
Copy
Ask AI
import torch# Load the fake_model with pretrained parametersmodel = torch.hub.load('kodekloudhub/PyTorch', 'fake_model', pretrained=True)print(model.state_dict())# List available models in the repo and get help about the fake_modelprint(torch.hub.list('kodekloudhub/PyTorch'))torch.hub.help('kodekloudhub/PyTorch', 'fake_model')
To load the model without pre-trained weights and modify it for a custom task:
Copy
Ask AI
import torchmodel = torch.hub.load('kodekloudhub/PyTorch', 'fake_model', pretrained=False)print(model)# Modify the network output to have two outputs instead of oneimport torch.nn as nnmodel.fc2 = nn.Linear(50, 2)print(model)
Learning rate schedulers are pivotal in adjusting the learning rate during training, ensuring efficient convergence and preventing overshooting. PyTorch optimizers support several schedulers. Here are some examples:
Copy
Ask AI
import torch.optim as optim# Define an optimizer for the model parametersoptimizer = optim.SGD(model.parameters(), lr=0.01)# Create a StepLR scheduler: reduce the learning rate by 0.1 every 5 epochsscheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)# Create an ExponentialLR scheduler: decay the learning rate by multiplying with 0.1 every epochscheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.1, last_epoch=-1)# Create a ReduceLROnPlateau scheduler: reduce LR by 0.1 if no improvement for 2 epochsscheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=2)# Print the scheduler stateprint(scheduler.state_dict())
Each scheduler has a specific use case. For example, StepLR decreases the learning rate at fixed intervals, while ReduceLROnPlateau monitors a validation metric and adjusts the learning rate when performance stalls.──────────────────────────────
Transfer learning allows you to leverage pre-trained models and fine-tune just the final layers for a specific task. This section illustrates how to adapt the VGG19 model for a 10-class problem using the CIFAR10 dataset. We begin by preparing the dataset and updating the model’s output layer.
Using image transformations and the CIFAR10 dataset (which includes classes like plane, car, bird, cat, deer, dog, frog, horse, ship, and truck), we update the classifier for our custom task:
Copy
Ask AI
import torchimport torchvision.transforms.v2 as v2from torchvision import datasetstransform = v2.Compose([ v2.RandomHorizontalFlip(), v2.ToImage(), v2.ToDtype(torch.float32, scale=True), v2.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) # Important normalization])# CIFAR10 datasettrainset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=1)import torch.nn as nnfrom torchvision import models# Load VGG19 with default weightsmodel = models.vgg19(weights=models.VGG19_Weights.DEFAULT)# Update the final classifier layer for 10 classesmodel.classifier[-1] = nn.Linear(4096, 10)
For effective feature extraction, freeze all layers except the final classifier. This ensures that during training, only the last layer’s parameters are updated.
Copy
Ask AI
# Freeze all layersfor param in model.parameters(): param.requires_grad = False# Unfreeze only the final classifier layerfor param in model.classifier[-1].parameters(): param.requires_grad = True# Verify which layers are trainablefor name, param in model.named_parameters(): print(f"Layer: {name}, requires_grad: {param.requires_grad}")
Freezing layers prevents the alteration of pre-trained features, speeding up training when adapting models to new tasks.
The training loop below defines a loss function, optimizer (limited to the final classifier), and a learning rate scheduler. It also saves checkpoints periodically, including the state of the optimizer and scheduler.
Copy
Ask AI
import torch.optim as optim# Define loss, optimizer, and schedulercriterion = nn.CrossEntropyLoss()optimizer = optim.SGD(model.classifier[-1].parameters(), lr=0.001, momentum=0.9)scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)N_EPOCHS = 10for epoch in range(N_EPOCHS): running_loss = 0.0 for i, data in enumerate(trainloader, 0): inputs, labels = data # Zero the gradients optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() # Print the average loss for this epoch print(f"Epoch: {epoch} Loss: {running_loss / len(trainloader):.4f}") # Step the scheduler at the end of the epoch scheduler.step() # Save a checkpoint every 2 epochs if epoch % 2 == 0: torch.save({ 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'scheduler_state_dict': scheduler.state_dict(), 'loss': loss, }, f'training_checkpoint_{epoch}.tar')# Save the final checkpoint after the last epochtorch.save({ 'epoch': N_EPOCHS, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'scheduler_state_dict': scheduler.state_dict(), 'loss': loss,}, 'training_checkpoint_final.tar')
This training approach demonstrates effective feature extraction by fine-tuning only the final layer over 10 epochs while preserving the state of training through periodic checkpointing.──────────────────────────────
In this article, we explored several advanced training techniques in PyTorch. We covered the process of loading and modifying pre-trained models, using PyTorch Hub for model sharing, setting up various learning rate schedulers, and implementing transfer learning by fine-tuning the model’s final layer. These methodologies not only boost model performance but also streamline the training process. Happy coding!