Skip to content

Pytorch library

The most popular Python ML and deep learning library to implement ML workflow and deep learning solution. It is open-source project. It helps to run code on GPU/TPU. PyTorch is also a low-level math library as NumPy, but built for deep learning. It compiles these compute graphs into highly efficient C++/CUDA code.

The sources for this content is from product documentation, Zero to mastery - learning pytorch, and WashU training website.

Environment setup

Use pip or mini conda for package management and virtual environment management, and jupyter notebooks.

Install

  • Using Python 3 and pip3, use a virtual environment, install torch

    pip3 install torch torchvision torchaudio
    
  • Using Anaconda:

    1. Install miniconda (it is installed in ~/miniconda3):

      # under ~/bin
      curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
      sh Miniconda3-latest-MacOSX-arm64.sh -u
      
    2. Verify installed libraries: conda list

    3. Environments are created under ~/miniconda3/envs. To create a conda environment named "torch", in miniconda3 folder do: conda create anaconda python=3 -n torch
    4. To activate conda environment: conda activate torch
    5. Install pytorch conda install pandas pytorch::pytorch torchvision torchaudio -c pytorch
    6. [optional] Install jupyter packaging: conda install -y jupyter
    7. Register a new runtime env for jupyter: python -m ipykernel install --user --name pytorch --display-name "Python 3.11 (pytorch)"

Run once conda installed

  1. To activate conda environment: conda activate torch
  2. Test my first program: python basic-torch.py
  3. If we need Jupyter: jupyter notebook in the torch env, and then http://localhost:8888/tree.
  4. Select the Kernel to be "Python 3.9 (pytorch)"

My code studies are in pytorch folder.

Concepts

Tensor

Tensor is an important concept for deep learning. It is the numerical representation of data, a n dimension matrix.

Tensors are a specialized data structure, similar to NumPy’s ndarrays, except that tensors can run on GPUs.

matrix1 = torch.tensor([[1, 1, 1], [1, 1, 2]], device=device, dtype=torch.float16)

Tensor attributes describe their shape, datatype, and the device on which they are stored.

import torch, numpy as np
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)

Tensor created from NumPy array:

import torch
X= torch.from_numpy(X).type(torch.float)
y= torch.from_numpy(y).type(torch.float)
X[:5],y[:5]

Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other. See the set of basic operations on tensor.

See the basic ML workflow using Pytorch to work on data and do a linear regression workflow-basic.ipynb.

Constructs

PyTorch has two important modules we can use to create neural network: torch.nn, torch.optim, and two primitives to work with data: torch.utils.data.DataLoader and torch.utils.data.Dataset. Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset. See dataset examples

Modules Description
torch.nn Contains all of the building blocks for computational graphs.
torch.nn. Parameter Stores tensors that can be used with nn.Module. If requires_grad=True gradients descent are calculated automatically.
torch.nn.Module The base class for all neural network modules. Need to subclass it. Requires a forward() method be implemented.
torch.optim various optimization algorithms to tell the model parameters how to best change to improve gradient descent and in turn reduce the loss

GPU

On Linux or Windows with nvidia GPU, we need to use the Cuda (Compute Unified Device Architecture) library. See AWS deep learning container. For Mac, use mps.

Here is sample code to set mps to access GPU (on Mac) for tensor computation:

has_mps = torch.backends.mps.is_built()
device = "mps" if has_mps else "cuda" if torch.cuda.is_available() else "cpu"

NumPy uses only CPU, so we can move to tensor and then tensor.to(device) to move the tensor to GPU, do computation and move back to NumPy

tensor=torch.tensor([1,2,3])
tensor_on_gpu = tensor.to(device)
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()

See Tim Dettmers's guide.

Basic Algebra with Pytorch

See Algebra using Pytorch python code.

Loss functions

Cost / loss functions selection depends on the problem to solve.

Loss function/Optimizer Problem type PyTorch module
Stochastic Gradient Descent (SGD) optimizer Classification, regression, many others. torch.optim.SGD()
Adam Optimizer Classification, regression, many others. torch.optim.Adam()
Binary cross entropy loss Binary classification torch.nn. BCELossWithLogits or torch.nn.BCELoss
Cross entropy loss Multi-class classification torch.nn.CrossEntropyLoss
Mean absolute error (MAE) or L1 Loss Regression torch.nn.L1Loss
Mean squared error (MSE) or L2 Loss Regression torch.nn.MSELoss

The binary cross-entropy / log loss is used to compute how good are the predicted probabilities. The function uses a negative log probability for a label to be one of the expected class: {0,1}, so when a class is not 1 the loss function result is big.

Neural network

A PyTorch neural network declaration is a class that extends nn.Module. The constructor includes the neural network structure, and the class must implement the forward(x) function to pass the input to the network and get the output. The back propagation is done using SGD. This is the most flexible way to declare a NN. As an alternate the following code uses the Sequential method using the non linear (nn.ReLu()) function between layers.

model = nn.Sequential(
    nn.Linear(x.shape[1], 50),
    nn.ReLU(),
    nn.Linear(50, 25),
    nn.ReLU(),
    nn.Linear(25, 1)
).to(device)

Neural network has an input layer with # of parameters equal to the number of input features, and the number of output equal to the number of expected responses (1 output for binary classification). The first layer above, is a linear transformation to the incoming data (x):

\Large corr(x,y)

50 is the number of parameters to the first hidden layer. For activation function between hidden layers, ReLU (max(0,x)) is often used when we want non-linearity. The output layer will not use a transfer function for a regression neural network, or use the logistic function for binary classification (just two classes) or log SoftMax for two or more classes.

The hyper-parameters to tune are:

  • The number of neuron in hidden layer: In general, more hidden neurons means more capability to fit complex problems. But too many, will lead to over fitting. Too few, may lead to under fitting the problem and will sacrifice accuracy.

  • The number of layers: more layers allow the neural network to perform more of its feature engineering and data preprocessing.

  • The activation function between hidden layers and for the output layer.
  • The loss and optimizer functions.
  • The learning rate of the optimization functions
  • Number of epochs to train the model. An epoch as one complete pass over the training set.

For multi class training, LogSoftmax is used as transfer function and CrossEntropyLoss as loss function. With Softmax, the outputs are normalized probabilities that sum up to one.

Some code samples:

{'model_name': 'FashionMNISTModel', 'model_loss': 0.41334256529808044, 'model_acc': tensor(0.8498, device='mps:0')}
{'model_name': 'FashionNISTCNN', 'model_loss': 0.3709910213947296, 'model_acc': tensor(0.8716, device='mps:0')}

Model training

PyTorch training loop

For the training loop, the steps to build:

Step What does it do? Code example
Forward pass The model goes through all of the training data once, performing its forward() function calculations. model(x_train)
Calculate the loss The model's predictions are compared to the ground truth and evaluated to see how wrong they are. loss = loss_fn(y_pred, y_train)
Zero gradients The optimizers gradients are set to zero to be recalculated for the specific training step. optimizer.zero_grad()
Perform back propagation on the loss Computes the gradient of the loss with respect for every model parameter to be updated (each parameter with requires_grad=True) loss.backward()
Update the optimizer (gradient descent) Update the parameters with requires_grad=True with respect to the loss gradients in order to improve them. optimizer.step()

Example of code for training on multiple epochs:

loss_fn=nn.CrossEntropyLoss()
optimizer= torch.optim.SGD(params=model.parameters(), lr=0.1)

for epoch in range(epochs):
    model.train()
    # 1. Forward pass
    y_logits = model(X_train).squeeze()
    # from logits -> prediction probabilities -> prediction labels
    y_pred = torch.softmax(y_logits,dim=1).argmax(dim=1)
    loss = loss_fn(y_logits,y_train)
    acc = accuracy_fn(y_true=y_train, y_pred=y_pred)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

The rules for performing inference with PyTorch models:

model.eval()

# 2. Use the inference mode context manager to make predictions
with torch.inference_mode():
    y_preds = model(X_test)

See train_step function in engine.py

PyTorch testing loop

The typical steps include:

Step Description Code example
Forward pass The model goes through all of the test data model(x_test)
Calculate the loss The model's predictions are compared to the ground truth. loss = loss_fn(y_pred, y_test)
Calculate evaluation metrics Calculate other evaluation metrics such as accuracy on the test set. Custom function
model.eval()
with torch.inference_mode():
    # 1. Forward pass
    test_logits = model(X_test).squeeze() 
    test_pred = torch.softmax(test_logits, dim=1).argmax(dim=1)
    # 2. Caculate loss/accuracy
    test_loss = loss_fn(test_logits,y_test)
    test_acc = accuracy_fn(y_true=y_test, y_pred=test_pred)

See test_step function in engine.py

Improving a model

Model improvement technique What does it do?
Add more layers Each layer potentially increases the learning capabilities of the model with each layer being able to learn some kind of new pattern in the data, more layers is often referred to as making the neural network deeper.
Add more hidden units Similar to the above, more hidden units per layer means a potential increase in learning capabilities of the model, more hidden units is often referred to as making the neural network wider.
Fitting for longer (more epochs) The model might learn more if it had more opportunities to look at the data.
Changing the activation functions Some data just can't be fit with only straight lines, using non-linear activation functions can help.
Change the learning rate Less model specific, the learning rate of the optimizer decides how much a model should change its parameters each step, too much and the model over corrects, too little and it doesn't learn enough.
Change the loss function Different problems require different loss functions.
Use transfer learning Take a pre-trained model from a problem domain similar to ours and adjust it to our own problem.

Evaluate classification models

Classification model can be measured using the at least the following metrics (see more PyTorch metrics):

Metric name/ Evaluation method Definition Code
Accuracy Out of 100 predictions, how many does your model get correct? E.g. 95% accuracy means it gets 95/100 predictions correct. torchmetrics.Accuracy() or sklearn.metrics.accuracy_score()
Precision Proportion of true positives over total number of samples. Higher precision leads to less false positives (model predicts 1 when it should've been 0). torchmetrics.Precision() or sklearn.metrics.precision_score()
Recall Proportion of true positives over total number of true positives and false negatives (model predicts 0 when it should've been 1). Higher recall leads to less false negatives. torchmetrics.Recall() or sklearn.metrics.recall_score()
F1-score Combines precision and recall into one metric. 1 is best, 0 is worst. torchmetrics.F1Score() or sklearn.metrics.f1_score()
Confusion matrix Compares the predicted values with the true values in a tabular way, if 100% correct, all values in the matrix will be top left to bottom right. torchmetrics.classification.ConfusionMatrixhttps://lightning.ai/docs/torchmetrics/stable/classification/confusion_matrix.html or sklearn.metrics.plot_confusion_matrix()
Classification report Collection of some of the main classification metrics such as precision, recall and f1-score. sklearn.metrics.classification_report()

Pytorch datasets

PyTorch includes many existing functions to load in various custom datasets in the TorchVision, TorchText, TorchAudio and TorchRec domain libraries.

See prepare_image_dataset.py to get food images from PyTorch vision.

Data augmentation

Data augmentation is the process of altering the data in such a way that this artificially increases the diversity of the training set.

The purpose of torchvision.transforms is to alter the images in some way and turning them into a tensor, or cropping an image or randomly erasing a portion or randomly rotating them.

Training a model on this artificially altered dataset hopefully results in a model that is capable of better generalization (the patterns it learns are more robust to future unseen examples).

Researches show that random transforms (like transforms.RandAugment() and transforms.TrivialAugmentWide()) generally perform better than hand-picked transforms.

We usually don't perform data augmentation on the test set. The idea of data augmentation is to artificially increase the diversity of the training set to better predict on the testing set.

See also in PyTorch's Illustration of Transforms examples.

Transfer learning for image classification

With transfer learning is to take an already well-performing model on a problem-space similar to the one to address and then to customize it.

For custom data to go into the model, need to be prepared in the same way as the original training data that went into the model.

PyTorch models has weights and we can get the transformers from the weight.

weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
transformer= weights.transforms()
model=torchvision.models.efficientnet_b0(weights=weights).to(device)

efficientnet_b0 comes in three main parts:

  • Features: A collection of convolutional layers and other various activation layers to learn a base representation of vision data.
  • avgpool: Takes the average of the output of the features layer(s) and turns it into a feature vector.
  • classifier: - Turns the feature vector into a vector with the same dimensionality as the number of required output classes (ImageNet has 1000 classes, out_features=1000).

The process of transfer learning usually freezes some base layers of a pre-trained model, typically the features section, and then adjusts the output layers (also called head/classifier layers) to suit our needs.

Some How to

How to set the device dynamically
def getDevice():
    if torch.backends.mps.is_available():
        device = torch.device("mps")
    elif torch.backends.cuda.is_available():
        device = torch.device("cuda")
    else: 
        device = torch.device("cpu")
    return device   
How to save and load a model?
# saving using Pytorch
MODEL_SAVE_PATH = MODEL_PATH / filename
torch.save(model.state_dict(), MODEL_SAVE_PATH)
# Load is reusing the class declaration
model=FashionNISTCNN(input_shape=1,hidden_units=10,output_shape=10)
model.load_state_dict(torch.load("models/fashion_cnn_model.pth"))
Display the confusion matrix for a multiclass prediction
def make_confusion_matrix(pred_tensor, test_labels, class_names):
    # Present a confustion matrix between the predicted labels and the true labels from test data
    cm = MulticlassConfusionMatrix(num_classes=len(class_names))
    cm.update(pred_tensor, test_labels)
    fig,ax = cm.plot(labels=class_names)
    plt.show()
Transform an image into a Tensor

Use torchvision.transforms module

train_transformer=v2.Compose([v2.Resize((224,224)), v2.TrivialAugmentWide(num_magnitude_bins=31), v2.ToTensor()])

How to get visibility into a neural network
import torchinfo
torchinfo.summary()

Code samples

Resources