How to define PyTorch tensor that is only partially trainable

I am trying to build a custom model to train in PyTorch, and long story short I need to build a tensor with all the elements set to zero except for a rectangular sub-diagonal block, crucially the optimization process should touch only the elements of this sub-diagonal block, leaving all the zeroes untouched. To do this I defined a custom pytorch network and defined my rectangual block with nn.Parameter

class My_Network(nn.Module):
    def __init__(self , vertical_dim , horizontal_dim):
        super().__init__()
        self.total_dim = vertical_dim + horizontal_dim
        self.subdiagonal_block = nn.Parameter(torch.rand(vertical_dim , horizontal_dim))

In this way, correct me if I am wrong, PyTorch should initialize the values of this tensor with random values, and should register them as parameters of the model to be optimized during training. But now I am stuck, I want to tell PyTorch to build a square matrix tensor, with dimension equal to self.total_dim, that is all zeroes except for the sub-diagonal block, and as I was saying in the computation that I will define in the forward method pytorch should train only the sub-diagonal block.

I can add a zeros tensor as I want it, without setting it as a model parameter, like so (if I am not mistaken):

class My_Network(nn.Module):
    def __init__(self , vertical_dim , horizontal_dim):
        super().__init__()
        self.total_dim = vertical_dim + horizontal_dim
        self.subdiagonal_block = nn.Parameter(torch.rand(vertical_dim , horizontal_dim))
        self.total_zero_tensor = torch.zeros(self.total_dim, self.total_dim)

But now how can I tell PyTorch to plug my subdiagonal block in the down left corner of this matrix of zeroes? I need to define this matrix for my computations to follow (I will need to perform matrix multiplications), but it is crucial than only the little subdiagonal block be considered as a set of parameters to be trained.

I think you can make a boolean mask tensor that’s a diagonal of 1’s and 0’s with 1’s (or Trues) for the values you want to keep/train, then do something like x = torch.where(mask, x, zeros)

Or, if you have values you don’t want to modify for the non-training parts, put them in a separate constant non-parameter tensor and x = torch.where(mask, x, constant_values)

as long as zeros or constant_values are non-parameters, it’ll backprop through torch.where into x but shouldn’t modify the non-parameter part.

You can mask the zero values during the forward pass and that will block the gradient.

Here’s an example assuming the sub-diagonal block is in the upper right corner of the matrix

import torch
import torch.nn as nn

vertical_dim = 2
horizontal_dim = 2
total_dim = vertical_dim + horizontal_dim

# create mask for non-block values
mask = torch.ones(total_dim, total_dim)
mask[vertical_dim:] = 0
mask[:, horizontal_dim:] = 0

# create params using the mask to zero non-block values
params = nn.Parameter(torch.rand(total_dim, total_dim) * mask)

# dummy loss example, applying the mask to params during the forward pass
loss = (params * mask).mean()

loss.backward()

# inspect the grad tensor and see that only the sub-diagonal block values have gradient values
params.grad

Leave a Comment