I am trying to build a custom model to train in PyTorch, and long story short I need to build a tensor with all the elements set to zero except for a rectangular sub-diagonal block, crucially the optimization process should touch only the elements of this sub-diagonal block, leaving all the zeroes untouched. To do this I defined a custom pytorch network and defined my rectangual block with nn.Parameter
class My_Network(nn.Module):
def __init__(self , vertical_dim , horizontal_dim):
super().__init__()
self.total_dim = vertical_dim + horizontal_dim
self.subdiagonal_block = nn.Parameter(torch.rand(vertical_dim , horizontal_dim))
In this way, correct me if I am wrong, PyTorch should initialize the values of this tensor with random values, and should register them as parameters of the model to be optimized during training. But now I am stuck, I want to tell PyTorch to build a square matrix tensor, with dimension equal to self.total_dim
, that is all zeroes except for the sub-diagonal block, and as I was saying in the computation that I will define in the forward method pytorch should train only the sub-diagonal block.
I can add a zeros tensor as I want it, without setting it as a model parameter, like so (if I am not mistaken):
class My_Network(nn.Module):
def __init__(self , vertical_dim , horizontal_dim):
super().__init__()
self.total_dim = vertical_dim + horizontal_dim
self.subdiagonal_block = nn.Parameter(torch.rand(vertical_dim , horizontal_dim))
self.total_zero_tensor = torch.zeros(self.total_dim, self.total_dim)
But now how can I tell PyTorch to plug my subdiagonal block in the down left corner of this matrix of zeroes? I need to define this matrix for my computations to follow (I will need to perform matrix multiplications), but it is crucial than only the little subdiagonal block be considered as a set of parameters to be trained.
I think you can make a boolean mask tensor that’s a diagonal of 1’s and 0’s with 1’s (or True
s) for the values you want to keep/train, then do something like x = torch.where(mask, x, zeros)
Or, if you have values you don’t want to modify for the non-training parts, put them in a separate constant non-parameter tensor and x = torch.where(mask, x, constant_values)
as long as zeros
or constant_values
are non-parameters, it’ll backprop through torch.where into x but shouldn’t modify the non-parameter part.
You can mask the zero values during the forward pass and that will block the gradient.
Here’s an example assuming the sub-diagonal block is in the upper right corner of the matrix
import torch
import torch.nn as nn
vertical_dim = 2
horizontal_dim = 2
total_dim = vertical_dim + horizontal_dim
# create mask for non-block values
mask = torch.ones(total_dim, total_dim)
mask[vertical_dim:] = 0
mask[:, horizontal_dim:] = 0
# create params using the mask to zero non-block values
params = nn.Parameter(torch.rand(total_dim, total_dim) * mask)
# dummy loss example, applying the mask to params during the forward pass
loss = (params * mask).mean()
loss.backward()
# inspect the grad tensor and see that only the sub-diagonal block values have gradient values
params.grad