The correct way to use CTCLoss in pytorch?

I have been training a conv-lstm network, the conv net takes in an input of (batch, 1, 75, 46, 146) and outputs a tensor of shape(batch , 10) which is then fed into the lstm network.

The model, however, doesn’t seem to learn anything, I think I’ve given the wrong inputs to the ctc loss I am using(it seems to work slightly better with crossentropy loss).

Here the sentence is the output of the model of shape ( batch, 35, 40)
where 35 is the length of every sentence and 40 is the number of classes.

sentence = torch.reshape(sentence , (35 , sentence.shape[0] , 40))
input_lengths = torch.full(size=(x.shape[0],), fill_value=35, dtype= torch.int)
loss = criterion(sentence , y, input_lengths  , input_lengths)

where criterion is defined as nn.CTCLoss()
but the model doesn’t seem to learn anything and gives the same gibberish prediction every epoch.

What is wrong here am I not using the ctcloss correctly or something??

Leave a Comment