Suggestions to eliminate vanishing gradients in UNET ❤️ {UPDATED 2023}

I defined a (nested) UNET to predict mask images basing on the article: https://arxiv.org/abs/1807.10165

I have 4 outputs at 4 different layers, so that i can compare result for 4 layers. I monitor the gradient using tensor board, found that the gradients is very small (i average the gradients generated from 4 losses).

Is there suggestion for me to finetune the model to eliminate vanishing gradient?

gradient for last output layer

gradient for layer decode-5

gradient for layer encode-3

what i’ve tried so far:

Tried Xavier kernel initiator
Tried in Encoder and Decode use ResNet module. (connects input and output of last layer in each block)
Deep supervised learning: i output for 4 layers to compute loss/gradient and apply to optimizer (in sequence)

Leave a Comment Cancel reply