I defined a (nested) UNET to predict mask images basing on the article: https://arxiv.org/abs/1807.10165
I have 4 outputs at 4 different layers, so that i can compare result for 4 layers. I monitor the gradient using tensor board, found that the gradients is very small (i average the gradients generated from 4 losses).
Is there suggestion for me to finetune the model to eliminate vanishing gradient?
gradient for last output layer
what i’ve tried so far:
- Tried Xavier kernel initiator
- Tried in Encoder and Decode use ResNet module. (connects input and output of last layer in each block)
- Deep supervised learning: i output for 4 layers to compute loss/gradient and apply to optimizer (in sequence)