Hi, I get an error when I tried training with dice coefficient as th

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error when training with Dice Coefficient about tf_unet HOT 9 OPEN

jakeret commented on July 17, 2024

Error when training with Dice Coefficient

from tf_unet.

Comments (9)

bslin commented on July 17, 2024 4

Quick update. Found the issue. There is a bug in layers.py:
In pixel_wise_softmax_2 and pixel_wise_softmax

If the output_map is too large, then exponential_map goes to infinity, which causes nan when calculating the cost function.

The following code fixes it, although we might want to find a better value to do the clipping:
replace:
exponential_map = tf.exp(output_map)
with:
exponential_map = tf.exp(tf.clip_by_value(output_map, -np.inf, 50))

BTW thanks for providing the tf_unet code. It has been very helpful! :)

from tf_unet.

bslin commented on July 17, 2024

The error gets hit only after some number of iterations. It seems to get hit after fewer iterations I use the adam optimizer rather than the momentum, but that might just be for my specific case. After enough iterations, I get this error regardless of the optimizer I use. The same training/testing data works fine if I use cross entropy as the cost function.

from tf_unet.

jakeret commented on July 17, 2024

Thanks for reporting this. I'm just wondering why the output_map gets so large

from tf_unet.

bslin commented on July 17, 2024

Yeah I'm wondering the same thing. I just noticed that I still get garbage results when training my data. (with cross entropy I was getting something more reasonable).

I have no idea why the output_map gets so large, I plan on looking into it some more a little later. Would you happen to have any ideas or theories to look into?

from tf_unet.

mateuszbuda commented on July 17, 2024

I have also encountered this issue.
Using smaller learning rate helped.
So maybe it's just an exploding gradient.

from tf_unet.

bslin commented on July 17, 2024

Maybe. Another thing I noticed was that to calculate the dice-coefficient, the original code is using both the channels together. When I use only one of the channels, the values I end up getting worked up to be better.

from tf_unet.

weiliu620 commented on July 17, 2024

This is a typical issue of overflow/underflow when computing the sum (exp (x)) function. Search 'log sum exp' on the web will give some explanation. The trick is to divide/multiply the same constant before exp function.

Or you can use tf.reduce_logsumexp or refer to source code of this function.

from tf_unet.

jakeret commented on July 17, 2024

@weiliu620 thanks for the hint. I'm going to look into this

from tf_unet.

jakeret commented on July 17, 2024

@weiliu620 following the lines from here refered in your SO question we would just have to subtract the result of tf.reduce_max in the tf.exp call, right?

from tf_unet.

Recommend Projects

Error when training with Dice Coefficient about tf_unet HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent