In this paper, we can get 0 value by ReLU Activation in the hard shrinkage operation (

The problem about Hard Shrinkage operation about memae-anomaly-detection HOT 8 OPEN

donggong1 commented on September 4, 2024

The problem about Hard Shrinkage operation

from memae-anomaly-detection.

Comments (8)

fluowhy commented on September 4, 2024 1

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

from memae-anomaly-detection.

Zk-soda commented on September 4, 2024

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

from memae-anomaly-detection.

LiUzHiAn commented on September 4, 2024

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

from memae-anomaly-detection.

fluowhy commented on September 4, 2024

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

Sorry I didn't respond at the time. I don't think it is a suitable solution because p is used as a probability distribution (it mightn't be a real distribution though). So add 1 to it could break the aim of p.

from memae-anomaly-detection.

sjp611 commented on September 4, 2024

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

from memae-anomaly-detection.

Wolfybox commented on September 4, 2024

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

nah. Adding ''1' to all zero items will yield an issue at backward when training.

from memae-anomaly-detection.

LiUzHiAn commented on September 4, 2024

Hi,
I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?
Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

from memae-anomaly-detection.

fluowhy commented on September 4, 2024

Hi,
I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?
Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

You should try training without the entropy loss and look if your model learn something without that constraint. If that is the case, you could try a threshold smaller than 1/N. If not, I think it may be some bug in your model or data processing.
It is really difficult to know exactly what your problem is, but I recommend you to check http://karpathy.github.io/2019/04/25/recipe/ . Please do not consider it a recipe but a source of empirical tips and tricks to debug your model.

from memae-anomaly-detection.

The problem about Hard Shrinkage operation about memae-anomaly-detection HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent