Code Monkey home page Code Monkey logo

Comments (8)

fluowhy avatar fluowhy commented on September 4, 2024 1

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

from memae-anomaly-detection.

Zk-soda avatar Zk-soda commented on September 4, 2024

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

from memae-anomaly-detection.

LiUzHiAn avatar LiUzHiAn commented on September 4, 2024

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

from memae-anomaly-detection.

fluowhy avatar fluowhy commented on September 4, 2024

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

Sorry I didn't respond at the time. I don't think it is a suitable solution because p is used as a probability distribution (it mightn't be a real distribution though). So add 1 to it could break the aim of p.

from memae-anomaly-detection.

sjp611 avatar sjp611 commented on September 4, 2024

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

from memae-anomaly-detection.

Wolfybox avatar Wolfybox commented on September 4, 2024

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

nah. Adding ''1' to all zero items will yield an issue at backward when training.

from memae-anomaly-detection.

LiUzHiAn avatar LiUzHiAn commented on September 4, 2024

Hi,
I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?
Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

from memae-anomaly-detection.

fluowhy avatar fluowhy commented on September 4, 2024

Hi,
I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?
Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

You should try training without the entropy loss and look if your model learn something without that constraint. If that is the case, you could try a threshold smaller than 1/N. If not, I think it may be some bug in your model or data processing.
It is really difficult to know exactly what your problem is, but I recommend you to check http://karpathy.github.io/2019/04/25/recipe/ . Please do not consider it a recipe but a source of empirical tips and tricks to debug your model.

from memae-anomaly-detection.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.