Code Monkey home page Code Monkey logo

Comments (14)

paulbrodersen avatar paulbrodersen commented on June 14, 2024

I am skeptical that this issue has anything to do with my code. Per this list, the error code is a notification on Windows systems of a stackoverflow. get_h is not computing the entropy recursively, so I can't see how it would cause the stackoverflow. Can you provide a minimal, reproducible example, that produces the error in a clean virtual environment (i.e. only containing the dependencies of this module)?

Also, if you google around, a lot of people seem to be running into this error code when using pycharm, especially in combination with Qt and/or tensorflow (or packages built on top of tensorflow such as keras). Are you using any of these programs/packages?

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 14, 2024

To be clear, I am not ruling out completely that my code is at fault, I am just saying that I need a lot more evidence to convince me.

from entropy_estimators.

zhangyue233 avatar zhangyue233 commented on June 14, 2024

yes, i figured out why fun get_h does't work well, "sum_log_dist = np.sum(log(2*distances))"is a line in function get_h, in my data, some samples have identity values, which leads to some elements in distances become zeros, the the sum_log_dist gets a value of '-inf', then, the following codes with run into error.

from entropy_estimators.

zhangyue233 avatar zhangyue233 commented on June 14, 2024

by far ,i have no idea how to solve this situation, i decide to read up the original paper, could you give me some advice? Thanks!

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 14, 2024

get_h has a min_dist parameter, which when set to non-zero should circumvent your issue (distances between points smaller than min_dist are capped to min_dist such that points with the same coordinates are forced to have non-zero distances to each other). A principled choice for min_dist is half of your measurement precision, typically the minimum non-zero nearest-neighbour distance in your dataset.

from entropy_estimators.

zhangyue233 avatar zhangyue233 commented on June 14, 2024

thanks for your guidance. while I test get_h and get_h_mvn on my data for feature selection, i found get_h_mvn works ideally, the calculated entropy values consist with the intuitive observation of feature data. Especially, one feature in fact is discrete, the entropy calculated from get_h_mvn is close to the entropy from standard information entropy equation for discrete variable. However, the get_h performs awfully, firstly, the entropy values calculated are counterintuitive and i also ranked the features based on the entropy, the rank from get_h and get_h_mvn have great difference, second, one feature which values are composed of {0.0: 7950, 0.0003636: 1, 0.0263157: 1}, while runing for this feature, get_h_mvn is stuck at this line" kdtree = cKDTree(x)", the python stop and print "Process finished with exit code -1073741571 (0xC00000FD)"

from entropy_estimators.

zhangyue233 avatar zhangyue233 commented on June 14, 2024

I'm working on feature selection for my project, in my situation, the features are used for clustering and have little labeled data. I have tried information entropy, Laplacian score as feature filter, do you have experience on feature selection for this scenario?

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 14, 2024

entropy calculated from get_h_mvn is close to the entropy from standard information entropy equation for discrete variable

That could be entirely accidental.

while runing for this feature, get_h_mvn is stuck at this line" kdtree = cKDTree(x)", the python stop and print "Process finished with exit code -1073741571 (0xC00000FD)

There is no call to cKDTree in get_h_mvn. It uses the (co-)variance to compute the entropy under the assumption that the samples are drawn from a multivariante normal distribution. Are you sure you are calling get_h_mvn?

from entropy_estimators.

zhangyue233 avatar zhangyue233 commented on June 14, 2024

oh,no ,the get_h_mvn works well ,the get_h is stuck at cKDTree

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 14, 2024

Ok. How many samples are in your dataset and what values are you using for k and min_dist?

from entropy_estimators.

zhangyue233 avatar zhangyue233 commented on June 14, 2024

I used the default k and selected min_dist as your advice, that is, the minimum nonzero distance is selected as min_dist. In fact, these have nothing to do with aforementioned error. you can run the following code with python 3.6, the error will reappear.
from scipy.spatial import cKDTree
x=[[i] for i in [0]*7950+[ 0.0003636,0.0263157]]
tree=cKDTree(np.array(x))

Process finished with exit code -1073741571 (0xC00000FD)

from entropy_estimators.

zhangyue233 avatar zhangyue233 commented on June 14, 2024

If there are continuous features and discrete features in my feature set, i need to rank their information entropies for feature filtering. Is it justify to treat all features as continuous variables and evaluate the entropy using get_h_mvn() function? Or, only continuous features' entropies are computed by get_h_mvn() while the discrete ones are calutated by shennon entropy equation , then rank all the entropies? looking forward for your guidance , thanks!

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 14, 2024

I can't reproduce your error. If I were you, I would investigate your setup and eventually file a bug report on scipy.

In [1]: %paste
from scipy.spatial import cKDTree
x=[[i] for i in [0]*7950+[ 0.0003636,0.0263157]]
tree=cKDTree(np.array(x))
## -- End pasted text --
In [2]: tree
Out[2]: <scipy.spatial.ckdtree.cKDTree at 0x7f427495f4a8>

Entropy is an extensive property. So no, I don't think that you can compare entropy values for discrete variables with the entropy values for continuous variables. Even within your continuous features such a comparison may be nonsensical.

from entropy_estimators.

paulbrodersen avatar paulbrodersen commented on June 14, 2024

Since I haven't heard from you for a week, I will close this issue for now. Feel free to re-open if necessary.

from entropy_estimators.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.