Code Monkey home page Code Monkey logo

Comments (6)

lmcinnes avatar lmcinnes commented on May 26, 2024 1

So it was quirky. There was some code added to bail when the tree splitting was not working well and avoid excess depth. Unfortunately that meant that, in rare cases, the size of a leaf could exceed the leaf_size set. This made things not match up when building leaf arrays at the end, because we expected things to match the leaf size. Now we have a max_leaf_size, and expand things in those rare cases. In theory this could blow up terribly for bad data by consuming ungodly amounts of memory, but that's a very rare case indeed, and I'm not sure there is any way to fix it anyway. The best answer in that case is simply to increase the leaf size in the NNDescent params.

from pynndescent.

thegodone avatar thegodone commented on May 26, 2024

I found the problem, I did not pass the distance

from pynndescent.

jamestwebber avatar jamestwebber commented on May 26, 2024

I just got this same error on an x86 machine (n2d-highmem-8 GCP VM) and I'm unclear on what you needed to do to fix this. In any case I think this is a bug, as additional arguments shouldn't be necessary.

edit: Of course, as soon as I comment it starts mysteriously working...was failing consistently before. I wonder if I had some bad version cached or something

from pynndescent.

lmcinnes avatar lmcinnes commented on May 26, 2024

I agree this is odd, and I'll try to keep a lookout for a reproducer.

from pynndescent.

jamestwebber avatar jamestwebber commented on May 26, 2024

I think I have a reproducer but not sure how to share it. It seems completely data specific: I got this error with np.sqrt(X) but not X (and I don't think it's a dtype issue).

from pynndescent.

jamestwebber avatar jamestwebber commented on May 26, 2024

I have a sporadic reproducer with a fairly small array (1.8M on disk, saved as numpy npz). It seems like this problem was introduced in a recent update. My suspicion is that this comes from something at the edges of giving the rows to n_jobs and an uneven split.

pynndescent_bug_np.npz.zip

edit: The above array seems to fail consistently only when passed through sqrt but right now I don't want to figure out why that is 🙃

from pynndescent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.