Comments (6)
So it was quirky. There was some code added to bail when the tree splitting was not working well and avoid excess depth. Unfortunately that meant that, in rare cases, the size of a leaf could exceed the leaf_size set. This made things not match up when building leaf arrays at the end, because we expected things to match the leaf size. Now we have a max_leaf_size, and expand things in those rare cases. In theory this could blow up terribly for bad data by consuming ungodly amounts of memory, but that's a very rare case indeed, and I'm not sure there is any way to fix it anyway. The best answer in that case is simply to increase the leaf size in the NNDescent params.
from pynndescent.
I found the problem, I did not pass the distance
from pynndescent.
I just got this same error on an x86 machine (n2d-highmem-8
GCP VM) and I'm unclear on what you needed to do to fix this. In any case I think this is a bug, as additional arguments shouldn't be necessary.
edit: Of course, as soon as I comment it starts mysteriously working...was failing consistently before. I wonder if I had some bad version cached or something
from pynndescent.
I agree this is odd, and I'll try to keep a lookout for a reproducer.
from pynndescent.
I think I have a reproducer but not sure how to share it. It seems completely data specific: I got this error with np.sqrt(X)
but not X
(and I don't think it's a dtype issue).
from pynndescent.
I have a sporadic reproducer with a fairly small array (1.8M on disk, saved as numpy
npz). It seems like this problem was introduced in a recent update. My suspicion is that this comes from something at the edges of giving the rows to n_jobs
and an uneven split.
edit: The above array seems to fail consistently only when passed through sqrt
but right now I don't want to figure out why that is 🙃
from pynndescent.
Related Issues (20)
- `make_dense_tree()` with `angular=True` can segfault on poorly-behaved datasets HOT 1
- access distances in heap HOT 3
- Sample identifiers for semantic search HOT 2
- uint8 as internal data HOT 1
- Cosine metric - error "Negative values in data passed to precomputed distance matrix" HOT 2
- Question about covariance matrix used when using Mahalanobis distance
- Tests fail: E SystemError: initialization of _internal failed without raising an exception
- Newest version breaks with UMAP HOT 3
- Exceedingly large amount of memory usage
- Very high memory usage HOT 7
- Specifying threshold in distance metrics
- API to save and load index from disk
- Minor bias in split selection? HOT 1
- true_angular is not a distance?
- TSSS missing a factor of 2
- Reverse diversification is actually forward diversification again
- pynndescent might break with next numba release HOT 3
- How to navigate pyinstaller HOT 1
- Querying the training set: runtime tradeoff for large k HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pynndescent.