dillondaudert / nearestneighbordescent.jl Goto Github PK
View Code? Open in Web Editor NEWEfficient approximate k-nearest neighbors graph construction and search in Julia
License: MIT License
Efficient approximate k-nearest neighbors graph construction and search in Julia
License: MIT License
The accessor methods for HeapKNNGraphEdge
flag
and weight
seem very generic. Should they be exported or not?
Right now, DescentGraph
and search
only allow passing data
and queries
, respectively, as Vector{AbstractVector}
.
This should be expanded to allow passing Matrix
types.
It seems the main use case here is batch processing, i.e. compute the graph once and then use that as a fixed thing for NN queries. Is there some way this can also be used for dynamic updates of the graph after it has been built, as new data pours in?
I see that the interfaces are similar and that you both use the Distances.jl
library for metrics. Is there any relation to this package and do you see any reason in the future to interface with them? Also, benchmarks comparing both on the same hardware would be interesting.
Could you make a new release? Currently, compat with Reexport
version 1 is only on master.
nndescent
and search
have methods to turn matrix data into vector-of-vector format (the internal supported format for this package). There should be an external constructor for KNNGraphs to do the same.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Add support for constructing KNN Graphs on datasets that are too large to fit in memory.
Requested for UMAP.jl here, will likely require adding support in NearestNeighborDescent.
It would be great with a new release, in particular since #29 has been merged to master since the last release.
Figure out an elegant way to handle string datasets.
Hi,
I'm testing the package on simulated data with the following code:
using NearestNeighborDescent
import Distances
test_data = rand(10, 2000)
graph = DescentGraph(test_data, 1, Distances.Euclidean(), max_iters=10000, precision=1e-10, sample_rate=1.0)
nn_per_cell = vec(graph.indices)
real_dists = Distances.pairwise(Distances.Euclidean(), test_data, dims=2);
real_dists[diagind(real_dists)] .= 1e10;
real_nn_per_cell = vec(mapslices(x -> findmin(x)[2], real_dists, dims=1));
nn_real_ids = sum(real_dists .< real_dists[CartesianIndex.(1:length(nn_per_cell), nn_per_cell)], dims=1)
println(mean(real_nn_per_cell .== nn_per_cell))
println(median(nn_real_ids))
It's looking for the 1'st NN first using NearestNeighborDescent
and then by pairwise comparison.
And the output is
0.003
467
Which show that only 0.003 of neighbours are really 1'st. And on average it returns 467'th (!) neighbour out of 2000. Am I doing something wrong, or is there a way to change parameters to increase precision?
Just in case, julia v1.1, Distances v0.8.0, NearestNeighborDescent v0.2.0.
It would be good to eventually add NNDescent.jl to the ANN Benchmarks. Doing this would require something along the lines of:
DescentGraph
and search
Systematize benchmarking the nndescent / search procedures for serial / threaded executions.
Take a look at BenchmarkCI.jl ?
Some simple improvements:
LockHeapKNNGraph
I wonder if it would be possible to implement this interface for the DescentGraph
. It would be useful to have this interoperability with other graph libraries, especially for something like GraphPlot which I am interested in using in conjunction with this library.
Now that DataStructures.jl has min-max heaps, it might be a good idea to re-implement this using them. There is no asymptotic runtime improvement, but in practice they should be faster.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.