cgtuebingen / ggnn Goto Github PK

GGNN: State of the Art Graph-based GPU Nearest Neighbor Search

Home Page: https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/computergrafik/lehrstuhl/veroeffentlichungen/ggnn-graph-based-gpu-nearest-neighbor-search/

License: MIT License

CMake 1.90% Shell 0.07% C++ 4.10% Dockerfile 0.15% Makefile 0.02% Cuda 93.75%

cuda ann nearest-neighbor-search gpu approximate-nearest-neighbor-search vector-database vector-db

ggnn's People

Contributors

Stargazers

Watchers

Forkers

melong007 wzb1005 shuixianhua jingmouren shengwenliang xuqinhaow benjaminxiang tinycodervs avanindra nonlinear1 presburger cjnolet aavbsouza snu-arc xiaosu-zhu barnes88 hanano-yuuki chriszhao13

ggnn's Issues

datasets dynamic update(add, delete, update)

Ggnn does not support dynamic updating of datasets?

Other Datasets

Hello,

I was wondering if you could provide a link to the other datasets used in the paper (specifically GloVe, NYTimes, deep1B). They don't seem to exist in the texmex corpus. Are there scripts you used to preprocess the data from other sources?

How to get the high-quality knn graph without refinement as mentioned in the paper?

Hello! Thanks for your excellent work. I'm trying your demo. I am confused that I can't get the high-quality knn graph without refinement as mentioned in the paper. For SIFT1M, I can only get C@10 of 0.5022 when KBuild=62 (Any larger KBuild will lead to an illegal memory error).

It is mentioned in the paper that your algorithm achieves C@10 of 0.987 without refinement. Please kindly tell me that how do I get that result. Besides, I used the getGraph function to export the graph, please let me know if I get something wrong. Thank you again!

How to you calculate the recall rate?

Hi,

When I dig into your implementation, I do not understand how the recall rate is computed?

What do these two variables rKQuery_including_duplicates and cKQuery_including_duplicates mean? Which one should be the recall rate?

Save the search results like groundtruth_ivecs

I am not familiar with cuda programming.
How can I minimally modify the code to save the search results like groundtruth_ivecs?
Any help would be greatly appreciated!
Thanks!

How to get the latest paper?

Hi guys:

I've read the v3 of GGNN on Arxiv: https://arxiv.org/pdf/1912.01059v3.pdf, but I didn't find the comparison with the method: SONG.

I want to know how to get the latest version of GGNN paper?

Kang

link problem for glog

Hi,

I'm trying to run your code and follow your command in readme, I got link error regarding 3rdparty glog library shown below,

What's the reason of it? How can I resolve it?

Have any API doc with this project?

I want to use python or golang to run test on it, how can i find the API in this project

when the code will be available?

looking forward to it.

Questions

Hello, I am very interested in your wonderful work! After reading your paper, I have some questions to ask.

You put emphasize on the construction time reduction, but how is that achieved, you did not clarify it, it is more like a partition-and-merge way, partition is fast because the subgraphs are small, and when we merge the subgraphs, I think in HNSW, each node has to perform a query to link the neighbors, but here you sample some vertices as the top layer to do this, is this the main reason for reducing the construction time?
How to add symmetric links after we have a KNN-graph? I know the rule in paper section4.3 that how x and z connected, but how we get such candidates?
I am a little bit confused how you perform queries, which is mentioned in paper section 4.1(last paragraph), 4.2, 4.5, do we have to go down from the top layer to bottom layer? Actually in these three sections, you are saying different things for how to perform queries...
Why your algorithm is suitable for GPU? Or I say why HNSW-gpu(if we reimplement hnsw on gpu) is less efficient?
You did not report your memory consumption, but I think it is huge even if you have 8 GPUs... I assume 8GPUs have about 100GB device memory, how does it run on 1B data?