Comments (3)
Hi there, thanks for a good question! It's been on our radars for a while, but it hasn't been clear so far whether pre-computing lookup tables per-query may get in the way of other raft optimizations.
In raft, the lookup table fully encodes distance components from the query to all possible vectors in a list (cluster). In particular, it depends on the distance from the query to the cluster center. Hence, it's unique for every (query, probe) pair.
As far as I know, some other implementations only encode the residual distances in the lookup table; in that case the lookup table needs only be constructed once per query. Both approaches have their pros and cons. Couple reasons to back up our choice:
- Encoding the full distance for every (query, probe) pair means we save a little bit of compute during the second phase - scanning through the encoded list. This tends to give better performance with larger datasets (100M or 1B records and up), where the construction of the lookup table takes only a small fraction of overall runtime.
- We support an alternative codebook mode, where the codebooks are trained per-cluster rather than per-subspace in the feature space. This mode allows even faster lookup table creation due to better data locality, but it naturally depends on the probed cluster id.
from raft.
I'd suggest we keep it open as a feature request, so that we can prioritize and try this as an optional optimization at some point.
from raft.
Thanks @QDXG-CXK and @achirkin. I've gone ahead and converted this to a feature request so we can keep it on our radar.
from raft.
Related Issues (20)
- [FEA] CAGRA vector addition - support large graph and dataset
- [FEA] Additional distance metrics for CAGRA, CAGRA-Q
- [FEA] NN Descent + CAGRA should support additional distance metrics
- [QST] Parallel Execution of cugraph::bfs Using RAFT and OpenMP for Stream Management
- [FEA] Support for (u)int8 matrix in row_normalize
- [QST] How to change log level in pylibraft
- [QST] num_threads on ANN latency benchmark HOT 2
- error loading pylibraft HOT 1
- [BUG] IVF-PQ index creation crashes on aarch64 for wiki_all_1M benchmark HOT 1
- [FEA] Better exception handling
- [FEA] use of COO to balance the `faster_dot kernel` across the blocks
- [FEA]creating the masked_matmul function to front the SDDMM+custom kernel.
- [BUG] wheel tests do not fail when `raft-dask` wheel has unsatisfiable dependency requirements HOT 1
- [BUG] Error when running raft_ann_bench
- [BUG] Enable logging macros in downstream projects
- [BUG] Cagra To hnsw, the search api not use search param ef HOT 3
- [FEA] Consolidate SUM reductions across raft
- [BUG] `bitmap_view::set` not work
- [FEA] Support setting the bits in a bitmap filter via a user-defined CUDA functor.
- Updating RAFT ANN benchmark environment/containers to newer CUDA
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raft.