Comments (4)
Still error: when compile the template, it shows "ptxas error : Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored" .
This looks like the "InternalID_t" out of range?
struct InternalID_t;
// InternalID_t uses 1 bit for marking (new or old).
template <>
class InternalID_t<int> {
private:
using Index_t = int;
Index_t id_{std::numeric_limits<Index_t>::max()};
public:
__host__ __device__ bool is_new() const { return id_ >= 0; }
__host__ __device__ Index_t& id_with_flag() { return id_; }
__host__ __device__ Index_t id() const
{
if (is_new()) return id_;
return -id_ - 1;
}
__host__ __device__ void mark_old()
{
if (id_ >= 0) id_ = -id_ - 1;
}
__host__ __device__ bool operator==(const InternalID_t<int>& other) const
{
return id() == other.id();
}
};
from raft.
UPDATE:
I remove all mamba / miniforge environment, and delete all raft source code and the compiled / installed libs from my ubuntu.
ONLY the template folder left in my machine. I run the build shell via ./build.sh
. compile still fails with the same error. I then modify the 398-th sentence of the nn_descent.cuh
(@ /path/to/template/build/_deps/raft-src/cpp/include/raft/neighbors/detail/nn_descent.cuh
) to
constexpr int BLOCK_SIZE = 256; ///512;
I reduce the BLOCK_SIZE
from 512 to 256. The bug is gone.
BUT, the nn_descent is even slower than RTX 3090, and the knn graph quality (recall) is nearly zero.
from raft.
OK, fixed.
change the 694-th line of code of /path/to/template/build/_deps/raft-src/cpp/include/raft/neighbors/detail/nn_descent.cuh
to
#if (__CUDA_ARCH__) == 750 || (__CUDA_ARCH__) == 860 || (__CUDA_ARCH__) == 890
the bug fixed.
- since I see a comment here
// launch_bounds here denote BLOCK_SIZE = 512 and MIN_BLOCKS_PER_SM = 4
// Per
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications,
// MAX_RESIDENT_THREAD_PER_SM = BLOCK_SIZE * BLOCKS_PER_SM = 2048
// For architectures 750 and 860, the values for MAX_RESIDENT_THREAD_PER_SM
// is 1024 and 1536 respectively, which means the bounds don't work anymore
and i find RTX4090's MAX_RESIDENT_THREAD_PER_SM is also 1536, and its arch is 89. So i add || (__CUDA_ARCH__) == 890
to this
from raft.
Since this PR has been merged, this issue will be closed.
from raft.
Related Issues (20)
- [BUG] raft-ann-bench.run stuck after sweep in search mode
- Replace device_memory_resource* with device_async_resource_ref
- [FEA] CAGRA vector addition - support large graph and dataset
- [FEA] Additional distance metrics for CAGRA, CAGRA-Q
- [FEA] NN Descent + CAGRA should support additional distance metrics
- [QST] Parallel Execution of cugraph::bfs Using RAFT and OpenMP for Stream Management
- [FEA] Support for (u)int8 matrix in row_normalize
- [QST] How to change log level in pylibraft
- [QST] num_threads on ANN latency benchmark HOT 2
- error loading pylibraft HOT 1
- [BUG] IVF-PQ index creation crashes on aarch64 for wiki_all_1M benchmark HOT 1
- [FEA] Better exception handling
- [FEA] use of COO to balance the `faster_dot kernel` across the blocks
- [FEA]creating the masked_matmul function to front the SDDMM+custom kernel.
- [BUG] wheel tests do not fail when `raft-dask` wheel has unsatisfiable dependency requirements HOT 1
- [BUG] Error when running raft_ann_bench
- [BUG] Enable logging macros in downstream projects
- [BUG] Cagra To hnsw, the search api not use search param ef HOT 3
- [FEA] Consolidate SUM reductions across raft
- [BUG] `bitmap_view::set` not work
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raft.