Comments (3)
I ran the test in a loop (both, using --gtest_repeat=-1 and a bash loop), never triggered that on the host/conda. In your reproducer case, was getting it roughly in every 2 out of 3 tests.
The problem is more-or-less clear - the unspecified order of destruction across translation units and/or between thread-local and static variables. Perhaps, the inlined static/thread-local variables end up in different translation units depending on the library we link against?.. Or it could be some totally random unrelated thing.
from raft.
Thanks for the report! I could reproduce this as well with the same container and using libiomp5 (though I couldn't reproduce it with any of the openmp packages coming with conda). Also, with #1229 in place, the segfault does not happen.
For the record, here's the commands I used (within the raft root folder, using the same test from the raft's core test suite):
docker run -v `pwd`:/workspace/raft --gpus all -it --rm nvcr.io/nvidia/pytorch:23.01-py3
...
apt update && apt install -y libgtest-dev libomp-dev gdb
cd /workspace/raft
nvcc -Icpp/include -Xcompiler=-fopenmp --std=c++17 cpp/test/core/interruptible.cu -o test -lgtest -lgtest_main -lomp5
./test
Interestingly, I could only reproduce the error when running all the tests from cpp/test/core/interruptible.cu
together, but couldn't when running any one of the tests individually using --gtest_filter
. Also, the segmentation fault occurred at the very end, after gtest reported that all tests passed (this is expectable, because we assume the program crashes at the exit).
from raft.
FYI conda symlinks llvm openmp to libgomp: https://github.com/conda-forge/_openmp_mutex-feedstock/blob/main/recipe/build.sh#L12
They also build openmp with static libgcc and static libstdc++:
https://github.com/conda-forge/openmp-feedstock/blob/main/recipe/build-llvm-openmp.sh#L21
I understand the symlinking part but don't know if the static linking has anything to do with the segfault not showing up, curious if you know anything about it. Also did you run the reproducer multiple times? The error is a bit non deterministic and showed up ~1/3 times when I tested with libiomp5/libomp. So I'm wondering if the segfault not showing up with conda openmp was just a one off event.
from raft.
Related Issues (20)
- [FEA] Improve connectivity of CAGRA graph index for search
- [QST] Question about VectorSearch_QuestionRetrieval.ipynb HOT 1
- [BUG] cagra filter test returns duplicate neighbor indices HOT 1
- [QST] Check rapids/raft version when installing through pip HOT 2
- [QST]How to ensure that the returned neighbor IDs in cagra multi-cta algorithm are non-duplicate?
- Automate C++ include file grouping and ordering using clang-format
- [BUG] raft-ann-bench.run stuck after sweep in search mode
- Replace device_memory_resource* with device_async_resource_ref
- [FEA] CAGRA vector addition - support large graph and dataset
- [FEA] Additional distance metrics for CAGRA, CAGRA-Q
- [FEA] NN Descent + CAGRA should support additional distance metrics
- [QST] Parallel Execution of cugraph::bfs Using RAFT and OpenMP for Stream Management
- [FEA] Support for (u)int8 matrix in row_normalize
- [QST] How to change log level in pylibraft
- [QST] num_threads on ANN latency benchmark HOT 2
- error loading pylibraft
- [BUG] IVF-PQ index creation crashes on aarch64 for wiki_all_1M benchmark HOT 1
- [FEA] Better exception handling
- [FEA] use of COO to balance the `faster_dot kernel` across the blocks
- [FEA]creating the masked_matmul function to front the SDDMM+custom kernel.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raft.