Comments (6)
According to the CUDA description:
cudaLimitPrintfFifoSize controls the size in bytes of the shared FIFO used by the printf() device system call. Setting cudaLimitPrintfFifoSize must not be performed after launching any kernel that uses the printf() device system call - in such case cudaErrorInvalidValue will be returned.
But then we don't call any printf (all are masked). And I don't understand why we see this problem only on H100...
from dbcsr.
I do not understand it either, I have simply not root-caused the issue let alone reporting the software versions like CUDA (or HPCSDK). I am currently retrying with this change.
from dbcsr.
I do not understand it either, I have simply not root-caused the issue let alone reporting the software versions like CUDA (or HPCSDK). I am currently retrying with this change.
it makes sense...
from dbcsr.
Since DeviceSetLimit
is governed by ACC_API_CALL
, the symbol NDEBUG
must not be defined for reproducing the issue.
from dbcsr.
Let's leave this ticket open... I think the issue here is when the RT fails to build a kernel, but I'm not sure...
from dbcsr.
(Taking over from #777 (comment))
I think we can move the call to a more convenient place...
What do you suggest? Putting it into acc_init may not be the right thing as it is device specific.
I wonder if the code in question should be removed entirely?
I start to think this is the right solution... But need more time to investigate it (see my previous comment).
from dbcsr.
Related Issues (20)
- Error with mpich 4.1: Type mismatch in argument 'baseptr' at (1); passed TYPE(c_ptr) to INTEGER(4) HOT 26
- cuda tests are broken HOT 1
- Evaluate USE_ACCEL=opencl
- Test MPI_F08 HOT 1
- Default initializers
- mpich test failure on s390x HOT 2
- Thread number has changed error if OMP_DYNAMIC=TRUE HOT 3
- `TEST_MPI_RANKS=auto` does not account for ctest parallelization HOT 1
- Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation HOT 1
- OpenMP detection with Clang is broken in 2.6.0 HOT 10
- Compilation fails on Archer2 UK system HOT 7
- What levels of sparsity is this useful for? HOT 4
- Removal of dbcsr-data HOT 2
- Consider to drop -Werror for tests/configs pulling external APIs/frameworks HOT 12
- DBCSR performs very poorly on GH200, when there are large blocks HOT 14
- Discussion on tuning machinery HOT 2
- Make configuration variables immutable when they are already consumed
- cuFuncSetSharedMemConfig deprecated HOT 1
- Discussion on DBCSR HOT 43
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbcsr.