See <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id

I can replicate it on <div class="snippet-clipboard-content notranslate position-r

This happens: <div class="highlight highlight-source-julia notranslate position-re

That's what I was missing, y and <code class="notrans

NNlibCUDA Heisenbug in conv! with nonzero beta about nnlibcuda.jl HOT 7 CLOSED

fluxml commented on July 26, 2024

NNlibCUDA Heisenbug in conv! with nonzero beta

from nnlibcuda.jl.

Comments (7)

maxfreu commented on July 26, 2024 2

We have to differentiate between the actual convolution and the algorithm search. The convolution needs a zero-initalized output buffer, that it alters. The algorithm search also needs an output buffer for the benchmark, but at the end of this, the values in the buffer are arbitrary. If you use the same for both, you run the algorithm search, the buffer has arbitrary values in it and then the conv adds to that, leading to garbage. As you said correctly, in subsequent calls the algorithm search is omitted, which makes this a semi-heisenbug. Using this branch https://github.com/maxfreu/CUDA.jl/tree/conv-algosearch I get zero errors in the fuzzer.

from nnlibcuda.jl.

maxfreu commented on July 26, 2024 1

I can replicate it on

[052768ef] CUDA v3.10.1
[872c559c] NNlib v0.8.3
[a00861dc] NNlibCUDA v0.2.3
CUDA 11.4.0
CUDNN 8.30.2
julia-1.6.3

from nnlibcuda.jl.

ToucheSir commented on July 26, 2024

The first call to conv! is special because it will reliably trigger an algorithm search. What happens if you go a level lower and call the CUDA.jl functions?

from nnlibcuda.jl.

maxfreu commented on July 26, 2024

This happens:

using CUDA, NNlibCUDA, NNlib
x = CUDA.ones(1,1,1)
w = CUDA.ones(1,1,1)
y = CUDA.zeros(1,1,1)
cdims = NNlib.DenseConvDims(x,w)
d, x, _ = NNlibCUDA.cudnnConvolutionDescriptorAndPaddedInput(cdims, x)
CUDA.CUDNN.cudnnConvolutionForward!(y, w, x, d; alpha=1f0, beta=1f0, z=y) # 2
y = CUDA.zeros(1,1,1)
CUDA.CUDNN.cudnnConvolutionForward!(y, w, x, d; alpha=1f0, beta=1f0, z=y) # 1

I suspect that the crux is here. Instead of y, a similar array should be allocated and used in cudnnFindConvolutionForwardAlgorithmEx. @maleadt , what do you think? The CUDA docs here state that the contents of y will be overwritten with arbitrary values during the algorithm search.

from nnlibcuda.jl.

ToucheSir commented on July 26, 2024

I would expect that y would be re-overwritten during the actual forward pass though? Otherwise using similar to allocate that array would lead to strange results for all conv calls, not just the first one.

from nnlibcuda.jl.

ToucheSir commented on July 26, 2024

That's what I was missing, y and dy are accumulated into instead of completely overwritten when beta is non-zero. Can we tweak the call chain such that an output buffer is only allocated for the algorithm search when this is the case? My worry is that the search is already causing OOMs for users, so allocating more for it when not required is not ideal.

from nnlibcuda.jl.

maxfreu commented on July 26, 2024

Hmm I wouldn't have expected that the search causes OOMs, as the buffer should be freed right after the search. Apart from the input, weight and output tensors, the search also needs a "workspace", the size of which is calculated here. It already seems to be quite small, but I didn't think it through. Maybe not small enough? Anyway, it should be possible to allocate only if beta != 0. In case the assumption holds that this is the only case y is accumulated into. Should I mark the PR as draft?

from nnlibcuda.jl.

NNlibCUDA Heisenbug in conv! with nonzero beta about nnlibcuda.jl HOT 7 CLOSED

Comments (7)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent