Code Monkey home page Code Monkey logo

Comments (7)

maxfreu avatar maxfreu commented on July 26, 2024 2

We have to differentiate between the actual convolution and the algorithm search. The convolution needs a zero-initalized output buffer, that it alters. The algorithm search also needs an output buffer for the benchmark, but at the end of this, the values in the buffer are arbitrary. If you use the same for both, you run the algorithm search, the buffer has arbitrary values in it and then the conv adds to that, leading to garbage. As you said correctly, in subsequent calls the algorithm search is omitted, which makes this a semi-heisenbug. Using this branch https://github.com/maxfreu/CUDA.jl/tree/conv-algosearch I get zero errors in the fuzzer.

from nnlibcuda.jl.

maxfreu avatar maxfreu commented on July 26, 2024 1

I can replicate it on

[052768ef] CUDA v3.10.1
[872c559c] NNlib v0.8.3
[a00861dc] NNlibCUDA v0.2.3
CUDA 11.4.0
CUDNN 8.30.2
julia-1.6.3

from nnlibcuda.jl.

ToucheSir avatar ToucheSir commented on July 26, 2024

The first call to conv! is special because it will reliably trigger an algorithm search. What happens if you go a level lower and call the CUDA.jl functions?

from nnlibcuda.jl.

maxfreu avatar maxfreu commented on July 26, 2024

This happens:

using CUDA, NNlibCUDA, NNlib
x = CUDA.ones(1,1,1)
w = CUDA.ones(1,1,1)
y = CUDA.zeros(1,1,1)
cdims = NNlib.DenseConvDims(x,w)
d, x, _ = NNlibCUDA.cudnnConvolutionDescriptorAndPaddedInput(cdims, x)
CUDA.CUDNN.cudnnConvolutionForward!(y, w, x, d; alpha=1f0, beta=1f0, z=y) # 2
y = CUDA.zeros(1,1,1)
CUDA.CUDNN.cudnnConvolutionForward!(y, w, x, d; alpha=1f0, beta=1f0, z=y) # 1

I suspect that the crux is here. Instead of y, a similar array should be allocated and used in cudnnFindConvolutionForwardAlgorithmEx. @maleadt , what do you think? The CUDA docs here state that the contents of y will be overwritten with arbitrary values during the algorithm search.

from nnlibcuda.jl.

ToucheSir avatar ToucheSir commented on July 26, 2024

I would expect that y would be re-overwritten during the actual forward pass though? Otherwise using similar to allocate that array would lead to strange results for all conv calls, not just the first one.

from nnlibcuda.jl.

ToucheSir avatar ToucheSir commented on July 26, 2024

That's what I was missing, y and dy are accumulated into instead of completely overwritten when beta is non-zero. Can we tweak the call chain such that an output buffer is only allocated for the algorithm search when this is the case? My worry is that the search is already causing OOMs for users, so allocating more for it when not required is not ideal.

from nnlibcuda.jl.

maxfreu avatar maxfreu commented on July 26, 2024

Hmm I wouldn't have expected that the search causes OOMs, as the buffer should be freed right after the search. Apart from the input, weight and output tensors, the search also needs a "workspace", the size of which is calculated here. It already seems to be quite small, but I didn't think it through. Maybe not small enough? Anyway, it should be possible to allocate only if beta != 0. In case the assumption holds that this is the only case y is accumulated into. Should I mark the PR as draft?

from nnlibcuda.jl.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.