Same as <a class="issue-link js-issue-link" data-error-text="Failed to load title" dat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

CUDNN about cuarrays.jl HOT 17 CLOSED

juliagpu commented on July 18, 2024

CUDNN

from cuarrays.jl.

Comments (17)

jekbradbury commented on July 18, 2024 1

Also, hopefully, CuArrays/Flux kernels and layers should be competitive with cuDNN for everything other than convolutions (which would be way too much work) so the really annoying things (in particular cuDNN RNNs) likely aren't worth wrapping at all.

from cuarrays.jl.

dfdx commented on July 18, 2024 1

@MikeInnes Most NN packages do it anyway. For example, PyTorch can be run completely on CPU, but it includes .cuda() methods all over the code for convenience. Probably, if sparse NNs or TPUs become popular, they will also have .sparse() and .tpu() methods.

Anyway, an alternative is to make NNlib a CPU-only package defining a common interface and something like CuNNlib (or even the same CUDNN) that implements this interface for a specific device. More high-level packages like Flux could then import both libraries so that a user doesn't have to type using NNlib, using CuNNlib every time.

Also the more I think about it, the more I get confident that CuArrays shouldn't depend on CUDNN - there are other types of people (e.g. graphic engineers) who may take advantage of CuArray data type, but have nothing to do with deep learning.

from cuarrays.jl.

jekbradbury commented on July 18, 2024 1

I believe the stride info is passed to cuDNN inside the cdesc struct. Many frameworks don't wrap cuDNN's structs inside C++ classes the way PyTorch does; Julia certainly shouldn't really have to. One place to look for extremely bare-bones and crazy fast cuDNN usage is the pure-C deep learning framework darknet.

from cuarrays.jl.

dfdx commented on July 18, 2024 1

Yes, darknet is a really pleasant piece of software. May be very useful for implementing CPU versions of forward and backward pass of convolution.

Regarding strides and other options, it turns out this is o... parameter that captures them and wraps into CD structure (same thing as cdesc in PyTorch, as @jekbradbury pointed out).

However, it turns out that cudnnConvolutionForward doesn't actually work. Tests pass only because it has been commented out at some point. I will take a look at it tomorrow.

from cuarrays.jl.

jekbradbury commented on July 18, 2024

I'm not as confident that you want a cuDNN wrapper inside CuArrays, since cuDNN exposes often-stateful NN layers with both forward and backward functions and might be cleaner to wrap at the Flux level?

from cuarrays.jl.

MikeInnes commented on July 18, 2024

Yeah, turns out that figuring out where to put this stuff is a hard problem of its own.

Wrapping in Flux requires at least an @require CuArrays (to avoid a hard dependency on cuda), and there's a bunch of messiness in finding the binaries, you can't precompile that code etc.

Another option would be to keep CUDNN.jl as a separate package that builds on CuArrays. But then you have to figure out how to make everything talk to each other, which as yet is an unsolved problem.

I'm open to suggestions. For now the approach used with softmax seems reasonable, and we can always move things around as we figure this stuff out.

from cuarrays.jl.

dfdx commented on July 18, 2024

Another option would be to keep CUDNN.jl as a separate package that builds on CuArrays. But then you have to figure out how to make everything talk to each other, which as yet is an unsolved problem.

Can you elaborate on this? For me it looks like a good idea to make CUDNN.jl depend on CuArrays.jl (and CUDAdrv, of course).

I'd be very much interested in seeing cuDNN separate form Flux - although I like the library, it's not the only one, and it would be great to have cuDNN as an independent component usable from any other projects.

from cuarrays.jl.

MikeInnes commented on July 18, 2024

I agree with that sentiment, and fortunately you can s/Flux/NNlib in my comment above to make things reusable (though the same issues apply).

To elaborate on the issue, it happens when you write code like this:

using CuArrays, NNlib
softmax(cu(rand(5,5))

Someone has to do using CUDNN to load a fast implementation of this – who?

It doesn't make a lot of sense to explicitly do using CuArrays, CUDNN, given that this proliferates to every use in every downstream package. NNlib could do it, but it seems really strange to make it aware of specific implementations. If CuArrays has logic for conditionally loading CUDNN, you lose the benefits of them being separate and may as well just fold them together.

from cuarrays.jl.

jekbradbury commented on July 18, 2024

I'm not sure I see the choice of using cuDNN or not for a particular op as being a good candidate for two methods on the same function -- that forces you to either write CuArrays kernels that replicate cuDNN's often-strange semantics or wrap cuDNN in lots of glue code to hide its strangeness. (e.g. the way softmax chooses a dimension). In general this often makes you choose between image-focused (cuDNN) and non-image-focused API shapes.

I'd rather see a CUDNN package that depends on CuArrays and some kind of DiffRules API and implements lightweight but autodiff-ready bindings to raw cuDNN kernels, so people who use pure Julia with CuArrays and autodiff (or who build DL frameworks on top of that) can choose to use CUDNN.softmax and the like whenever they want to but cuDNN doesn't have to affect the semantics of NNlib/"Base" ops. Basically I think a pure-Julia softmax will be almost as fast and significantly more flexible than cuDNN fairly quickly (I see this as being the case for pretty much all ops other than convolutions and BLAS, although even those might be possible when we can directly program Volta Tensor Cores in CUDAnative) and I don't see why we should have to call out to a fairly-poorly-designed C API any more than we need to.

from cuarrays.jl.

dfdx commented on July 18, 2024

NNlib could do it, but it seems really strange to make it aware of specific implementations.

Why not? If in some library you provide functions common for neural networks and neural networks are mostly trained on GPU, it's reasonable to expect that this library is at least aware of GPU backends, even if it doesn't have a strong dependency on them.

I imagine dependency graph like this (where ==> means strong dependency, while ~~> means optional):

NNlib ~~> CUDNN ==> CuArrays
      ~~> CLWhaterver ==> CLArrays
      ==> Base.Array

from cuarrays.jl.

MikeInnes commented on July 18, 2024

@dfdx From a design standpoint, something being common isn't a good reason to special-case it. Less common cases shouldn't work less well. New approaches will become popular (e.g. sparse ML or new hardware like TPUs) and we'd have to special-case them too, which rapidly defeats the point of NNlib.

Aside from that, CuArrays will have to load NNlib to provide fallback implementations, so it's equivalent to doing the auto-load there and once again loses the benefit. (Unless you also put cuda fallbacks inside NNlib, but that leads to a nightmarish version of the above, adds a lot of complexity to compilation and so on.)

from cuarrays.jl.

MikeInnes commented on July 18, 2024

@jekbradbury Ok, I see where you're coming from.

It definitely does seem cleaner to create a solid, Julian base for DL functions than to design around cudnn's particular optimisations. The downside is that if/when cudnn gives an advantage, the (possibly multiple) ML libraries have to explicitly pull it in. OTOH, it's inevitable that making optimisations will require some knowledge of the implementation beyond AbstractArray; special-casing CUDNN's approach can't avoid that forever, in which case there's no real benefit to it.

These approaches are not completely incompatible. We can wrap cudnn in a lightweight way in CuArrays for some NNlib functions (perhaps only has a stopgap, as for softmax), but then have CUDNN.jl build on that with a full wrapper. That seems like a completely reasonable direction to me.

from cuarrays.jl.

vchuravy commented on July 18, 2024

Also the more I think about it, the more I get confident that CuArrays shouldn't depend on CUDNN - there are other types of people (e.g. graphic engineers) who may take advantage of CuArray data type, but have nothing to do with deep learning.

Yes!

from cuarrays.jl.

MikeInnes commented on July 18, 2024

Yes, that's pretty obvious and not at all at issue here.

from cuarrays.jl.

jekbradbury commented on July 18, 2024

For instance these new softmax kernels are faster than cuDNN and way more flexible pytorch/pytorch#2899

from cuarrays.jl.

dfdx commented on July 18, 2024

I'm a bit confused on how to handle strides in convolution - cudnnConvolutionForward doesn't mention anything like that. In PyTorch, I see strides in convolution parameters, but it seems like implementation doesn't use it (at least explicitly). Does anyone have a clue?

from cuarrays.jl.

MikeInnes commented on July 18, 2024

@dfdx That may have been me; the CUDNN.jl code was/is very stale so I more or less just wanted to get the minimum viable thing to be usable, rather than fixing up the entire package. I just removed CuArrays' dependency on it, so I think you're basically free to do as you please with it. Looking forward to having this really solid!

I think we have a good plan for this, so we can close this issue for now and discuss specifics like convolutions wherever appropriate.

from cuarrays.jl.

CUDNN about cuarrays.jl HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent