Comments (17)
Also, hopefully, CuArrays/Flux kernels and layers should be competitive with cuDNN for everything other than convolutions (which would be way too much work) so the really annoying things (in particular cuDNN RNNs) likely aren't worth wrapping at all.
from cuarrays.jl.
@MikeInnes Most NN packages do it anyway. For example, PyTorch can be run completely on CPU, but it includes .cuda()
methods all over the code for convenience. Probably, if sparse NNs or TPUs become popular, they will also have .sparse()
and .tpu()
methods.
Anyway, an alternative is to make NNlib
a CPU-only package defining a common interface and something like CuNNlib
(or even the same CUDNN
) that implements this interface for a specific device. More high-level packages like Flux could then import both libraries so that a user doesn't have to type using NNlib, using CuNNlib
every time.
Also the more I think about it, the more I get confident that CuArrays
shouldn't depend on CUDNN
- there are other types of people (e.g. graphic engineers) who may take advantage of CuArray
data type, but have nothing to do with deep learning.
from cuarrays.jl.
I believe the stride info is passed to cuDNN inside the cdesc struct. Many frameworks don't wrap cuDNN's structs inside C++ classes the way PyTorch does; Julia certainly shouldn't really have to. One place to look for extremely bare-bones and crazy fast cuDNN usage is the pure-C deep learning framework darknet.
from cuarrays.jl.
Yes, darknet is a really pleasant piece of software. May be very useful for implementing CPU versions of forward and backward pass of convolution.
Regarding strides and other options, it turns out this is o...
parameter that captures them and wraps into CD
structure (same thing as cdesc
in PyTorch, as @jekbradbury pointed out).
However, it turns out that cudnnConvolutionForward
doesn't actually work. Tests pass only because it has been commented out at some point. I will take a look at it tomorrow.
from cuarrays.jl.
I'm not as confident that you want a cuDNN wrapper inside CuArrays, since cuDNN exposes often-stateful NN layers with both forward and backward functions and might be cleaner to wrap at the Flux level?
from cuarrays.jl.
Yeah, turns out that figuring out where to put this stuff is a hard problem of its own.
Wrapping in Flux requires at least an @require CuArrays
(to avoid a hard dependency on cuda), and there's a bunch of messiness in finding the binaries, you can't precompile that code etc.
Another option would be to keep CUDNN.jl as a separate package that builds on CuArrays. But then you have to figure out how to make everything talk to each other, which as yet is an unsolved problem.
I'm open to suggestions. For now the approach used with softmax
seems reasonable, and we can always move things around as we figure this stuff out.
from cuarrays.jl.
Another option would be to keep CUDNN.jl as a separate package that builds on CuArrays. But then you have to figure out how to make everything talk to each other, which as yet is an unsolved problem.
Can you elaborate on this? For me it looks like a good idea to make CUDNN.jl depend on CuArrays.jl (and CUDAdrv, of course).
I'd be very much interested in seeing cuDNN separate form Flux - although I like the library, it's not the only one, and it would be great to have cuDNN as an independent component usable from any other projects.
from cuarrays.jl.
I agree with that sentiment, and fortunately you can s/Flux/NNlib
in my comment above to make things reusable (though the same issues apply).
To elaborate on the issue, it happens when you write code like this:
using CuArrays, NNlib
softmax(cu(rand(5,5))
Someone has to do using CUDNN
to load a fast implementation of this – who?
It doesn't make a lot of sense to explicitly do using CuArrays, CUDNN
, given that this proliferates to every use in every downstream package. NNlib could do it, but it seems really strange to make it aware of specific implementations. If CuArrays has logic for conditionally loading CUDNN, you lose the benefits of them being separate and may as well just fold them together.
from cuarrays.jl.
I'm not sure I see the choice of using cuDNN or not for a particular op as being a good candidate for two methods on the same function -- that forces you to either write CuArrays kernels that replicate cuDNN's often-strange semantics or wrap cuDNN in lots of glue code to hide its strangeness. (e.g. the way softmax chooses a dimension). In general this often makes you choose between image-focused (cuDNN) and non-image-focused API shapes.
I'd rather see a CUDNN package that depends on CuArrays and some kind of DiffRules API and implements lightweight but autodiff-ready bindings to raw cuDNN kernels, so people who use pure Julia with CuArrays and autodiff (or who build DL frameworks on top of that) can choose to use CUDNN.softmax and the like whenever they want to but cuDNN doesn't have to affect the semantics of NNlib/"Base" ops. Basically I think a pure-Julia softmax will be almost as fast and significantly more flexible than cuDNN fairly quickly (I see this as being the case for pretty much all ops other than convolutions and BLAS, although even those might be possible when we can directly program Volta Tensor Cores in CUDAnative) and I don't see why we should have to call out to a fairly-poorly-designed C API any more than we need to.
from cuarrays.jl.
NNlib could do it, but it seems really strange to make it aware of specific implementations.
Why not? If in some library you provide functions common for neural networks and neural networks are mostly trained on GPU, it's reasonable to expect that this library is at least aware of GPU backends, even if it doesn't have a strong dependency on them.
I imagine dependency graph like this (where ==>
means strong dependency, while ~~>
means optional):
NNlib ~~> CUDNN ==> CuArrays
~~> CLWhaterver ==> CLArrays
==> Base.Array
from cuarrays.jl.
@dfdx From a design standpoint, something being common isn't a good reason to special-case it. Less common cases shouldn't work less well. New approaches will become popular (e.g. sparse ML or new hardware like TPUs) and we'd have to special-case them too, which rapidly defeats the point of NNlib.
Aside from that, CuArrays will have to load NNlib to provide fallback implementations, so it's equivalent to doing the auto-load there and once again loses the benefit. (Unless you also put cuda fallbacks inside NNlib, but that leads to a nightmarish version of the above, adds a lot of complexity to compilation and so on.)
from cuarrays.jl.
@jekbradbury Ok, I see where you're coming from.
It definitely does seem cleaner to create a solid, Julian base for DL functions than to design around cudnn's particular optimisations. The downside is that if/when cudnn gives an advantage, the (possibly multiple) ML libraries have to explicitly pull it in. OTOH, it's inevitable that making optimisations will require some knowledge of the implementation beyond AbstractArray
; special-casing CUDNN's approach can't avoid that forever, in which case there's no real benefit to it.
These approaches are not completely incompatible. We can wrap cudnn in a lightweight way in CuArrays for some NNlib functions (perhaps only has a stopgap, as for softmax), but then have CUDNN.jl build on that with a full wrapper. That seems like a completely reasonable direction to me.
from cuarrays.jl.
Also the more I think about it, the more I get confident that CuArrays shouldn't depend on CUDNN - there are other types of people (e.g. graphic engineers) who may take advantage of CuArray data type, but have nothing to do with deep learning.
Yes!
from cuarrays.jl.
Yes, that's pretty obvious and not at all at issue here.
from cuarrays.jl.
For instance these new softmax kernels are faster than cuDNN and way more flexible pytorch/pytorch#2899
from cuarrays.jl.
I'm a bit confused on how to handle strides in convolution - cudnnConvolutionForward
doesn't mention anything like that. In PyTorch, I see strides in convolution parameters, but it seems like implementation doesn't use it (at least explicitly). Does anyone have a clue?
from cuarrays.jl.
@dfdx That may have been me; the CUDNN.jl code was/is very stale so I more or less just wanted to get the minimum viable thing to be usable, rather than fixing up the entire package. I just removed CuArrays' dependency on it, so I think you're basically free to do as you please with it. Looking forward to having this really solid!
I think we have a good plan for this, so we can close this issue for now and discuss specifics like convolutions wherever appropriate.
from cuarrays.jl.
Related Issues (20)
- similar(PermutedDimsArray(::CuArray)) isa Array HOT 1
- In CuArrays v2.0, GPU operation takes hours to run for the first time HOT 5
- sum!(y::CuVector, x::CuMatrix) throws InvalidIRError error
- Where can I find
- Where can I find All the using instructions of CuArrays? HOT 3
- add implicit float conversion to math functions HOT 4
- Multiplication between mixed types doesn't drop leading dimensions HOT 2
- Very slow 4D broadcast in 2.0.1 HOT 1
- Failed to detect installed CUDA version. HOT 1
- Sum function is slow HOT 8
- CURAND_STATUS_PREEXISTING_FAILURE with v2.0.1 but not v1.7.3 HOT 8
- Deadlock during memory free HOT 5
- Indexing CuArrays with Empty Ranges Errors HOT 5
- Sum, any, etc. with function is no longer implemented HOT 7
- Training Halts when Using CuArrarys HOT 6
- CUBLAS initialization HOT 1
- Performance issue with v2.1.0 compared with v1.7.3 HOT 4
- .+ CartesianIndices: InvalidIRError: compiling kernel broadcast HOT 1
- Package fails to load HOT 4
- Project.toml becoming stale (many notable package downgrades) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cuarrays.jl.