fluxml / nnlibcuda.jl Goto Github PK

CUDA integration for the NNlib API

License: Other

Julia 100.00%

nnlibcuda.jl's Introduction

NNlibCUDA.jl

This is a glue package which extends functions from NNlib.jl to work with CUDA.jl. It should be loaded automatically when using Flux.jl, but not when using NNlib.jl by itself.

Julia gpu kernels are in src/, while wrappers around cudnn are in src/cudnn/.

nnlibcuda.jl's People

Contributors

Stargazers

Watchers

Forkers

yuehhua maxfreu akashkgarg touchesir darsnack jeremiedb bjarthur mcabbott domcrose yichengdwu chengchingwen maleadt nikopj

nnlibcuda.jl's Issues

NNlibCUDA Heisenbug in conv! with nonzero beta

See JuliaGPU/CUDA.jl#736

Describe the bug

When using the beta keyword of NNlib.conv! on CuArray there are rare non-deterministic? absurd results.

To reproduce
Run the following on a fresh julia session

using CUDA
using NNlib
x_cpu = fill(1f0, 1,1,1)
w_cpu = fill(1f0, 1,1,1)
x_gpu = CuArray(x_cpu)
w_gpu = CuArray(w_cpu)
cdims = NNlib.DenseConvDims(x_cpu, w_cpu)
y_cpu = fill(0f0, 1,1,1)
y_gpu = CuArray(y_cpu)
NNlib.conv!(y_cpu, x_cpu, w_cpu, cdims, alpha=1f0, beta=1f0)
NNlib.conv!(y_gpu, x_gpu, w_gpu, cdims, alpha=1f0, beta=1f0)
@show y_cpu
@show y_gpu

y_cpu = Float32[1.0]
y_gpu = Float32[2.0]

If I run it again, y_gpu will give the correct result. If I do some fuzz testing, it seems that at least the first conv! operation of given array sizes goes wrong. I think also that it is not only the first operation that goes wrong, but the first operation reliably goes wrong.

using NNlib
using CUDA
using Test
using LinearAlgebra

function fuzz(;max_fails, max_iter)
    fails = 0
    for i in 1:max_iter
        nspacedims = rand(1:1)
        spacesize  = tuple(rand(1:3, nspacedims)...)
        nb         = rand(1:3)
        ncin       = rand(1:3)
        ncout      = rand(1:3)
        
        x_cpu = randn(Float32, spacesize...,ncin, nb)
        kernel_size = ntuple(_->1, nspacedims)
        w_cpu = randn(Float32, kernel_size...,ncin, ncout)
        x_gpu = CuArray(x_cpu)
        w_gpu = CuArray(w_cpu)
        
        cdims = NNlib.DenseConvDims(x_cpu, w_cpu)
        y_cpu = randn(Float32, spacesize...,ncout, nb)
        y_gpu = CuArray(y_cpu)
        NNlib.conv!(y_cpu, x_cpu, w_cpu, cdims, alpha=1f0, beta=1f0)
        NNlib.conv!(y_gpu, x_gpu, w_gpu, cdims, alpha=1f0, beta=1f0)
        if !(collect(y_gpu) ≈ y_cpu)
            @show i
            #@show x_cpu
            #@show x_gpu
            #@show w_cpu
            #@show w_gpu
            #@show y_cpu
            #@show y_gpu
            @show size(x_cpu)
            @show size(w_cpu)
            @show norm(collect(y_gpu) -y_cpu)
            fails += 1
        end
        if fails >= max_fails
            break
        end
    end
    @show fails
end

fuzz(max_fails=1000, max_iter=10000)

I am using current master of CUDA.jl

Manifest.toml

[[AbstractFFTs]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "8ed9de2f1b1a9b1dee48582ad477c6e67b83eb2c"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.0.0"

[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "ffcfa2d345aaee0ef3d8346a073d5dd03c983ebe"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.2.0"

[[ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"

[[Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[BFloat16s]]
deps = ["LinearAlgebra", "Test"]
git-tree-sha1 = "4af69e205efc343068dc8722b8dfec1ade89254a"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.1.0"

[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[CEnum]]
git-tree-sha1 = "215a9aa4a1f23fbd05b92769fdd62559488d70e9"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.4.1"

[[CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CompilerSupportLibraries_jll", "DataStructures", "ExprTools", "GPUArrays", "GPUCompiler", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "MacroTools", "Memoize", "NNlib", "Printf", "Random", "Reexport", "Requires", "SparseArrays", "Statistics", "TimerOutputs"]
git-tree-sha1 = "d891e403471f04266c80a03ecf247d9aff6e7879"
repo-rev = "master"
repo-url = "https://github.com/JuliaGPU/CUDA.jl.git"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "2.6.0"

[[ChainRulesCore]]
deps = ["Compat", "LinearAlgebra", "SparseArrays"]
git-tree-sha1 = "de4f08843c332d355852721adb1592bce7924da3"
uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
version = "0.9.29"

[[Compat]]
deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
git-tree-sha1 = "919c7f3151e79ff196add81d7f4e45d91bbf420b"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "3.25.0"

[[CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"

[[DataStructures]]
deps = ["Compat", "InteractiveUtils", "OrderedCollections"]
git-tree-sha1 = "4437b64df1e0adccc3e5d1adbc3ac741095e4677"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.18.9"

[[Dates]]
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"

[[DelimitedFiles]]
deps = ["Mmap"]
uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"

[[Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"

[[Downloads]]
deps = ["ArgTools", "LibCURL", "NetworkOptions"]
uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"

[[ExprTools]]
git-tree-sha1 = "10407a39b87f29d47ebaca8edbc75d7c302ff93e"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.3"

[[GPUArrays]]
deps = ["AbstractFFTs", "Adapt", "LinearAlgebra", "Printf", "Random", "Serialization"]
git-tree-sha1 = "f99a25fe0313121f2f9627002734c7d63b4dd3bd"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "6.2.0"

[[GPUCompiler]]
deps = ["DataStructures", "ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "Serialization", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "ef2839b063e158672583b9c09d2cf4876a8d3d55"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.10.0"

[[InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"

[[LLVM]]
deps = ["CEnum", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "b616937c31337576360cb9fb872ec7633af7b194"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "3.6.0"

[[LazyArtifacts]]
deps = ["Artifacts", "Pkg"]
uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3"

[[LibCURL]]
deps = ["LibCURL_jll", "MozillaCACerts_jll"]
uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21"

[[LibCURL_jll]]
deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"]
uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0"

[[LibGit2]]
deps = ["Base64", "NetworkOptions", "Printf", "SHA"]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"

[[LibSSH2_jll]]
deps = ["Artifacts", "Libdl", "MbedTLS_jll"]
uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8"

[[Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

[[LinearAlgebra]]
deps = ["Libdl"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

[[Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"

[[MacroTools]]
deps = ["Markdown", "Random"]
git-tree-sha1 = "6a8a2a625ab0dea913aba95c11370589e0239ff0"
uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
version = "0.5.6"

[[Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"

[[MbedTLS_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"

[[Memoize]]
deps = ["MacroTools"]
git-tree-sha1 = "2b1dfcba103de714d31c033b5dacc2e4a12c7caa"
uuid = "c03570c3-d221-55d1-a50c-7939bbd78826"
version = "0.4.4"

[[Mmap]]
uuid = "a63ad114-7e13-5084-954f-fe012c677804"

[[MozillaCACerts_jll]]
uuid = "14a3606d-f60d-562e-9121-12d972cd8159"

[[NNlib]]
deps = ["ChainRulesCore", "Compat", "LinearAlgebra", "Pkg", "Requires", "Statistics"]
git-tree-sha1 = "df42d0816edfc24f5b82a728f46381613c4dff79"
uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
version = "0.7.14"

[[NetworkOptions]]
uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908"

[[OrderedCollections]]
git-tree-sha1 = "4fa2ba51070ec13fcc7517db714445b4ab986bdf"
uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
version = "1.4.0"

[[Pkg]]
deps = ["Artifacts", "Dates", "Downloads", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs"]
uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"

[[Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"

[[REPL]]
deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"]
uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"

[[Random]]
deps = ["Serialization"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[[Reexport]]
git-tree-sha1 = "57d8440b0c7d98fc4f889e478e80f268d534c9d5"
uuid = "189a3867-3050-52da-a836-e630ba90ab69"
version = "1.0.0"

[[Requires]]
deps = ["UUIDs"]
git-tree-sha1 = "cfbac6c1ed70c002ec6361e7fd334f02820d6419"
uuid = "ae029012-a4dd-5104-9daa-d747884805df"
version = "1.1.2"

[[SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"

[[Scratch]]
deps = ["Dates"]
git-tree-sha1 = "ad4b278adb62d185bbcb6864dc24959ab0627bf6"
uuid = "6c6a2e73-6563-6170-7368-637461726353"
version = "1.0.3"

[[Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"

[[SharedArrays]]
deps = ["Distributed", "Mmap", "Random", "Serialization"]
uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"

[[Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"

[[SparseArrays]]
deps = ["LinearAlgebra", "Random"]
uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[[Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

[[TOML]]
deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"

[[Tar]]
deps = ["ArgTools", "SHA"]
uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"

[[Test]]
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[[TimerOutputs]]
deps = ["Printf"]
git-tree-sha1 = "3318281dd4121ecf9713ce1383b9ace7d7476fdd"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.7"

[[UUIDs]]
deps = ["Random", "SHA"]
uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"

[[Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

[[Zlib_jll]]
deps = ["Libdl"]
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"

[[nghttp2_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d"

Version info

Details on Julia:

Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: AMD Ryzen 9 3900 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)
Environment:
  JULIA_NUM_THREADS = 24

Details on CUDA:

CUDA toolkit 11.2.1, artifact installation
CUDA driver 11.2.0
NVIDIA driver 460.32.3

Libraries: 
- CUBLAS: 11.4.1
- CURAND: 10.2.3
- CUFFT: 10.4.0
- CUSOLVER: 11.1.0
- CUSPARSE: 11.4.0
- CUPTI: 14.0.0
- NVML: 11.0.0+460.32.3
- CUDNN: 8.10.0 (for CUDA 11.2.0)
- CUTENSOR: 1.2.2 (for CUDA 11.1.0)

Toolchain:
- Julia: 1.6.0-rc1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

Preferences:
- Memory pool: None
- Async allocation: true

1 device:
  0: GeForce RTX 2060 (sm_75, 4.931 GiB / 5.787 GiB available)

`batched_gemm!` with CuArrays doesn't work

it seems as if there is a dispatch problem with https://github.com/FluxML/NNlibCUDA.jl/blob/master/src/batchedmul.jl#L3

julia> using CUDA, NNlib, NNlibCUDA

julia> a=rand(3,3,5);

julia> b=rand(3,3,5);

julia> NNlib.batched_gemm!('N', 'N', 1., a, b, 1., a);

julia> ag=CuArray(a);
  
julia> bg=CuArray(b);

julia> NNlib.batched_gemm!('N', 'N', 1., ag, bg, 1., ag);
ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float64, 3}
Stacktrace:
 [1] unsafe_convert(#unused#::Type{Ptr{Float64}}, x::CuArray{Float64, 3})
   @ CUDA ~/.julia/packages/CUDA/3VnCC/src/array.jl:253
 [2] batched_gemm!(transA::Char, transB::Char, alpha::Float64, A::CuArray{Float64, 3}, B::CuArray{Float64, 3}, beta::Float64, C::CuArray{Float64, 3})
   @ NNlib ~/.julia/packages/NNlib/3MZcC/src/gemm.jl:86
 [3] top-level scope
   @ REPL[7]:1
 [4] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

julia> NNlibCUDA.batched_gemm!('N', 'N', 1., ag, bg, 1., ag);
ERROR: UndefVarError: batched_gemm! not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:26
 [2] top-level scope
   @ REPL[8]:1
 [3] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

julia> NNlibCUDA._batched_gemm!('N', 'N', 1., ag, bg, 1., ag);
ERROR: UndefVarError: _batched_gemm! not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:26
 [2] top-level scope
   @ REPL[9]:1
 [3] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

julia> NNlib._batched_gemm!('N', 'N', 1., ag, bg, 1., ag);
ERROR: MethodError: no method matching _batched_gemm!(::Char, ::Char, ::Float64, ::CuArray{Float64, 3}, ::CuArray{Float64, 3}, ::Float64, ::CuArray{Float64, 3})
Closest candidates are:
  _batched_gemm!(::Type{var"#s117"} where var"#s117"<:Array, ::Char, ::Char, ::Number, ::Any, ::Any, ::Number, ::Any) at /groups/scicompsoft/home/arthurb/.julia/packages/NNlib/3MZcC/src/batched/batchedmul.jl:260
  _batched_gemm!(::Type{var"#s7"} where var"#s7"<:CuArray, ::Char, ::Char, ::Number, ::Any, ::Any, ::Number, ::Any) at /groups/scicompsoft/home/arthurb/.julia/packages/NNlibCUDA/eMmZI/src/batchedmul.jl:3
Stacktrace:
 [1] top-level scope
   @ REPL[10]:1
 [2] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

(@v1.6) pkg> st
      Status `~/.julia/environments/v1.6/Project.toml`
  [6e4b80f9] BenchmarkTools v1.0.0
  [052768ef] CUDA v3.2.1
  [34f1f09b] ClusterManagers v0.4.0
  [864edb3b] DataStructures v0.18.9
  [31c24e10] Distributions v0.25.2
  [5903a43b] Infiltrator v0.3.0
  [4138dd39] JLD v0.12.3
  [872c559c] NNlib v0.7.21
  [a00861dc] NNlibCUDA v0.1.2
  [132c30aa] ProfileSVG v0.2.1

gpu `scatter` with `CartesianIndex` not supported

MWE:

julia> NNlib.scatter(+, randn(10, 2,2), [CartesianIndex{2}(1,1) CartesianIndex{2}(1,2); CartesianIndex{2}(2,1) CartesianIn
dex{2}(2,1)])
10×2×2 Array{Float64, 3}: 
[:, :, 1] =
 -0.254159   0.129012
  0.623626  -0.759501
 -0.252346  -0.880688
 -0.442164   0.391463
  0.504293  -0.0432547
  0.863525   1.47864
 -0.803966   1.46345
 -1.48825   -1.46929
  0.736272   0.170795
 -0.629679   1.98309
[:, :, 2] =
  1.69932    0.0                                                           
 -0.229177   0.0     
  1.76205    0.0                                                     
 -1.46482    0.0                                                                                                          
  0.328287   0.0
  0.459204   0.0                                                     
 -0.351994   0.0
 -0.0894004  0.0
  0.0585911  0.0
 -0.860969   0.0
                                                                                                                          
julia> NNlib.scatter(+, cu(randn(10, 2,2)), cu([CartesianIndex{2}(1,1) CartesianIndex{2}(1,2); CartesianIndex{2}(2,1) Cart
esianIndex{2}(2,1)]))                                                                                                     
ERROR: InvalidIRError: compiling kernel scatter_kernel!(typeof(+), CuDeviceArray{Float32, 3, 1}, CuDeviceArray{Float32, 3,
 1}, CuDeviceMatrix{CartesianIndex{2}, 1}, Int64, Int64, Tuple{Int64}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f__apply_iterate)
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/CUDA/9T5Sq/src/device/intrinsics/atomics.jl:438
 [2] scatter_kernel!
   @ ~/.julia/packages/NNlibCUDA/tXguL/src/scatter.jl:18

No frule for some activations

Hello,

It seems there is no frule for some activations (e.g. sigmoid, relu), so that trying to perform forward mode AD on the GPU fails. You can see the discussion here on the topic. The following example is a little bit more complicated than it needs to be, but it shows the problem:

using ForwardDiff: partials, Dual
using Zygote: pullback
using LinearAlgebra

mutable struct HvpOperator{F, T, I}
	f::F
	x::AbstractArray{T, 1}
	dualCache1::AbstractArray{Dual{Nothing, T, 1}}
	size::I
	nProd::I
end

function HvpOperator(f, x::AbstractVector)
	dualCache1 = Dual.(x, similar(x))
	return HvpOperator(f, x, dualCache1, size(x, 1), 0)
end

Base.eltype(op::HvpOperator{F, T, I}) where{F, T, I} = T
Base.size(op::HvpOperator) = (op.size, op.size)

function LinearAlgebra.mul!(result::AbstractVector, op::HvpOperator, v::AbstractVector)
	op.nProd += 1

	op.dualCache1 .= Dual.(op.x, v)
	val, back = pullback(op.f, op.dualCache1)

	result .= partials.(back(one(val))[1], 1)
end

using Flux, CUDA
data = randn(10, 4) |> gpu
model = Dense(10, 1, σ) |> gpu

ps, re = Flux.destructure(model)
f(θ) = re(θ)(data)

Hop = HvpOperator(f, ps)

v, res = similar(ps), similar(ps)

LinearAlgebra.mul!(res, Hop, v)

which gives the following error:

ERROR: LoadError: MethodError: no method matching cudnnDataType(::Type{Dual{Nothing, Float32, 1}})
Closest candidates are:
  cudnnDataType(::Type{Float16}) at /home/rs-coop/.julia/packages/CUDA/nYggH/lib/cudnn/util.jl:7
  cudnnDataType(::Type{Float32}) at /home/rs-coop/.julia/packages/CUDA/nYggH/lib/cudnn/util.jl:8
  cudnnDataType(::Type{Float64}) at /home/rs-coop/.julia/packages/CUDA/nYggH/lib/cudnn/util.jl:9
  ...
Stacktrace:
  [1] CUDA.CUDNN.cudnnTensorDescriptor(array::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}; format::CUDA.CUDNN.cudnnTensorFormat_t, dims::Vector{Int32})
    @ CUDA.CUDNN ~/.julia/packages/CUDA/nYggH/lib/cudnn/tensor.jl:9
  [2] CUDA.CUDNN.cudnnTensorDescriptor(array::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA.CUDNN ~/.julia/packages/CUDA/nYggH/lib/cudnn/tensor.jl:8
  [3] cudnnActivationForward!(y::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}, x::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}; o::Base.Iterators.Pairs{Symbol, CUDA.CUDNN.cudnnActivationMode_t, Tuple{Symbol}, NamedTuple{(:mode,), Tuple{CUDA.CUDNN.cudnnActivationMode_t}}})
    @ CUDA.CUDNN ~/.julia/packages/CUDA/nYggH/lib/cudnn/activation.jl:22
  [4] (::NNlibCUDA.var"#65#69")(src::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}, dst::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ NNlibCUDA ~/.julia/packages/NNlibCUDA/IeeBk/src/cudnn/activations.jl:11
  [5] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(σ), Tuple{CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}}})
    @ NNlibCUDA ~/.julia/packages/NNlibCUDA/IeeBk/src/cudnn/activations.jl:30
  [6] rrule(#unused#::typeof(Base.Broadcast.broadcasted), #unused#::typeof(σ), x::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ NNlib ~/.julia/packages/NNlib/y5z4i/src/activations.jl:813
  [7] rrule(::Zygote.ZygoteRuleConfig{Zygote.Context}, ::Function, ::Function, ::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ ChainRulesCore ~/.julia/packages/ChainRulesCore/oBjCg/src/rules.jl:134
  [8] chain_rrule
    @ ~/.julia/packages/Zygote/FPUm3/src/compiler/chainrules.jl:216 [inlined]
  [9] macro expansion
    @ ~/.julia/packages/Zygote/FPUm3/src/compiler/interface2.jl:0 [inlined]
 [10] _pullback(::Zygote.Context, ::typeof(Base.Broadcast.broadcasted), ::typeof(σ), ::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface2.jl:9
 [11] _pullback
    @ ~/.julia/packages/Flux/BPPNj/src/layers/basic.jl:158 [inlined]
 [12] _pullback(ctx::Zygote.Context, f::Dense{typeof(σ), CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}, CuArray{Dual{Nothing, Float32, 1}, 1, CUDA.Mem.DeviceBuffer}}, args::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface2.jl:0
 [13] _pullback
    @ ~/Documents/research/masters-thesis/CubicNewton/experiments/test.jl:43 [inlined]
 [14] _pullback(ctx::Zygote.Context, f::typeof(f), args::CuArray{Dual{Nothing, Float32, 1}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface2.jl:0
 [15] _pullback(f::Function, args::CuArray{Dual{Nothing, Float32, 1}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface.jl:34
 [16] pullback(f::Function, args::CuArray{Dual{Nothing, Float32, 1}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface.jl:40
 [17] mul!(result::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, op::HvpOperator{typeof(f), Float32, Int64}, v::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Main ~/Documents/research/masters-thesis/CubicNewton/experiments/test.jl:34
 [18] top-level scope
    @ ~/Documents/research/masters-thesis/CubicNewton/experiments/test.jl:49
 [19] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [20] top-level scope
    @ REPL[14]:1
 [21] top-level scope
    @ ~/.julia/packages/CUDA/nYggH/src/initialization.jl:52

No error occurs when I use a custom version of the activation instead of what is exported from this package.

Unconstrained element type on activations causing errors with e.g. Complex, ForwardDiff.Dual

Currently there doesn't appear to be any type constraints on the element type for the activation function overrides. This causes errors when non-CUDNN types (e.g. Complex, ForwardDiff.Dual) are passed in. Further, this breaks code that works before using NNlibCUDA, e.g.

using CUDA
x = CuArray([1.0 + 1.0im, 2.0 + 2.0im])
tanh.(x)

outputs

2-element CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}:
 1.0839233273386943 + 0.27175258531951174im
 1.0238355945704727 - 0.028392952868232284im

but when immediately followed by

using NNlibCUDA
tanh.(x)

we get

ERROR: MethodError: no method matching cudnnDataType(::Type{ComplexF64})
Closest candidates are:
  cudnnDataType(::Type{Float16}) at C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\util.jl:7
  cudnnDataType(::Type{Float32}) at C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\util.jl:8
  cudnnDataType(::Type{Float64}) at C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\util.jl:9
  ...
Stacktrace:
 [1] CUDA.CUDNN.cudnnTensorDescriptor(array::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}; format::CUDA.CUDNN.cudnnTensorFormat_t, dims::Vector{Int32})
   @ CUDA.CUDNN C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\tensor.jl:9
 [2] CUDA.CUDNN.cudnnTensorDescriptor(array::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer})
   @ CUDA.CUDNN C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\tensor.jl:8
 [3] cudnnActivationForward!(y::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, x::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}; o::Base.Iterators.Pairs{Symbol, CUDA.CUDNN.cudnnActivationMode_t, Tuple{Symbol}, NamedTuple{(:mode,), Tuple{CUDA.CUDNN.cudnnActivationMode_t}}})
   @ CUDA.CUDNN C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\activation.jl:22
 [4] (::NNlibCUDA.var"#74#78")(src::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, dst::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer})
   @ NNlibCUDA C:\Users\domin\.julia\packages\NNlibCUDA\i1IW9\src\cudnn\activations.jl:10
 [5] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, typeof(tanh), Tuple{CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}})
   @ NNlibCUDA C:\Users\domin\.julia\packages\NNlibCUDA\i1IW9\src\cudnn\activations.jl:30

Is there an easy way to add some type constraints to this code to make sure its only applied to CUDNN types? E.g. would changing

function Base.materialize!(dst::DenseCuArray{<:CUDNNFloat},
                                   bc::Broadcast.Broadcasted{<:Any,<:Any,typeof($f),<:Tuple{DenseCuArray}})
...
function Base.materialize(bc::Broadcast.Broadcasted{<:Any,<:Any,typeof($f),<:Tuple{DenseCuArray}})
...

function Base.materialize!(dst::DenseCuArray{<:CUDNNFloat},
                                   bc::Broadcast.Broadcasted{<:Any,<:Any,typeof($f),<:Tuple{DenseCuArray{<:CUDNNNFloat}}})
...
function Base.materialize(bc::Broadcast.Broadcasted{<:Any,<:Any,typeof($f),<:Tuple{DenseCuArray{<:CUDNNFloat}}})
...

suffice? I'm not too familiar with using @eval.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

fluxml / nnlibcuda.jl Goto Github PK

nnlibcuda.jl's Introduction

NNlibCUDA.jl

nnlibcuda.jl's People

Contributors

Stargazers

Watchers

Forkers

nnlibcuda.jl's Issues

NNlibCUDA Heisenbug in conv! with nonzero beta

`batched_gemm!` with CuArrays doesn't work

gpu `scatter` with `CartesianIndex` not supported

No frule for some activations

Unconstrained element type on activations causing errors with e.g. Complex, ForwardDiff.Dual

Increase CUDA.jl compat to 5.0.0?

NNlib 0.9

TagBot trigger issue

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent