Code Monkey home page Code Monkey logo

nnlibcuda.jl's Introduction

NNlibCUDA.jl

This is a glue package which extends functions from NNlib.jl to work with CUDA.jl. It should be loaded automatically when using Flux.jl, but not when using NNlib.jl by itself.

Julia gpu kernels are in src/, while wrappers around cudnn are in src/cudnn/.

nnlibcuda.jl's People

Contributors

akashkgarg avatar avik-pal avatar carlolucibello avatar chengchingwen avatar colbec avatar darsnack avatar dhairyalgandhi avatar domcrose avatar domschl avatar drchainsaw avatar maleadt avatar maxfreu avatar mcabbott avatar nikopj avatar pxl-th avatar touchesir avatar yuehhua avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nnlibcuda.jl's Issues

NNlibCUDA Heisenbug in conv! with nonzero beta

See JuliaGPU/CUDA.jl#736

Describe the bug

When using the beta keyword of NNlib.conv! on CuArray there are rare non-deterministic? absurd results.

To reproduce
Run the following on a fresh julia session

using CUDA
using NNlib
x_cpu = fill(1f0, 1,1,1)
w_cpu = fill(1f0, 1,1,1)
x_gpu = CuArray(x_cpu)
w_gpu = CuArray(w_cpu)
cdims = NNlib.DenseConvDims(x_cpu, w_cpu)
y_cpu = fill(0f0, 1,1,1)
y_gpu = CuArray(y_cpu)
NNlib.conv!(y_cpu, x_cpu, w_cpu, cdims, alpha=1f0, beta=1f0)
NNlib.conv!(y_gpu, x_gpu, w_gpu, cdims, alpha=1f0, beta=1f0)
@show y_cpu
@show y_gpu

y_cpu = Float32[1.0]
y_gpu = Float32[2.0]

If I run it again, y_gpu will give the correct result. If I do some fuzz testing, it seems that at least the first conv! operation of given array sizes goes wrong. I think also that it is not only the first operation that goes wrong, but the first operation reliably goes wrong.

using NNlib
using CUDA
using Test
using LinearAlgebra

function fuzz(;max_fails, max_iter)
    fails = 0
    for i in 1:max_iter
        nspacedims = rand(1:1)
        spacesize  = tuple(rand(1:3, nspacedims)...)
        nb         = rand(1:3)
        ncin       = rand(1:3)
        ncout      = rand(1:3)
        
        x_cpu = randn(Float32, spacesize...,ncin, nb)
        kernel_size = ntuple(_->1, nspacedims)
        w_cpu = randn(Float32, kernel_size...,ncin, ncout)
        x_gpu = CuArray(x_cpu)
        w_gpu = CuArray(w_cpu)
        
        cdims = NNlib.DenseConvDims(x_cpu, w_cpu)
        y_cpu = randn(Float32, spacesize...,ncout, nb)
        y_gpu = CuArray(y_cpu)
        NNlib.conv!(y_cpu, x_cpu, w_cpu, cdims, alpha=1f0, beta=1f0)
        NNlib.conv!(y_gpu, x_gpu, w_gpu, cdims, alpha=1f0, beta=1f0)
        if !(collect(y_gpu)  y_cpu)
            @show i
            #@show x_cpu
            #@show x_gpu
            #@show w_cpu
            #@show w_gpu
            #@show y_cpu
            #@show y_gpu
            @show size(x_cpu)
            @show size(w_cpu)
            @show norm(collect(y_gpu) -y_cpu)
            fails += 1
        end
        if fails >= max_fails
            break
        end
    end
    @show fails
end

fuzz(max_fails=1000, max_iter=10000)

I am using current master of CUDA.jl

Manifest.toml

[[AbstractFFTs]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "8ed9de2f1b1a9b1dee48582ad477c6e67b83eb2c"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.0.0"

[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "ffcfa2d345aaee0ef3d8346a073d5dd03c983ebe"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.2.0"

[[ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"

[[Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[BFloat16s]]
deps = ["LinearAlgebra", "Test"]
git-tree-sha1 = "4af69e205efc343068dc8722b8dfec1ade89254a"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.1.0"

[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[CEnum]]
git-tree-sha1 = "215a9aa4a1f23fbd05b92769fdd62559488d70e9"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.4.1"

[[CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CompilerSupportLibraries_jll", "DataStructures", "ExprTools", "GPUArrays", "GPUCompiler", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "MacroTools", "Memoize", "NNlib", "Printf", "Random", "Reexport", "Requires", "SparseArrays", "Statistics", "TimerOutputs"]
git-tree-sha1 = "d891e403471f04266c80a03ecf247d9aff6e7879"
repo-rev = "master"
repo-url = "https://github.com/JuliaGPU/CUDA.jl.git"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "2.6.0"

[[ChainRulesCore]]
deps = ["Compat", "LinearAlgebra", "SparseArrays"]
git-tree-sha1 = "de4f08843c332d355852721adb1592bce7924da3"
uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
version = "0.9.29"

[[Compat]]
deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
git-tree-sha1 = "919c7f3151e79ff196add81d7f4e45d91bbf420b"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "3.25.0"

[[CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"

[[DataStructures]]
deps = ["Compat", "InteractiveUtils", "OrderedCollections"]
git-tree-sha1 = "4437b64df1e0adccc3e5d1adbc3ac741095e4677"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.18.9"

[[Dates]]
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"

[[DelimitedFiles]]
deps = ["Mmap"]
uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"

[[Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"

[[Downloads]]
deps = ["ArgTools", "LibCURL", "NetworkOptions"]
uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"

[[ExprTools]]
git-tree-sha1 = "10407a39b87f29d47ebaca8edbc75d7c302ff93e"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.3"

[[GPUArrays]]
deps = ["AbstractFFTs", "Adapt", "LinearAlgebra", "Printf", "Random", "Serialization"]
git-tree-sha1 = "f99a25fe0313121f2f9627002734c7d63b4dd3bd"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "6.2.0"

[[GPUCompiler]]
deps = ["DataStructures", "ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "Serialization", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "ef2839b063e158672583b9c09d2cf4876a8d3d55"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.10.0"

[[InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"

[[LLVM]]
deps = ["CEnum", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "b616937c31337576360cb9fb872ec7633af7b194"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "3.6.0"

[[LazyArtifacts]]
deps = ["Artifacts", "Pkg"]
uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3"

[[LibCURL]]
deps = ["LibCURL_jll", "MozillaCACerts_jll"]
uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21"

[[LibCURL_jll]]
deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"]
uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0"

[[LibGit2]]
deps = ["Base64", "NetworkOptions", "Printf", "SHA"]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"

[[LibSSH2_jll]]
deps = ["Artifacts", "Libdl", "MbedTLS_jll"]
uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8"

[[Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

[[LinearAlgebra]]
deps = ["Libdl"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

[[Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"

[[MacroTools]]
deps = ["Markdown", "Random"]
git-tree-sha1 = "6a8a2a625ab0dea913aba95c11370589e0239ff0"
uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
version = "0.5.6"

[[Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"

[[MbedTLS_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"

[[Memoize]]
deps = ["MacroTools"]
git-tree-sha1 = "2b1dfcba103de714d31c033b5dacc2e4a12c7caa"
uuid = "c03570c3-d221-55d1-a50c-7939bbd78826"
version = "0.4.4"

[[Mmap]]
uuid = "a63ad114-7e13-5084-954f-fe012c677804"

[[MozillaCACerts_jll]]
uuid = "14a3606d-f60d-562e-9121-12d972cd8159"

[[NNlib]]
deps = ["ChainRulesCore", "Compat", "LinearAlgebra", "Pkg", "Requires", "Statistics"]
git-tree-sha1 = "df42d0816edfc24f5b82a728f46381613c4dff79"
uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
version = "0.7.14"

[[NetworkOptions]]
uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908"

[[OrderedCollections]]
git-tree-sha1 = "4fa2ba51070ec13fcc7517db714445b4ab986bdf"
uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
version = "1.4.0"

[[Pkg]]
deps = ["Artifacts", "Dates", "Downloads", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs"]
uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"

[[Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"

[[REPL]]
deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"]
uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"

[[Random]]
deps = ["Serialization"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[[Reexport]]
git-tree-sha1 = "57d8440b0c7d98fc4f889e478e80f268d534c9d5"
uuid = "189a3867-3050-52da-a836-e630ba90ab69"
version = "1.0.0"

[[Requires]]
deps = ["UUIDs"]
git-tree-sha1 = "cfbac6c1ed70c002ec6361e7fd334f02820d6419"
uuid = "ae029012-a4dd-5104-9daa-d747884805df"
version = "1.1.2"

[[SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"

[[Scratch]]
deps = ["Dates"]
git-tree-sha1 = "ad4b278adb62d185bbcb6864dc24959ab0627bf6"
uuid = "6c6a2e73-6563-6170-7368-637461726353"
version = "1.0.3"

[[Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"

[[SharedArrays]]
deps = ["Distributed", "Mmap", "Random", "Serialization"]
uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"

[[Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"

[[SparseArrays]]
deps = ["LinearAlgebra", "Random"]
uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[[Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

[[TOML]]
deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"

[[Tar]]
deps = ["ArgTools", "SHA"]
uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"

[[Test]]
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[[TimerOutputs]]
deps = ["Printf"]
git-tree-sha1 = "3318281dd4121ecf9713ce1383b9ace7d7476fdd"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.7"

[[UUIDs]]
deps = ["Random", "SHA"]
uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"

[[Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

[[Zlib_jll]]
deps = ["Libdl"]
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"

[[nghttp2_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d"

Version info

Details on Julia:

Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: AMD Ryzen 9 3900 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)
Environment:
  JULIA_NUM_THREADS = 24

Details on CUDA:

CUDA toolkit 11.2.1, artifact installation
CUDA driver 11.2.0
NVIDIA driver 460.32.3

Libraries: 
- CUBLAS: 11.4.1
- CURAND: 10.2.3
- CUFFT: 10.4.0
- CUSOLVER: 11.1.0
- CUSPARSE: 11.4.0
- CUPTI: 14.0.0
- NVML: 11.0.0+460.32.3
- CUDNN: 8.10.0 (for CUDA 11.2.0)
- CUTENSOR: 1.2.2 (for CUDA 11.1.0)

Toolchain:
- Julia: 1.6.0-rc1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

Preferences:
- Memory pool: None
- Async allocation: true

1 device:
  0: GeForce RTX 2060 (sm_75, 4.931 GiB / 5.787 GiB available)

`batched_gemm!` with CuArrays doesn't work

it seems as if there is a dispatch problem with https://github.com/FluxML/NNlibCUDA.jl/blob/master/src/batchedmul.jl#L3

julia> using CUDA, NNlib, NNlibCUDA

julia> a=rand(3,3,5);

julia> b=rand(3,3,5);

julia> NNlib.batched_gemm!('N', 'N', 1., a, b, 1., a);

julia> ag=CuArray(a);
  
julia> bg=CuArray(b);

julia> NNlib.batched_gemm!('N', 'N', 1., ag, bg, 1., ag);
ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float64, 3}
Stacktrace:
 [1] unsafe_convert(#unused#::Type{Ptr{Float64}}, x::CuArray{Float64, 3})
   @ CUDA ~/.julia/packages/CUDA/3VnCC/src/array.jl:253
 [2] batched_gemm!(transA::Char, transB::Char, alpha::Float64, A::CuArray{Float64, 3}, B::CuArray{Float64, 3}, beta::Float64, C::CuArray{Float64, 3})
   @ NNlib ~/.julia/packages/NNlib/3MZcC/src/gemm.jl:86
 [3] top-level scope
   @ REPL[7]:1
 [4] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

julia> NNlibCUDA.batched_gemm!('N', 'N', 1., ag, bg, 1., ag);
ERROR: UndefVarError: batched_gemm! not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:26
 [2] top-level scope
   @ REPL[8]:1
 [3] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

julia> NNlibCUDA._batched_gemm!('N', 'N', 1., ag, bg, 1., ag);
ERROR: UndefVarError: _batched_gemm! not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:26
 [2] top-level scope
   @ REPL[9]:1
 [3] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

julia> NNlib._batched_gemm!('N', 'N', 1., ag, bg, 1., ag);
ERROR: MethodError: no method matching _batched_gemm!(::Char, ::Char, ::Float64, ::CuArray{Float64, 3}, ::CuArray{Float64, 3}, ::Float64, ::CuArray{Float64, 3})
Closest candidates are:
  _batched_gemm!(::Type{var"#s117"} where var"#s117"<:Array, ::Char, ::Char, ::Number, ::Any, ::Any, ::Number, ::Any) at /groups/scicompsoft/home/arthurb/.julia/packages/NNlib/3MZcC/src/batched/batchedmul.jl:260
  _batched_gemm!(::Type{var"#s7"} where var"#s7"<:CuArray, ::Char, ::Char, ::Number, ::Any, ::Any, ::Number, ::Any) at /groups/scicompsoft/home/arthurb/.julia/packages/NNlibCUDA/eMmZI/src/batchedmul.jl:3
Stacktrace:
 [1] top-level scope
   @ REPL[10]:1
 [2] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

(@v1.6) pkg> st
      Status `~/.julia/environments/v1.6/Project.toml`
  [6e4b80f9] BenchmarkTools v1.0.0
  [052768ef] CUDA v3.2.1
  [34f1f09b] ClusterManagers v0.4.0
  [864edb3b] DataStructures v0.18.9
  [31c24e10] Distributions v0.25.2
  [5903a43b] Infiltrator v0.3.0
  [4138dd39] JLD v0.12.3
  [872c559c] NNlib v0.7.21
  [a00861dc] NNlibCUDA v0.1.2
  [132c30aa] ProfileSVG v0.2.1

gpu `scatter` with `CartesianIndex` not supported

MWE:

julia> NNlib.scatter(+, randn(10, 2,2), [CartesianIndex{2}(1,1) CartesianIndex{2}(1,2); CartesianIndex{2}(2,1) CartesianIn
dex{2}(2,1)])
10×2×2 Array{Float64, 3}: 
[:, :, 1] =
 -0.254159   0.129012
  0.623626  -0.759501
 -0.252346  -0.880688
 -0.442164   0.391463
  0.504293  -0.0432547
  0.863525   1.47864
 -0.803966   1.46345
 -1.48825   -1.46929
  0.736272   0.170795
 -0.629679   1.98309
[:, :, 2] =
  1.69932    0.0                                                           
 -0.229177   0.0     
  1.76205    0.0                                                     
 -1.46482    0.0                                                                                                          
  0.328287   0.0
  0.459204   0.0                                                     
 -0.351994   0.0
 -0.0894004  0.0
  0.0585911  0.0
 -0.860969   0.0
                                                                                                                          
julia> NNlib.scatter(+, cu(randn(10, 2,2)), cu([CartesianIndex{2}(1,1) CartesianIndex{2}(1,2); CartesianIndex{2}(2,1) Cart
esianIndex{2}(2,1)]))                                                                                                     
ERROR: InvalidIRError: compiling kernel scatter_kernel!(typeof(+), CuDeviceArray{Float32, 3, 1}, CuDeviceArray{Float32, 3,
 1}, CuDeviceMatrix{CartesianIndex{2}, 1}, Int64, Int64, Tuple{Int64}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f__apply_iterate)
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/CUDA/9T5Sq/src/device/intrinsics/atomics.jl:438
 [2] scatter_kernel!
   @ ~/.julia/packages/NNlibCUDA/tXguL/src/scatter.jl:18

No frule for some activations

Hello,

It seems there is no frule for some activations (e.g. sigmoid, relu), so that trying to perform forward mode AD on the GPU fails. You can see the discussion here on the topic. The following example is a little bit more complicated than it needs to be, but it shows the problem:

using ForwardDiff: partials, Dual
using Zygote: pullback
using LinearAlgebra

mutable struct HvpOperator{F, T, I}
	f::F
	x::AbstractArray{T, 1}
	dualCache1::AbstractArray{Dual{Nothing, T, 1}}
	size::I
	nProd::I
end

function HvpOperator(f, x::AbstractVector)
	dualCache1 = Dual.(x, similar(x))
	return HvpOperator(f, x, dualCache1, size(x, 1), 0)
end

Base.eltype(op::HvpOperator{F, T, I}) where{F, T, I} = T
Base.size(op::HvpOperator) = (op.size, op.size)

function LinearAlgebra.mul!(result::AbstractVector, op::HvpOperator, v::AbstractVector)
	op.nProd += 1

	op.dualCache1 .= Dual.(op.x, v)
	val, back = pullback(op.f, op.dualCache1)

	result .= partials.(back(one(val))[1], 1)
end

using Flux, CUDA
data = randn(10, 4) |> gpu
model = Dense(10, 1, σ) |> gpu

ps, re = Flux.destructure(model)
f(θ) = re(θ)(data)

Hop = HvpOperator(f, ps)

v, res = similar(ps), similar(ps)

LinearAlgebra.mul!(res, Hop, v)

which gives the following error:

ERROR: LoadError: MethodError: no method matching cudnnDataType(::Type{Dual{Nothing, Float32, 1}})
Closest candidates are:
  cudnnDataType(::Type{Float16}) at /home/rs-coop/.julia/packages/CUDA/nYggH/lib/cudnn/util.jl:7
  cudnnDataType(::Type{Float32}) at /home/rs-coop/.julia/packages/CUDA/nYggH/lib/cudnn/util.jl:8
  cudnnDataType(::Type{Float64}) at /home/rs-coop/.julia/packages/CUDA/nYggH/lib/cudnn/util.jl:9
  ...
Stacktrace:
  [1] CUDA.CUDNN.cudnnTensorDescriptor(array::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}; format::CUDA.CUDNN.cudnnTensorFormat_t, dims::Vector{Int32})
    @ CUDA.CUDNN ~/.julia/packages/CUDA/nYggH/lib/cudnn/tensor.jl:9
  [2] CUDA.CUDNN.cudnnTensorDescriptor(array::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA.CUDNN ~/.julia/packages/CUDA/nYggH/lib/cudnn/tensor.jl:8
  [3] cudnnActivationForward!(y::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}, x::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}; o::Base.Iterators.Pairs{Symbol, CUDA.CUDNN.cudnnActivationMode_t, Tuple{Symbol}, NamedTuple{(:mode,), Tuple{CUDA.CUDNN.cudnnActivationMode_t}}})
    @ CUDA.CUDNN ~/.julia/packages/CUDA/nYggH/lib/cudnn/activation.jl:22
  [4] (::NNlibCUDA.var"#65#69")(src::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}, dst::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ NNlibCUDA ~/.julia/packages/NNlibCUDA/IeeBk/src/cudnn/activations.jl:11
  [5] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(σ), Tuple{CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}}})
    @ NNlibCUDA ~/.julia/packages/NNlibCUDA/IeeBk/src/cudnn/activations.jl:30
  [6] rrule(#unused#::typeof(Base.Broadcast.broadcasted), #unused#::typeof(σ), x::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ NNlib ~/.julia/packages/NNlib/y5z4i/src/activations.jl:813
  [7] rrule(::Zygote.ZygoteRuleConfig{Zygote.Context}, ::Function, ::Function, ::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ ChainRulesCore ~/.julia/packages/ChainRulesCore/oBjCg/src/rules.jl:134
  [8] chain_rrule
    @ ~/.julia/packages/Zygote/FPUm3/src/compiler/chainrules.jl:216 [inlined]
  [9] macro expansion
    @ ~/.julia/packages/Zygote/FPUm3/src/compiler/interface2.jl:0 [inlined]
 [10] _pullback(::Zygote.Context, ::typeof(Base.Broadcast.broadcasted), ::typeof(σ), ::CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface2.jl:9
 [11] _pullback
    @ ~/.julia/packages/Flux/BPPNj/src/layers/basic.jl:158 [inlined]
 [12] _pullback(ctx::Zygote.Context, f::Dense{typeof(σ), CuArray{Dual{Nothing, Float32, 1}, 2, CUDA.Mem.DeviceBuffer}, CuArray{Dual{Nothing, Float32, 1}, 1, CUDA.Mem.DeviceBuffer}}, args::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface2.jl:0
 [13] _pullback
    @ ~/Documents/research/masters-thesis/CubicNewton/experiments/test.jl:43 [inlined]
 [14] _pullback(ctx::Zygote.Context, f::typeof(f), args::CuArray{Dual{Nothing, Float32, 1}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface2.jl:0
 [15] _pullback(f::Function, args::CuArray{Dual{Nothing, Float32, 1}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface.jl:34
 [16] pullback(f::Function, args::CuArray{Dual{Nothing, Float32, 1}, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/FPUm3/src/compiler/interface.jl:40
 [17] mul!(result::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, op::HvpOperator{typeof(f), Float32, Int64}, v::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Main ~/Documents/research/masters-thesis/CubicNewton/experiments/test.jl:34
 [18] top-level scope
    @ ~/Documents/research/masters-thesis/CubicNewton/experiments/test.jl:49
 [19] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [20] top-level scope
    @ REPL[14]:1
 [21] top-level scope
    @ ~/.julia/packages/CUDA/nYggH/src/initialization.jl:52

No error occurs when I use a custom version of the activation instead of what is exported from this package.

Unconstrained element type on activations causing errors with e.g. Complex, ForwardDiff.Dual

Currently there doesn't appear to be any type constraints on the element type for the activation function overrides. This causes errors when non-CUDNN types (e.g. Complex, ForwardDiff.Dual) are passed in. Further, this breaks code that works before using NNlibCUDA, e.g.

using CUDA
x = CuArray([1.0 + 1.0im, 2.0 + 2.0im])
tanh.(x)

outputs

2-element CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}:
 1.0839233273386943 + 0.27175258531951174im
 1.0238355945704727 - 0.028392952868232284im

but when immediately followed by

using NNlibCUDA
tanh.(x)

we get

ERROR: MethodError: no method matching cudnnDataType(::Type{ComplexF64})
Closest candidates are:
  cudnnDataType(::Type{Float16}) at C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\util.jl:7
  cudnnDataType(::Type{Float32}) at C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\util.jl:8
  cudnnDataType(::Type{Float64}) at C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\util.jl:9
  ...
Stacktrace:
 [1] CUDA.CUDNN.cudnnTensorDescriptor(array::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}; format::CUDA.CUDNN.cudnnTensorFormat_t, dims::Vector{Int32})
   @ CUDA.CUDNN C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\tensor.jl:9
 [2] CUDA.CUDNN.cudnnTensorDescriptor(array::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer})
   @ CUDA.CUDNN C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\tensor.jl:8
 [3] cudnnActivationForward!(y::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, x::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}; o::Base.Iterators.Pairs{Symbol, CUDA.CUDNN.cudnnActivationMode_t, Tuple{Symbol}, NamedTuple{(:mode,), Tuple{CUDA.CUDNN.cudnnActivationMode_t}}})
   @ CUDA.CUDNN C:\Users\domin\.julia\packages\CUDA\qAl31\lib\cudnn\activation.jl:22
 [4] (::NNlibCUDA.var"#74#78")(src::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, dst::CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer})
   @ NNlibCUDA C:\Users\domin\.julia\packages\NNlibCUDA\i1IW9\src\cudnn\activations.jl:10
 [5] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, typeof(tanh), Tuple{CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}})
   @ NNlibCUDA C:\Users\domin\.julia\packages\NNlibCUDA\i1IW9\src\cudnn\activations.jl:30

Is there an easy way to add some type constraints to this code to make sure its only applied to CUDNN types? E.g. would changing

function Base.materialize!(dst::DenseCuArray{<:CUDNNFloat},
                                   bc::Broadcast.Broadcasted{<:Any,<:Any,typeof($f),<:Tuple{DenseCuArray}})
...
function Base.materialize(bc::Broadcast.Broadcasted{<:Any,<:Any,typeof($f),<:Tuple{DenseCuArray}})
...

to

function Base.materialize!(dst::DenseCuArray{<:CUDNNFloat},
                                   bc::Broadcast.Broadcasted{<:Any,<:Any,typeof($f),<:Tuple{DenseCuArray{<:CUDNNNFloat}}})
...
function Base.materialize(bc::Broadcast.Broadcasted{<:Any,<:Any,typeof($f),<:Tuple{DenseCuArray{<:CUDNNFloat}}})
...

suffice? I'm not too familiar with using @eval.

Increase CUDA.jl compat to 5.0.0?

Just creating an issue to ask if there are blockers for this to happen. I am using NNlibCUDA.jl to access CUDA bindings for convolution routines, but not using Flux at all.

NNlib 0.9

not sure why CompatHelper hasn't issued a PR yet given that its cron is 00 * * * *, but NNlibCUDA is holding back an upgrade of NNlib to v0.9 b/c it's Project.toml only lists v0.8.15

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.