Code Monkey home page Code Monkey logo

Comments (8)

nikopj avatar nikopj commented on August 30, 2024 1

There is a similar error for gradients with conv and Float16 for 3D/4D/5D tensors as well.

julia> w = rand(Float16, 3, 1, 1) |> gpu;

julia> gradient(x->sum(conv(x, w)), rand(Float16, 16, 1, 1) |> gpu)
┌ Warning: CuDNN (v8600) function cudnnGetConvolutionForwardAlgorithmMaxCount() called:
│     Info: Traceback contains 44 message(s)
│         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: false == cudnn::cnn::isForwardSupported(handle, xDesc, wDesc, cDesc, yDesc, algo)
│         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: T_ENGINEMAP::isLegacyAlgoSupported(handle, xDesc, wDesc, cDesc, yDesc, algo)
   [...]
│ Time: 2023-02-16T23:53:39.684290 (0d+0h+0m+48s since start)
│ Process=2461157; Thread=2461157; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:151
ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
  [1] throw_api_error(res::cuDNN.cudnnStatus_t)
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:11
  [2] macro expansion
    @ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:24 [inlined]
  [3] cudnnConvolutionBackwardFilter(handle::Ptr{cuDNN.cudnnContext}, alpha::Base.RefValue{Float32}, xDesc::cuDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, dyDesc::cuDNN.cudnnTensorDescriptor, dy::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, convDesc::cuDNN.cudnnConvolutionDescriptor, algo::cuDNN.cudnnConvolutionBwdFilterAlgo_t, workSpace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, dwDesc::cuDNN.cudnnFilterDescriptor, dw::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
    @ cuDNN /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
  [4] #36
    @ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:120 [inlined]
  [5] with_workspace(f::NNlibCUDA.var"#36#38"{Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionBwdFilterAlgoPerfStruct, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnConvolutionDescriptor}, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing; keep::Bool)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:77
  [6] with_workspace(f::Function, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:56
  [7] #with_workspace#1
    @ /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53 [inlined]
  [8] with_workspace(f::Function, size::UInt64, fallback::Nothing) (repeats 2 times)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53
  [9] ∇conv_filter!(dw::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, dy::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{1, 1, 1, 2, 1}; alpha::Int64, beta::Int64, algo::Int64)
    @ NNlibCUDA /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:119
 [10] ∇conv_filter!
    @ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:107 [inlined]
 [11] #∇conv_filter#237
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:112 [inlined]
 [12] ∇conv_filter
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:107 [inlined]
 [13] #375
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:351 [inlined]
 [14] unthunk
    @ /scratch/npj226/.julia/packages/ChainRulesCore/a4mIA/src/tangent_types/thunks.jl:204 [inlined]
 [15] wrap_chainrules_output
    @ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:110 [inlined]
 [16] map
    @ ./tuple.jl:223 [inlined]
 [17] map
    @ ./tuple.jl:224 [inlined]
 [18] wrap_chainrules_output
    @ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:111 [inlined]
 [19] ZBack
    @ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:211 [inlined]
 [20] Pullback
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
 [21] (::typeof((#conv#231)))(Δ::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
    @ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:0
 [22] Pullback
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:50 [inlined]
 [23] Pullback
    @ ./REPL[27]:1 [inlined]
 [24] (::Zygote.var"#60#61"{typeof((#30))})(Δ::Float16)
    @ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:45
 [25] gradient(::Function, ::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, ::Vararg{CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}})
    @ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:97
 [26] top-level scope
    @ REPL[27]:1
 [27] top-level scope
    @ /scratch/npj226/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155

from nnlibcuda.jl.

ToucheSir avatar ToucheSir commented on August 30, 2024

Can you set JULIA_DEBUG=CUDA and post the debug output after running the second conv call?

from nnlibcuda.jl.

nikopj avatar nikopj commented on August 30, 2024
julia> conv(rand(Float16, 16, 16, 16, 1, 1) |> gpu, rand(Float16, 3, 3, 3, 1, 1) |> gpu)
┌ Warning: No valid algorithm found, probably bad params for convolution.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:276
┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x23907a00)
│   version: type=int; val=POINTER (IN HEX:0x0x7ffcee3e1fbc)
│  Time: 2023-02-16T23:53:39 elapsed from start 0.883333 minutes or 53.000000 seconds
│ Process=2461157; Thread=22746666410368; GPU=0; Handle=POINTER (IN HEX:0x0x23907a00); StreamId=POINTER (IN HEX:0x0x4db8580); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/cublas/CUBLAS.jl:224
ERROR: ┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x23907a00)
│   version: type=int; val=POINTER (IN HEX:0x0x7ffcee3e1fbc)
│  Time: 2023-02-16T23:53:39 elapsed from start 0.883333 minutes or 53.000000 seconds
│ Process=2461157; Thread=22746666410368; GPU=0; Handle=POINTER (IN HEX:0x0x23907a00); StreamId=POINTER (IN HEX:0x0x4db8580); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/cublas/CUBLAS.jl:224
CUDNNError: ┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x23907a00)
│   version: type=int; val=POINTER (IN HEX:0x0x7ffcee3e1fbc)
│  Time: 2023-02-16T23:53:39 elapsed from start 0.883333 minutes or 53.000000 seconds
│ Process=2461157; Thread=22746666410368; GPU=0; Handle=POINTER (IN HEX:0x0x23907a00); StreamId=POINTER (IN HEX:0x0x4db8580); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
└ @ CUDA.CUBLAS /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/cublas/CUBLAS.jl:224
CUDNN_STATUS_NOT_SUPPORTED┌ Debug:  cuBLAS (v11.8) function cublasStatus_t cublasGetVersion_v2(cublasHandle_t, int*) called:
│   handle: type=cublasHandle_t; val=POINTER (IN HEX:0x0x23907a00)
│   version: type=int; val=POINTER (IN HEX:0x0x7ffcee3e1fbc)
│  Time: 2023-02-16T23:53:39 elapsed from start 0.883333 minutes or 53.000000 seconds
│ Process=2461157; Thread=22746666410368; GPU=0; Handle=POINTER (IN HEX:0x0x23907a00); StreamId=POINTER (IN HEX:0x0x4db8580); MathMode=CUBLAS_DEFAULT_MATH
│  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
│
└ @ CUDA.CUBLAS /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/cublas/CUBLAS.jl:224
 (code 9)
Stacktrace:
  [1] throw_api_error(res::cuDNN.cudnnStatus_t)
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:11
  [2] macro expansion
    @ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:24 [inlined]
  [3] cudnnConvolutionForward(handle::Ptr{cuDNN.cudnnContext}, alpha::Base.RefValue{Float32}, xDesc::cuDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, wDesc::cuDNN.cudnnFilterDescriptor, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, convDesc::cuDNN.cudnnConvolutionDescriptor, algo::cuDNN.cudnnConvolutionFwdAlgo_t, workSpace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::cuDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer})
    @ cuDNN /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
  [4] (::cuDNN.var"#1153#1155"{CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnActivationMode_t, cuDNN.cudnnConvolutionDescriptor, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionFwdAlgoPerfStruct})(workspace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer})
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:105
  [5] with_workspace(f::cuDNN.var"#1153#1155"{CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnActivationMode_t, cuDNN.cudnnConvolutionDescriptor, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionFwdAlgoPerfStruct}, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing; keep::Bool)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:77
  [6] with_workspace(f::Function, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:56
  [7] #with_workspace#1
    @ /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53 [inlined]
  [8] with_workspace(f::Function, size::UInt64, fallback::Nothing) (repeats 2 times)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53
  [9] cudnnConvolutionForwardAD(w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, bias::Nothing, z::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}; y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, activation::cuDNN.cudnnActivationMode_t, convDesc::cuDNN.cudnnConvolutionDescriptor, wDesc::cuDNN.cudnnFilterDescriptor, xDesc::cuDNN.cudnnTensorDescriptor, yDesc::cuDNN.cudnnTensorDescriptor, zDesc::cuDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:103
 [10] cudnnConvolutionForwardWithDefaults(w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}; padding::Int64, stride::Int64, dilation::Int64, mode::cuDNN.cudnnConvolutionMode_t, mathType::cuDNN.cudnnMathType_t, reorderType::cuDNN.cudnnReorderType_t, group::Int64, format::cuDNN.cudnnTensorFormat_t, convDesc::cuDNN.cudnnConvolutionDescriptor, xDesc::cuDNN.cudnnTensorDescriptor, wDesc::cuDNN.cudnnFilterDescriptor, y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, yDesc::cuDNN.cudnnTensorDescriptor, alpha::Int64, beta::Int64, bias::Nothing, z::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, biasDesc::Nothing, zDesc::cuDNN.cudnnTensorDescriptor, activation::cuDNN.cudnnActivationMode_t, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any})
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:96
 [11] #cudnnConvolutionForward!#1150
    @ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:53 [inlined]
 [12] conv!(y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; alpha::Int64, beta::Int64, algo::Int64)
    @ NNlibCUDA /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:67
 [13] conv!
    @ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:58 [inlined]
 [14] #conv#233
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:88 [inlined]
 [15] conv
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:83 [inlined]
 [16] #conv#231
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
 [17] conv(x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer})
    @ NNlib /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:50
 [18] top-level scope
    @ REPL[3]:1
 [19] top-level scope
    @ /scratch/npj226/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155

from nnlibcuda.jl.

ToucheSir avatar ToucheSir commented on August 30, 2024

Ok, that means this might be easier to solve then if it's not dimension specific. I also forgot that cuDNN functionality had been spun off into its own package, sorry. Do you mind rerunning the test with JULIA_DEBUG=cuDNN instead?

from nnlibcuda.jl.

nikopj avatar nikopj commented on August 30, 2024

Ok, JULIA_DEBUG=cuDNN for the 5D conv and 3D gradient cases:

julia> conv(rand(Float16, 16, 16, 16, 1, 1) |> gpu, rand(Float16, 3, 3, 3, 1, 1) |> gpu)
┌ Warning: No valid algorithm found, probably bad params for convolution.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:276
┌ Debug: CuDNN (v8600) function cudnnCreateConvolutionDescriptor() called:
│     convDesc: location=host; addr=0x1526f7168c80;
│ Time: 2023-02-17T00:25:16.053149 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
ERROR: CUDNNError: CUDNN_STATUS_NOT_SUPPORTED┌ Debug: CuDNN (v8600) function cudnnSetConvolutionNdDescriptor() called:
│     convDesc: location=host; addr=0x75faa90;
│     arrayLength: type=int; val=2;
│     padA: type=int; val=[0,0];
│     strideA: type=int; val=[1,1];
│     dilationA: type=int; val=[1,1];
│     mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0);
│     dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
│ Time: 2023-02-17T00:25:16.118505 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
 (code 9)
Stacktrace:
  [1] throw_api_error(res::cuDNN.cudnnStatus_t)
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:11
  [2] macro expansion
    @ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:24 [inlined]
  [3] cudnnConvolutionForward(handle::Ptr{cuDNN.cudnnContext}, alpha::Base.RefValue{Float32}, xDesc::cuDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, wDesc::cuDNN.cudnnFilterDescriptor, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, convDesc::cuDNN.cudnnConvolutionDescriptor, algo::cuDNN.cudnnConvolutionFwdAlgo_t, workSpace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::cuDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer})
    @ cuDNN /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
  [4] (::cuDNN.var"#1153#1155"{CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnActivationMode_t, cuDNN.cudnnConvolutionDescriptor, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionFwdAlgoPerfStruct})(workspace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer})
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:105
  [5] with_workspace(f::cuDNN.var"#1153#1155"{CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnActivationMode_t, cuDNN.cudnnConvolutionDescriptor, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionFwdAlgoPerfStruct}, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing; keep::Bool)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:77
  [6] with_workspace(f::Function, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:56
  [7] #with_workspace#1
    @ /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53 [inlined]
  [8] with_workspace(f::Function, size::UInt64, fallback::Nothing) (repeats 2 times)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53
  [9] cudnnConvolutionForwardAD(w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, bias::Nothing, z::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}; y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, activation::cuDNN.cudnnActivationMode_t, convDesc::cuDNN.cudnnConvolutionDescriptor, wDesc::cuDNN.cudnnFilterDescriptor, xDesc::cuDNN.cudnnTensorDescriptor, yDesc::cuDNN.cudnnTensorDescriptor, zDesc::cuDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:103
 [10] cudnnConvolutionForwardWithDefaults(w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}; padding::Int64, stride::Int64, dilation::Int64, mode::cuDNN.cudnnConvolutionMode_t, mathType::cuDNN.cudnnMathType_t, reorderType::cuDNN.cudnnReorderType_t, group::Int64, format::cuDNN.cudnnTensorFormat_t, convDesc::cuDNN.cudnnConvolutionDescriptor, xDesc::cuDNN.cudnnTensorDescriptor, wDesc::cuDNN.cudnnFilterDescriptor, y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, yDesc::cuDNN.cudnnTensorDescriptor, alpha::Int64, beta::Int64, bias::Nothing, z::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, biasDesc::Nothing, zDesc::cuDNN.cudnnTensorDescriptor, activation::cuDNN.cudnnActivationMode_t, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any})
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:96
 [11] #cudnnConvolutionForward!#1150
    @ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/convolution.jl:53 [inlined]
 [12] conv!(y::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; alpha::Int64, beta::Int64, algo::Int64)
    @ NNlibCUDA /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:67
 [13] conv!
    @ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:58 [inlined]
 [14] #conv#233
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:88 [inlined]
 [15] conv
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:83 [inlined]
 [16] #conv#231
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
 [17] conv(x::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, w::CUDA.CuArray{Float16, 5, CUDA.Mem.DeviceBuffer})
    @ NNlib /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:50
 [18] top-level scope
    @ REPL[4]:1
 [19] top-level scope
    @ /scratch/npj226/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155

julia> w = rand(Float16, 3, 1, 1) |> gpu;

julia> gradient(x->sum(conv(x, w)), rand(Float16, 16, 1, 1) |> gpu)
┌ Debug: CuDNN (v8600) function cudnnSetConvolutionMathType() called:
│     convDesc: location=host; addr=0x75faa90;
│     mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
│ Time: 2023-02-17T00:25:16.118532 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
ERROR: ┌ Debug: CuDNN (v8600) function cudnnCreateTensorDescriptor() called:
│ Time: 2023-02-17T00:25:16.237211 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
CUDNNError: ┌ Debug: CuDNN (v8600) function cudnnSetTensorNdDescriptorEx() called:
│     format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
│     dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
│     nbDims: type=int; val=4;
│     dimA: type=int; val=[1,1,16,16];
│ Time: 2023-02-17T00:25:16.252704 (0d+0h+0m+40s since start)
│ Process=2464303; Thread=2464303; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/cuDNN.jl:149
CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
  [1] throw_api_error(res::cuDNN.cudnnStatus_t)
    @ cuDNN /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:11
  [2] macro expansion
    @ /scratch/npj226/.julia/packages/cuDNN/7X4E7/src/libcudnn.jl:24 [inlined]
  [3] cudnnConvolutionBackwardFilter(handle::Ptr{cuDNN.cudnnContext}, alpha::Base.RefValue{Float32}, xDesc::cuDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, dyDesc::cuDNN.cudnnTensorDescriptor, dy::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, convDesc::cuDNN.cudnnConvolutionDescriptor, algo::cuDNN.cudnnConvolutionBwdFilterAlgo_t, workSpace::CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, dwDesc::cuDNN.cudnnFilterDescriptor, dw::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
    @ cuDNN /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:26
  [4] #36
    @ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:120 [inlined]
  [5] with_workspace(f::NNlibCUDA.var"#36#38"{Base.RefValue{Float32}, Base.RefValue{Float32}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, cuDNN.cudnnConvolutionBwdFilterAlgoPerfStruct, cuDNN.cudnnFilterDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnTensorDescriptor, cuDNN.cudnnConvolutionDescriptor}, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing; keep::Bool)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:77
  [6] with_workspace(f::Function, eltyp::Type{UInt8}, size::CUDA.APIUtils.var"#2#3"{UInt64}, fallback::Nothing)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:56
  [7] #with_workspace#1
    @ /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53 [inlined]
  [8] with_workspace(f::Function, size::UInt64, fallback::Nothing) (repeats 2 times)
    @ CUDA.APIUtils /scratch/npj226/.julia/packages/CUDA/ZdCxS/lib/utils/call.jl:53
  [9] ∇conv_filter!(dw::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, x::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, dy::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{1, 1, 1, 2, 1}; alpha::Int64, beta::Int64, algo::Int64)
    @ NNlibCUDA /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:119
 [10] ∇conv_filter!
    @ /scratch/npj226/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/conv.jl:107 [inlined]
 [11] #∇conv_filter#237
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:112 [inlined]
 [12] ∇conv_filter
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:107 [inlined]
 [13] #375
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:351 [inlined]
 [14] unthunk
    @ /scratch/npj226/.julia/packages/ChainRulesCore/a4mIA/src/tangent_types/thunks.jl:204 [inlined]
 [15] wrap_chainrules_output
    @ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:110 [inlined]
 [16] map
    @ ./tuple.jl:223 [inlined]
 [17] map
    @ ./tuple.jl:224 [inlined]
 [18] wrap_chainrules_output
    @ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:111 [inlined]
 [19] ZBack
    @ /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/chainrules.jl:211 [inlined]
 [20] Pullback
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
 [21] (::typeof((#conv#231)))(Δ::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
    @ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:0
 [22] Pullback
    @ /scratch/npj226/.julia/packages/NNlib/TZPiH/src/conv.jl:50 [inlined]
 [23] Pullback
    @ ./REPL[6]:1 [inlined]
 [24] (::Zygote.var"#60#61"{typeof((#3))})(Δ::Float16)
    @ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:45
 [25] gradient(f::Function, args::CUDA.CuArray{Float16, 3, CUDA.Mem.DeviceBuffer})
    @ Zygote /scratch/npj226/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:97
 [26] top-level scope
    @ REPL[6]:1
 [27] top-level scope
    @ /scratch/npj226/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155

from nnlibcuda.jl.

ToucheSir avatar ToucheSir commented on August 30, 2024

I've been looking into this but haven't found anything conclusive yet. Can you test with NNlibCUDA v0.2.6 and see if it has the same issue? Verifying whether it's a CUDA lib version issue should help us narrow down the possibilities significantly.

Edit: just tested myself and same issue. This is strange, because when I log all the descriptors everything looks fine, but for whatever reason the algo search at https://github.com/JuliaGPU/CUDA.jl/blob/a70c83e2cbe978873a7aa74f2493838b509aa42c/lib/cudnn/src/convolution.jl#L193 is returning CUDNN_STATUS_NOT_SUPPORTED. It feels like I'm missing something blindingly obvious but not sure what, nothing stands out in the cudnn docs.

Edit2: right after I posted the last edit, I realized that Table 30 under https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnConvolutionForward notes that 3D convs only support PSEUDO_HALF_CONFIG and not TRUE_HALF_CONFIG, whereas 2D convs (Table 29) support both. The main difference is that we'd have to set the conv descriptor's dataType to CUDNN_DATA_FLOAT instead of CUDNN_DATA_HALF. This is currently matched to the eltype of x in https://github.com/JuliaGPU/CUDA.jl/blob/a70c83e2cbe978873a7aa74f2493838b509aa42c/lib/cudnn/src/convolution.jl#L69, and my question is whether it makes more sense to have cuDNN.jl or NNlibCUDA check for this (cc @maleadt for thoughts).

P.S. @mcabbott you may be interested in Tables like no. 25 and 26 in https://docs.nvidia.com/deeplearning/cudnn/api/index.html. We were wondering what mixtures of datatypes people might use in the wild and I think such tables provide a pretty exhaustive list.

from nnlibcuda.jl.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.