I guess half-precision is not officially supported by Flux yet, but as it "almost" wor

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

CUDNN bad param for backward pass with Float16 about nnlibcuda.jl HOT 4 OPEN

fluxml commented on August 30, 2024

CUDNN bad param for backward pass with Float16

from nnlibcuda.jl.

Comments (4)

maleadt commented on August 30, 2024 1

Try running with JULIA_DEBUG=CUDNN (on latest CUDA.jl) and comparing the params to the error causes listed in https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnConvolutionBackwardFilter.

from nnlibcuda.jl.

DrChainsaw commented on August 30, 2024

@denizyuret, @maleadt sorry for poking.

Would it be easy for you to spot if the below function is correct usage of the Cudnn API? Perhaps this issue belongs in CUDA.jl instead if so?

NNlibCUDA.jl/src/cudnn/conv.jl

Lines 86 to 100 in b8baa3d

    
           function ∇conv_filter!(dw::DenseCuArray{T}, x::DenseCuArray{T}, dy::DenseCuArray{T}, 
        
                                  cdims::DenseConvDims; alpha=1, beta=0, algo=-1) where T<:CUDNNFloat 
        
               if cudnnversion() < v"6" 
        
                   all(x -> x == 1, dilation(cdims)) || error("Only dilation = 1 is supported in cuDNN version < 6") 
        
               end 
        
               if algo != -1 
        
                   @warn "The algo option has been deprecated, the fastest algo is computed automatically" maxlog=1 
        
               end     
        
               alpha, beta = scalingParameter(T,alpha), scalingParameter(T,beta); 
        
               xDesc, yDesc, wDesc = cudnnTensorDescriptor(x), cudnnTensorDescriptor(dy), cudnnFilterDescriptor(dw) 
        
               convDesc = cudnnConvolutionDescriptor(cdims, x) 
        
               p = cudnnConvolutionBwdFilterAlgoPerf(xDesc, x, yDesc, dy, convDesc, wDesc, dw); 
        
               @workspace size=p.memory workspace->cudnnConvolutionBackwardFilter(handle(), alpha, xDesc, x, yDesc, dy, convDesc, p.algo, workspace, sizeof(workspace), beta, wDesc, dw); 
        
               return dw 
        
           end

from nnlibcuda.jl.

DrChainsaw commented on August 30, 2024

Thanks. Thats some great debug output!

For some reason it did not print everything the first time, I had to rerun a couple of times until the relevant function appeared.

ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM┌ Debug: CuDNN (v8200) function cudnnConvolutionBackwardFilter() called:
│     handle: type=cudnnHandle_t; streamId=00000000B8F8EDC0;
│     alpha: type=CUDNN_DATA_FLOAT; val=1.000000;
│     xDesc: type=cudnnTensorDescriptor_t:
│         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
│         nbDims: type=int; val=4;
│         dimA: type=int; val=[1,3,3,3];
│         strideA: type=int; val=[27,9,3,1];
│     xData: location=dev; addr=0000000203C01000;
│     dyDesc: type=cudnnTensorDescriptor_t:
│         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
│         nbDims: type=int; val=4;
│         dimA: type=int; val=[1,64,1,1];
│         strideA: type=int; val=[64,1,1,1];
│     dyData: location=dev; addr=0000000203C01600;
│     convDesc: type=cudnnConvolutionDescriptor_t:
│         mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0);
│         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
│         mathType: type=cudnnMathType_t; val=CUDNN_DEFAULT_MATH (0);
│         reorderType: type=int; val=0;
│         arrayLength: type=int; val=2;
│         padA: type=int; val=[0,0];
│         strideA: type=int; val=[1,1];
│         dilationA: type=int; val=[1,1];
│         groupCount: type=int; val=1;
│     algo: type=cudnnConvolutionBwdFilterAlgo_t; val=CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 (1);
│     workSpace: location=dev; addr=0000000203C03000;
│     workSpaceSizeInBytes: type=unsigned long long; val=5400;
│     beta: type=CUDNN_DATA_FLOAT; val=0.000000;
│     dwDesc: type=cudnnFilterDescriptor_t:
│         dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
│         vect: type=int; val=0;
│         nbDims: type=int; val=4;
│         dimA: type=int; val=[64,3,3,3];
│         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
│     dwData: location=dev; addr=0000000203C01A00;
│ Time: 2021-06-14T22:20:18.800688 (0d+0h+1m+4s since start)
│ Process=15972; Thread=14580; GPU=0; Handle=00000000E8189540; StreamId=00000000B8F8EDC0.
└ @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\mVgLI\lib\cudnn\CUDNN.jl:123
 (code 3)
Stacktrace:

I could not spot anything violating the conditions listed for bad param. Looking at the table of supported algos, it seems like the datatypes for xDesc, dyDesc, convDesc and dwDesc represent the TRUE_HALF_CONFIG which is not listed as being supported by CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1, but that should have yielded a CUDNN_STATUS_NOT_SUPPORTED, right?

Attempt to be slightly more than useless by listing each condition and what I think is the relevant part of output:

At least one of the following is NULL: handle, xDesc, dyDesc, convDesc, dwDesc, xData, dyData, dwData, alpha, beta

handle: type=cudnnHandle_t; streamId=00000000B8F8EDC0;
xDesc: type=cudnnTensorDescriptor_t: ...
dyDesc: type=cudnnTensorDescriptor_t: ...
convDesc: type=cudnnConvolutionDescriptor_t: ...
dwDesc: type=cudnnFilterDescriptor_t: ...
xData: location=dev; addr=0000000203C01000;
dyData: location=dev; addr=0000000203C01600;
dwData: location=dev; addr=0000000203C01A00;
alpha: type=CUDNN_DATA_FLOAT; val=1.000000;
beta: type=CUDNN_DATA_FLOAT; val=0.000000;

or could one of those addrs be pointing to NULL?

xDesc and dyDesc have a non-matching number of dimensions

xDesc: nbDims: type=int; val=4;
dyDesc: nbDims: type=int; val=4;

xDesc and dwDesc have a non-matching number of dimensions

xDesc: nbDims: type=int; val=4;
dwDesc: nbDims: type=int; val=4;

xDesc has fewer than three number of dimensions

xDesc: nbDims: type=int; val=4;

xDesc, dyDesc, and dwDesc have a non-matching data type.

xDesc: dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
dyDesc: dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);
dwDesc: dataType: type=cudnnDataType_t; val=CUDNN_DATA_HALF (2);

xDesc and dwDesc have a non-matching number of input feature maps per image (or group in case of grouped convolutions).

xDesc: dimA: type=int; val=[1,3,3,3];
dwDesc: dimA: type=int; val=[64,3,3,3];
              format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);

yDesc or dwDesc indicate an output channel count that isn't a multiple of group count (if group count has been set in convDesc).

 convDesc: groupCount: type=int; val=1;

from nnlibcuda.jl.

DrChainsaw commented on August 30, 2024

Did a bit of hacking around, and it seems like changing the algo to _WINOGRAD_NONFUSED (which seems to be the only one with support for TRUE_HALF_CONFIG ) resulted in CUDNN_STATUS_NOT_SUPPORTED, not sure why since all the fine print in the last column seems to be fulfilled.

Edit: NVM, I missed that the order of returned algos was not deterministic. _WINOGRAD_NONFUSED works. Btw, it says CUDA.CUDNN.CUDNN_STATUS_ALLOC_FAILED for the default workspace size but it seems to succeed anyways (data looks the same as with ALGO_1 and PSEUDO_HALF_CONFIG). This might explain why the small filter size works but not the large. Could changing the workspace size in cudnnFindConvolutionAlgorithmWorkspaceSize fix this, or perhaps accepting CUDNN_STATUS_ALLOC_FAILED in cudnnConvolutionAlgoPerfChoose (sounds risky)?

Changing the convDesc data type to Float32 so that the data type configuration becomes PSEUDO_HALF_CONFIG also works.

Is this the correct fix in cases when _WINOGRAD_NONFUSED is not applicable? The support for TRUE_HALF_CONFIG seems quite limited.

from nnlibcuda.jl.

CUDNN bad param for backward pass with Float16 about nnlibcuda.jl HOT 4 OPEN

Comments (4)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	function ∇conv_filter!(dw::DenseCuArray{T}, x::DenseCuArray{T}, dy::DenseCuArray{T},
	cdims::DenseConvDims; alpha=1, beta=0, algo=-1) where T<:CUDNNFloat
	if cudnnversion() < v"6"
	all(x -> x == 1, dilation(cdims)) \|\| error("Only dilation = 1 is supported in cuDNN version < 6")
	end
	if algo != -1
	@warn "The algo option has been deprecated, the fastest algo is computed automatically" maxlog=1
	end
	alpha, beta = scalingParameter(T,alpha), scalingParameter(T,beta);
	xDesc, yDesc, wDesc = cudnnTensorDescriptor(x), cudnnTensorDescriptor(dy), cudnnFilterDescriptor(dw)
	convDesc = cudnnConvolutionDescriptor(cdims, x)
	p = cudnnConvolutionBwdFilterAlgoPerf(xDesc, x, yDesc, dy, convDesc, wDesc, dw);
	@workspace size=p.memory workspace->cudnnConvolutionBackwardFilter(handle(), alpha, xDesc, x, yDesc, dy, convDesc, p.algo, workspace, sizeof(workspace), beta, wDesc, dw);
	return dw
	end