rocm / miopen Goto Github PK
View Code? Open in Web Editor NEWAMD's Machine Intelligence Library
Home Page: https://rocm.docs.amd.com/projects/MIOpen/en/latest/
License: Other
AMD's Machine Intelligence Library
Home Page: https://rocm.docs.amd.com/projects/MIOpen/en/latest/
License: Other
I've been using the rocm-1.6.0 release of hcc to build MIOpen, but I'm getting the error below. I'm seeing this error with v1.2.0 and 1.1.4. Which versions of hcc are supported?
hcc: /builddir/build/BUILD/llvm-rocm-1.6.0/tools/clang/lib/Sema/SemaChecking.cpp:9826: void clang::Sema::DiagnoseAlwaysNonNullPointer(clang::Expr*, clang::Expr::NullPointerConstantKind, bool, clang::SourceRange): Assertion `ParamIter != FD->param_end()' failed.
Hi,
I am trying to run patflick's miopen-benchmark on my R9 Nano.
When I run his AlexNet benchmark, I get 'hipErrorOutOfMemory' from MIOpen, even though I believe 4gb of RAM should be enough to run AlexNet.
Strangely, even if I change the batch size from 128 to 1, I still get the same 'hipErrorOutOfMemory' error (but from different layer).
Below is the output when the batch size is 128. I am using a debug build for MIOpen.
I really appreciate if you could help me figure out what is going on.
$ ./alexnet
[INFO] Number of HIP devices found: 1
[INFO] Device 0: Fiji [Radeon R9 FURY / NANO Series]
[INFO] Arch: 803
[INFO] GMem: 4096 MiB
[INFO] warps: 64
[INFO] CUs: 64
[INFO] MaxClk: 1000000
[INFO] MemClk: 500000
[INFO] drm: /sys/class/drm/card0
[INFO] hwmon: /sys/class/drm/card0/device/hwmon/hwmon1
[DEBUG] Allocating Float Tensor (64,3,11,11), total size: 90 kB
[DEBUG] Allocating Float Tensor (64,3,11,11), total size: 90 kB
[DEBUG] Allocating Float Tensor (128,64,55,55), total size: 96800 kB
[DEBUG] Allocating Float Tensor (128,64,55,55), total size: 96800 kB
[DEBUG] Allocating Float Tensor (128,64,27,27), total size: 23328 kB
[DEBUG] Allocating Float Tensor (192,64,5,5), total size: 1200 kB
[DEBUG] Allocating Float Tensor (192,64,5,5), total size: 1200 kB
[DEBUG] Allocating Float Tensor (128,192,27,27), total size: 69984 kB
[DEBUG] Allocating Float Tensor (128,192,27,27), total size: 69984 kB
[DEBUG] Allocating Float Tensor (128,192,13,13), total size: 16224 kB
[DEBUG] Allocating Float Tensor (384,192,3,3), total size: 2592 kB
[DEBUG] Allocating Float Tensor (384,192,3,3), total size: 2592 kB
[DEBUG] Allocating Float Tensor (128,384,13,13), total size: 32448 kB
[DEBUG] Allocating Float Tensor (128,384,13,13), total size: 32448 kB
[DEBUG] Allocating Float Tensor (256,384,3,3), total size: 3456 kB
[DEBUG] Allocating Float Tensor (256,384,3,3), total size: 3456 kB
[DEBUG] Allocating Float Tensor (128,256,13,13), total size: 21632 kB
[DEBUG] Allocating Float Tensor (128,256,13,13), total size: 21632 kB
[DEBUG] Allocating Float Tensor (256,256,3,3), total size: 2304 kB
[DEBUG] Allocating Float Tensor (256,256,3,3), total size: 2304 kB
[DEBUG] Allocating Float Tensor (128,256,13,13), total size: 21632 kB
[DEBUG] Allocating Float Tensor (128,256,13,13), total size: 21632 kB
[DEBUG] Dims after Features: (128,256,6,6)
[DEBUG] Allocating Float Tensor (128,9216,1,1), total size: 4608 kB
[DEBUG] Allocating Float Tensor (4096,9216,1,1), total size: 147456 kB
[DEBUG] Allocating Float Tensor (4096,9216,1,1), total size: 147456 kB
[DEBUG] Allocating Float Tensor (128,4096,1,1), total size: 2048 kB
[DEBUG] Allocating Float Tensor (128,4096,1,1), total size: 2048 kB
[DEBUG] Allocating Float Tensor (4096,4096,1,1), total size: 65536 kB
[DEBUG] Allocating Float Tensor (4096,4096,1,1), total size: 65536 kB
[DEBUG] Allocating Float Tensor (128,4096,1,1), total size: 2048 kB
[DEBUG] Allocating Float Tensor (128,4096,1,1), total size: 2048 kB
[DEBUG] Allocating Float Tensor (1000,4096,1,1), total size: 16000 kB
[DEBUG] Allocating Float Tensor (1000,4096,1,1), total size: 16000 kB
[DEBUG] Allocating Float Tensor (128,3,224,224), total size: 75264 kB
[DEBUG] Allocating Float Tensor (128,256,6,6), total size: 4608 kB
[INFO] Init fwd
[DEBUG] Allocating Float Tensor (128,1000,1,1), total size: 500 kB
[DEBUG] Init fwd Conv(11x11,pad=2,s=4) (128,3,224,224)->(128,64,55,55) req workspace: 4392300
[DEBUG] >>> Resizing workspace 0 -> 4392300
runcl -DNUM_CH_PER_WG=1 -DNUM_IM_BLKS_X=2 -DNUM_IM_BLKS=14 -DLOCAL_MEM_SIZE=5265 -DSTRIDE_GT_1=1 -DTILE_SZ_X=32 -DTILE_SZ_Y=8 -DUSE_IM_OFF_GUARD=1 src/Kernels/MIOpenUtilKernels.cl -k Im2Col -dumpilisa -r 10 if#0: if#0: if#0: iv#0 10752,1,1/256,1,1
key: miopenIm2Col,
Kernel filename: MIOpenUtilKernels.cl
key: miopenConvolutionFwdAlgoGEMM,tC0_tA0_tB0_colMaj1_m3025_n64_k363_lda3025_ldb363_ldc3025_ws0_f32
key: miopenConvolutionFwdAlgoGEMM_beta,tC0_tA0_tB0_colMaj1_m3025_n64_k363_lda3025_ldb363_ldc3025_ws0_f32
key: miopenConvolutionFwdAlgoGEMM_beta tC0_tA0_tB0_colMaj1_m3025_n64_k363_lda3025_ldb363_ldc3025_ws0_f32
key: miopenConvolutionFwdAlgoGEMM tC0_tA0_tB0_colMaj1_m3025_n64_k363_lda3025_ldb363_ldc3025_ws0_f32
runcl -DMLO_DIR_FORWARD=1 -DMLO_GRP_SZ=256 -DMLO_GRP_SZ0=256 -DMLO_GRP_SZ1=1 -DMLO_GRP_SZ2=1 -DMLO_FILTER_SIZE0=11 -DMLO_FILTER_SIZE1=11 -DMLO_FILTER_PAD0=2 -DMLO_FILTER_PAD1=2 -DMLO_FILTER_STRIDE0=4 -DMLO_FILTER_STRIDE1=4 -DSTRIDE_W=4 -DSTRIDE_H=4 -DMLO_N_OUTPUTS=64 -DMLO_N_INPUTS=3 -DMLO_BATCH_SZ=128 -DMLO_N_BATCH_LOOPS=1 -DMLO_OUT_BATCH_STRIDE=193600 -DMLO_OUT_CHANNEL_STRIDE=3025 -DMLO_OUT_STRIDE=55 -DMLO_IN_BATCH_STRIDE=150528 -DMLO_IN_CHANNEL_STRIDE=50176 -DMLO_IN_STRIDE=224 -DMLO_WEI_BATCH_STRIDE=363 -DMLO_WEI_CHANNEL_STRIDE=121 -DMLO_IN_WIDTH=224 -DMLO_IN_HEIGHT=224 -DMLO_OUT_WIDTH=55 -DMLO_OUT_HEIGHT=55 -DMLO_IN_TILE1=1 -DMLO_IN_TILE0=1 -DMLO_N_LCL_BATCHS=1 -DMLO_N_LCL_OUT_MAPS=6 -DMLO_N_LCL_IN_MAPS=1 -DMLO_IN_PIX_TILE1=1 -DMLO_IN_PIX_TILE0=1 -DMLO_OUT_PIX_TILE1=1 -DMLO_OUT_PIX_TILE0=3 -DMLO_OUT_STACKS=1 -DMLO_IN_STACKS=1 -DMLO_N_WAVES=4 -DMLO_N_FILTER_SPLITS0=3 -DMLO_N_FILTER_SPLITS1=3 -DMLO_PROCESSING_WIDTH=19 -DMLO_OUT_EXTENT1=13 -DMLO_LAST_OUT_EXTENT1=3 -DMLO_N_LCL_BATCHS_PASS2=4 -DMLO_TILE_REPLICATE0=2 -DMLO_TILE_REPLICATE1=1 -DMLO_LCL_BWD_MEM_SZ=726 -DMLO_N_IN_BWD_HORIZ_READS=17 -DMLO_N_IN_BWD_VERT_READS=6 -DMLO_READ_TYPE=_FLOAT10 -DMLO_READ_UNIT=10 -DMLO_HW_WAVE_SZ=64 -DMLO_LG2_WAVE_SZ=6 -DMLO_N_WAVES_MASK=3 -DMLO_CONV_BIAS=0 -cl-denorms-are-zero src/Kernels/MIOpenConvFwd_LxL_11.cl -k MIOpenCvFwd11x11 -dumpilisa -r 10 if#77070336: if#92928: if#99123200: iv#0 1024,11,128/256,1,1
key: miopenConvolutionFwdAlgoDirect,3x224x224x11x11x64x55x55x128xNCHWxFP32x1
Kernel filename: MIOpenConvFwd_LxL_11.cl
runcl -DMLO_DIR_FORWARD=1 -DMLO_GRP_SZ=256 -DMLO_GRP_SZ0=256 -DMLO_GRP_SZ1=1 -DMLO_GRP_SZ2=1 -DMLO_FILTER_SIZE0=11 -DMLO_FILTER_SIZE1=11 -DMLO_FILTER_PAD0=2 -DMLO_FILTER_PAD1=2 -DMLO_FILTER_STRIDE0=4 -DMLO_FILTER_STRIDE1=4 -DSTRIDE_W=4 -DSTRIDE_H=4 -DMLO_N_OUTPUTS=64 -DMLO_N_INPUTS=3 -DMLO_BATCH_SZ=128 -DMLO_N_BATCH_LOOPS=1 -DMLO_OUT_BATCH_STRIDE=193600 -DMLO_OUT_CHANNEL_STRIDE=3025 -DMLO_OUT_STRIDE=55 -DMLO_IN_BATCH_STRIDE=150528 -DMLO_IN_CHANNEL_STRIDE=50176 -DMLO_IN_STRIDE=224 -DMLO_WEI_BATCH_STRIDE=363 -DMLO_WEI_CHANNEL_STRIDE=121 -DMLO_IN_WIDTH=224 -DMLO_IN_HEIGHT=224 -DMLO_OUT_WIDTH=55 -DMLO_OUT_HEIGHT=55 -DMLO_IN_TILE1=1 -DMLO_IN_TILE0=1 -DMLO_N_LCL_BATCHS=1 -DMLO_N_LCL_OUT_MAPS=6 -DMLO_N_LCL_IN_MAPS=1 -DMLO_IN_PIX_TILE1=1 -DMLO_IN_PIX_TILE0=1 -DMLO_OUT_PIX_TILE1=1 -DMLO_OUT_PIX_TILE0=3 -DMLO_OUT_STACKS=1 -DMLO_IN_STACKS=1 -DMLO_N_WAVES=4 -DMLO_N_FILTER_SPLITS0=3 -DMLO_N_FILTER_SPLITS1=3 -DMLO_PROCESSING_WIDTH=19 -DMLO_OUT_EXTENT1=13 -DMLO_LAST_OUT_EXTENT1=3 -DMLO_N_LCL_BATCHS_PASS2=4 -DMLO_TILE_REPLICATE0=2 -DMLO_TILE_REPLICATE1=1 -DMLO_LCL_BWD_MEM_SZ=726 -DMLO_N_IN_BWD_HORIZ_READS=17 -DMLO_N_IN_BWD_VERT_READS=6 -DMLO_READ_TYPE=_FLOAT10 -DMLO_READ_UNIT=10 -DMLO_HW_WAVE_SZ=64 -DMLO_LG2_WAVE_SZ=6 -DMLO_N_WAVES_MASK=3 -DMLO_CONV_BIAS=0 -cl-denorms-are-zero src/Kernels/MIOpenConvFwd_LxL_11.cl -k MIOpenCvFwd11x11_2 -dumpilisa -r 10 if#77070336: if#92928: if#99123200: iv#0 256,11,32/256,1,1
key: miopenConvolutionFwdAlgoDirect_pass2,3x224x224x11x11x64x55x55x128xNCHWxFP32x1x1
[INFO] MIOpen Found 2 fwd algorithms, choosing 1:
[INFO] 0) 1 - time: 5.10683, Memory: 0
[INFO] 1) 0 - time: 15.7082, Memory: 4392300
[DEBUG] Init fwd Conv(5x5,pad=2,s=1) (128,64,27,27)->(128,192,27,27) req workspace: 214466560
[DEBUG] >>> Resizing workspace 4392300 -> 214466560
runcl -DNUM_CH_PER_WG=1 -DNUM_IM_BLKS_X=1 -DNUM_IM_BLKS=4 -DLOCAL_MEM_SIZE=432 -DSTRIDE_GT_1=0 -DTILE_SZ_X=32 -DTILE_SZ_Y=8 -DUSE_IM_OFF_GUARD=1 src/Kernels/MIOpenUtilKernels.cl -k Im2Col -dumpilisa -r 10 if#0: if#0: if#0: iv#0 65536,1,1/256,1,1
key: miopenIm2Col,
Kernel filename: MIOpenUtilKernels.cl
key: miopenConvolutionFwdAlgoGEMM,tC0_tA0_tB0_colMaj1_m729_n192_k1600_lda729_ldb1600_ldc729_ws0_f32
key: miopenConvolutionFwdAlgoGEMM_beta,tC0_tA0_tB0_colMaj1_m729_n192_k1600_lda729_ldb1600_ldc729_ws0_f32
key: miopenConvolutionFwdAlgoGEMM_beta tC0_tA0_tB0_colMaj1_m729_n192_k1600_lda729_ldb1600_ldc729_ws0_f32
key: miopenConvolutionFwdAlgoGEMM tC0_tA0_tB0_colMaj1_m729_n192_k1600_lda729_ldb1600_ldc729_ws0_f32
key: miopenConvolutionFwdAlgoWinograd,64x27x27x5x5x192x27x27x128xNCHWxFP32x1
Kernel filename: conv_u1v1_wheel_alpha_v8_4_4_gfx803.so
runcl -DMLO_HW_WAVE_SZ=64 -DMLO_DIR_FORWARD=1 -DMLO_FILTER_SIZE0=5 -DMLO_FILTER_SIZE1=5 -DMLO_FILTER_PAD0=2 -DMLO_FILTER_PAD1=2 -DMLO_N_OUTPUTS=192 -DMLO_N_INPUTS=64 -DMLO_BATCH_SZ=128 -DMLO_OUT_WIDTH=27 -DMLO_OUT_HEIGHT=27 -DMLO_OUT_BATCH_STRIDE=139968 -DMLO_OUT_CHANNEL_STRIDE=729 -DMLO_OUT_STRIDE=27 -DMLO_IN_WIDTH=27 -DMLO_IN_HEIGHT=27 -DMLO_IN_BATCH_STRIDE=46656 -DMLO_IN_CHANNEL_STRIDE=729 -DMLO_IN_STRIDE=27 -DMLO_IN_TILE0=27 -DMLO_IN_TILE1=27 -DMLO_OUT_TILE0=27 -DMLO_OUT_TILE1=27 -DMLO_GRP_TILE0=16 -DMLO_GRP_TILE1=16 -DMLO_ACTIVE_ALUS=252 -DMLO_N_ALUTILES_PERSTACK=2 -DMLO_OUT_PIX_TILE0=3 -DMLO_OUT_PIX_TILE1=2 -DMLO_N_STACKS=1 -DMLO_N_OUT_TILES=7 -DMLO_N_OUT_TILES_PERSTACK=14 -DMLO_N_IN_TILES_PERSTACK=2 -DMLO_N_READ_PROCS=256 -DMLO_CONV_BIAS=0 -DMLO_ALU_VTILE0=9 -DMLO_ALU_VTILE1=14 src/Kernels/MIOpenConvDirUniC.cl -k MIOpenConvUniC -dumpilisa -r 10 if#23887872: if#1228800: if#71663616: iv#0 256,14,128/256,1,1
key: miopenConvolutionFwdAlgoDirect,64x27x27x5x5x192x27x27x128xNCHWxFP32x1
Kernel filename: MIOpenConvDirUniC.cl
runcl -DCFF_CGEMM_CHOICE_1=1 -DCFF_IMG_SZ_27_27 -DCFF_IMG_H=27 -DCFF_IMG_W=27 -DCFF_BATCH=128 -DCFF_NFILTER=192 -DCFF_CHANNELS=64 -DCFF_HALFW=13404160 src/Kernels/MIOpenConvFFT.cl -k MIOpenConvFFT_fwd_in -dumpilisa -r 10 if#0: if#0: if#0: iv#0 524288,1,1/64,1,1
key: miopenConvolutionFwdAlgoFFT,FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_0
Kernel filename: MIOpenConvFFT.cl
runcl -DCFF_CGEMM_CHOICE_1=1 -DCFF_IMG_SZ_27_27 -DCFF_IMG_H=27 -DCFF_IMG_W=27 -DCFF_BATCH=128 -DCFF_NFILTER=192 -DCFF_CHANNELS=64 -DCFF_HALFW=13404160 src/Kernels/MIOpenConvFFT.cl -k MIOpenConvFFT_fwd_we -dumpilisa -r 10 if#0: if#0: if#0: iv#0 786432,1,1/64,1,1
key: miopenConvolutionFwdAlgoFFT,FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_1
runcl -DCFF_CGEMM_CHOICE_1=1 -DCFF_IMG_SZ_27_27 -DCFF_IMG_H=27 -DCFF_IMG_W=27 -DCFF_BATCH=128 -DCFF_NFILTER=192 -DCFF_CHANNELS=64 -DCFF_HALFW=13404160 src/Kernels/MIOpenConvFFT.cl -k MIOpenConvFFT_transpose_in -dumpilisa -r 10 if#0: if#0: if#0: iv#0 1114112,1,1/256,1,1
key: miopenConvolutionFwdAlgoFFT,FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_2
runcl -DCFF_CGEMM_CHOICE_1=1 -DCFF_IMG_SZ_27_27 -DCFF_IMG_H=27 -DCFF_IMG_W=27 -DCFF_BATCH=128 -DCFF_NFILTER=192 -DCFF_CHANNELS=64 -DCFF_HALFW=13404160 src/Kernels/MIOpenConvFFT.cl -k MIOpenConvFFT_transpose_we -dumpilisa -r 10 if#0: if#0: if#0: iv#0 1671168,1,1/256,1,1
key: miopenConvolutionFwdAlgoFFT,FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_3
runcl -DCFF_CGEMM_CHOICE_1=1 -DCFF_IMG_SZ_27_27 -DCFF_IMG_H=27 -DCFF_IMG_W=27 -DCFF_BATCH=128 -DCFF_NFILTER=192 -DCFF_CHANNELS=64 -DCFF_HALFW=13404160 src/Kernels/MIOpenConvFFT.cl -k MIOpenConvFFT_cgemm -dumpilisa -r 10 if#0: if#0: if#0: iv#0 48,32,544/16,16,1
key: miopenConvolutionFwdAlgoFFT,FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_4
runcl -DCFF_CGEMM_CHOICE_1=1 -DCFF_IMG_SZ_27_27 -DCFF_IMG_H=27 -DCFF_IMG_W=27 -DCFF_BATCH=128 -DCFF_NFILTER=192 -DCFF_CHANNELS=64 -DCFF_HALFW=13404160 src/Kernels/MIOpenConvFFT.cl -k MIOpenConvFFT_transpose_out -dumpilisa -r 10 if#0: if#0: if#0: iv#0 3342336,1,1/256,1,1
key: miopenConvolutionFwdAlgoFFT,FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_5
runcl -DCFF_CGEMM_CHOICE_1=1 -DCFF_IMG_SZ_27_27 -DCFF_IMG_H=27 -DCFF_IMG_W=27 -DCFF_BATCH=128 -DCFF_NFILTER=192 -DCFF_CHANNELS=64 -DCFF_HALFW=13404160 src/Kernels/MIOpenConvFFT.cl -k MIOpenConvFFT_inv_out -dumpilisa -r 10 if#0: if#0: if#0: iv#0 1572864,1,1/64,1,1
key: miopenConvolutionFwdAlgoFFT,FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_6
key: miopenConvolutionFwdAlgoFFT FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_0
key: miopenConvolutionFwdAlgoFFT FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_1
key: miopenConvolutionFwdAlgoFFT FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_2
key: miopenConvolutionFwdAlgoFFT FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_3
key: miopenConvolutionFwdAlgoFFT FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_4
key: miopenConvolutionFwdAlgoFFT FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_5
key: miopenConvolutionFwdAlgoFFT FFT_x_in_h_27_in_w_27_in_n_128_in_c_64_out_c_192_kernel_6
[INFO] MIOpen Found 4 fwd algorithms, choosing 2:
[INFO] 0) 2 - time: 3.4942, Memory: 214466560
[INFO] 1) 3 - time: 6.27657, Memory: 0
[INFO] 2) 1 - time: 11.4962, Memory: 0
[INFO] 3) 0 - time: 21.2581, Memory: 4665600
[DEBUG] Init fwd Conv(3x3,pad=1,s=1) (128,192,13,13)->(128,384,13,13) req workspace: 0
key: miopenConvolutionFwdAlgoWinograd,192x13x13x3x3x384x13x13x128xNCHWxFP32x1
Kernel filename: conv_3x3_wheel_alpha_v3_0b_gfx803_m30.so
[INFO] MIOpen Found 1 fwd algorithms, choosing 3:
[INFO] 0) 3 - time: 2.32301, Memory: 0
[DEBUG] Init fwd Conv(3x3,pad=1,s=1) (128,384,13,13)->(128,256,13,13) req workspace: 0
key: miopenConvolutionFwdAlgoWinograd,384x13x13x3x3x256x13x13x128xNCHWxFP32x1
[INFO] MIOpen Found 1 fwd algorithms, choosing 3:
[INFO] 0) 3 - time: 3.01709, Memory: 0
[DEBUG] Init fwd Conv(3x3,pad=1,s=1) (128,256,13,13)->(128,256,13,13) req workspace: 0
key: miopenConvolutionFwdAlgoWinograd,256x13x13x3x3x256x13x13x128xNCHWxFP32x1
[INFO] MIOpen Found 1 fwd algorithms, choosing 3:
[INFO] 0) 3 - time: 2.0419, Memory: 0
[INFO] Begin warmup runs
[INFO] ======= BEGIN FWD =======
key: miopenConvolutionFwdAlgoDirect 3x224x224x11x11x64x55x55x128xNCHWxFP32x1
key: miopenConvolutionFwdAlgoDirect_pass2 3x224x224x11x11x64x55x55x128xNCHWxFP32x1x1
[INFO] Conv(11x11,pad=2,s=4) (128,3,224,224)->(128,64,55,55): 5.074 ms
runcl -DMLO_NRN_GROUP_SZ0=256 -DMLO_NRN_GROUP_SZ1=1 -DMLO_NRN_OP_ID=3 -DMLO_N_PIXS_OFF=0 -DMLO_MAP_SZ=24780800 -DMLO_MAP_SZ_ALIGNED=6195200 -DMLO_READ_UNIT=4 src/Kernels/MIOpenNeuron.cl -k MIOpenNeuronFwd -dumpilisa -r 10 if#0: if#0: if#0: iv#0 6195200,1,1/256,1,1
key: miopenActivationForward,64x55x55x3x3x64x55x55x128xNCHWxFP32x1
Kernel filename: MIOpenNeuron.cl
MIOpen Error: /home/masa/MIOpen/src/hipoc/hipoc_program.cpp:96: Failed creating module hipErrorOutOfMemory
error: 'StatusUnknownError '(7) at ./layers.hpp:277
We get "Error: There is no device can be used to do the computation" on HIP/CUDA path, while exercising test cases after integrating MiOpen for MXNet HIP Port.
Pre-built MiOpen package is taken from ROCm.($ sudo apt-get install miopen-hip)
Query:
Does pre-built package of MiOpen supports both HIP/CUDA and HIP/ROCm Platforms.
Hi.
I'm new to Radeon openComputing/MIOpen, and I'm interested in
RoCm platforms and MIOpen because I usually use AMD graphic cards
in my Home device.
I often use Chainer as a deep-learning backend library, so I want to know
there is any plan to hipify Chainer and Cupy, as its acceleration library.
And I want to ask what I can do if there has been no plan or it is far from
accomplishing hipify these libraries.
thanks in advance
The following configuration with a 1x1 kernel fails the forward verification and then seems to hang indefinitely inside the miopenFindConvolutionBackwardWeightsAlgorithm
function.
$ ./MIOpenDriver conv -H 14 -W 14 -P 1 -k 512 -c 256 -n 128 -p 0 -q 0 -u 2 -v 2 -x 1 -y 1 -t 1
MIOpenDriver: conv -H 14 -W 14 -P 1 -k 512 -c 256 -n 128 -p 0 -q 0 -u 2 -v 2 -x 1 -y 1 -t 1
MIOpen Forward Conv. Algorithm: 1
GPU Kernel Time Forward Conv. Elapsed: 3.433880 ms
Forward Convolution Verifies on CPU and GPU
MIOpen Backward Data Conv. Algorithm: 0
GPU Kernel Time Backward Data Conv. Elapsed: 104.385124 ms
^C
(waited a good 10 minutes). Interrupting in gdb shows that this hangs inside the miopenFindConvolutionBackwardWeightsAlgorithm
function.
It looks like the contents of gemm.cpp and gemm_geometry.cpp are swapped: the first one containts geometry code while the second provides an interafce to GEMM calls.
There's a bug inside the miopenGemm function call resulting in a memory access fault.
I've tried debugging it a bit. So far I found that the kernel accesses the A matrix one element past its size, ie, it accesses A[a_data_size]
.
The issue can be reproduced with the MIOpenDriver
, which rules out the cause in my code:
$ ./MIOpenDriver gemm -m 128 -k 9216 -n 4096 -v 0
MIOpenDriver: gemm -m 128 -k 9216 -n 4096 -v 0
Memory access fault by GPU node-1 on address 0x1101808000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
This may be an issue inside MIOpenGemm.
I'm running into a problem building MIOpenDriver after a successful HIP build of MIOpen...
make MIOpenDriver
[ 5%] Built target addkernels
[ 97%] Built target MIOpen
[ 97%] Linking CXX executable ../bin/MIOpenDriver
ld: /tmp/tmp.FuopL936mR/main.cpp.host.o: undefined reference to symbol '_ZN5boost6system15system_categoryEv'
//usr/lib/x86_64-linux-gnu/libboost_system.so.1.58.0: error adding symbols: DSO missing from command line
clang-6.0: error: linker command failed with exit code 1 (use -v to see invocation)
driver/CMakeFiles/MIOpenDriver.dir/build.make:128: recipe for target 'bin/MIOpenDriver' failed
make[3]: *** [bin/MIOpenDriver] Error 1
CMakeFiles/Makefile2:3144: recipe for target 'driver/CMakeFiles/MIOpenDriver.dir/all' failed
make[2]: *** [driver/CMakeFiles/MIOpenDriver.dir/all] Error 2
CMakeFiles/Makefile2:3151: recipe for target 'driver/CMakeFiles/MIOpenDriver.dir/rule' failed
make[1]: *** [driver/CMakeFiles/MIOpenDriver.dir/rule] Error 2
Makefile:1313: recipe for target 'MIOpenDriver' failed
make: *** [MIOpenDriver] Error 2
Does MIOpen not support Flatten
layer?
Hello. Further looking into the crash of MIOpen on my system (#5). I found that the OCL kernel function MIOpenConvUniC
is the cause of MIOpen failing tests on my system.
The following code exists in function Conv
inside MIOpenConvUniC.cl
.
uint wei_stg_off = wei_stg_base_off +
o_c * MLO_N_IN_TILES_PERSTACK * MLO_FILTER_SZ +
k_act * MLO_FILTER_SIZE0;
for(uint i = 0; i < MLO_FILTER_SIZE0; ++i)
{
pvt_wei_stage[i] = lcl_wei[wei_stg_off + i];
}
Where the array lcl_wei
has size MLO_WEIGHTS_SZ
(3200 in test 5). But wei_stg_off
can go up to 6395(again, in test 5). This behavior can be shown by adding the following snippet before the for loop and comment the loop out (Else OpenCL crashes and the printf buffer will not be displayed).
if(wei_stg_off+MLO_FILTER_SIZE0 >= MLO_WEIGHTS_SZ)
{
printf("%d\n",wei_stg_off);
}
Tested on Intel OpenCL SDK with OpenCL CPU Device / Nvidia OpenCL (tested on both).
Not sure if this is the best place to put this but in building MIOpen it should be mentioned in the "Configuring with cmake" section that after you create the "build" path and descend into it, "rm ../CMakeCache.txt" should be done otherwise the subsequent configuration step will fail but it will not look like it fails because the generated objects will be placed in path above instead in the current working directory (i.e. "build") which you created to keep your build separate from the source.
It took me awhile to track that down so I'm sure someone else might come across it unless they are intimately familiar with using cmake.
file: src/kernel_cache.cpp
line: 91
std::cout << "runcl " << params << " src/Kernels/" << program_name << " -k " << kernel_name
<< " -dumpilisa -r 10"
<< " if#" << isize * 4 << ": if#" << msize * 4 << ": if#" << osize * 4 << ": iv#0 "
<< vgd[0] << "," << vgd[1] << "," << vgd[2] << "/" << vld[0] << "," << vld[1] << ","
<< vld[2] << std::endl;
When running command MIOpenDriver softmax -n 64 -W 1 -H 1 -c 1000 -V 0 -i 1 -t 1
, with a debug building (NDBUG is undefined)
vgd
will be a const vector which size is 1. (in src/ocl/softmaxocl.cpp#81)
~/ROCm/MIOpen/build# make
[ 0%] Building CXX object addkernels/CMakeFiles/addkernels.dir/include_inliner.cpp.o
[ 3%] Building CXX object addkernels/CMakeFiles/addkernels.dir/addkernels.cpp.o
[ 3%] Linking CXX executable ../bin/addkernels
c++: error: unrecognized command line option ‘-amdgpu-target=gfx803’
c++: error: unrecognized command line option ‘-amdgpu-target=gfx900’
addkernels/CMakeFiles/addkernels.dir/build.make:120: recipe for target 'bin/addkernels' failed
make[2]: *** [bin/addkernels] Error 1
CMakeFiles/Makefile2:382: recipe for target 'addkernels/CMakeFiles/addkernels.dir/all' failed
make[1]: *** [addkernels/CMakeFiles/addkernels.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2
cmake parameters:
CXX=/opt/rocm/hcc/bin/hcc cmake -DMIOPEN_BACKEND=HIP -DCMAKE_PREFIX_PATH="/opt/rocm/hcc;/opt/rocm/hip" -DCMAKE_CXX_FLAGS="-isystem /usr/include/aarch64-linux-gnu/" -DHSA_LIBRARY=/opt/hsa-rocr-dev/lib/libhsa-runtime64.so -DHSA_HEADER=/opt/hsa-rocr-dev/include -DCMAKE_CXX_FLAGS=-Wall ..
The function miopenFindConvolutionBackwardWeightsAlgorithm
fails with Memory access fault
.
This happens for example in the following configuration:
$ ./MIOpenDriver conv -H 57 -W 57 -P 1 -k 64 -c 64 -n 128 -p 1 -q 1
MIOpenDriver: conv -H 57 -W 57 -P 1 -k 64 -c 64 -n 128 -p 1 -q 1
Forward Convolution Verifies on CPU and GPU
Memory access fault by GPU node-1 on address 0x118114f000. Reason: Page not present or supervisor privilege.
[1] 14387 abort (core dumped) ./MIOpenDriver conv -H 57 -W 57 -P 1 -k 64 -c 64 -n 128 -p 1 -q 1
with the following stack trace:
(gdb) bt
#0 0x00007ffff61a6bbf in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#1 0x00007ffff61a6d7a in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#2 0x00007ffff488668c in waitComplete () at /home/jenkins/jenkins-root/workspace/compute-rocm-rel-1.6/external/hcc-tot/lib/hsa/mcwamp_hsa.cpp:3715
#3 0x00007ffff488775d in operator() () at /home/jenkins/jenkins-root/workspace/compute-rocm-rel-1.6/external/hcc-tot/lib/hsa/mcwamp_hsa.cpp:3813
#4 _M_invoke<> () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/functional:1530
#5 operator() () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/functional:1520
#6 operator() () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/future:1342
#7 0x00007ffff48876f2 in _M_invoke () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/functional:1856
#8 0x00007ffff4887667 in operator() () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/functional:2267
#9 _M_do_set () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/future:527
#10 0x00007ffff5f69a99 in __pthread_once_slow (once_control=0x79e9d8, init_routine=0x7ffff7b0cac0 <__once_proxy>) at pthread_once.c:116
#11 0x00007ffff4887c4b in __gthread_once () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0/bits/gthr-default.h:699
#12 call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()> *, bool *), std::__future_base::_State_baseV2 *, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()> *, bool *> () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/mutex:738
#13 _M_set_result () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/future:386
#14 _M_complete_async () at /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/future:1606
#15 0x00007ffff7364ff5 in hipEventSynchronize () from /opt/rocm/hip/lib/libhip_hcc.so
#16 0x00007ffff770d4b1 in run () at /home/patrick/miopen/miopen-git/src/hipoc/hipoc_kernel.cpp:89
#17 0x00007ffff76c2dfd in operator()<int, int, int, int, int, int, int, int, const void *, void *, const void *, int *> () at /home/patrick/miopen/miopen-git/src/include/miopen/hipoc_kernel.hpp:134
#18 FindConvBwdWeightsAlgorithm () at /home/patrick/miopen/miopen-git/src/ocl/convolutionocl.cpp:1463
#19 0x00007ffff75f4fff in operator() () at /home/patrick/miopen/miopen-git/src/convolution_api.cpp:380
#20 try_<(lambda at /home/patrick/miopen/miopen-git/src/convolution_api.cpp:379:25)> () at /home/patrick/miopen/miopen-git/src/include/miopen/errors.hpp:71
#21 miopenFindConvolutionBackwardWeightsAlgorithm () at /home/patrick/miopen/miopen-git/src/convolution_api.cpp:379
#22 0x00000000004140ee in FindBackwardWeights () at /home/patrick/miopen/miopen-git/driver/conv_driver.hpp:764
#23 0x0000000000410cb5 in RunBackwardGPU () at /home/patrick/miopen/miopen-git/driver/conv_driver.hpp:834
#24 0x000000000040b131 in main () at /home/patrick/miopen/miopen-git/driver/main.cpp:106
Trying to collect performance counters for DeepBench apps of the ROCm repo here:
https://github.com/ROCmSoftwarePlatform/DeepBench
The applications run correctly with the following command line. Example:
rcprof -t ./conv_bench
But fail to generate any output:
Failed to generate profile result path/to/out.atp
Further, if we try to run it with the HSA mode of RCP, the application totally fails to run. Example:
$ rcprof -A ./conv_bench
Radeon Compute Profiler V5.3.0 is enabled
Failed to generate profile result path/to/out.atp
I have been working on fixing a bug with a benchmark (bwd_bypass from DNNMark) that is bizarrely using 0 vector registers. After doing some digging, I found the problem is that HCC/HIP is generating no code for the kernel. This comes down to the fact that the behavior of cuDNN and MIOpen for the passthru activation is different -- in cuDNN an explicit copy from the input array to output array will happen, but in MIOpen the copy does not happen. My conversation with the DNNMark developers that documents all of this in more detail is here: shidong-ai/DNNMark#20.
I was wondering if this is a known issue with MIOpen? If so, is there a different setting that should be used to achieve the same effect? Or is this a problem with the activation (passthru) code that needs to be fixed?
Thanks,
Matt
Hello! Why not wrap CopyTensor from C++ api as miopenTransformTensor in c-api?
In C++, it's not possible to convert enum values back into their corresponding names since all the symbols are discarded in compiling.
Having a method to display the error at run-time seems pretty important.
const char* miopenGetErrorString(miopenStatus_t status) {
switch (status) {
case miopenStatusSuccess:
return "miopenStatusSuccess";
case miopenStatusNotInitialized:
return "miopenStatusNotInitialized";
case miopenStatusInvalidValue:
return "miopenStatusInvalidValue";
case miopenStatusBadParm:
return "miopenStatusBadParm";
case miopenStatusAllocFailed:
return "miopenStatusAllocFailed";
case miopenStatusInternalError:
return "miopenStatusInternalError";
case miopenStatusNotImplemented:
return "miopenStatusNotImplemented";
case miopenStatusUnknownError:
return "miopenStatusUnknownError";
default:
return "Unrecognized Status Code";
}
}
When using miopenFindConvolutionForwardAlgorithm me and my group get segfault! We been wondering if it is because of using Malloc and not allocating with opencl is this possible ?
Apparently miopenLRNForward loses value signs of input tensor here unlike cudnn implementation. Isn't this a compatibility issue?
LIDAR is cheap. We need support for 3D convolutions.
Hello, I was using MIOpen as an external library when implementing my own program. But below error occurs when I call the convolution forward API. I tried to link the miopengemm to the program, but it still doesn't work. Anyone got any clue? Thanks in advance!
ROCmSoftwarePlatform/MIOpen/src/gemm.cpp:329: looking for gemm kernel (does not exist): miopenConvolutionFwdAlgoGEMM, tC0_tA0_tB0_colMaj1_m65536_n32_k75_lda65536_ldb75_ldc65536_ws0_f32
Any plans to support MIOpen on Windows platform?
Maybe provide a visual studio solution to make it easier/possible to build on Windows.
Hi, for benchmarking MIOpen with popular Imagenet models, I want Dropout layer implemented in MIOpen.
Sure, it is not hard to implement it myself using hcRNG library, as done in hipcaffe here. But it would be nice if you could implement an optimized one in MIOpen.
Is Dropout layer coming?
Thanks
Thanks in advance! 🤣
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1912.5)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon R6 Graphics
Name: AMD Embedded R-Series RX-416GD Radeon R6
Vendor: AuthenticAMD
I wanna adjust GPU frequency but I don't how to do, not familiar with AMD GPU setting. I knew nVidia has nvidia-smi
command to adjust frequency. Thus, I think maybe AMD support, too. I found rocm-smi
command, but don't know how to install this command tool and if my hardware support?
Of course, I looked up this link: https://rocm.github.io/hardware.html, but due to not familiar AMD series, it seems my hardware (R6 series) doesn't support ROCm, which mean I can't install this tools to adjust frequency, but I'm not sure 🤣
More detailed hardware info. as below using clinfo
:
$ clinfo [148/1970]
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1912.5)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon R6 Graphics
Device Topology: PCI[ B#0, D#1, F#0 ]
Max compute units: 6
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1 Native vector width float: 1 Native vector width double: 1
Max clock frequency: 576Mhz
Address bits: 64
Max memory allocation: 387935232
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 1551740928
Constant buffer size: 65536 [89/1970]
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 387935232
Max global variable size: 349141504
Max global variable preferred total size: 1551740928
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: Yes
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7f361478da18
Name: Carrizo
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 1912.5 (VM)
Profile: FULL_PROFILE
Version: OpenCL 2.0 AMD-APP (1912.5)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base
_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_khr_gl_depth_images cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_
ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16 [30/1970]
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 1200Mhz
Address bits: 64
Max memory allocation: 2147483648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 7297716224
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2147483648
Max global variable size: 1879048192
Max global variable preferred total size: 1879048192
Max read/write image args: 64
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f361478da18
Name: AMD Embedded R-Series RX-416GD Radeon R6
Vendor: AuthenticAMD
Device OpenCL C version: OpenCL C 1.2
Driver version: 1912.5 (sse2,avx,fma4)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1912.5)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base
_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_am
d_popcnt cl_khr_spir cl_khr_gl_event
Is it possible to install miopen on CUDA machines? (opencl is included in CUDA)
Since I don't have ATI card at the moment, I just want to test whether MIOpen can work well using Nvidia OpenCL.
Hi! There are some links in the readme that lead to non-existing pages:
The `MIOpenDriver fails for the Backwards weights for all parameters I tried, including the default ones (see below). Is this reproducible on your side, or is there something wrong with my setup/compilation?
$ ./MIOpenDriver conv
MIOpenDriver: conv
Forward Convolution Verifies on CPU and GPU
Backward Convolution Data Verifies on CPU and GPU
Backward Convolution Weights Failed: 0.993599
Please consider ship MIOpenDriver
in the release package so it's easier for framework developers to diagnose issues.
When running MIOpen code, an a.out
file confusingly appears in your current directory. This function appears to be the culprit.
Since it looks like it is just testing for a compiler bug and the compiled output isn't necessary, adding -o /dev/null
to the arguments should prevent this.
I think the answer is "No", because that's part of the selling point of MIOpen, right? :)
But, the Readme and docs seem to suggest that whichever you choose from HIP/OpenCL, you need a ROCm stack to use MIOpen. If that's not true, it should be made a bit clearer..
Any plans for support Mac? Most Macs have AMD GPUs. Odd to see nothing there.
I am trying to compile hipCaffe with MIOpen in Ubuntu 16.04 with RX480 and Xeon 1231 v3.
I followed the installation manual in the hipCaffe page and succeded compiling without MIOpen but when I try it with MIOpen by modifying Makefile.config, it failes resulting the following error.
CXX .build_release/src/caffe/proto/caffe.pb.cc CXX src/caffe/layer.cpp CXX src/caffe/data_transformer.cpp CXX src/caffe/util/db_lmdb.cpp CXX src/caffe/util/blocking_queue.cpp CXX src/caffe/util/math_functions.cpp CXX src/caffe/util/signal_handler.cpp CXX src/caffe/util/hdf5.cpp In file included from src/caffe/util/db_lmdb.cpp:2: In file included from ./include/caffe/util/db_lmdb.hpp:10: In file included from ./include/caffe/util/db.hpp:6: In file included from ./include/caffe/common.hpp:19: In file included from ./include/caffe/util/device_alternate.hpp:48: ./include/caffe/util/cudnn.hpp:53:40: error: use of undeclared identifier 'miopenDouble' static const miopenDataType_t type = miopenDouble; ^ ./include/caffe/util/cudnn.hpp:100:55: error: use of undeclared identifier 'miopenCrossCorrelation'; did you mean 'miopenConvolution'? MIOPEN_CHECK(miopenInitConvolutionDescriptor(*conv, miopenCrossCorrelation, ^~~~~~~~~~~~~~~~~~~~~~ miopenConvolution ./include/caffe/util/cudnn.hpp:13:29: note: expanded from macro 'MIOPEN_CHECK' miopenStatus_t status = condition; \ ^ /usr/include/miopen.h:218:5: note: 'miopenConvolution' declared here miopenConvolution = 0, /*!< Convolutions */ ^ 2 errors generated. Died at /opt/rocm/bin/hipcc line 452. Makefile:624: recipe for target '.build_release/src/caffe/util/db_lmdb.o' failed make: *** [.build_release/src/caffe/util/db_lmdb.o] Error 1 make: *** Waiting for unfinished jobs.... In file included from src/caffe/util/hdf5.cpp:1: In file included from ./include/caffe/util/hdf5.hpp:9: In file included from ./include/caffe/blob.hpp:8: In file included from ./include/caffe/common.hpp:19: In file included from ./include/caffe/util/device_alternate.hpp:48: ./include/caffe/util/cudnn.hpp:53:40: error: use of undeclared identifier 'miopenDouble' static const miopenDataType_t type = miopenDouble; ^ ./include/caffe/util/cudnn.hpp:100:55: error: use of undeclared identifier 'miopenCrossCorrelation'; did you mean 'miopenConvolution'? MIOPEN_CHECK(miopenInitConvolutionDescriptor(*conv, miopenCrossCorrelation, ^~~~~~~~~~~~~~~~~~~~~~ miopenConvolution ./include/caffe/util/cudnn.hpp:13:29: note: expanded from macro 'MIOPEN_CHECK' miopenStatus_t status = condition; \ ^ /usr/include/miopen.h:218:5: note: 'miopenConvolution' declared here miopenConvolution = 0, /*!< Convolutions */ ^ 2 errors generated. Died at /opt/rocm/bin/hipcc line 452. Makefile:624: recipe for target '.build_release/src/caffe/util/hdf5.o' failed make: *** [.build_release/src/caffe/util/hdf5.o] Error 1 In file included from src/caffe/util/signal_handler.cpp:7: In file included from ./include/caffe/util/signal_handler.h:5: In file included from ./include/caffe/solver.hpp:7: In file included from ./include/caffe/net.hpp:10: In file included from ./include/caffe/blob.hpp:8: In file included from ./include/caffe/common.hpp:19: In file included from ./include/caffe/util/device_alternate.hpp:48: ./include/caffe/util/cudnn.hpp:53:40: error: use of undeclared identifier 'miopenDouble' static const miopenDataType_t type = miopenDouble; ^ ./include/caffe/util/cudnn.hpp:100:55: error: use of undeclared identifier 'miopenCrossCorrelation'; did you mean 'miopenConvolution'? MIOPEN_CHECK(miopenInitConvolutionDescriptor(*conv, miopenCrossCorrelation, ^~~~~~~~~~~~~~~~~~~~~~ miopenConvolution ./include/caffe/util/cudnn.hpp:13:29: note: expanded from macro 'MIOPEN_CHECK' miopenStatus_t status = condition; \ ^ /usr/include/miopen.h:218:5: note: 'miopenConvolution' declared here miopenConvolution = 0, /*!< Convolutions */ ^ 2 errors generated. Died at /opt/rocm/bin/hipcc line 452. Makefile:624: recipe for target '.build_release/src/caffe/util/signal_handler.o' failed make: *** [.build_release/src/caffe/util/signal_handler.o] Error 1 In file included from src/caffe/data_transformer.cpp:8: In file included from ./include/caffe/data_transformer.hpp:6: In file included from ./include/caffe/blob.hpp:8: In file included from ./include/caffe/common.hpp:19: In file included from ./include/caffe/util/device_alternate.hpp:48: ./include/caffe/util/cudnn.hpp:53:40: error: use of undeclared identifier 'miopenDouble' static const miopenDataType_t type = miopenDouble; ^ ./include/caffe/util/cudnn.hpp:100:55: error: use of undeclared identifier 'miopenCrossCorrelation'; did you mean 'miopenConvolution'? MIOPEN_CHECK(miopenInitConvolutionDescriptor(*conv, miopenCrossCorrelation, ^~~~~~~~~~~~~~~~~~~~~~ miopenConvolution ./include/caffe/util/cudnn.hpp:13:29: note: expanded from macro 'MIOPEN_CHECK' miopenStatus_t status = condition; \ ^ /usr/include/miopen.h:218:5: note: 'miopenConvolution' declared here miopenConvolution = 0, /*!< Convolutions */ ^ 2 errors generated. Died at /opt/rocm/bin/hipcc line 452. Makefile:624: recipe for target '.build_release/src/caffe/data_transformer.o' failed make: *** [.build_release/src/caffe/data_transformer.o] Error 1 In file included from src/caffe/util/math_functions.cpp:21: In file included from ./include/caffe/common.hpp:19: In file included from ./include/caffe/util/device_alternate.hpp:48: ./include/caffe/util/cudnn.hpp:53:40: error: use of undeclared identifier 'miopenDouble' static const miopenDataType_t type = miopenDouble; ^ ./include/caffe/util/cudnn.hpp:100:55: error: use of undeclared identifier 'miopenCrossCorrelation'; did you mean 'miopenConvolution'? MIOPEN_CHECK(miopenInitConvolutionDescriptor(*conv, miopenCrossCorrelation, ^~~~~~~~~~~~~~~~~~~~~~ miopenConvolution ./include/caffe/util/cudnn.hpp:13:29: note: expanded from macro 'MIOPEN_CHECK' miopenStatus_t status = condition; \ ^ /usr/include/miopen.h:218:5: note: 'miopenConvolution' declared here miopenConvolution = 0, /*!< Convolutions */ ^ 2 errors generated. Died at /opt/rocm/bin/hipcc line 452. Makefile:624: recipe for target '.build_release/src/caffe/util/math_functions.o' failed make: *** [.build_release/src/caffe/util/math_functions.o] Error 1 In file included from src/caffe/layer.cpp:2: In file included from ./include/caffe/layer.hpp:8: In file included from ./include/caffe/blob.hpp:8: In file included from ./include/caffe/common.hpp:19: In file included from ./include/caffe/util/device_alternate.hpp:48: ./include/caffe/util/cudnn.hpp:53:40: error: use of undeclared identifier 'miopenDouble' static const miopenDataType_t type = miopenDouble; ^ ./include/caffe/util/cudnn.hpp:100:55: error: use of undeclared identifier 'miopenCrossCorrelation'; did you mean 'miopenConvolution'? MIOPEN_CHECK(miopenInitConvolutionDescriptor(*conv, miopenCrossCorrelation, ^~~~~~~~~~~~~~~~~~~~~~ miopenConvolution ./include/caffe/util/cudnn.hpp:13:29: note: expanded from macro 'MIOPEN_CHECK' miopenStatus_t status = condition; \ ^ /usr/include/miopen.h:218:5: note: 'miopenConvolution' declared here miopenConvolution = 0, /*!< Convolutions */ ^ 2 errors generated. Died at /opt/rocm/bin/hipcc line 452. Makefile:624: recipe for target '.build_release/src/caffe/layer.o' failed make: *** [.build_release/src/caffe/layer.o] Error 1 In file included from src/caffe/util/blocking_queue.cpp:4: In file included from ./include/caffe/data_reader.hpp:8: In file included from ./include/caffe/common.hpp:19: In file included from ./include/caffe/util/device_alternate.hpp:48: ./include/caffe/util/cudnn.hpp:53:40: error: use of undeclared identifier 'miopenDouble' static const miopenDataType_t type = miopenDouble; ^ ./include/caffe/util/cudnn.hpp:100:55: error: use of undeclared identifier 'miopenCrossCorrelation'; did you mean 'miopenConvolution'? MIOPEN_CHECK(miopenInitConvolutionDescriptor(*conv, miopenCrossCorrelation, ^~~~~~~~~~~~~~~~~~~~~~ miopenConvolution ./include/caffe/util/cudnn.hpp:13:29: note: expanded from macro 'MIOPEN_CHECK' miopenStatus_t status = condition; \ ^ /usr/include/miopen.h:218:5: note: 'miopenConvolution' declared here miopenConvolution = 0, /*!< Convolutions */ ^ 2 errors generated. Died at /opt/rocm/bin/hipcc line 452. Makefile:624: recipe for target '.build_release/src/caffe/util/blocking_queue.o' failed make: *** [.build_release/src/caffe/util/blocking_queue.o] Error 1
I built them with make -j8.
In the debug build for MIOpen, I'm experiencing segfaults inside the FindConvFwdAlgorithm function.
Note that this happens only in the debug (CMAKE_BUILD_TYPE=Debug
) build of MIOpen. The CMAKE_BUILD_TYPE=Release
doesn't have this issue.
I tried to create a minimal example, but its still quite long. Steps to reproduce:
I build the current master
version of MIOpen with:
mkdir debug && cd debug
cmake -DCMAKE_BUILD_TYPE=Debug ../
make
sudo make install
The following is the almost-minimal code to reproduce this issue, file: conv_segfault.cpp
:
#include <hip/hip_runtime_api.h>
#include <miopen/miopen.h>
#include <stdio.h>
#include <iostream>
#define CHECK_HIP(cmd) \
{\
hipError_t hip_error = cmd;\
if (hip_error != hipSuccess) { \
fprintf(stderr, "error: '%s'(%d) at %s:%d\n", hipGetErrorString(hip_error), hip_error,__FILE__, __LINE__); \
exit(EXIT_FAILURE);\
}\
}
#define CHECK_MIO(cmd) \
{\
miopenStatus_t miostat = cmd;\
if (miostat != miopenStatusSuccess) { \
fprintf(stderr, " MIOpen error (%d) at %s:%d\n", miostat,__FILE__, __LINE__); \
exit(EXIT_FAILURE);\
}\
}
struct Tensor {
miopenTensorDescriptor_t desc;
void* data;
size_t data_size;
Tensor(int n, int c, int h, int w) {
CHECK_MIO(miopenCreateTensorDescriptor(&desc));
CHECK_MIO(miopenSet4dTensorDescriptor(desc, miopenFloat, n, c, h, w));
data_size = n*c*h*w*sizeof(float);
CHECK_HIP(hipMalloc(&data, data_size));
}
};
int main(int argc, char *argv[])
{
int devcount;
CHECK_HIP(hipGetDeviceCount(&devcount));
std::cout << "Number of HIP devices found: " << devcount << std::endl;
if (devcount <= 0)
exit(EXIT_FAILURE);
miopenHandle_t mio_handle;
CHECK_MIO(miopenCreate(&mio_handle));
/* create conv desc */
miopenConvolutionDescriptor_t convdesc;
CHECK_MIO(miopenCreateConvolutionDescriptor(&convdesc));
CHECK_MIO(miopenInitConvolutionDescriptor(convdesc, miopenConvolution, 1, 1, 1, 1, 1, 1));
// create input, output and weights tensors
Tensor input(128, 3, 32, 32);
Tensor output(128, 64, 32, 32);
Tensor weights(64, 3, 3, 3);
// create workspace
size_t workspace_size;
void* workspace;
CHECK_MIO(miopenConvolutionForwardGetWorkSpaceSize(mio_handle, weights.desc, input.desc, convdesc, output.desc, &workspace_size));
CHECK_HIP(hipMalloc(&workspace, workspace_size));
// findalgo: this segfaults
miopenConvAlgoPerf_t perfs[4];
int returned_algos;
CHECK_MIO(miopenFindConvolutionForwardAlgorithm(mio_handle, input.desc, input.data, weights.desc, weights.data, convdesc, output.desc, output.data, 4, &returned_algos, perfs, workspace, workspace_size, false));
return 0;
}
Compile with:
/opt/rocm/hip/bin/hipcc -g --amdgpu-target=gfx900 -I/opt/rocm/hip/include -I/opt/rocm/include conv_segfault.cpp -L/opt/rocm/lib -L/opt/rocm/opencl/lib/x86_64 -lMIOpen -o segfault
Running the resulting executable yields:
Number of HIP devices found: 1
Device Name: gfx900
runcl -DNUM_CH_PER_WG=1 -DNUM_IM_BLKS_X=1 -DNUM_IM_BLKS=4 -DLOCAL_MEM_SIZE=385 -DSTRIDE_GT_1=0 -DTILE_SZ_X=32 -DTILE_SZ_Y=8 -DUSE_IM_OFF_GUARD=1 src/Kernels/MIOpenUtilKernels.cl -k Im2Col -dumpilisa -r 10 if#0: if#0: if#0: iv#0 3072,1,1/256,1,1
key: miopenIm2Col,
Kernel filename: MIOpenUtilKernels.cl
Segmentation fault (core dumped)
GDB stacktrace:
Thread 1 "segfault" received signal SIGSEGV, Segmentation fault.
0x00007fffe8e8fad9 in clSetKernelArg () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
(gdb) bt
#0 0x00007fffe8e8fad9 in clSetKernelArg () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#1 0x00007ffff64d6233 in miopen::OCLSetKernelArg::operator()<std::integral_constant<unsigned long, 1ul>, _cl_mem*> (this=0x7fffffffc800, kernel=0xb628a0, i=..., x=@0x7fffffffc9d8: 0x110131d000) at /home/patrick/miopen/miopen-git/src/include/miopen/oclkernel.hpp:64
#2 0x00007ffff64d4a78 in std::_Bind<miopen::OCLSetKernelArg (_cl_kernel*, std::_Placeholder<1>, std::_Placeholder<2>)>::__call<void, std::integral_constant<unsigned long, 1ul>&&, _cl_mem* const&, 0ul, 1ul, 2ul>(std::tuple<std::integral_constant<unsigned long, 1ul>&&, _cl_mem* const&>&&, std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x7fffffffc800, __args=<unknown type in /opt/rocm/lib/libMIOpen.so.1, CU 0x104f80, DIE 0x13e0aa>) at /usr/include/c++/5/functional:1074
#3 0x00007ffff64d3385 in std::_Bind<miopen::OCLSetKernelArg (_cl_kernel*, std::_Placeholder<1>, std::_Placeholder<2>)>::operator()<std::integral_constant<unsigned long, 1ul>, _cl_mem* const&, void>(std::integral_constant<unsigned long, 1ul>&&, _cl_mem* const&) (this=0x7fffffffc800) at /usr/include/c++/5/functional:1133
#4 0x00007ffff6533e55 in miopen::detail::each_args_i_impl<std::_Bind<miopen::OCLSetKernelArg (_cl_kernel*, std::_Placeholder<1>, std::_Placeholder<2>)>, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul, int const&, _cl_mem* const&, unsigned long const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, _cl_mem* const&>(std::_Bind<miopen::OCLSetKernelArg (_cl_kernel*, std::_Placeholder<1>, std::_Placeholder<2>)>, miopen::detail::seq<0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul>, int const&, _cl_mem* const&, unsigned long const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, _cl_mem* const&) (f=...) at /home/patrick/miopen/miopen-git/src/include/miopen/each_args.hpp:68
#5 0x00007ffff6533b33 in miopen::each_args_i<std::_Bind<miopen::OCLSetKernelArg (_cl_kernel*, std::_Placeholder<1>, std::_Placeholder<2>)>, int const&, _cl_mem* const&, unsigned long const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, _cl_mem* const&>(std::_Bind<miopen::OCLSetKernelArg (_cl_kernel*, std::_Placeholder<1>, std::_Placeholder<2>)>, int const&, _cl_mem* const&, unsigned long const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, int const&, _cl_mem* const&) (f=...) at /home/patrick/miopen/miopen-git/src/include/miopen/each_args.hpp:83
#6 0x00007ffff65337c0 in miopen::OCLKernelInvoke::operator()<int, _cl_mem*, unsigned long, int, int, int, int, int, int, int, int, int, int, _cl_mem*> (this=0x7fffffffccc0) at /home/patrick/miopen/miopen-git/src/include/miopen/oclkernel.hpp:97
#7 0x00007ffff6532e55 in miopen::Im2ColGPU (handle=..., data_size=393216, im=0x110131d000, im_offset=0, c=3, h=32, w=32, wei_h=3, wei_w=3, out_h=32, out_w=32, pad_h=1, pad_w=1, stride_h=1, stride_w=1, col=0x11034a2000) at /home/patrick/miopen/miopen-git/src/ocl/utilocl.cpp:92
#8 0x00007ffff64fd85d in miopen::ConvolutionDescriptor::FindConvFwdAlgorithm (this=0xc719e0, handle=..., xDesc=..., x=0x110131d000, wDesc=..., w=0x110349f000, yDesc=..., y=0x110149e000, requestAlgoCount=4, returnedAlgoCount=0x7fffffffd68c, perfResults=0x7fffffffd690, workSpace=0x11034a2000, workSpaceSize=110592, exhaustiveSearch=false) at /home/patrick/miopen/miopen-git/src/ocl/convolutionocl.cpp:360
#9 0x00007ffff6465c85 in <lambda()>::operator()(void) const (__closure=0x7fffffffd460) at /home/patrick/miopen/miopen-git/src/convolution_api.cpp:166
#10 0x00007ffff646b6fb in miopen::try_<miopenFindConvolutionForwardAlgorithm(miopenHandle_t, miopenTensorDescriptor_t, void const*, miopenTensorDescriptor_t, void const*, miopenConvolutionDescriptor_t, miopenTensorDescriptor_t, void*, int, int*, miopenConvAlgoPerf_t*, void*, size_t, bool)::<lambda()> >(<lambda()>) (f=...) at /home/patrick/miopen/miopen-git/src/include/miopen/errors.hpp:71
#11 0x00007ffff646646f in miopenFindConvolutionForwardAlgorithm (handle=0xb45760, xDesc=0xc106d0, x=0x110131d000, wDesc=0xc6f890, w=0x110349f000, convDesc=0xc719e0, yDesc=0xc108e0, y=0x110149e000, requestAlgoCount=4, returnedAlgoCount=0x7fffffffd68c, perfResults=0x7fffffffd690, workSpace=0x11034a2000, workSpaceSize=110592, exhaustiveSearch=false) at /home/patrick/miopen/miopen-git/src/convolution_api.cpp:167
#12 0x0000000000419776 in main (argc=1, argv=0x7fffffffd878) at conv_segfault.cpp:69
conv_u1v1_wheel_alpha_v8_4_4_gfx803.so etc does not work with ROCm-OpenCL-Runtime master branch since the metadata format has changed. A 'SymbolName' metadata is required, which is usually the function_name@kd. Can those files be updated? Thanks.
2018-07-17 09:05:55.622488: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-17 09:05:55.622826: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution
1, Where to find AMD device OpenCL header files?
In readme.md it said "OpenCL libraries and header files" are one of prerequisites when using OpenCL backend. I successfully installed 17.30 amdgpu-pro linux driver on CentOS 7.3, and OpenCL works fine. But the driver only ships with OpenCL libraries, no header files. I also cannot find OpenCL SDK on developer.amd.com. So I have to use the old AMD APP SDK 3.0 OpenCL header files which I downloaded before.
In readme.md:
For OpenCL, run:
cmake -DMIOPEN_BACKEND=OpenCL ..
The above assumes that OpenCL is installed in one of the standard locations. If not, then manually set these two cmake variables:...
What is the standard locations for OpenCL header files?
2, cmake linking error as follows:
[root@GPU build]# cmake --build . --config Release --target install
[ 3%] Built target addkernels
Linking CXX shared library ../lib/libMIOpen.so
/usr/bin/ld: cannot find -lOpenSSL::Crypto
collect2: error: ld returned 1 exit status
gmake[2]: *** [lib/libMIOpen.so.1] Error 1
gmake[1]: *** [src/CMakeFiles/MIOpen.dir/all] Error 2
gmake: *** [all] Error 2
How to link OpenSSL::Crypto? I have installed openssl and openssl-devel by "yum install".
3, dependencies check also failed as follows (since linking failure, I have to go back to check what's the problem):
[root@GPU MIOpen-master]# /root/software/cmake-3.9.4-Linux-x86_64/bin/cmake -P install_deps.cmake -DOPENCL_INCLUDE_DIRS=/root/OpenCLNet/providers/AMD/include/
CMake Warning (dev) at /usr/local/share/cmake/cmakeget/CMakeGet.cmake:264 (list):
Policy CMP0007 is not set: list command no longer ignores empty elements.
Run "cmake --help-policy CMP0007" for policy details. Use the cmake_policy
command to set the policy and suppress this warning. List has value =
[https:;;github.com;pfultz2;rocm-recipes;archive;HEAD.tar.gz].
Call Stack (most recent call first):
/usr/local/share/cmake/cmakeget/CMakeGet.cmake:347 (cget_fetch)
install_deps.cmake:50 (cmake_get)
This warning is for project developers. Use -Wno-dev to suppress it.
Downloading https://github.com/pfultz2/rocm-recipes/archive/HEAD.tar.gz
-- [download 0% complete]
-- [download 22% complete]
-- [download 43% complete]
-- [download 65% complete]
-- [download 86% complete]
-- [download 100% complete]
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning (dev) in CMakeLists.txt:
No cmake_minimum_required command is present. A line of code such as
cmake_minimum_required(VERSION 3.9)
should be added at the top of the file. The version specified may be lower
if you wish to support older CMake versions for this project. For more
information run "cmake --help-policy CMP0000".
This warning is for project developers. Use -Wno-dev to suppress it.
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
CGET_CMAKE_ORIGINAL_SOURCE_FILE
OPENCL_INCLUDE_DIRS
-- Build files have been written to: /tmp/cget-00-03-29-enY67-1/build
Install the project...
-- Install configuration: ""
-- Up-to-date: /usr/local/etc/cget/recipes//boost
-- Up-to-date: /usr/local/etc/cget/recipes//boost/1.58
-- Up-to-date: /usr/local/etc/cget/recipes//boost/1.58/package.txt
-- Up-to-date: /usr/local/etc/cget/recipes//boost/1.58/requirements.txt
-- Up-to-date: /usr/local/etc/cget/recipes//boost/package.txt
-- Up-to-date: /usr/local/etc/cget/recipes//boost/requirements.txt
-- Up-to-date: /usr/local/etc/cget/recipes//bzip2
-- Up-to-date: /usr/local/etc/cget/recipes//bzip2/build.cmake
-- Up-to-date: /usr/local/etc/cget/recipes//bzip2/package.txt
-- Up-to-date: /usr/local/etc/cget/recipes//libressl
-- Up-to-date: /usr/local/etc/cget/recipes//libressl/build.cmake
-- Up-to-date: /usr/local/etc/cget/recipes//libressl/package.txt
-- Up-to-date: /usr/local/etc/cget/recipes//zlib
-- Up-to-date: /usr/local/etc/cget/recipes//zlib/package.txt
-- Up-to-date: /usr/local/share//cmake
-- Up-to-date: /usr/local/share//cmake/cget-recipe-utils
-- Up-to-date: /usr/local/share//cmake/cget-recipe-utils/cget-recipe-utils-config.cmake
CMake Warning (dev) at /usr/local/share/cmake/cmakeget/CMakeGet.cmake:264 (list):
Policy CMP0007 is not set: list command no longer ignores empty elements.
Run "cmake --help-policy CMP0007" for policy details. Use the cmake_policy
command to set the policy and suppress this warning. List has value =
[https:;;github.com;RadeonOpenCompute;rocm-cmake;archive;cb666a28b261fe63ffbcfcf3fee946b1941df604.tar.gz].
Call Stack (most recent call first):
/usr/local/share/cmake/cmakeget/CMakeGet.cmake:347 (cget_fetch)
/usr/local/share/cmake/cmakeget/CMakeGet.cmake:405 (cmake_get)
install_deps.cmake:51 (cmake_get_from)
This warning is for project developers. Use -Wno-dev to suppress it.
Downloading https://github.com/RadeonOpenCompute/rocm-cmake/archive/cb666a28b261fe63ffbcfcf3fee946b1941df604.tar.gz
-- [download 0% complete]
-- [download 17% complete]
-- [download 33% complete]
-- [download 50% complete]
-- [download 66% complete]
-- [download 83% complete]
-- [download 100% complete]
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
CGET_CMAKE_ORIGINAL_SOURCE_FILE
OPENCL_INCLUDE_DIRS
-- Build files have been written to: /tmp/cget-00-03-34-OhWsk-1/build
Install the project...
-- Install configuration: ""
-- Up-to-date: /usr/local/./share
-- Up-to-date: /usr/local/./share/rocm
-- Up-to-date: /usr/local/./share/rocm/cmake
-- Up-to-date: /usr/local/./share/rocm/cmake/ROCMConfig.cmake
-- Up-to-date: /usr/local/./share/rocm/cmake/ROCMCreatePackage.cmake
-- Up-to-date: /usr/local/./share/rocm/cmake/ROCMInstallSymlinks.cmake
-- Up-to-date: /usr/local/./share/rocm/cmake/ROCMInstallTargets.cmake
-- Up-to-date: /usr/local/./share/rocm/cmake/ROCMPackageConfigHelpers.cmake
-- Up-to-date: /usr/local/./share/rocm/cmake/ROCMSetupVersion.cmake
CMake Warning (dev) at /usr/local/share/cmake/cmakeget/CMakeGet.cmake:264 (list):
Policy CMP0007 is not set: list command no longer ignores empty elements.
Run "cmake --help-policy CMP0007" for policy details. Use the cmake_policy
command to set the policy and suppress this warning. List has value =
[https:;;github.com;ROCmSoftwarePlatform;MIOpenGEMM;archive;3f7555519207aaf203424afff82e7d912e1020e4.tar.gz].
Call Stack (most recent call first):
/usr/local/share/cmake/cmakeget/CMakeGet.cmake:347 (cget_fetch)
/usr/local/share/cmake/cmakeget/CMakeGet.cmake:405 (cmake_get)
install_deps.cmake:51 (cmake_get_from)
This warning is for project developers. Use -Wno-dev to suppress it.
Downloading https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/archive/3f7555519207aaf203424afff82e7d912e1020e4.tar.gz
-- [download 0% complete]
-- [download 1% complete]
-- [download 2% complete]
-- [download 3% complete]
-- [download 4% complete]
-- [download 5% complete]
-- [download 6% complete]
-- [download 7% complete]
-- [download 8% complete]
-- [download 9% complete]
-- [download 10% complete]
-- [download 11% complete]
-- [download 12% complete]
-- [download 13% complete]
-- [download 14% complete]
-- [download 15% complete]
-- [download 16% complete]
-- [download 17% complete]
-- [download 18% complete]
-- [download 19% complete]
-- [download 20% complete]
-- [download 21% complete]
-- [download 22% complete]
-- [download 23% complete]
-- [download 24% complete]
-- [download 25% complete]
-- [download 26% complete]
-- [download 27% complete]
-- [download 28% complete]
-- [download 29% complete]
-- [download 30% complete]
-- [download 31% complete]
-- [download 32% complete]
-- [download 33% complete]
-- [download 34% complete]
-- [download 35% complete]
-- [download 36% complete]
-- [download 37% complete]
-- [download 38% complete]
-- [download 39% complete]
-- [download 40% complete]
-- [download 41% complete]
-- [download 42% complete]
-- [download 43% complete]
-- [download 44% complete]
-- [download 45% complete]
-- [download 46% complete]
-- [download 47% complete]
-- [download 48% complete]
-- [download 49% complete]
-- [download 50% complete]
-- [download 51% complete]
-- [download 52% complete]
-- [download 53% complete]
-- [download 54% complete]
-- [download 55% complete]
-- [download 56% complete]
-- [download 57% complete]
-- [download 58% complete]
-- [download 59% complete]
-- [download 60% complete]
-- [download 61% complete]
-- [download 62% complete]
-- [download 63% complete]
-- [download 64% complete]
-- [download 65% complete]
-- [download 66% complete]
-- [download 67% complete]
-- [download 68% complete]
-- [download 69% complete]
-- [download 70% complete]
-- [download 71% complete]
-- [download 72% complete]
-- [download 73% complete]
-- [download 74% complete]
-- [download 75% complete]
-- [download 76% complete]
-- [download 77% complete]
-- [download 78% complete]
-- [download 79% complete]
-- [download 80% complete]
-- [download 81% complete]
-- [download 82% complete]
-- [download 83% complete]
-- [download 84% complete]
-- [download 85% complete]
-- [download 86% complete]
-- [download 87% complete]
-- [download 88% complete]
-- [download 89% complete]
-- [download 90% complete]
-- [download 91% complete]
-- [download 92% complete]
-- [download 93% complete]
-- [download 94% complete]
-- [download 95% complete]
-- [download 96% complete]
-- [download 97% complete]
-- [download 98% complete]
-- [download 99% complete]
-- [download 100% complete]
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OPENCL: /lib64/libOpenCL.so
-- Could NOT find LATEX (missing: LATEX_COMPILER)
Latex builder not found. To build PDF documentation run make in /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/doc/pdf, once a latex builder is installed.
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
CGET_CMAKE_ORIGINAL_SOURCE_FILE
-- Build files have been written to: /tmp/cget-00-03-37-Wkv0D-1/build
Scanning dependencies of target miopengemm
[ 2%] Building CXX object miopengemm/CMakeFiles/miopengemm.dir/src/accuracytests.cpp.o
[ 4%] Building CXX object miopengemm/CMakeFiles/miopengemm.dir/src/alphagenerator.cpp.o
[ 6%] Building CXX object miopengemm/CMakeFiles/miopengemm.dir/src/apitest.cpp.o
[ 8%] Building CXX object miopengemm/CMakeFiles/miopengemm.dir/src/architests.cpp.o
[ 11%] Building CXX object miopengemm/CMakeFiles/miopengemm.dir/src/basegenerator.cpp.o
[ 13%] Building CXX object miopengemm/CMakeFiles/miopengemm.dir/src/betacgenerator.cpp.o
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp: In instantiation of ‘MIOpenGEMM::apitest::RunStats MIOpenGEMM::apitest::supa_gemm0(_cl_command_queue*&, const MIOpenGEMM::Geometry&, const MIOpenGEMM::Offsets&, T, T, size_t, bool, MIOpenGEMM::apitest::GemmImpl, bool, MIOpenGEMM::owrite::Writer&, const MIOpenGEMM::setabcw::CpuMemBundle) [with T = float; cl_command_queue = _cl_command_queue; size_t = long unsigned int]’:
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:519:73: required from here
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:402:21: error: call of overloaded ‘abs(float&)’ is ambiguous
x = std::abs(x);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:409:21: error: call of overloaded ‘abs(float&)’ is ambiguous
x = std::abs(x);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:416:21: error: call of overloaded ‘abs(float&)’ is ambiguous
x = std::abs(x);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:420:69: error: call of overloaded ‘abs(const float&)’ is ambiguous
gg, toff, A_abs.data(), B_abs.data(), C_abs.data(), std::abs(alpha), std::abs(beta), mowri);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:420:86: error: call of overloaded ‘abs(const float&)’ is ambiguous
gg, toff, A_abs.data(), B_abs.data(), C_abs.data(), std::abs(alpha), std::abs(beta), mowri);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp: In instantiation of ‘MIOpenGEMM::apitest::RunStats MIOpenGEMM::apitest::supa_gemm0(_cl_command_queue*&, const MIOpenGEMM::Geometry&, const MIOpenGEMM::Offsets&, T, T, size_t, bool, MIOpenGEMM::apitest::GemmImpl, bool, MIOpenGEMM::owrite::Writer&, const MIOpenGEMM::setabcw::CpuMemBundle) [with T = double; cl_command_queue = _cl_command_queue; size_t = long unsigned int]’:
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:531:74: required from here
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:402:21: error: call of overloaded ‘abs(double&)’ is ambiguous
x = std::abs(x);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:409:21: error: call of overloaded ‘abs(double&)’ is ambiguous
x = std::abs(x);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:416:21: error: call of overloaded ‘abs(double&)’ is ambiguous
x = std::abs(x);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:420:69: error: call of overloaded ‘abs(const double&)’ is ambiguous
gg, toff, A_abs.data(), B_abs.data(), C_abs.data(), std::abs(alpha), std::abs(beta), mowri);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:420:86: error: call of overloaded ‘abs(const double&)’ is ambiguous
gg, toff, A_abs.data(), B_abs.data(), C_abs.data(), std::abs(alpha), std::abs(beta), mowri);
^
In file included from /usr/local/include/c++/5.4.0/cstdlib:72:0,
from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/include/stdlib.h:770:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW attribute ((const)) __wur;
^
In file included from /usr/local/include/c++/5.4.0/bits/stl_algo.h:59:0,
from /usr/local/include/c++/5.4.0/algorithm:62,
from /tmp/cget-00-03-37-Wkv0D-1/download/MIOpenGEMM-3f7555519207aaf203424afff82e7d912e1020e4/miopengemm/src/apitest.cpp:4:
/usr/local/include/c++/5.4.0/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
/usr/local/include/c++/5.4.0/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
gmake[2]: *** [miopengemm/CMakeFiles/miopengemm.dir/src/apitest.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [miopengemm/CMakeFiles/miopengemm.dir/all] Error 2
gmake: *** [all] Error 2
CMake Error at /usr/local/share/cmake/cmakeget/CMakeGet.cmake:130 (message):
Process failed:
COMMAND;/root/software/cmake-3.9.4-Linux-x86_64/bin/cmake;--build;/tmp/cget-00-03-37-Wkv0D-1/build;--config;Release;--;-j;4
Call Stack (most recent call first):
/usr/local/share/cmake/cmakeget/CMakeGet.cmake:183 (cget_exec)
/usr/local/share/cmake/cmakeget/CMakeGet.cmake:354 (cget_install_dir)
/usr/local/share/cmake/cmakeget/CMakeGet.cmake:405 (cmake_get)
install_deps.cmake:51 (cmake_get_from)
Could anyone please give me some instructions?
Hello! We have been struggling a lot with the miopenFindConvolutionForwardAlgorithm function. Every time we call it we get segfault. We have allocated memory for the buffers using OpenCL. We can not seem to find the problem. We are novice programmers in OpenCL and might be allocating memory incorrectly, or doing something else wrong. We have included our code below. All help is much appreciated!
#include <iostream>
#include <cstdio>
#include "activ_driver.hpp"
#include "bn_driver.hpp"
#include "conv_driver.hpp"
#include "driver.hpp"
#include "gemm_driver.hpp"
#include "lrn_driver.hpp"
#include "pool_driver.hpp"
#include "softmax_driver.hpp"
#include "rnn_driver.hpp"
#include "miopen/config.h"
#include <miopen/miopen.h>
#include <miopen/tensor.hpp>
#include <miopen/env.hpp>
#include <miopen/convolution.hpp>
#include <CL/cl.h>
#include <CL/cl.hpp>
#include <CL/cl_ext.h>
#include <CL/cl_gl.h>
#include <CL/cl_gl_ext.h>
#include <CL/cl_platform.h>
#include <CL/opencl.h>
#include <stdio.h>
#include <stdlib.h>
#include <algorithm>
#include <cstdlib>
#include <cstring>
#include <float.h>
#include <fstream>
#include <memory>
using namespace std;
#define MEM_FLAG CL_MEM_ALLOC_HOST_PTR
int main()
{
/* _______DECLARATIONS_______ */
int status = 0; // Error codes
int n = 1; // Mini-Batch size (how many images)
int c = 1; // Nr of channels (eg RGB is 3, grayscale is 1)
int stride = 1;
int padding = 1;
int dilation = 1;
size_t WorkSpaceSize = 0;
int requestAlgoCount = 1;
int returnedAlgoCount = 0;
bool ExhaustiveSearch = 0;
/* _______CREATING THE ENVIRONMENT_______ */
miopenHandle_t Network;
status = miopenCreate(&Network);
if (status!=0)
{
printf("Error in miopencreate. Error code %d",status);
return 0;
}
/* _______CREATING THE INPUT TENSOR_______ */
miopenTensorDescriptor_t InputTensorDesc;
status = miopenCreateTensorDescriptor(&InputTensorDesc);
if (status!=0)
{
printf("Error in miopenCreateTensorDescriptor. Error code %d",status);
return 0;
}
status = miopenSet4dTensorDescriptor( InputTensorDesc,
miopenFloat,
n, //batch size
c, //nr of channels
10, //data height img.rows
10); //data width img.cols
if (status!=0)
{
printf("Error in miopenSet4dTensorDescriptor. Error code %d",status);
return 0;
}
/* _________CREATING THE OUTPUT TENSOR______________ */
miopenTensorDescriptor_t OutputTensorDesc;
status = miopenCreateTensorDescriptor(&OutputTensorDesc);
if (status!=0)
{
printf("Error in miopencreate. Error code %d",status);
return 0;
}
status = miopenSet4dTensorDescriptor( OutputTensorDesc,
miopenFloat,
n,
c, //nr of channels
10, //data height img.rows
10); //data width img.cols
if (status!=0)
{
printf("Error in miopenSet4dTensorDescriptor. Error code %d",status);
return 0;
}
/* _______________KERNEL CREATION_____________________________ */
miopenTensorDescriptor_t KernelDescriptor;
status = miopenCreateTensorDescriptor(&KernelDescriptor);
if (status!=0)
{
printf("Error in miopencreate. Error code %d",status);
return 0;
}
status = miopenSet4dTensorDescriptor( KernelDescriptor,
miopenFloat,
n,
c, // nr of channels
3, // filter height
3); // filter width
if (status!=0)
{
printf("Error in miopenSet4dTensorDescriptor. Error code %d",status);
return 0;
}
/* _________CREATING THE CONVOLUTION DESCRIPTOR______________ */
miopenConvolutionDescriptor_t ConvDesc;
status = miopenCreateConvolutionDescriptor(&ConvDesc);
if (status!=0)
{
printf("Error in miopenCreateConvolutionDescriptor. Error code %d",status);
return 0;
}
status = miopenInitConvolutionDescriptor( ConvDesc,
miopenConvolution,
padding, //h
padding, //w
stride, //h
stride, //w
dilation, //h
dilation); //w
if (status!=0)
{
printf("Error in miopenCreateConvolutionDescriptor. Error code %d",status);
return 0;
}
/* _________CONVOLUTION GET WORKSPACE SIZE______________ */
miopenConvolutionDescriptor_t const const_ConvDesc = ConvDesc;
miopenTensorDescriptor_t const const_KernelDescriptor = KernelDescriptor;
miopenTensorDescriptor_t const const_InputTensorDesc = InputTensorDesc;
miopenTensorDescriptor_t const const_OutputTensorDesc = OutputTensorDesc;
status = miopenConvolutionForwardGetWorkSpaceSize( Network,
const_KernelDescriptor,
const_InputTensorDesc,
const_ConvDesc,
const_OutputTensorDesc,
&WorkSpaceSize );
if (status!=0)
{
printf("Error in miopenConvolutionForwardGetWorkSpaceSize. Err code %d",status);
return 0;
}
cout << "WorkSpaceSize = " << WorkSpaceSize << " bytes\n";
/* _______OPENCL ALLOCATING MEMORY_______ */
cl_platform_id platform; // PLATFORM-TYPE
clGetPlatformIDs(1, &platform, NULL); // ADRESS TO PLATFORM
cl_device_id device; // DEVICE-TYPE
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL); // ADRESS TO DEVICE
cl_context context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL); // CONTEXT
// QUEUES
cl_command_queue queue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, NULL);
cl_command_queue queue2 = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, NULL);
cl_command_queue queue3 = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, NULL);
cl_command_queue queue4 = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, NULL);
// DECLARATIONS
char buf[4096]; // CHARBUFFER FOR DEVICE NAME
cl_int errorcode; // FOR ERRORCODES
int buffSize = 1024*3600; // MEMORY TO ALLOCATE FOR WORKSPACE
int imageSize = 1024*sizeof(int); // MEMORY TO ALLOCATE FOR THE IMAGE
int outimageSize = 1024*sizeof(int); // MEMORY TO ALLOCATE FOR THE OUTIMAGE
int kernelWeightsSize = 1024*sizeof(int); // MEMORY TO ALLOCATE FOR THE OUTIMAGE
// ALLOCATE OBJECTS ON GPU
cl_mem buff = clCreateBuffer(context, MEM_FLAG, buffSize, NULL, NULL);
cl_mem image = clCreateBuffer(context, MEM_FLAG, imageSize, NULL, NULL);
cl_mem outimage = clCreateBuffer(context, MEM_FLAG, outimageSize, NULL, NULL);
cl_mem kernelweights = clCreateBuffer(context, MEM_FLAG, kernelWeightsSize, NULL, NULL);
// QUERIE DEVICE NAME
errorcode = clGetDeviceInfo(device, CL_DEVICE_NAME, sizeof(buf), buf, NULL);
if (errorcode != CL_SUCCESS)
{
printf("Error getting cl Device Info");
exit(-1);
}
printf("Device Name : %s\n", buf);
// GET POINTER TO MEMORY IN GPU
void *WorkSpace = clEnqueueMapBuffer(queue, buff, CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, buffSize, 0, NULL, NULL, &errorcode);
if (errorcode != CL_SUCCESS)
{
printf("Error getting cl Device Info");
exit(-1);
}
cout << "WorkSpace adress = " << WorkSpace << endl;
cout << "Memeory allocated = " << buffSize << endl;
int *imageDummy = (int*)clEnqueueMapBuffer(queue2, image, CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, imageSize, 0, NULL, NULL, &errorcode);
if (errorcode != CL_SUCCESS)
{
printf("Error getting cl Device Info");
exit(-1);
}
cout << "imageDummy adress = " << imageDummy << endl;
cout << "Memeory allocated = " << imageSize << endl;
// CREATING DUMMY IMAGE
for (int i = 0; i < 100; i++)
{
imageDummy[i] = 1;
}
// TESTING IF IMAGE IS ON GPU
for (int i = 0; i < 100; i++)
{
cout << imageDummy[i] << " ";
}
cout << "\n\n\n";
// GET POINTER TO MEMORY IN GPU
void *outimagePtr = clEnqueueMapBuffer(queue3, outimage, CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, outimageSize, 0, NULL, NULL, &errorcode);
if (errorcode != CL_SUCCESS)
{
printf("Error getting cl Device Info");
exit(-1);
}
cout << "outimagePtr adress = " << outimagePtr << endl;
cout << "Memeory allocated = " << outimageSize << endl;
void *kernelWeightsPtr = clEnqueueMapBuffer(queue4, kernelweights, CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, kernelWeightsSize, 0, NULL, NULL, &errorcode);
if (errorcode != CL_SUCCESS)
{
printf("Error getting cl Device Info");
exit(-1);
}
/* _________CONVOLUTION FIND FORWARD ALGORITM______________ */
miopenConvAlgoPerf_t PrefConvAlg;
status = miopenFindConvolutionForwardAlgorithm( Network,
const_InputTensorDesc,
imageDummy, //Data tensor
const_KernelDescriptor,
kernelWeightsPtr, //Weights tensor
const_ConvDesc,
const_OutputTensorDesc,
outimagePtr, //Data tensor,
requestAlgoCount, //request algorithm count
&returnedAlgoCount, //returned Alg Count
&PrefConvAlg, //Pointer to union of best algo returned
WorkSpace, //Ptr to workspace required for the search
WorkSpaceSize, //size in bytes of the memory needed for find
ExhaustiveSearch //A boolean to toggle a full search of all algo and config
);
cout << "Find Algorithm finished\n";
if (status!=0)
{
printf("Error in miopenFindConvolutionForwardAlgorithm. Error code %d /n",status);
return 0;
}
/* _________UNMAP THE POINTERs & CLEANING______________ */
clEnqueueUnmapMemObject(queue, buff, WorkSpace, 0, NULL, NULL);
clEnqueueUnmapMemObject(queue2, image, imageDummy, 0, NULL, NULL);
clEnqueueUnmapMemObject(queue3, outimage, outimagePtr, 0, NULL, NULL);
clEnqueueUnmapMemObject(queue4, kernelweights, kernelWeightsPtr, 0, NULL, NULL);
clReleaseMemObject(buff);
clReleaseMemObject(image);
clReleaseMemObject(outimage);
clReleaseMemObject(kernelweights);
clReleaseCommandQueue(queue);
clReleaseCommandQueue(queue2);
clReleaseCommandQueue(queue3);
clReleaseCommandQueue(queue4);
clReleaseContext(context);
miopenDestroyConvolutionDescriptor(ConvDesc);
miopenDestroyTensorDescriptor(InputTensorDesc);
miopenDestroyTensorDescriptor(KernelDescriptor);
miopenDestroyTensorDescriptor(OutputTensorDesc);
miopenDestroy(Network);
return 0;
}
When I ran CaffeNet inference sample after following the installation steps in https://github.com/ROCmSoftwarePlatform/hipCaffe/blob/hip/README.ROCm.md, I observed the following warning message. I obtained the same result with the example on the hipCaffe quick start guide. Could you please let me know if this warning can be ignored?
MIOpen(HIP): Warning [ReadFile] File is unreadable.
It looks the warning is caused by failure in access to /opt/rocm/miopen/share/miopen/db/gfx803_16.cd.pdb.txt which does not exist. There are only following files under /opt/rocm/miopen/share/miopen/db/ in my environment.
gfx803_64.cd.pdb.txt gfx900_64.cd.pdb.txt
GPU: RX-560
OS: Ubuntu 16.04.3
Package versions:
rocm-dkms/rocm-libs 1.7.60
hip_base 1.4.17494
hcc 1.1.17493
miopen-hip 1.2.0-a9949e3
miopengemm 1.1.5-3c3145b
Hello. I found that MIOpen can be built with the OpenCL backed without ROCm. Doing so on a non ROCm OpenCL (nvidia/intel OpenCL to be pacific) platform will make MIOpen to fail half of it's test. Applying #5 's fix will let MIOpen pass 8 tests(test 5 and 6 failed).
Here is the log.
Is there any chance that this can be fixed so that MIOpen can run on non AMD hardware? I think that it will be a great feature for MIOpen to have. With a cross platform MIOpen, developers can get descent performance via MIOpen on any hardware. Thus make the life of developers easier. And make more people use MIOpen,
Hey everyone,
I get this error when I try to run and build the tests:
Start 1: test_activation
1/12 Test #1: test_activation ..................***Exception: Other 0.28 sec
FAILED: /ROCm/MIOpen/src/ocl/clhelper.cpp:101: Error Building OpenCL Program in BuildProgram()
Error: The binary is incorrect or incomplete. Finalization to ISA couldn't be performed.
Build Program Failure
OS: ubuntu 16.04
ROCm Kernel: 4.11.0-kfd-compute-rocm-rel-1.6-180
GPU: WX 7100
The build process crashed on this line:
clhelper.cpp:88:
auto status = clBuildProgram(program, 1, &device, params.c_str(), nullptr, nullptr);
When I build HelloWorld to test if ROCm succesful installed, I get no error, although HelloWorld.cpp contains the same function.
HelloWorld sample:
wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cpp
wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cl
The output of cmake -DMIOPEN_BACKEND=OpenCL ..
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OPENCL: /usr/lib/libOpenCL.so
-- OpenCL backend selected.
-- AMDGCN assembler: MIOPEN_AMDGCN_ASSEMBLER-NOTFOUND
-- Build with miopengemm
-- Found OpenSSL: /usr/local/lib/libssl.so;/usr/local/lib/libcrypto.so (found version "2.0.0")
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:140 (find_package)
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:140 (find_package)
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- Clang tidy not found
-- Clang tidy checks: *,-android-cloexec-fopen,-cert-err60-cpp,-cert-msc30-c,-cert-msc50-cpp,-clang-analyzer-alpha.core.CastToStruct,-clang-analyzer-optin.performance.Padding,-clang-diagnostic-deprecated-declarations,-clang-diagnostic-extern-c-compat,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-constant-array-index,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-member-init,-cppcoreguidelines-pro-type-reinterpret-cast,-cppcoreguidelines-pro-type-union-access,-cppcoreguidelines-pro-type-vararg,-cppcoreguidelines-special-member-functions,-google-explicit-constructor,-google-readability-braces-around-statements,-google-readability-todo,-google-runtime-int,-google-runtime-references,-hicpp-braces-around-statements,-hicpp-explicit-conversions,-hicpp-signed-bitwise,-hicpp-special-member-functions,-hicpp-use-equals-default,-hicpp-use-override,-llvm-header-guard,-llvm-include-order,-misc-macro-parentheses,-misc-misplaced-const,-misc-misplaced-widening-cast,-modernize-loop-convert,-modernize-pass-by-value,-modernize-use-default-member-init,-modernize-use-emplace,-modernize-use-equals-default,-modernize-use-transparent-functors,-performance-unnecessary-value-param,-readability-braces-around-statements,-readability-else-after-return,-readability-implicit-bool-cast,-readability-implicit-bool-conversion,-readability-misleading-indentation,-readability-named-parameter
-- Could NOT find LATEX (missing: LATEX_COMPILER)
Latex builder not found. Latex builder is required only for building the PDF documentation for MIOpen and is not necessary for building the library, or any other components. To build PDF documentation run make in /ROCm/MIOpen/doc/pdf, once a latex builder is installed.
-- MIOpen_VERSION= 1.1.3
-- CMAKE_BUILD_TYPE= Release
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- MIOpen linking OpenCL: /usr/include
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Configuring done
WARNING: Target "MIOpenDriver" has EXCLUDE_FROM_ALL set and will not be built by default but an install rule has been provided for it. CMake does not define behavior for this case.
-- Generating done
-- Build files have been written to: /ROCm/MIOpen/build
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OPENCL: /usr/lib/libOpenCL.so
-- OpenCL backend selected.
-- AMDGCN assembler: MIOPEN_AMDGCN_ASSEMBLER-NOTFOUND
-- Build with miopengemm
-- Found OpenSSL: /usr/local/lib/libssl.so;/usr/local/lib/libcrypto.so (found version "2.0.0")
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:140 (find_package)
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:140 (find_package)
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- Clang tidy not found
-- Clang tidy checks: *,-android-cloexec-fopen,-cert-err60-cpp,-cert-msc30-c,-cert-msc50-cpp,-clang-analyzer-alpha.core.CastToStruct,-clang-analyzer-optin.performance.Padding,-clang-diagnostic-deprecated-declarations,-clang-diagnostic-extern-c-compat,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-constant-array-index,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-member-init,-cppcoreguidelines-pro-type-reinterpret-cast,-cppcoreguidelines-pro-type-union-access,-cppcoreguidelines-pro-type-vararg,-cppcoreguidelines-special-member-functions,-google-explicit-constructor,-google-readability-braces-around-statements,-google-readability-todo,-google-runtime-int,-google-runtime-references,-hicpp-braces-around-statements,-hicpp-explicit-conversions,-hicpp-signed-bitwise,-hicpp-special-member-functions,-hicpp-use-equals-default,-hicpp-use-override,-llvm-header-guard,-llvm-include-order,-misc-macro-parentheses,-misc-misplaced-const,-misc-misplaced-widening-cast,-modernize-loop-convert,-modernize-pass-by-value,-modernize-use-default-member-init,-modernize-use-emplace,-modernize-use-equals-default,-modernize-use-transparent-functors,-performance-unnecessary-value-param,-readability-braces-around-statements,-readability-else-after-return,-readability-implicit-bool-cast,-readability-implicit-bool-conversion,-readability-misleading-indentation,-readability-named-parameter
-- Could NOT find LATEX (missing: LATEX_COMPILER)
Latex builder not found. Latex builder is required only for building the PDF documentation for MIOpen and is not necessary for building the library, or any other components. To build PDF documentation run make in /ROCm/MIOpen/doc/pdf, once a latex builder is installed.
-- MIOpen_VERSION= 1.1.3
-- CMAKE_BUILD_TYPE= Release
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- MIOpen linking OpenCL: /usr/include
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Configuring done
WARNING: Target "MIOpenDriver" has EXCLUDE_FROM_ALL set and will not be built by default but an install rule has been provided for it. CMake does not define behavior for this case.
-- Generating done
-- Build files have been written to: /ROCm/MIOpen/build
As convolutions with alpha and beta = 1.0 is not implemnted in miopen yet, I was wondering if it is possible to acheive this with some kind of workaround?
I tried the miopenOpTensor but this doesn't work since the input and output tensors do not match.
I'm also a bit unsure about the blending of the computation using alpha and beta. The cuDNN documnetation it says the dstValue = alpha[0]*computedValue + beta[0]*priorDstValue
. I don't see how this yDesc = yDesc * alpha + xDesc * beta
operation would work if the dimensions are different.
Any suggestions are much apprechiated!
I'm trying to run the rnn_bench from DeepBench on multiple HW platforms. In my case, I'm using MIOpen (v1.3.0 -- HEAD of master on Github) built with debug flags. I'm seeing that rnn_bench has a failure when using MIOpen built with debug flags:
(...)
Assertion `x.size() == sizeof...(Ns)' failed. (from tensor.hpp, tie_impl)
I added a print to tie_impl, and here are the values of x and Ns:
x.size(): 4, sizeof...(Ns): 2
If I run the exact same benchmark, but built with a release version of MIOpen, it works. I ran valgrind on the debug version, and have attached my latest valgrind trace. There are several invalid reads which may be contributing.
valgrind.rnnBench.miopen130.txt
This is the backtrace with gdb (in frame 4 you can see x is size 4 and Ns is size 2 with the values 0 and 1):
(gdb) bt
#0 0x00007ffff56e3428 in __GI_raise (sig=sig@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff56e502a in __GI_abort () at abort.c:89
#2 0x00007ffff56dbbd7 in __assert_fail_base (fmt=<optimized out>,
assertion=assertion@entry=0x7ffff64c1b24 "x.size() == sizeof...(Ns)",
file=file@entry=0x7ffff64c1a35 "/home-nis/msinclai/downloads/new/MIOpen/src/include/miopen/tensor.hpp", line=line@entry=48,
function=function@entry=0x7ffff64cec6d "<dependent type> miopen::tie_impl(T &&, detail::seq<Ns...>) [T = const std::vector<unsigned long, std::allocator<unsigned long> > &, Ns = <0, 1>]") at assert.c:92
#3 0x00007ffff56dbc82 in __GI___assert_fail (
assertion=0x7ffff64c1b24 "x.size() == sizeof...(Ns)",
file=0x7ffff64c1a35 "/home-nis/msinclai/downloads/new/MIOpen/src/include/miopen/tensor.hpp", line=48,
function=0x7ffff64cec6d "<dependent type> miopen::tie_impl(T &&, detail::seq<Ns...>) [T = const std::vector<unsigned long, std::allocator<unsigned long> > &, Ns = <0, 1>]") at assert.c:101
#4 0x00007ffff63325c3 in miopen::tie_impl<std::vector<unsigned long, std::allocator<unsigned long> > const&, 0ul, 1ul> (
x=std::vector of length 4, capacity 4 = {...})
at /home-nis/msinclai/downloads/new/MIOpen/src/include/miopen/tensor.hpp:48
#5 0x00007ffff632ebef in miopen::tien<2ul, std::vector<unsigned long, std::allocator<unsigned long> > const&> (x=std::vector of length 4, capacity 4 = {...})
at /home-nis/msinclai/downloads/new/MIOpen/src/include/miopen/tensor.hpp:59
#6 0x00007ffff6438ed9 in miopen::RNNDescriptor::RNNForwardTraining (
this=0xbe7340, handle=..., seqLen=50, xDesc=..., x=0x7ffeeb31b010,
hxDesc=..., hx=0x10976e0, cxDesc=..., cx=0x10a7780, wDesc=...,
w=0x7ffef0025010, yDesc=..., y=0xbe75f0, hyDesc=..., hy=0x109f730,
cyDesc=..., cy=0x10af7d0, workSpace=0x7ffeeb7cd010, workSpaceSize=1638400,
reserveSpace=0x7ffeeb4ac010, reserveSpaceSize=3276800)
at /home-nis/msinclai/downloads/new/MIOpen/src/ocl/rnnocl.cpp:1086
#7 0x00007ffff623aee2 in miopenRNNForwardTraining::$_18::operator() (
this=0x7fffffffc930)
at /home-nis/msinclai/downloads/new/MIOpen/src/rnn_api.cpp:386
#8 0x00007ffff62341d9 in miopen::try_<miopenRNNForwardTraining::$_18> (f=...,
output=true)
at /home-nis/msinclai/downloads/new/MIOpen/src/include/miopen/errors.hpp:72
#9 0x00007ffff623419a in miopenRNNForwardTraining (handle=0xbe71f0,
rnnDesc=0xbe7340, sequenceLen=50, xDesc=0xbc45e0, x=0x7ffeeb31b010,
hxDesc=0xbe6200, hx=0x10976e0, cxDesc=0xbe6580, cx=0x10a7780,
wDesc=0xbe7450, w=0x7ffef0025010, yDesc=0xbdfee0, y=0xbe75f0,
hyDesc=0xbe62e0, hy=0x109f730, cyDesc=0xbe6660, cy=0x10af7d0,
workSpace=0x7ffeeb7cd010, workSpaceNumBytes=1638400,
reserveSpace=0x7ffeeb4ac010, reserveSpaceNumBytes=3276800)
at /home-nis/msinclai/downloads/new/MIOpen/src/rnn_api.cpp:382
#10 0x00000000004ae777 in miopenRNN::forward (this=0x7fffffffdd68, x=...,
hx=..., cx=..., y=..., hy=..., cy=...) at rnn_bench_rocm.cpp:113
#11 0x0000000000484367 in time_rnn<float> (hidden_size=256, batch_size=32,
time_steps=50, type="vanilla", inference=0) at rnn_bench_rocm.cpp:196
#12 0x0000000000483778 in main (argc=1, argv=0x7fffffffe568)
at rnn_bench_rocm.cpp:280
Daniel L. thinks this is happening because CuDNN sends through 3-d tensors, but MIOpen sends through 2-d tensors -- and somewhere it is getting caught.
Hello. I have been studying MIOpen's source code since it's launch. After a while, I found that MIOpen can be built on a non ROCm system with OpenCL backend (not sure if MIOpen can be built with HIP on Nvidia platforms). However building and running MIOpen with with OpenCL on a non ROCm system will fail half(5/10) of it's test with something throwing std::bad_alloc.
Digging into the source. I found the following code snippet.
/* First, get the size of device list data */
size_t deviceListSize;
if(clGetContextInfo(
impl->context.get(), CL_CONTEXT_NUM_DEVICES, sizeof(size_t), &deviceListSize, nullptr) !=
CL_SUCCESS)
{
MIOPEN_THROW("Error: Getting Handle Info (device list size, clGetContextInfo)");
}
if(deviceListSize == 0)
{
MIOPEN_THROW("Error: No devices found.");
}
std::vector<cl_device_id> devices(deviceListSize);
According to clGetContextInfo's documentation. When param_name is CL_CONTEXT_NUM_DEVICES
, return type should be cl_uint
. not size_t
.
This causes deviceListSize being not set correctly. And thus being a large value, causing the std::vector trying to allocate really large amount of memory and thus throwing.
After changing the type of deviceListSize
to cl_uint
and change sizeof(size_t)
to sizeof(cl_uint)
. 8/10 test passes (test 5 and 6 failed, I'll open anther issue for that).
I guess this is a compatibility issue. Will/can this be fixed on a future version of MIOpen?
Tested on:
Manjaro Linux with Intel OpenCL SDK for CPU and Nvida OpenCL (not sure which one MIOpen uses).
Hi, dear friends!
We are investigating MiOpen potential to use in our deep learning applications, but Resnet-52 shows strangely long times for layers with 3x3 convolutions (batch size 64).
For 1x1 convolutions AMD Vega often performs on a par with GTX 1080 Ti, but for 3x3 convs the time is up to 10 times longer.
For example,
{28, 28, 128} -> {28, 28, 128} by filter {3, 3} with stride {1, 1}:
NVIDIA 1080 Ti: 13.6 ns (73470 ips)
AMD VX VEGA: 127.2 ns (7862 ips)
All layers perf in TSV: https://gist.github.com/hex000/de2aebf622d2120fb6a57c42a0c7d90e
Here the the time is given per image. The test goes for the whole 1 sec before hipDeviceSynchronize(), so the sync time is amortized.
What can be wrong here? Maybe I haven't set some secret flag? The algorithm chosen by FindConvolution is always "miopenConvolutionFwdAlgoWinograd", and for such convolutions it offers no other options.
The test code is at https://gist.github.com/hex000/81fe99ae8c10f4384e64960275e91554, it compiles both for AMD and NVIDIA.
So I'm running https://github.com/NLPLearn/QANet with tensorflow-upstream and I've had to cut my batch_size down to nothing to fit the model onto the GPU.
This is what happens when I try to train the model:
Building model...
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/layers.py:52: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:134: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:174: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See @{tf.nn.softmax_cross_entropy_with_logits_v2}.
Total number of trainable parameters: 788673
2018-05-25 22:40:06.553366: W tensorflow/stream_executor/rocm/rocm_driver.cc:404] creating context when one is currently active; existing: 0x7f3818e3d580
2018-05-25 22:40:06.553526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with properties:
name: Device 67df
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.34
pciBusID 0000:09:00.0
Total memory: 8.00GiB
Free memory: 7.75GiB
2018-05-25 22:40:06.553537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
2018-05-25 22:40:06.553548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-25 22:40:06.553567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995] 0
2018-05-25 22:40:06.553575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0: N
2018-05-25 22:40:06.553612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Device 67df, pci bus id: 0000:09:00.0)
0%| | 0/60000 [00:00<?, ?it/s]2018-05-25 22:40:41.253877: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:41.253878: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:42.476336: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:43.035086: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:45.046033: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:45.047533: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:46.301007: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:46.983404: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:47.838168: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:48.067349: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:49.404750: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:49.955002: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
Memory access fault by GPU node-1 on address 0x58a404000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
The authors of the project are using a similar sized GPU (though twice as much desktop RAM/not sure if this is the problem) and aren't having to drop their batch size down around 4 to fit the model on their GPU.
From localminimum/QANet#2
"""
Hi @kamalkraj I uploaded the most recent model pretrained weights (EM/F1 = 70.0/79.4) and you can download it here.
The specification of the system I used is:
CPU: i7-3930K CPU @ 3.20GHz
GPU: GTX1080 (8GB)
RAM: 16GB
Training takes about 5~8 hours depending on your gpu/cpu spec. The model takes about 8 GB gpu memory so if you're using anything bigger than 96 as your hidden unit size then you'll get an OOM error. Or if you are using a preoccupied GPU it will also cause an OOM error.
NOTE: If you are using your desktop GPU, try running it in terminal mode (alt + ctrl + F1) and close all applications that require gpu memory (e.g. Xorg)
sudo service lightdm stop
python config.py --mode train
after training,
sudo service lightdm start
"""
I followed the advice to shut down all the other applications and just use terminal too. Won't fit. Any idea why this is happening? My RX580 is supposed to have the same amount of memory. Curious as to what's going on 😕
$ apt list | grep miopen
miopen-hip/Ubuntu 16.04,now 1.4.2-0258028 amd64 [installed]
miopengemm/Ubuntu 16.04,now 1.1.5-9547fb9 amd64 [installed]
Running the tf_cnn_benchmarks.py as follows with tensorflow-upstream r1.8-rocm branch:
MIOPEN_FIND_ENFORCE=3 python tf_cnn_benchmarks.py --model=vgg16 --batch_size=64 --num_gpus=1 --num_batches=1 --num_warmup_batches=0
Possibly a separate bug, I saw the output
2018-08-29 09:42:38.991402: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
miopenFindConvolutionBackwardWeightsAlgorithm: ./bin/MIOpenDriver conv -n 64 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -t 1
MIOpen(HIP): Error [FindSolutionImpl] Search failed for: ConvAsmBwdWrW3x3: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 411041792
So I switched to using MIOpenDriver without all of TF. Without MIOPEN_FIND_ENFORCE=3
this was the output:
./bin/MIOpenDriver conv -n 64 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -t 1
MIOpenDriver: conv -n 64 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -t 1
MIOpen Forward Conv. Algorithm: 3
GPU Kernel Time Forward Conv. Elapsed: 18.477495 ms
Forward Convolution Verifies on CPU and GPU (5.68537e-08)
MIOpen Backward Data Conv. Algorithm: 3
GPU Kernel Time Backward Data Conv. Elapsed: 17.817320 ms
MIOpen Backward Weights Conv. Algorithm: 0
GPU Kernel Time Backward Weights Conv. Elapsed: 61.924541 ms
Backward Convolution Data Verifies on CPU and GPU (6.28033e-08)
Backward Convolution Weights Verifies on CPU and GPU (3.45614e-07)
Using MIOPEN_FIND_ENFORCE=3
selected a failing kernel:
MIOPEN_FIND_ENFORCE=3 ./bin/MIOpenDriver conv -n 64 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -t 1
MIOpenDriver: conv -n 64 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -t 1
MIOpen Forward Conv. Algorithm: 3
GPU Kernel Time Forward Conv. Elapsed: 18.349342 ms
Forward Convolution Verifies on CPU and GPU (5.68537e-08)
MIOpen Backward Data Conv. Algorithm: 3
GPU Kernel Time Backward Data Conv. Elapsed: 18.237505 ms
MIOpen(HIP): Warning [GenericSearch] ConvAsmBwdWrW3x3: Searching the best solution among 5416...
MIOpen(HIP): Warning [Monitor] 10/0/5416 76.4986, best within recent 11: 76.4986 #2 2,0,8,1,1,1, ETA:1829.38 sec.
MIOpen(HIP): Warning [Monitor] 18/0/5416 76.0711, best within recent 8: 76.0711 #12 2,1,8,1,1,1, ETA:2048.48 sec.
MIOpen(HIP): Warning [Monitor] 32/0/5416 53.4908, best within recent 14: 53.4908 #23 3,0,16,1,1,1, ETA:1662.38 sec.
MIOpen(HIP): Warning [Monitor] 45/0/5416 45.7382, best within recent 13: 45.7382 #45 5,0,8,2,1,1, ETA:1559.3 sec.
MIOpen(HIP): Warning [Monitor] 55/0/5416 45.3974, best within recent 10: 45.3974 #52 2,1,8,2,1,1, ETA:1579.78 sec.
MIOpen(HIP): Warning [Monitor] 72/0/5416 45.3929, best within recent 17: 45.3929 #57 7,1,8,2,1,1, ETA:1428.14 sec.
MIOpen(HIP): Warning [Monitor] 86/0/5416 39.2118, best within recent 14: 39.2118 #82 2,0,8,4,1,1, ETA:1388.66 sec.
MIOpen(HIP): Warning [Monitor] 95/0/5416 38.3107, best within recent 9: 38.3107 #92 2,1,8,4,1,1, ETA:1423.62 sec.
MIOpen(HIP): Warning [Monitor] 108/0/5416 37.8397, best within recent 13: 37.8397 #102 2,0,16,4,1,1, ETA:1397.33 sec.MIOpen(HIP): Warning [Monitor] 124/0/5416 37.8397, best within recent 16: 40.2554 #109 9,0,16,4,1,1, ETA:1349.06 sec.MIOpen(HIP): Warning [Monitor] 135/0/5416 37.8397, best within recent 11: 44.4192 #134 4,1,8,8,1,1, ETA:1354 sec.
MIOpen(HIP): Warning [Monitor] 148/0/5416 37.8397, best within recent 13: 44.4951 #137 7,1,8,8,1,1, ETA:1340.02 sec.
MIOpen(HIP): Warning [Monitor] 164/0/5416 37.8397, best within recent 16: 70.7134 #161 1,0,16,1,2,1, ETA:1308.35 sec.MIOpen(HIP): Warning [Monitor] 180/0/5416 37.8397, best within recent 16: 51.9839 #180 0,0,8,2,2,1, ETA:1279.06 sec.
MIOpen(HIP): Warning [Monitor] 194/0/5416 37.8397, best within recent 14: 42.9275 #181 1,0,8,2,2,1, ETA:1265.97 sec.
MIOpen(HIP): Warning [Monitor] 210/0/5416 37.8397, best within recent 16: 40.292 #201 1,0,16,2,2,1, ETA:1243.87 sec.
MIOpen(HIP): Warning [Monitor] 223/0/5416 37.8397, best within recent 13: 42.4137 #211 1,1,16,2,2,1, ETA:1239.31 sec.MIOpen(HIP): Warning [Monitor] 233/0/5416 37.8397, best within recent 10: 43.6917 #229 9,0,8,4,2,1, ETA:1252.31 sec.
MIOpen(HIP): Warning [Monitor] 244/0/5416 32.7761, best within recent 11: 32.7761 #242 2,0,16,4,2,1, ETA:1263.28 sec.MIOpen(HIP): Warning [Monitor] 253/0/5416 32.7761, best within recent 9: 36.442 #251 1,1,16,4,2,1, ETA:1279.37 sec.
MIOpen(HIP): Warning [Monitor] 261/0/5416 32.7761, best within recent 8: 43.0791 #256 6,1,16,4,2,1, ETA:1301.46 sec.
MIOpen(HIP): Warning [Monitor] 267/0/5416 32.7761, best within recent 6: 45.4455 #263 3,0,8,8,2,1, ETA:1329.88 sec.
MIOpen(HIP): Warning [Monitor] 273/0/5416 32.7761, best within recent 6: 45.748 #272 2,1,8,8,2,1, ETA:1356.6 sec.
MIOpen(HIP): Warning [Monitor] 279/0/5416 32.7761, best within recent 6: 45.9136 #277 7,1,8,8,2,1, ETA:1383.04 sec.
MIOpen(HIP): Warning [Monitor] 294/0/5416 32.7761, best within recent 15: 77.2678 #285 5,0,8,1,3,1, ETA:1364.04 sec.
MIOpen(HIP): Warning [Monitor] 308/0/5416 32.7761, best within recent 14: 77.6754 #296 6,1,8,1,3,1, ETA:1349.86 sec.
MIOpen(HIP): Warning [Monitor] 323/0/5416 32.7761, best within recent 15: 43.2013 #321 1,0,8,2,3,1, ETA:1333.66 sec.
MIOpen(HIP): Warning [Monitor] 336/0/5416 32.7761, best within recent 13: 43.2116 #329 9,0,8,2,3,1, ETA:1326.67 sec.
MIOpen(HIP): Warning [Monitor] 350/0/5416 32.7761, best within recent 14: 43.2379 #337 7,1,8,2,3,1, ETA:1314.85 sec.
MIOpen(HIP): Warning [Monitor] 362/0/5416 32.7761, best within recent 12: 38.196 #351 1,1,16,2,3,1, ETA:1310.16 sec.
MIOpen(HIP): Warning [Monitor] 370/0/5416 32.7761, best within recent 8: 45.2074 #368 8,0,8,4,3,1, ETA:1321.09 sec.
MIOpen(HIP): Warning [Monitor] 378/0/5416 32.7761, best within recent 8: 45.657 #376 6,1,8,4,3,1, ETA:1333.9 sec.
MIOpen(HIP): Warning [Monitor] 387/0/5416 32.3893, best within recent 9: 32.3893 #380 0,0,16,4,3,1, ETA:1342.44 sec.
MIOpen(HIP): Warning [Monitor] 396/0/5416 32.3893, best within recent 9: 32.4194 #389 9,0,16,4,3,1, ETA:1347.62 sec.
MIOpen(HIP): Warning [Monitor] 406/0/5416 32.386, best within recent 10: 32.386 #397 7,1,16,4,3,1, ETA:1349.91 sec.
MIOpen(HIP): Warning [Monitor] 417/0/5416 32.386, best within recent 11: 78.2849 #412 2,1,8,1,4,1, ETA:1348.55 sec.
--snip long output--
MIOpen(HIP): Warning [Monitor] 5407/0/5416 32.386, best within recent 8: 64.2473 #5407 0,1,16,1,7,8, ETA:2.76698 sec.
MIOpen(HIP): Warning [GenericSearch] Done: 5416/0/5416, best #397 32.386 7,1,16,4,3,1
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.51211 (default time 48.9711)
MIOpen Backward Weights Conv. Algorithm: 1
GPU Kernel Time Backward Weights Conv. Elapsed: 32.823360 ms
Backward Convolution Data Verifies on CPU and GPU (6.28033e-08)
Backward Convolution Weights Failed: 1.08136e-06
Now when I run the same MIOpenDriver config, it's using the failing solution:
./bin/MIOpenDriver conv -n 64 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -t 1
MIOpenDriver: conv -n 64 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -t 1
MIOpen Forward Conv. Algorithm: 3
GPU Kernel Time Forward Conv. Elapsed: 18.376612 ms
Forward Convolution Verifies on CPU and GPU (5.68537e-08)
MIOpen Backward Data Conv. Algorithm: 3
GPU Kernel Time Backward Data Conv. Elapsed: 18.370230 ms
MIOpen Backward Weights Conv. Algorithm: 1
GPU Kernel Time Backward Weights Conv. Elapsed: 48.794930 ms
Backward Convolution Data Verifies on CPU and GPU (6.28033e-08)
Backward Convolution Weights Failed: 1.08136e-06
Input shape = [1,3,800,1280]
output shape = [1,3,50,80]
miopenPoolingForward
return error: local memory limit exceeded (72900) in mloPooling
even if increase the shape output to [1,3,100,160], still get this error.
miopenPoolingForward
work only if i want to halve the input shape.
Any idea, how to fix this? thanks.
Hi.
Since v.1.3.0 has supported fp16 for cnn inference preliminarily, I am looking forward to seeing excellent performance of Vega.
Cheers!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.