Comments (12)
Wondering if there is a residue in ~/.cache from previous hardware/software versions.
Try remove ~/.cache folder and repeat the same test.
from miopen.
thank you @rgiduthuri for your answer.
Now I get 92% tests passed, 1 tests failed out of 12
The error still persists in test_bn_spatial_test
Start 4: test_bn_spatial_test
4/12 Test #4: test_bn_spatial_test .............***Exception: Other 0.41 sec
FAILED: /ROCm/MIOpen/src/ocl/clhelper.cpp:101: Error Building OpenCL Program in BuildProgram()
2 errors generated.
error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR
Build Program Failure
from miopen.
from miopen.
This is the output of: ./build/bin/test_bn_spatial_test
FAILED:/ROCm/MIOpen-master/src/ocl/clhelper.cpp:101: Error Building OpenCL Program in BuildProgram()
/tmp/OCL17328T3.cl:160:24: error: use of unknown builtin '__builtin_amdgcn_mov_dpp'
*value += as_float(__builtin_amdgcn_mov_dpp(as_int(*value), 0x111, 0xF, 0xF, 0));
^
/tmp/OCL17328T3.cl:168:23: error: use of unknown builtin '__builtin_amdgcn_readlane'
*value = as_float(__builtin_amdgcn_readlane(as_int(*value), 63));
^
/tmp/OCL17328T3.cl:168:23: note: did you mean '__builtin_amdgcn_mov_dpp'?
/tmp/OCL17328T3.cl:160:24: note: '__builtin_amdgcn_mov_dpp' declared here
*value += as_float(__builtin_amdgcn_mov_dpp(as_int(*value), 0x111, 0xF, 0xF, 0));
^
/home/foreman/sources/stream/opencl/compiler/clc2/ocl-headers/build/lnx64a/B_rel/opencl12_builtins.h:5511:39: note: expanded from macro 'as_float'
#define as_float(x) __builtin_astype((x), float)
^
2 errors generated.
error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR
Build Program Failure
Forward Train Spatial Batch Normalization:
Input tensor: 4, 64, 28, 28
terminate called after throwing an instance of 'miopen::Exception'
what(): /ROCm/MIOpen-master/src/ocl/clhelper.cpp:101: Error Building OpenCL Program in BuildProgram()
/tmp/OCL17328T3.cl:160:24: error: use of unknown builtin '__builtin_amdgcn_mov_dpp'
*value += as_float(__builtin_amdgcn_mov_dpp(as_int(*value), 0x111, 0xF, 0xF, 0));
^
/tmp/OCL17328T3.cl:168:23: error: use of unknown builtin '__builtin_amdgcn_readlane'
*value = as_float(__builtin_amdgcn_readlane(as_int(*value), 63));
^
/tmp/OCL17328T3.cl:168:23: note: did you mean '__builtin_amdgcn_mov_dpp'?
/tmp/OCL17328T3.cl:160:24: note: '__builtin_amdgcn_mov_dpp' declared here
*value += as_float(__builtin_amdgcn_mov_dpp(as_int(*value), 0x111, 0xF, 0xF, 0));
^
/home/foreman/sources/stream/opencl/compiler/clc2/ocl-headers/build/lnx64a/B_rel/opencl12_builtins.h:5511:39: note: expanded from macro 'as_float'
#define as_float(x) __builtin_astype((x), float)
^
2 errors generated.
error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR
Build Program Failure
Aborted (core dumped)
from miopen.
from miopen.
@daniellowell, unfortunately the error could not be bypassed by uncommenting:
//#ifdef __AMDGCN__
//#undef __AMDGCN__
//#endif
the output of clinfo return:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2442.7)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon (TM) Pro WX 7100 Graphics
Device Topology: PCI[ B#2, D#0, F#0 ]
Max compute units: 36
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1243Mhz
Address bits: 64
Max memory allocation: 4244635648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8349130752
Constant buffer size: 4244635648
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f384e1cf478
Name: Ellesmere
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2442.7
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2442.7)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
from miopen.
Ellesmire is gfx803 so that should be ok. Can you run:
rocm_agent_enumerator -t all
It will give you a output like:
Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz
gfx000
gfx900
For some reason your front end compiler is not picking up the supported hardware instructions. I don't know if you have another OpenCL device that is being targeted by the compiler.
As for the workaround, there is actually a bug there. Looks like:
#ifdef __AMDGCN__
...
#endif
Needs to be in three other places in those files:
MIOpenBatchNormFwdTrainSpatial.cl
Start Line 157:
#ifdef __AMDGCN__
static inline void dppRegReduce64(_FLOAT* value, _FLOAT scale)
...
#endif
End line 216
MIOpenBatchNormFwdTrainSpatial.cl
Start Line 776:
#ifdef __AMDGCN__
unsigned int segment = MIO_BN_GRP1 >> 6;
#endif
End line 778
MIOpenBatchNormBwdSpatial.cl
Start line 151
static inline void dppRegReduce64(_FLOAT* value, _FLOAT scale)
...
#endif
You can add those manually and rebuild, or wait for my fix on the github MIOpen later today.
Daniel Lowell
from miopen.
The output of:
rocm_agent_enumerator -t all
Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz
gfx000
gfx803
from miopen.
Hey @reger-men where did you get your OpenCL?
You should be using rocm-opencl and rocm-opencl-dev with MIOpen.
I noticed you have:
-- Found OPENCL: /usr/lib/libOpenCL.so
In your cmake output. The install director for rocm-opencl is /opt/rocm/opencl
from miopen.
When I compile with: cmake -DMIOPEN_BACKEND=OpenCL -DOPENCL_LIBRARIES=/opt/rocm/opencl/lib/x86_64/libOpenCL.so DOPENCL_INCLUDE_DIRS=/opt/rocm/opencl/include ..
I get:
FAILED: /home/lashab/Desktop/OpenVX/WorkPlace/ROCm/MIOpen-master/src/ocl/handleocl.cpp:274: clGetPlatformIDs failed. 0
...
The following tests FAILED:
1 - test_activation (SEGFAULT)
3 - test_bn_peract_test (SEGFAULT)
4 - test_bn_spatial_test (SEGFAULT)
5 - test_check_numerics_test (SEGFAULT)
6 - test_conv (SEGFAULT)
7 - test_custom_allocator (SEGFAULT)
8 - test_main (SEGFAULT)
9 - test_pooling_test (SEGFAULT)
10 - test_soft_max (SEGFAULT)
11 - test_tensor_ops (SEGFAULT)
The OpenCL lib in /usr/lib
is an link to /opt/rocm/opencl/lib/x86_64/libOpenCL.so
from miopen.
$ cat /etc/OpenCL/vendors/amdocl64.icd
libamdocl64.so
What do get when you run the above command?
I would delete the symlink, remove OpenCL from your system, either manually, or using
sudo apt-get remove rocm-opencl*
Then add it back in. Your platform is not being detected correctly using the libraries on your system.
from miopen.
Thanks @daniellowell for your support. There were many dependencies in my operating system.
I reinstall Ubuntu 16.04
and I install ROCm
. Now it work.
I think my mistake was, that I install AMDAPPSDK
alongside ROCm
.
from miopen.
Related Issues (20)
- Softmax invoking FP16 kernel for FP32 input
- [GEMM group conv] Incorrect GPU time when GemmBwd1x1_stride1 and GemmFwd1x1_0_1 are invoked in "Run" mode. HOT 1
- [URGENT] error: no member named 'for_each_n' in namespace 'std' HOT 5
- Excessive warning messages on workspace provided and required (IsEnoughWorkspace) after #2947 HOT 3
- Build process, which to follow? HOT 1
- [Windows] graphapi gtests are not compiling due to missing class methods HOT 2
- Regenerating KDB in the `develop` branch is currently not possible. HOT 2
- MIOpen unit test link issue : ld.lld: error: undefined symbol: dladdr and undefined reference due to --no-allow-shlib-undefined HOT 5
- naive_conv_nonpacked_fwd_nchw_half_double_half in KDB cache breaks after #2863 HOT 5
- Tip of `develop` branch fail to build on older Linux distributions HOT 8
- Remove `setEnvironmentVariable` from the tests and use `env::update(VAR, value)` instead. HOT 3
- Enable hipBLASLt as an optional backend for MIOpen GEMM kernels HOT 2
- Bring back std::unordered_map to old systems HOT 2
- [SLES][RHEL8] error: no member named 'exclusive_scan' in namespace 'std HOT 8
- Implement RNN solver's HOT 3
- Windows support? HOT 1
- Remove deprecated Comgr actions HOT 9
- Memory access fault for Gemm solvers after #2969 HOT 20
- what can I do,for the "linux(ubuntu) amd yolov8 MIOpen Error"
- How to tune using API? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from miopen.