chip-spv / chipstar Goto Github PK
View Code? Open in Web Editor NEWchipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
License: Other
chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
License: Other
Device variable initialization calls
if (QueuedKernels)
Queue->finish();
which results kernel launches being a blocking operation which violates HIP queue semantics.
Currently the CMAKE_CXX_FLAGS are used for the device compilation too. Unfortunately, the default is -O0 which marks the device functions 'optnone' and the required transformation passes of HIP are not ran to those functions (including the actual kernel function). Incidentally this means that with the default optimization flags the device build is broken (works for the kernels where the passes are not necessary) and the user should pass CMAKE_CXX_FLAGS=-O2 for the intended functionality.
This should be fixed such that we ensure the device binaries are compiled with flags that do not prevent required passes to be ran.
sometimes compilation fails with:
Consolidate compiler generated dependencies of target cuda-clock
[ 73%] Building CXX object samples/cuda_samples/CMakeFiles/cuda-clock.dir/0_Simple/clock/clock.cu.o
/gpfs/jlse-fs0/users/pvelesko/install/clang/clang14/clang14-spirv-omp/bin/opt: symbol lookup error: /home/pvelesko/space/CHIP-SPV/build/lib/libLLVMHipSpvPasses.so: undefined symbol: _ZN4llvm34convertConstantExprsToInstructionsEPNS_11InstructionEPNS_12ConstantExprEPNS_15SmallPtrSetImplIS1_EE
clang-14: error: unable to execute command: No such file or directory
clang-14: error: hipspv-link command failed due to signal (use -v to see invocation)
clang version 14.0.0 (/home/pvelesko/space/llvm-project/clang 2a2286e9859a518a46ead8e3ef5e283662766370)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/pvelesko/space/install/clang/clang14/clang14-spirv-omp/bin
clang-14: note: diagnostic msg: Error generating preprocessed source(s).
make[2]: *** [samples/cuda_samples/CMakeFiles/cuda-clock.dir/build.make:76: samples/cuda_samples/CMakeFiles/cuda-clock.dir/0_Simple/clock/clock.cu.o] Error 255
make[1]: *** [CMakeFiles/Makefile2:14446: samples/cuda_samples/CMakeFiles/cuda-clock.dir/all] Error 2
make: *** [Makefile:166: all] Error 2
... at least two people reported the same symbol-missing error.
For a release, it is important to have a test suite which has only tests that should always pass. This is for the user to check that the build was solid. Currently there seems to be some failing tests and it's not clear are they expected to fail (sometimes) or it's due to a broken/unsupported local setup.
Not sure what the minimum should be but 3.16 is too low
cmake .. -DCMAKE_PREFIX_PATH=~/local -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DCMAKE_INSTALL_PREFIX=/home/ptu1/packages
-- Using llvm-config: /home/ptu1/packages/bin/llvm-config
-- Using llvm-link: /home/ptu1/packages/bin/llvm-link
-- Using llvm-spirv: /home/ptu1/packages/bin/llvm-spirv
-- Using clang-offload-bundler: /home/ptu1/packages/bin/clang-offload-bundler
-- OpenCL_LIBRARY: /soft/libraries/khronos/loader/master-2022.05.18/lib64/libOpenCL.so
-- LevelZero_LIBRARY: /soft/restricted/CNDA/emb/libraries/intel-level-zero/api_+_loader/20220711.1/lib64/libze_loader.so
-- Buiding CHIP-SPV as a shared library
-- CHIP-SPV will be installed to: /home/ptu1/packages
-- LLVM CMake directory: /home/ptu1/packages/lib/cmake/llvm
-- LLVM built with static libraries -> adding workaround for issue #102
CMake Error at llvm_passes/CMakeLists.txt:69 (get_target_property):
get_target_property called with incorrect number of arguments
-- libLLVMCore CONFIG: DEBUG;RELEASE
-- libLLVMCore PATH:
ar: No such file or directory
CMake Error at llvm_passes/CMakeLists.txt:77 (message):
ar command on failed
-- Configuring incomplete, errors occurred!
See also "/home/ptu1/workspace/chip-spv/build/CMakeFiles/CMakeOutput.log".
See also "/home/ptu1/workspace/chip-spv/build/CMakeFiles/CMakeError.log".
There are 3 flavors of memset: D8, D16, D32. Their implementations follow a pattern:
hipError_t hipMemsetD8(hipDeviceptr_t Dst, unsigned char Value, size_t Count) {
CHIP_TRY
CHIPInitialize();
NULLCHECK(Dst);
Backend->getActiveDevice()->initializeDeviceVariables();
Backend->getActiveDevice()->getDefaultQueue()->memFill(Dst, 1 * Count, &Value,
1);
RETURN(hipSuccess);
CHIP_CATCH
}
hipError_t hipMemsetD16(hipDeviceptr_t Dest, unsigned short Value,
size_t Count) {
CHIP_TRY
CHIPInitialize();
NULLCHECK(Dest);
Backend->getActiveDevice()->initializeDeviceVariables();
Backend->getActiveDevice()->getDefaultQueue()->memFill(Dest, 2 * Count,
&Value, 2);
RETURN(hipSuccess);
CHIP_CATCH
};
hipError_t hipMemsetD32(hipDeviceptr_t Dst, int Value, size_t Count) {
CHIP_TRY
CHIPInitialize();
NULLCHECK(Dst);
Backend->getActiveDevice()->initializeDeviceVariables();
Backend->getActiveDevice()->getDefaultQueue()->memFill(Dst, 4 * Count, &Value,
4);
RETURN(hipSuccess);
CHIP_CATCH
}
Of these, hipMemsetD16
fails correctness tests with no obvious reason as to why
Right now, both backends are required to compile CHIP-SPV. Should be able to compile&run with just one, not everybody might have both.
Why not having it in PATH suffice?
Unit_hipTextureFetch_vector
hipTextureObj2D
hipTextureObj1DFetch
hipTex1DFetchCheckModes
hipNormalizedCoords
hipNormalizedFloat
hipTextureObj1D
hipAddressingModes
It may be useful to support a HIP program with hiprand() functions on an Intel GPU.
The values "multiProcessorCount, maxThreadsPerMultiProcessor, warpSize" are not the same using OpenCL and Level0 backends on Intel DevCloud.
OpenCL:
CHIP warning [TID 887554] [1659643269.953287497] : CHIP_BE was not set. Defaulting to OPENCL
--------------------------------------------------------------------------------
device# 0
Name: Intel(R) UHD Graphics P630 [0x3e96]
pciBusID: 16
pciDeviceID: 64
multiProcessorCount: 24
maxThreadsPerMultiProcessor: 10
isMultiGpuBoard: 0
clockRate: 1200 Mhz
memoryClockRate: 1 Mhz
memoryBusWidth: 256
clockInstructionRate: 2.465 Mhz
totalGlobalMem: 50.10 GB
maxSharedMemoryPerMultiProcessor: 0.00 GB
totalConstMem: 4294959104
sharedMemPerBlock: 64.00 KB
regsPerBlock: 64
warpSize: 0
....
Level0
CHIP warning [TID 887678] [1659643337.135896406] : Ignoring alignment. Using hardcoded value 0x1000
CHIP warning [TID 887678] [1659643337.135936619] : Usigned zeMallocHost instead of zeMallocShared due to outstanding bug
--------------------------------------------------------------------------------
device# 0
Name: M
pciBusID: 16
pciDeviceID: 64
multiProcessorCount: 8
maxThreadsPerMultiProcessor: 56
isMultiGpuBoard: 0
clockRate: 1200 Mhz
memoryClockRate: 0 Mhz
memoryBusWidth: 64
clockInstructionRate: 1.2 Mhz
totalGlobalMem: 50.10 GB
maxSharedMemoryPerMultiProcessor: 0.00 GB
totalConstMem: 53794992128
sharedMemPerBlock: 64.00 KB
regsPerBlock: 4096
warpSize: 32
...
It has to be installed from a separate package 'clang-tools-XX' when using the Ubuntu packages.
myQueue.template get_native<sycl::backend::level_zero>(); // returns nullptr.
Are these known? Happens with my Iris Xe mobile GPU with LLVM 15.
OpenCL:
The following tests FAILED:
108 - Unit_hipHostGetFlags_Basic - int (Failed)
109 - Unit_hipHostGetFlags_Basic - float (Failed)
110 - Unit_hipHostGetFlags_Basic - double (Failed)
111 - Unit_hipMallocManaged_MultiChunkSingleDevice (Failed)
112 - Unit_hipMallocManaged_MultiChunkMultiDevice (Failed)
115 - Unit_hipMallocManaged_TwoPointers - int (Failed)
116 - Unit_hipMallocManaged_TwoPointers - float (Failed)
117 - Unit_hipMallocManaged_TwoPointers - double (Failed)
118 - Unit_hipMallocManaged_DeviceContextChange - unsigned char (Failed)
119 - Unit_hipMallocManaged_DeviceContextChange - int (Failed)
120 - Unit_hipMallocManaged_DeviceContextChange - float (Failed)
121 - Unit_hipMallocManaged_DeviceContextChange - double (Failed)
187 - Unit_hipMemcpy_KernelLaunch - int (Failed)
188 - Unit_hipMemcpy_KernelLaunch - float (Failed)
189 - Unit_hipMemcpy_KernelLaunch - double (Failed)
193 - Unit_hipMemcpy_MultiThreadWithSerialization (Subprocess aborted)
197 - Unit_hipMemcpyAsync_KernelLaunch - int (Failed)
198 - Unit_hipMemcpyAsync_KernelLaunch - float (Failed)
199 - Unit_hipMemcpyAsync_KernelLaunch - double (Failed)
204 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThread - int (Subprocess aborted)
205 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThread - float (SEGFAULT)
206 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThread - double (Subprocess aborted)
215 - Unit_ldg (Failed)
450 - Unit_deviceFunctions_CompileTest_modf_double (Failed)
454 - Unit_deviceFunctions_CompileTest_norm_double (Failed)
463 - Unit_deviceFunctions_CompileTest_rhypot_double (Failed)
465 - Unit_deviceFunctions_CompileTest_rnorm_double (Failed)
466 - Unit_deviceFunctions_CompileTest_rnorm3d_double (Failed)
467 - Unit_deviceFunctions_CompileTest_rnorm4d_double (Failed)
474 - Unit_deviceFunctions_CompileTest_sincos_double (Failed)
475 - Unit_deviceFunctions_CompileTest_sincospi_double (Failed)
528 - Unit_hipGetDeviceProperties_ArchPropertiesTst (Failed)
540 - Unit_hipStreamPerThread_MultiThread (Subprocess aborted)
541 - Unit_hipStreamPerThread_DeviceReset_1 (Subprocess aborted)
626 - cuda-reduction (Failed)
Errors while running CTest
Level0:
The following tests FAILED:
108 - Unit_hipHostGetFlags_Basic - int (Failed)
109 - Unit_hipHostGetFlags_Basic - float (Failed)
110 - Unit_hipHostGetFlags_Basic - double (Failed)
111 - Unit_hipMallocManaged_MultiChunkSingleDevice (Failed)
112 - Unit_hipMallocManaged_MultiChunkMultiDevice (Failed)
115 - Unit_hipMallocManaged_TwoPointers - int (Failed)
116 - Unit_hipMallocManaged_TwoPointers - float (Failed)
117 - Unit_hipMallocManaged_TwoPointers - double (Failed)
118 - Unit_hipMallocManaged_DeviceContextChange - unsigned char (Failed)
119 - Unit_hipMallocManaged_DeviceContextChange - int (Failed)
120 - Unit_hipMallocManaged_DeviceContextChange - float (Failed)
121 - Unit_hipMallocManaged_DeviceContextChange - double (Failed)
187 - Unit_hipMemcpy_KernelLaunch - int (Failed)
188 - Unit_hipMemcpy_KernelLaunch - float (Failed)
189 - Unit_hipMemcpy_KernelLaunch - double (Failed)
193 - Unit_hipMemcpy_MultiThreadWithSerialization (Subprocess aborted)
197 - Unit_hipMemcpyAsync_KernelLaunch - int (Failed)
198 - Unit_hipMemcpyAsync_KernelLaunch - float (Failed)
199 - Unit_hipMemcpyAsync_KernelLaunch - double (Failed)
204 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThread - int (Subprocess aborted)
205 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThread - float (Subprocess aborted)
206 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThread - double (Subprocess aborted)
215 - Unit_ldg (Failed)
450 - Unit_deviceFunctions_CompileTest_modf_double (Failed)
454 - Unit_deviceFunctions_CompileTest_norm_double (Failed)
463 - Unit_deviceFunctions_CompileTest_rhypot_double (Failed)
465 - Unit_deviceFunctions_CompileTest_rnorm_double (Failed)
466 - Unit_deviceFunctions_CompileTest_rnorm3d_double (Failed)
467 - Unit_deviceFunctions_CompileTest_rnorm4d_double (Failed)
474 - Unit_deviceFunctions_CompileTest_sincos_double (Failed)
475 - Unit_deviceFunctions_CompileTest_sincospi_double (Failed)
528 - Unit_hipGetDeviceProperties_ArchPropertiesTst (Failed)
570 - hipKernelLaunchIsNonBlocking (Subprocess terminated)
626 - cuda-reduction (Failed)
Errors while running CTest
Some errors from logs (LZ):
Command: "/home/pjaaskel/src/chip-spv/build/catch/unit/memory/hipMemcpy2DToArray" "Unit_hipMemcpy2DToArray_Negative"
Directory: /home/pjaaskel/src/chip-spv/build/catch/hipTestMain
"Unit_hipMemcpy2DToArray_Negative" start time: Sep 08 14:28 EEST
Output:
----------------------------------------------------------
CHIP error [TID 13776] [1662636513.907324677] : hipErrorInvalidHandle (passed in nullptr) in /home/pjaaskel/src/chip-spv/src/CHIPException.hh:91:checkIfNullptr
CHIP error [TID 13776] [1662636513.907437162] : Caught Error: hipErrorInvalidHandle
CHIP error [TID 13776] [1662636513.907551852] : hipErrorInvalidHandle (passed in nullptr) in /home/pjaaskel/src/chip-spv/src/CHIPException.hh:91:checkIfNullptr
CHIP error [TID 13776] [1662636513.907561840] : Caught Error: hipErrorInvalidHandle
Filters: Unit_hipMemcpy2DToArray_Negative
===============================================================================
...
"Unit_hipMemcpyParam2D_Negative" start time: Sep 08 14:28 EEST
Output:
----------------------------------------------------------
CHIP error [TID 13900] [1662636519.092989120] : hipErrorTbd (Source Device pointer is null) in /home/pjaaskel/src/chip-spv/src/CHIPBindings.cc:700:hipMemcpyParam2DAsync
CHIP error [TID 13900] [1662636519.093107437] : Caught Error: hipErrorTbd
CHIP error [TID 13900] [1662636519.096081339] : hipErrorTbd (Source Device pointer is null) in /home/pjaaskel/src/chip-spv/src/CHIPBindings.cc:702:hipMemcpyParam2DAsync
CHIP error [TID 13900] [1662636519.096100443] : Caught Error: hipErrorTbd
CHIP error [TID 13900] [1662636519.099407248] : hipErrorTbd (Source and Destination Device pointer is null) in /home/pjaaskel/src/chip-spv/src/CHIPBindings.cc:697:hipMemcpyParam2DAsync
CHIP error [TID 13900] [1662636519.099423555] : Caught Error: hipErrorTbd
CHIP error [TID 13900] [1662636519.102441277] : hipErrorTbd (Width > src/dest pitches) in /home/pjaaskel/src/chip-spv/src/CHIPBindings.cc:706:hipMemcpyParam2DAsync
CHIP error [TID 13900] [1662636519.102455602] : Caught Error: hipErrorTbd
Filters: Unit_hipMemcpyParam2D_Negative
===============================================
Command: "/home/pjaaskel/src/chip-spv/build/catch/unit/memory/hipHostRegister" "Unit_hipHostRegister_Memcpy - int"
Directory: /home/pjaaskel/src/chip-spv/build/catch/hipTestMain
"Unit_hipHostRegister_Memcpy - int" start time: Sep 08 14:28 EEST
Output:
----------------------------------------------------------
CHIP error [TID 14142] [1662636531.471413075] : ZE Build Log:
error: Double type is not supported on this platform.
in kernel: 'void Inc<double>(double*)'
error: backend compiler failed build.
error: Double type is not supported on this platform.
in kernel: 'void Inc<double>(double*)'
error: backend compiler failed build.
error: Double type is not supported on this platform.
in kernel: 'void Inc<double>(double*)'
error: backend compiler failed build.
error: Double type is not supported on this platform.
in kernel: 'void Inc<double>(double*)'
error: backend compiler failed build.
error: Double type is not supported on this platform.
in kernel: 'void Inc<double>(double*)'
error: backend compiler failed build.
error: Double type is not supported on this platform.
in kernel: 'void Inc<double>(double*)'
error: backend compiler failed build.
...
Command: "/home/pjaaskel/src/chip-spv/build/catch/unit/memory/hipMemset2DAsyncMultiThreadAndKernel" "Unit_hipMemset2DAsync_MultiThread"
Directory: /home/pjaaskel/src/chip-spv/build/catch/hipTestMain
"Unit_hipMemset2DAsync_MultiThread" start time: Sep 08 14:30 EEST
Output:
----------------------------------------------------------
CHIP error [TID 15912] [1662636608.615095007] : hipErrorTbd (ZE_RESULT_ERROR_INVALID_ARGUMENT ) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:979:memFillAsyncImpl
601/627 Test: cuda-asyncAPI
Command: "/home/pjaaskel/src/chip-spv/build/samples/cuda_samples/cuda-asyncAPI"
Directory: /home/pjaaskel/src/chip-spv/build/samples/cuda_samples
"cuda-asyncAPI" start time: Sep 08 14:35 EEST
Output:
----------------------------------------------------------
CHIP error [TID 19505] [1662636943.558795687] : hipErrorNotReady (Event Not Ready) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:379:updateFinishStatus
CHIP error [TID 19505] [1662636943.559533885] : Caught Error: hipErrorNotReady
CHIP error [TID 19505] [1662636943.559581970] : hipErrorNotReady (Event Not Ready) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:379:updateFinishStatus
The "Double type is not supported on this platform." is a HW/driver issue. OpenCL/SPIR-V doesn't require double support or its SW emulation. https://registry.khronos.org/OpenCL/sdk/1.0/docs/man/xhtml/cl_khr_fp64.html Can we fail more gracefully at runtime? Should we add a separate test suite for the double tests since they are not supposed to work on OpenCL/LZ devices which do not provide double support?
From LAMMPS:
/gpfs/jlse-fs0/users/pvelesko/lammps/build/lib/gpu/zbl.cu.cpp:115:17: error: no matching function for call to 'tex1Dfetch'
numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
^~~~~~~~~~~~~~~~~~~~
/gpfs/jlse-fs0/users/pvelesko/lammps/lib/gpu/lal_pre_cuda_hip.h:138:37: note: expanded from macro 'fetch4'
#define fetch4(ans,i,pos_tex) ans=tex1Dfetch(pos_tex, i);
^~~~~~~~~~
/gpfs/jlse-fs0/users/pvelesko/install/HIP/clang13/chip-spv-1.2/include/hip/spirv_texture_functions.h:241:30: note: candidate template ignored: couldn't infer template argument 'T'
__TEXTURE_FUNCTIONS_DECL__ T tex1Dfetch(hipTextureObject_t TexObj, int X) {
^
/gpfs/jlse-fs0/users/pvelesko/install/HIP/clang13/chip-spv-1.2/include/hip/spirv_texture_functions.h:168:16: note: candidate function not viable: requires 3 arguments, but 2 were provided
DEF_TEX1D_VEC4(tex1Dfetch, float4, int, _chip_tex1dfetchf);
Currently a workaround is implemented to use zeMemAllocHost
instead. To reproduce the segfault compile CHIP-SPV with -DMALLOC_SHARED_WORKAROUND=OFF
and run the these tests:
Start 120: Unit_hipMallocManaged_TwoPointers - int
5/20 Test #120: Unit_hipMallocManaged_TwoPointers - int .....................***Exception: SegFault 0.26 sec
Start 121: Unit_hipMallocManaged_TwoPointers - float
6/20 Test #121: Unit_hipMallocManaged_TwoPointers - float ...................***Exception: SegFault 0.26 sec
Start 122: Unit_hipMallocManaged_TwoPointers - double
7/20 Test #122: Unit_hipMallocManaged_TwoPointers - double ..................***Exception: SegFault 0.27 sec
Start 123: Unit_hipMallocManaged_DeviceContextChange - unsigned char
8/20 Test #123: Unit_hipMallocManaged_DeviceContextChange - unsigned char ...***Exception: SegFault 0.25 sec
Start 124: Unit_hipMallocManaged_DeviceContextChange - int
9/20 Test #124: Unit_hipMallocManaged_DeviceContextChange - int .............***Exception: SegFault 0.26 sec
Start 125: Unit_hipMallocManaged_DeviceContextChange - float
10/20 Test #125: Unit_hipMallocManaged_DeviceContextChange - float ...........***Exception: SegFault 0.26 sec
Start 126: Unit_hipMallocManaged_DeviceContextChange - double
11/20 Test #126: Unit_hipMallocManaged_DeviceContextChange - double ..........***Exception: SegFault 0.27 sec
on branch #76, floatMath.cc
// Issue here
Out[tid] = __powf(2.0f, Out[tid]);
leaving this uncommented causes the kernel build to fail with ZE_RESULT_ERROR_INVALID_MODULE_UNLINKED
WRT StaleEventMonitor, IMO there is a bigger issue: calling CHIPUninitialize from __hipUnregisterFatBinary. That handler itself is called from shared library destructors, which are called after main() function returns, and these are called in reverse order of their constructors (so some of the shared libraries might be already unloaded). Doing this kind of uninitialization after main() is unsafe, because the program is no longer in a normal state. There are two solutions: 1) make a new hip API (hipUninit or such) that the application can to explicitly call, or 2) do nothing, and let the kernel clean up everything. 1) is useful for debugging memory leaks, otherwise the default should be 2).
@franz can you elaborate on what shared libraries might be already unloaded and how that could cause issues?
Overall, I agree that that this function is not necessary - I implemented it to make sure that we don't have events remaining that haven't been garbage collected.
The following kernel calls a double type sqrt
onto the result of a float powf
function causing a hang. If I replace sqrt
with sqrtf
then the kernel works again.
__global__ void kernel(float* x, float* y, int n) {
size_t tid{threadIdx.x};
if (tid < 1) {
for (int i = 0; i < n; i++) {
x[i] = sqrt(powf(3.14159, i));
}
y[tid] = y[tid] + 1.0f;
}
}
Sometimes the reported time between start
and end
ends up being negative.
Might be due to counter wrapping?
I get these warnings (with default logging settings) dumped out when building tests:
CHIP warning [TID 54662] [1660033951.029736425] : Ignoring alignment. Using hardcoded value 0x1000
CHIP warning [TID 54662] [1660033951.029825009] : Usigned zeMallocHost instead of zeMallocShared due to outstanding bug
CHIP warning [TID 54662] [1660033951.211279355] : Remaining 0 events that haven't been collected:
CHIP warning [TID 54662] [1660033951.211318094] : Remaining 0 command lists that haven't been collected:
CHIP warning [TID 54665] [1660033951.237032246] : Ignoring alignment. Using hardcoded value 0x1000
CHIP warning [TID 54665] [1660033951.237127111] : Usigned zeMallocHost instead of zeMallocShared due to outstanding bug
CHIP warning [TID 54665] [1660033951.441705646] : Remaining 0 events that haven't been collected:
CHIP warning [TID 54665] [1660033951.441741497] : Remaining 0 command lists that haven't been collected:
Building the HIP-Common tests fail after the merge of #83:
$ make build_tests_standalone -j1 VERBOSE=1
...
[ 5%] Building CXX object catch/stress/memory/CMakeFiles/memcpy.dir/memcpy.cc.o
cd /mnt/md1/linehill/chip-spv-space/builds/chip-spv/catch/stress/memory && /mnt/md1/linehill/chip-spv-space/install/bin/clang++ -I/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include -I/mnt/md1/linehill/chip-spv-space/install/include -I/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/external/picojson -I/HIP/include -I/include -I/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/external/Catch2/include -I/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/include -I/mnt/md1/linehill/chip-spv-space/chip-spv/include -I/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/external/Catch2/single_include -g -gdwarf-4 -O0 -Wno-duplicate-decl-specifier -Wno-tautological-constant-compare -Wno-c++20-extensions -Wno-unused-result -Wno-delete-abstract-non-virtual-dtor -Wno-deprecated-declarations -Wunused-command-line-argument -g -fPIE -Wno-format-extra-args -Wall -pthread -std=c++17 -MD -MT catch/stress/memory/CMakeFiles/memcpy.dir/memcpy.cc.o -MF CMakeFiles/memcpy.dir/memcpy.cc.o.d -o CMakeFiles/memcpy.dir/memcpy.cc.o -c /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:1:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:24:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_context.hh:24:
/mnt/md1/linehill/chip-spv-space/install/include/hip/hip_runtime.h:77:2: error: ("Must define exactly one of __HIP_PLATFORM_AMD__, __HIP_PLATFORM_NVIDIA__ or __HIP_PLATFORM_SPIRV__");
#error("Must define exactly one of __HIP_PLATFORM_AMD__, __HIP_PLATFORM_NVIDIA__ or __HIP_PLATFORM_SPIRV__");
^
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:1:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:24:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_context.hh:24:
In file included from /mnt/md1/linehill/chip-spv-space/install/include/hip/hip_runtime.h:124:
/mnt/md1/linehill/chip-spv-space/install/include/hip/hip_runtime_api.h:522:2: error: ("Must define exactly one of __HIP_PLATFORM_AMD__, __HIP_PLATFORM_NVIDIA__ or __HIP_PLATFORM_SPIRV__");
#error("Must define exactly one of __HIP_PLATFORM_AMD__, __HIP_PLATFORM_NVIDIA__ or __HIP_PLATFORM_SPIRV__");
^
/mnt/md1/linehill/chip-spv-space/install/include/hip/hip_runtime_api.h:5732:61: error: use of undeclared identifier 'hipHostMallocDefault'
unsigned int flags = hipHostMallocDefault) {
^
/mnt/md1/linehill/chip-spv-space/install/include/hip/hip_runtime_api.h:5738:64: error: use of undeclared identifier 'hipMemAttachGlobal'
unsigned int flags = hipMemAttachGlobal) {
^
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:1:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:24:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_context.hh:24:
In file included from /mnt/md1/linehill/chip-spv-space/install/include/hip/hip_runtime.h:125:
/mnt/md1/linehill/chip-spv-space/install/include/hip/library_types.h:44:2: error: ("Must define exactly one of __HIP_PLATFORM_AMD__, __HIP_PLATFORM_NVIDIA__ or __HIP_PLATFORM_SPIRV__");
#error("Must define exactly one of __HIP_PLATFORM_AMD__, __HIP_PLATFORM_NVIDIA__ or __HIP_PLATFORM_SPIRV__");
^
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:1:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:24:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_context.hh:24:
In file included from /mnt/md1/linehill/chip-spv-space/install/include/hip/hip_runtime.h:127:
/mnt/md1/linehill/chip-spv-space/install/include/hip/hip_vector_types.h:46:2: error: ("Must define exactly one of __HIP_PLATFORM_AMD__, __HIP_PLATFORM_NVIDIA__ or __HIP_PLATFORM_SPIRV__");
#error("Must define exactly one of __HIP_PLATFORM_AMD__, __HIP_PLATFORM_NVIDIA__ or __HIP_PLATFORM_SPIRV__");
^
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:1:
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:24:
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_context.hh:52:2: error: "Platform not recognized"
#error "Platform not recognized"
^
In file included from /mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:1:
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:80:13: error: use of undeclared identifier 'hipGetDeviceCount'
HIP_CHECK(hipGetDeviceCount(&dev));
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:80:3: error: use of undeclared identifier 'hipGetErrorString'
HIP_CHECK(hipGetDeviceCount(&dev));
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:34:25: note: expanded from macro 'HIP_CHECK'
INFO("Error: " << hipGetErrorString(localError) << " Code: " << localError << " Str: " \
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:96:13: error: use of undeclared identifier 'hipGetDevice'
HIP_CHECK(hipGetDevice(&device));
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:96:3: error: use of undeclared identifier 'hipGetErrorString'
HIP_CHECK(hipGetDevice(&device));
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:34:25: note: expanded from macro 'HIP_CHECK'
INFO("Error: " << hipGetErrorString(localError) << " Code: " << localError << " Str: " \
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:98:13: error: use of undeclared identifier 'hipGetDeviceProperties'
HIP_CHECK(hipGetDeviceProperties(&props, device));
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:98:3: error: use of undeclared identifier 'hipGetErrorString'
HIP_CHECK(hipGetDeviceProperties(&props, device));
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/./include/hip_test_common.hh:34:25: note: expanded from macro 'HIP_CHECK'
INFO("Error: " << hipGetErrorString(localError) << " Code: " << localError << " Str: " \
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:8:5: error: use of undeclared identifier 'hipFree'
hipFree(d_a);
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:14:5: error: use of undeclared identifier 'hipFree'
hipFree(d_a);
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:20:5: error: use of undeclared identifier 'hipFree'
hipFree(d_a);
^
/mnt/md1/linehill/chip-spv-space/chip-spv/HIP/tests/catch/stress/memory/memcpy.cc:26:5: error: use of undeclared identifier 'hipFree'
hipFree(d_a);
^
17 errors generated.
catch/stress/memory/CMakeFiles/memcpy.dir/build.make:75: recipe for target 'catch/stress/memory/CMakeFiles/memcpy.dir/memcpy.cc.o' failed
As seen in the log the compilation step is missing the necessary HIP flags (-D__HIP_PLATFORM_SPIRV__, -x hip, --offload=spirv64 etc). By a glance the commit dee2dea seems to be the culprit. The commit removes the OFFLOAD_ARCH_STR CMake variable which was used to pass the needed flags to the HIP-Common's test framework CMake system.
This assert doesn't fire:
assert(ChipDev->AllocationTracker && "AllocationTracker was not created!");
if (!ChipDev->AllocationTracker->reserveMem(Size))
return nullptr;
When building the sources against libstdc++ v12 (v11 still works) there is a problem when compiling the CUDA examples due a macro define of noinline and the fact that the shared_ptr_base.h header uses __attribute__((__noinline__)):
[ 80%] Building CXX object samples/cuda_samples/CMakeFiles/cuda-bandwidthTest.dir/1_Utilities/bandwidthTest/bandwidthTest.cu.o
In file included from /home/pjaaskel/src/chip-spv/samples/cuda_samples/1_Utilities/bandwidthTest/bandwidthTest.cu:31:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/memory:77:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:53:
/usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:196:22: error: use of undeclared identifier 'noinline'; did you mean 'inline'?
__attribute__((__noinline__))
^
/home/pjaaskel/src/chip-spv/include/hip/spirv_hip.hh:40:37: note: expanded from macro '__noinline__'
#define __noinline__ __attribute__((noinline))
^
In file included from /home/pjaaskel/src/chip-spv/samples/cuda_samples/1_Utilities/bandwidthTest/bandwidthTest.cu:31:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/memory:77:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:53:
/usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:196:22: error: type name does not allow function specifier to be specified
/home/pjaaskel/src/chip-spv/include/hip/spirv_hip.hh:40:37: note: expanded from macro '__noinline__'
#define __noinline__ __attribute__((noinline))
^
In file included from /home/pjaaskel/src/chip-spv/samples/cuda_samples/1_Utilities/bandwidthTest/bandwidthTest.cu:31:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/memory:77:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:53:
/usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:196:22: error: expected expression
/home/pjaaskel/src/chip-spv/include/hip/spirv_hip.hh:40:46: note: expanded from macro '__noinline__'
#define __noinline__ __attribute__((noinline))
^
3 errors generated when compiling for .
Are the #defines for these HIP keywords (still) necessary in spirv_hip.hh and hip_runtime.api.h or can we remove them? Shouldn't clang frontend bring those in when building in HIP mode?
hipError_t hipMemset2DAsync(void *Dst, size_t Pitch, int Value, size_t Width,
size_t Height, hipStream_t Stream) {
CHIP_TRY
CHIPInitialize();
NULLCHECK(Dst);
hipError_t Res = hipSuccess;
for (int i = 0; i < Height; i++) {
size_t SizeBytes = Width * sizeof(int);
auto Offset = Pitch * i;
char *DstP = (char *)Dst;
auto Res = hipMemset(DstP + Offset, Value, SizeBytes);
if (Res != hipSuccess)
break;
}
RETURN(Res);
CHIP_CATCH
}
hipMemset
is synchronous
Seems to affect both L0 and OCL
642 - stream (Failed)
220 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThreadMultiStream - int (Subprocess aborted) (transient) (only when ctest -j)
221 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThreadMultiStream - float (Subprocess aborted) (transient) (only when ctest -j)
222 - Unit_hipMemcpyAsync_hipMultiMemcpyMultiThreadMultiStream - double (Subprocess aborted) (transient) (only when ctest -j)
Filters: Unit_hipEvent
test: starting sequence with sizeBytes=40000000 bytes, 38.15 MB
test 0x1001: stream=0 waitStart=0 syncMode=syncNone
time= 0.00 error=hipErrorNotReady
negtime= 0.00 error=hipErrorNotReady
test: OK
test 0x1002: stream=0xab48f0 waitStart=0 syncMode=syncNone
time= 0.00 error=hipErrorNotReady
negtime= 0.00 error=hipErrorNotReady
test: OK
test 0x1004: stream=0 waitStart=0x1 syncMode=syncStream
time=133.07 error=hipSuccess
negtime=-133.07 error=hipSuccess
test: OK
test 0x1008: stream=0xab48f0 waitStart=0x1 syncMode=syncStream
time=273.69 error=hipSuccess
negtime=-273.69 error=hipSuccess
test: OK
test 0x1010: stream=0 waitStart=0x1 syncMode=syncStopEvent
time=131.93 error=hipSuccess
negtime=-131.93 error=hipSuccess
test: OK
test 0x1020: stream=0xab48f0 waitStart=0x1 syncMode=syncStopEvent
time=115.68 error=hipSuccess
negtime=-115.68 error=hipSuccess
test: OK
test 0x1: stream=0 waitStart=0 syncMode=syncNone
time= 0.00 error=hipErrorNotReady
negtime= 0.00 error=hipErrorNotReady
test: OK
test 0x2: stream=0xab48f0 waitStart=0 syncMode=syncNone
time= 0.00 error=hipErrorNotReady
negtime= 0.00 error=hipErrorNotReady
test: OK
test 0x4: stream=0 waitStart=0 syncMode=syncStream
time=131.89 error=hipSuccess
negtime=-131.89 error=hipSuccess
test: OK
test 0x8: stream=0xab48f0 waitStart=0 syncMode=syncStream
time=269.45 error=hipSuccess
negtime=-269.45 error=hipSuccess
test: OK
test 0x10: stream=0 waitStart=0 syncMode=syncStopEvent
time=131.66 error=hipSuccess
negtime=-131.66 error=hipSuccess
test: OK
test 0x20: stream=0xab48f0 waitStart=0 syncMode=syncStopEvent
Unit_hipEvent: /gpfs/jlse-fs0/users/pvelesko/CHIP-SPV/HIP/tests/catch/unit/event/Unit_hipEvent.cc:123: void test(unsigned int, int *, int *, int64_t, hipStream_t, int, SyncMode): Assertion `t > 0.0f' failed.
The following test cases fail on Level0 after faec3a6:
64 - cuda-simpleCallback (Subprocess aborted)
72 - cuda-qrng (Failed)
77 - cuda-FDTD3d (SEGFAULT)
~/chip-spv$ git submodule update --init --recursive
Submodule 'HIP' ([email protected]:CHIP-SPV/HIP.git) registered for path 'HIP'
Submodule 'bitcode/ROCm-Device-Libs' (https://github.com/RadeonOpenCompute/ROCm-Device-Libs.git) registered for path 'bitcode/ROCm-Device-Libs'
Submodule 'hip-testsuite' (https://github.com/CHIP-SPV/hip-testsuite.git) registered for path 'hip-testsuite'
Cloning into '/path/to/chip-spv/HIP'...
fatal: Could not read from remote repository.
I tried to build hipblas-mkl library without using CMake on Intel DevCloud. The flag "-fdeclare-spriv-builtins' is not available in hipcc or clang++. Or did I build the example incorrectly? Thanks.
../llvm-install/bin/clang++ -std=c++17 src/hipblas-stub.cpp src/GemmLib/src/gemm_mkl.cpp -Isrc/GemmLib/include/ -I/glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/sycl/ -I/glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/ -o hipblas-mkl.o -fsycl
In file included from /glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/sycl/CL/sycl/backend/level_zero.hpp:16:
In file included from /glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/sycl/ext/oneapi/backend/level_zero.hpp:11:
In file included from /glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/sycl/CL/sycl/backend.hpp:11:
In file included from /glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/sycl/CL/sycl/accessor.hpp:12:
In file included from /glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/sycl/CL/sycl/atomic.hpp:11:
/glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/sycl/CL/__spirv/spirv_ops.hpp:112:2: error: "SPIR-V built-ins are not available. Please set -fdeclare-spirv-builtins flag."
#error \
^
/glob/development-tools/versions/oneapi/2022.2/oneapi/compiler/2022.1.0/linux/include/sycl/CL/__spirv/spirv_ops.hpp:132:63: error: unknown type name '__ocl_sampler_t'
extern SYCL_EXTERNAL SampledType __spirv_SampledImage(ImageT, __ocl_sampler_t);
^
Invalid address space
The test tests long running kernels and causes hangs (likely kernel mode busy loops) with a shorter duration to some, longer to some. In this laptop I once waited for 10 minutes for the Linux to wake up before hard power off. I added it to the flaky_tests file for now in #111.
There is a dependency problem in the build files: the parallel build fails on not finding kernellib.bc. Serial build finished OK. Can be reproduced with 'make -j4'.
Referring to #96, I think the current way of relying to symlinks to the correct LLVM binaries calls for trouble as those symlinks can change over time. We should use the direct paths to LLVM binaries by default, and query that using the (given) llvm-config via -DLLVM_CONFIG in cmake.
Add a CHANGES for change logging.
A lot tests result in failures due to CL_INVALID_KERNEL_ARGS
when using clang14 + OpenCL. This is not the case for Clang13
CHIPKernelOpenCL::CHIPKernelOpenCL(const cl::Kernel &&, std::string, OCLFuncInfo *): Assertion `FuncInfo_->ArgTypeInfo.size() == NumArgs' failed.
This is not the case with the default LLVM binary packages for Debian/Ubuntu, which has the major version number appended (for example clang++-14).
This issue is related to compiling Kokkos. AMD has it implemented like so:
#define launch_bounds_impl0(requiredMaxThreadsPerBlock) \
__attribute__((amdgpu_flat_work_group_size(1, requiredMaxThreadsPerBlock)))
#define launch_bounds_impl1(requiredMaxThreadsPerBlock, minBlocksPerMultiprocessor) \
__attribute__((amdgpu_flat_work_group_size(1, requiredMaxThreadsPerBlock), \
amdgpu_waves_per_eu(minBlocksPerMultiprocessor)))
#define select_impl_(_1, _2, impl_, ...) impl_
#define __launch_bounds__(...) \
select_impl_(__VA_ARGS__, launch_bounds_impl1, launch_bounds_impl0)(__VA_ARGS__)
Currently any Kokkos::HIP application segfaults during processing of a SPIR-V Structure.
if (Opcode_ == spv::Op::OpTypeStruct) {
size_t TotalSize = 0;
for (size_t i = 2; i < WordCount_; ++i) {
int32_t MemberId = OrigStream_[i];
TotalSize += TypeMap[MemberId]->size();
}
return new SPIRVtypePOD(Word1_, TotalSize);
}
Furthermore TypeMap
does not contain all the POD sizes which causes a segfault.
Now the BE defaults to OpenCL and if it fails (e.g. no OpenCL drivers installed), it actually gets stuck. It could quite easily try also L0 as a fallback?
Some of the currently failing samples:
96 - abort (Failed)
135 - PrintfSimple (Failed)
136 - PrintfNOP (Failed)
137 - PrintfDynamic (Failed)
This seems a bit too simple to work, just raising the issue here before I forget:
https://github.com/CHIP-SPV/chip-spv/blob/6487cbf143004a6f9abaa3af85dd8910040cda66/llvm_passes/HipPrintf.cpp#L65-L67
Correct me if I am wrong but wouldn't this give different or wrong results for "%%%"
and "%% %"
?
"%%%"
should give 2 or 1 depending if count is implemented with a greedy regexp or not respectively and "%% %"
should yield 2 irrespective of count implementation, which would be wrong.
This is extracted from Issue #135.
Currently when targeting a device without double precision support (which is optional in OpenCL), the failure will happen rather late, when lowering the SPIR-V to the target ISA or when linking in the bitcode lib (if using double precision builtins).
Strictly put we should match what CUDA does when there is no double support. I'm not sure if the CUDA implementations are assumed to software emulate doubles if there is no native ISA support. They are supported for compute capability 7.0 and higher devices, but what is expected to happen when they are not? Some integrated/mobile GPUs might not support doubles.
I belive the client is expected to query the CC before using kernels with doubles. Thus, we could improve here by querying for the support via OpenCL and printing a warning if it's not, or lowering the compute capability expectation in that case.
A nuance of this is that even if the host program never called a function which used doubles, it will fail, thus this prevents "portable" host programs that query for the double support and fallback to a non-double function from working. I do not know if this type of programming pattern is supported in CUDA either. Making it work would require true just-in-time compilation of functions that are actually needed by the called kernels, without building all of them "ahead of time" (or using call graph analysis or such to figure out which ones need to be lowered to target ISA). This is why it affects also the int and float variants of the tests.
Most the test failures listed in Issue #135 are caused by this.
hipcc ./stream.cpp -o stream -fsanitize=thread
InvalidFunctionCall: Unexpected llvm intrinsic:
llvm.returnaddress
clang-14: error: hipspv-link command failed with exit code 6 (use -v to see invocation)
compiled with -DLLVM_ENABLE_SHARED_LIBS=ON
It finishes with OpenCL. Some sort of mutex issue at deinit?
CHIP_BE=level0 gdb --args /home/pjaaskel/src/chip-spv/build/samples/hipKernelLaunchIsNonBlocking/hipKernelLaunchIsNonBlocking
GNU gdb (Ubuntu 12.0.90-0ubuntu1) 12.0.90
...
info: running on device Intel(R) Iris(R) Xe Graphics [0x9a49]
info: copy Host2Device
CHIP error [TID 4574] [1662633912.175776953] : hipErrorNotReady (Event Not Ready) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:379:updateFinishStatus
CHIP error [TID 4574] [1662633912.175984574] : Caught Error: hipErrorNotReady
CHIP error [TID 4574] [1662633912.176015899] : hipErrorNotReady (Event Not Ready) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:379:updateFinishStatus
CHIP error [TID 4574] [1662633912.176031293] : Caught Error: hipErrorNotReady
Launching kernel
Kernel launched successfully
CHIP error [TID 4574] [1662633912.177256979] : hipErrorNotReady (Event Not Ready) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:379:updateFinishStatus
CHIP error [TID 4574] [1662633912.177287781] : Caught Error: hipErrorNotReady
CHIP error [TID 4574] [1662633912.177305132] : hipErrorNotReady (Event Not Ready) in /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:379:updateFinishStatus
CHIP error [TID 4574] [1662633912.177318504] : Caught Error: hipErrorNotReady
Kernel time: 0s
PASSED!
[Thread 0x7ffff55fe640 (LWP 4579) exited]
^C
Thread 1 "hipKernelLaunch" received signal SIGINT, Interrupt.
__futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=4578, futex_word=0x7ffff5dff910) at ./nptl/futex-internal.c:57
57 ./nptl/futex-internal.c: No such file or directory.
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=4578, futex_word=0x7ffff5dff910) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=4578, futex_word=0x7ffff5dff910) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffff5dff910, expected=4578, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128)
at ./nptl/futex-internal.c:139
#3 0x00007ffff755b6a4 in __pthread_clockjoin_ex (threadid=140737318483520, thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at ./nptl/pthread_join_common.c:105
#4 0x00007ffff7f2b4fd in CHIPEventMonitor::join (this=0x5555559ac6a0) at /home/pjaaskel/src/chip-spv/src/CHIPBackend.hh:316
#5 0x00007ffff7f1eadc in CHIPBackendLevel0::uninitialize (this=0x555555571350) at /home/pjaaskel/src/chip-spv/src/backend/Level0/CHIPBackendLevel0.cc:1336
#6 0x00007ffff7e544f4 in CHIPUninitializeCallOnce () at /home/pjaaskel/src/chip-spv/src/CHIPDriver.cc:152
#7 0x00007ffff7e56f72 in std::__invoke_impl<void, void (*)()> (__f=@0x7fffffffdcf8: 0x7ffff7e544e0 <CHIPUninitializeCallOnce()>)
at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
#8 0x00007ffff7e56f55 in std::__invoke<void (*)()> (__fn=@0x7fffffffdcf8: 0x7ffff7e544e0 <CHIPUninitializeCallOnce()>)
at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
#9 0x00007ffff7e56f38 in std::call_once<void (*)()>(std::once_flag&, void (*&&)())::{lambda()#1}::operator()() const (this=0x7fffffffdcc8)
at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:776
#10 0x00007ffff7e56f14 in std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<void (*)()>(std::once_flag&, void (*&&)())::{lambda()#1}>(void (*&)())::{lambda()#1}::operator()() const (
this=0x7ffff79f9760) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:712
#11 0x00007ffff7e56ee9 in std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<void (*)()>(std::once_flag&, void (*&&)())::{lambda()#1}>(void (*&)())::{lambda()#1}::__invoke() ()
at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:712
#12 0x00007ffff755ef68 in __pthread_once_slow (once_control=0x7ffff7fbad58 <Uninitialized>, init_routine=0x7ffff78aedc0 <__once_proxy>) at ./nptl/pthread_once.c:116
#13 0x00007ffff7e54bcb in __gthread_once (__once=0x7ffff7fbad58 <Uninitialized>, __func=0x7ffff78aedc0 <__once_proxy>)
at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:700
#14 0x00007ffff7e54d15 in std::call_once<void (*)()> (__once=..., __f=@0x7fffffffdcf8: 0x7ffff7e544e0 <CHIPUninitializeCallOnce()>)
at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:783
#15 0x00007ffff7e54523 in CHIPUninitialize () at /home/pjaaskel/src/chip-spv/src/CHIPDriver.cc:156
#16 0x00007ffff7ebb2a7 in __hipUnregisterFatBinary (Data=0x5555559992d0) at /home/pjaaskel/src/chip-spv/src/CHIPBindings.cc:3200
#17 0x0000555555556d1f in __hip_module_dtor ()
#18 0x00007ffff750a495 in __run_exit_handlers (status=0, listp=0x7ffff76de838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:113
#19 0x00007ffff750a610 in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:143
#20 0x00007ffff74eed97 in __libc_start_call_main (main=main@entry=0x5555555563b0 <main()>, argc=argc@entry=1, argv=argv@entry=0x7fffffffdf08) at ../sysdeps/nptl/libc_start_call_main.h:74
#21 0x00007ffff74eee40 in __libc_start_main_impl (main=0x5555555563b0 <main()>, argc=1, argv=0x7fffffffdf08, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffdef8) at ../csu/libc-start.c:392
#22 0x0000555555556265 in _start ()
I think most of the source files miss the license blurbs.
multi-threaded callback tests fail/hang
Specifically list the missing features so the users know what should work and what not.
Doesn't happen very often
Iris backtrace#1:
(gdb) bt
#0 0x00007ffff6475cdb in raise () from /lib64/libc.so.6
#1 0x00007ffff6477375 in abort () from /lib64/libc.so.6
#2 0x00007ffff64bbb07 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff64c3b8a in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff64c45ec in malloc_consolidate () from /lib64/libc.so.6
#5 0x00007ffff64c6bf0 in _int_malloc () from /lib64/libc.so.6
#6 0x00007ffff64c8638 in malloc () from /lib64/libc.so.6
#7 0x00007ffff70538d5 in operator new (sz=1016) at /dev/shm/spack-stage-servesh/spack-stage-gcc-10.2.0-yudlyezca7twgd5o3wkkraur7wdbngdn/spack-src/libstdc++-v3/libsupc++/new_op.cc:50
#8 0x00007ffff5763533 in L0::CommandList::Allocator<L0::CommandListProductFamily<(PRODUCT_FAMILY)18> >::allocate(unsigned int) () from /soft/libraries/intel-gpu-umd/f81b779-2022.06.09/driver/lib64/libze_intel_gpu.so.1
#9 0x00007ffff579fe2b in L0::CommandList::create(unsigned int, L0::Device*, NEO::EngineGroupType, unsigned int, _ze_result_t&) () from /soft/libraries/intel-gpu-umd/f81b779-2022.06.09/driver/lib64/libze_intel_gpu.so.1
#10 0x00007ffff57a8ab8 in L0::DeviceImp::createCommandList(_ze_command_list_desc_t const*, _ze_command_list_handle_t**) () from /soft/libraries/intel-gpu-umd/f81b779-2022.06.09/driver/lib64/libze_intel_gpu.so.1
#11 0x00007ffff789797c in CHIPQueueLevel0::getCmdListCompute (this=this@entry=0xc3ce10) at /home/pvelesko/space/CHIP-SPV/src/backend/Level0/CHIPBackendLevel0.cc:717
#12 0x00007ffff78a2391 in CHIPQueueLevel0::enqueueBarrierImpl (this=0xc3ce10, EventsToWaitFor=<optimized out>) at /home/pvelesko/space/CHIP-SPV/src/backend/Level0/CHIPBackendLevel0.cc:1121
#13 0x00007ffff782f2f8 in CHIPContext::syncQueues (this=<optimized out>, TargetQueue=0x2) at /home/pvelesko/space/CHIP-SPV/src/CHIPBackend.cc:1009
#14 0x00007ffff7828222 in CHIPQueue::memCopyAsync (this=0xc3ce10, Dst=0xffffd556aa7b0000, Src=0x0, Size=140737325259995) at /home/pvelesko/space/CHIP-SPV/src/CHIPBackend.cc:1471
#15 0x00007ffff784ff94 in hipMemcpyAsync (Dst=0xffffd556aa7b0000, Src=0xffffd556aa7d0000, SizeBytes=131072, Kind=hipMemcpyDeviceToDevice, Stream=0x0) at /home/pvelesko/space/CHIP-SPV/src/CHIPBindings.cc:2075
#16 0x0000000000448c39 in Thread_func<int> (A_d=0xffffd556aa7d0000, B_d=0x2, C_d=<optimized out>, C_h=<optimized out>, Nbytes=131072, mystream=0xc3ce10) at /home/pvelesko/space/CHIP-SPV/HIP/tests/catch/unit/memory/hipMemcpyAsync.cc:54
#17 0x00007ffff707c4f0 in std::execute_native_thread_routine (__p=0xdef1d0) at /dev/shm/spack-stage-servesh/spack-stage-gcc-10.2.0-yudlyezca7twgd5o3wkkraur7wdbngdn/spack-src/libstdc++-v3/src/c++11/thread.cc:80
#18 0x00007ffff682a6ea in start_thread () from /lib64/libpthread.so.0
#19 0x00007ffff6542a8f in clone () from /lib64/libc.so.6
Iris backtracke#2:
Thread 261 "hipMemcpyAsync" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffe64748700 (LWP 27413)]
0x00007ffff6475cdb in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff6475cdb in raise () from /lib64/libc.so.6
#1 0x00007ffff6477375 in abort () from /lib64/libc.so.6
#2 0x00007ffff64bbb07 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff64c3b8a in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff64c5764 in _int_free () from /lib64/libc.so.6
#5 0x00007ffff64c8c8b in __malloc_arena_thread_freeres () from /lib64/libc.so.6
#6 0x00007ffff682a70f in start_thread () from /lib64/libpthread.so.0
#7 0x00007ffff6542a8f in clone () from /lib64/libc.so.6
Arcticus backtrace#1:
#0 0x00007ffff64c6c80 in __malloc_arena_thread_freeres () from /lib64/libc.so.6
#1 0x00007ffff682870f in start_thread () from /lib64/libpthread.so.0
#2 0x00007ffff6540a8f in clone () from /lib64/libc.so.6
seems like a memory leak of some kind, the test uses >20G memory with Level0 backend, with OpenCL backend it runs in ~135M or so.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.