Comments (35)
Tests for rocPRIM on 1.8.0 and 1.8.1 pass on both Vega and Fiji. They passed on previous ROCm versions too. If they don't pass on ROCm 1.8.2, there can be something wrong with ROCm 1.8.2, what do you think?
from rocprim.
Thanks @jszuppe
Let me try with ROCm1.8.1 and will try to find out some difference.
Thanks for super-fast response.
from rocprim.
Sure. There may be some breaking changes in 1.8.2 rocPRIM is not adjusted to, but there's no way adjusting to a version that is being developed. Btw. We tested it on MI25 (Vega) and s9300 (Fiji).
from rocprim.
@jszuppe
I tested with Fiji nano(2 GPUs) with ROCm1.8 and 1.8.1 also, 9 failures are still observed.
I dont have S9300 so can not comment on that.
from rocprim.
S9300 is just two Fiji cores on a single GPU, and I don't see any problems. I'll check if we can test it on Fiji Nano.
from rocprim.
Can you post hipconfig
output from environment you tested it on?
from rocprim.
Output of hipconfig:
HIP version : 1.5.18151
== hipconfig
HIP_PATH : /opt/rocm
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/include -I/opt/rocm/hcc/include
== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 7.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 86791fc4961dc8ffde77bde20d7dfa5e5cbeff5e) (ssh://gerritgit/compute/ec/hcc-tot/llvm 0ccef158132e1222d549edf2da33d4bc0be6c2d1) (based on HCC 1.2.18175-6e93604-86791fc-0ccef15 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: haswell
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc/include -I/opt/rocm/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive
=== Environment Variables
PATH=/home/taccuser/bin:/home/taccuser/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
== Linux Kernel
Hostname : ROCM-REL-VG10
Linux ROCM-REL-VG10 4.13.0-45-generic #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
from rocprim.
We have 3 problems here: warp_reduce, warp_scan, device_histogram. It looks like there is some problem with shared memory.
Let's focus on a single issue. This is a function that is executed for test RocprimWarpReduceTests/12.ReduceSum
. It is very simple reduction using shared memory a within single warp (that's why volatile
store/loads and no sync). We can't see why it should fail.
from rocprim.
How can I proceed further? Whats my next action?
from rocprim.
Tests work on Vega (yours and ours) and our s9300, so we're thinking it's not a bug in tests, otherwise it would fail everywhere. It also suggests that it's most likely not a bug in rocPRIM, because it would manifest on s9300 too since the same ISA is generated for s9300 and Fiji Nano (gfx830 version as far as I know). Maybe it's some kind of bug in ROCm drivers or something like that. We can't locate exact issue as we don't have Fiji Nano available right now.
Just to be sure, please send us ISA from test_hip_warp_reduce
, rocminfo
output and maybe the whole binary file test_hip_warp_reduce
.
from rocprim.
Sure..Shall I get ISA by setting KMDUMPISA=1 & KMDUMPLLVM=1
from rocprim.
You can just use extractkernel
from /opt/rocm/bin
from rocprim.
Thanks...Please find isa for gfx803.
test_hip_warp_reduce-gfx803.zip
from rocprim.
taccuser@ROCM-REL-FIJI:~/Desktop/rocprim/build/test/rocprim$ /opt/rocm/bin/rocminfo
HSA System Attributes
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
Agent 1
Name: Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0
Queue Min Size: 0
Queue Max Size: 0
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768KB
Chip ID: 0
Cacheline Size: 64
Max Clock Frequency (MHz):3500
BDFID: 0
Compute Unit: 16
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16323912KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16323912KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A
Agent 2
Name: gfx803
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 29440
Cacheline Size: 64
Max Clock Frequency (MHz):1000
BDFID: 2304
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 150995968
Dim[2]: 3698327552
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx803
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
Agent 3
Name: gfx803
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 29440
Cacheline Size: 64
Max Clock Frequency (MHz):1000
BDFID: 1280
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 83887104
Dim[2]: 3698327552
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx803
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
*** Done ***
from rocprim.
@jszuppe any update on this?
Issue is STILL observed with Fiji always with different ROCm driver packages.
from rocprim.
No update. Only common thing in these functions is that they use shared memory, but they are not only such functions.
Could you check this? (Old ROCm versions had some issues with synchronizations, I doubt it's the cause here, but it's worth trying)
HCC_OPT_FLUSH=0 HIP_LAUNCH_BLOCKING=1 test/rocprim/test_hip_warp_reduce
I've compared your ISA and ours they're exactly the same. Perhaps the binaries differ? Could you upload test/rocprim/test_hip_warp_reduce
? I'll try it on our 9300.
from rocprim.
Output on Fiji for HCC_OPT_FLUSH=0 HIP_LAUNCH_BLOCKING=1 test/rocprim/test_hip_warp_reduce:
Check the attachment for details log.
rocprim-warp_reduce.log
from rocprim.
Any update on this? How can we resolve this issue?
from rocprim.
I think we installed same gpu in our servers on Friday, so I'll check it next week.
from rocprim.
Sure, thank you
from rocprim.
Just now tested with ROCm1.9 Beta + rocprim with master branch, failure count on Fiji decreased to 2 only.
33 - rocprim.hc.warp_reduce (Failed)
34 - rocprim.hc.warp_scan (Failed)
from rocprim.
With latest roc-master I see test failures with rocprim.hip.device_scan on Fiji Nano as well.
from rocprim.
@ntrost57
roc-master? What are HCC and HIP versions it includes?
Could you also upload the test log?
from rocprim.
HIP version: 1.5.18413
HCC clang version 8.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang dfca79082cc28df5ad8395629e0f5aa43e37354a) (ssh://gerritgit/lightning/ec/llvm 84310fb4caf397ff0f4ba26feecb839c993e2f17) (based on HCC 1.3.18414-023e273-dfca790-84310fb )
~/rocPRIM/build$ ./test/rocprim/test_hip_device_scan
Running main() from /home/nico/rocPRIM/build/googletest-src/googletest/src/gtest_main.cc
[==========] Running 25 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 5 tests from RocprimDeviceScanTests/0, where TypeParam = DeviceScanParams<int, int, false>
[ RUN ] RocprimDeviceScanTests/0.InclusiveScanSumEmptyInput
[ OK ] RocprimDeviceScanTests/0.InclusiveScanSumEmptyInput (35 ms)
[ RUN ] RocprimDeviceScanTests/0.InclusiveScanSum
Memory access fault by GPU node-1 (Agent handle: 0x14c16e0) on address 0xd01ad5000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
from rocprim.
@ntrost57 That looks like a completely new bug. As this potentially is an HCC issue, I think we should first discuss it there to get some broader feedback. If we're sure this is not caused by HCC, a new issue can be created here.
from rocprim.
@ntrost57 @rkothako What is the status of this bug? Could you check with the development branch?
from rocprim.
Using develop branch:
HIP version: 1.5.18494
HCC clang version 8.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 6ec3c61e09fbb60373eaf5a40021eb862363ba2c) (ssh://gerritgit/lightning/ec/llvm ab3b88ffc2ae50f55361a49aec89f6e95d9d0ec4) (based on HCC 1.3.18482-757fb49-6ec3c61-ab3b88f )
~/rocPRIM/build$ ./test/rocprim/test_hip_device_scan
Running main() from /home/nico/rocPRIM/build/googletest-src/googletest/src/gtest_main.cc
[==========] Running 45 tests from 9 test suites.
[----------] Global test environment set-up.
[----------] 5 tests from RocprimDeviceScanTests/0, where TypeParam = DeviceScanParams<unsigned short, unsigned short, rocprim::plus, false>
[ RUN ] RocprimDeviceScanTests/0.InclusiveScanEmptyInput
[ OK ] RocprimDeviceScanTests/0.InclusiveScanEmptyInput (31 ms)
[ RUN ] RocprimDeviceScanTests/0.InclusiveScan
Memory access fault by GPU node-2 (Agent handle: 0xae9250) on address 0x16020f1000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
from rocprim.
We have tested on ROCM 2.0 (instead of 1.9) and there indeed "Memory access fault" pops up for various algorithms. We're looking into it.
from rocprim.
@ntrost57 if you have time, could you check this branch https://github.com/ROCmSoftwarePlatform/rocPRIM/tree/develop_stream ?
I suspect that there is a compiler bug, I have a workaround and all tests pass on our S9300 (Fiji) and ROCm 2.0.
from rocprim.
@ex-rzr I still observe same Memory access fault using develop_stream branch on Fiji.
from rocprim.
Which tests and what is your version of HCC?
from rocprim.
I was linking to the wrong library (been testing with external tests). All tests are passing now for me too. Thanks!
from rocprim.
@ntrost57, thank you!
I've opened an issue (ROCm/hcc#1024), I hope that my findings will help the compiler team.
from rocprim.
@ex-rzr would you mind describing your workaround? What did you need to change?
from rocprim.
master has been updated.
I'm closing the issue, but please report as soon as possible if you notice something strange again (the workaround is not a bullet-proof fix).
from rocprim.
Related Issues (20)
- Block_reduce fails to distribute correct answer to all lanes when hipBlockDim > 64 HOT 5
- Invalid use of inline assembly for GFX1010 target HOT 14
- Compiler Build issue HOT 1
- ROCm 3.9.1 fails to compile benchmark_warp_scan HOT 2
- error: reference to __host__ function 'inclusive_scan<rocprim::default_config, double *, double *, thrust::plus<double>>' in __host__ __device__ function HOT 2
- Ignoring return value warning HOT 2
- error: invalid operands to binary expression HOT 6
- Please enable two factor authentication in your github account
- Alternative documentation HOT 2
- error: no matching function for call to 'ceiling_div' HOT 2
- Add support for NAVI22 and NAVI23 i.e. gfx1031 and gfx1032 HOT 1
- rocPRIM reduction (block_reduce_int) issue HOT 2
- Follow the example of rocPRIM custom types example but compile failed. HOT 2
- rocPRIM on NAVI22 = gfx1031 only one test fails - can we fix this? HOT 6
- rocPRIM -- rocThrust dependency: error: no member named 'init_offset_scan_state_kernel' in rocprim::detail HOT 2
- rocPRIM 5.4.3 failed device_adjacent_difference on gfx906 HOT 4
- rocPRIM selects wrong code paths for warp reductions and scas on gfx1036 HOT 3
- Page fault during `rocprim::select` for no obvious reason. HOT 2
- `rocprim::block_load` fails to instantiate HOT 8
- first installation failed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rocprim.