Code Monkey home page Code Monkey logo

Comments (35)

jszuppe avatar jszuppe commented on June 7, 2024

Tests for rocPRIM on 1.8.0 and 1.8.1 pass on both Vega and Fiji. They passed on previous ROCm versions too. If they don't pass on ROCm 1.8.2, there can be something wrong with ROCm 1.8.2, what do you think?

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

Thanks @jszuppe
Let me try with ROCm1.8.1 and will try to find out some difference.

Thanks for super-fast response.

from rocprim.

jszuppe avatar jszuppe commented on June 7, 2024

Sure. There may be some breaking changes in 1.8.2 rocPRIM is not adjusted to, but there's no way adjusting to a version that is being developed. Btw. We tested it on MI25 (Vega) and s9300 (Fiji).

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

@jszuppe
I tested with Fiji nano(2 GPUs) with ROCm1.8 and 1.8.1 also, 9 failures are still observed.
I dont have S9300 so can not comment on that.

from rocprim.

jszuppe avatar jszuppe commented on June 7, 2024

S9300 is just two Fiji cores on a single GPU, and I don't see any problems. I'll check if we can test it on Fiji Nano.

from rocprim.

jszuppe avatar jszuppe commented on June 7, 2024

Can you post hipconfig output from environment you tested it on?

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

Output of hipconfig:

HIP version : 1.5.18151

== hipconfig
HIP_PATH : /opt/rocm
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/include -I/opt/rocm/hcc/include

== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 7.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 86791fc4961dc8ffde77bde20d7dfa5e5cbeff5e) (ssh://gerritgit/compute/ec/hcc-tot/llvm 0ccef158132e1222d549edf2da33d4bc0be6c2d1) (based on HCC 1.2.18175-6e93604-86791fc-0ccef15 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: haswell

Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc/include -I/opt/rocm/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

=== Environment Variables
PATH=/home/taccuser/bin:/home/taccuser/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

== Linux Kernel
Hostname : ROCM-REL-VG10
Linux ROCM-REL-VG10 4.13.0-45-generic #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

from rocprim.

jszuppe avatar jszuppe commented on June 7, 2024

We have 3 problems here: warp_reduce, warp_scan, device_histogram. It looks like there is some problem with shared memory.

Let's focus on a single issue. This is a function that is executed for test RocprimWarpReduceTests/12.ReduceSum. It is very simple reduction using shared memory a within single warp (that's why volatile store/loads and no sync). We can't see why it should fail.

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

How can I proceed further? Whats my next action?

from rocprim.

jszuppe avatar jszuppe commented on June 7, 2024

Tests work on Vega (yours and ours) and our s9300, so we're thinking it's not a bug in tests, otherwise it would fail everywhere. It also suggests that it's most likely not a bug in rocPRIM, because it would manifest on s9300 too since the same ISA is generated for s9300 and Fiji Nano (gfx830 version as far as I know). Maybe it's some kind of bug in ROCm drivers or something like that. We can't locate exact issue as we don't have Fiji Nano available right now.

Just to be sure, please send us ISA from test_hip_warp_reduce, rocminfo output and maybe the whole binary file test_hip_warp_reduce.

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

Sure..Shall I get ISA by setting KMDUMPISA=1 & KMDUMPLLVM=1

from rocprim.

jszuppe avatar jszuppe commented on June 7, 2024

You can just use extractkernel from /opt/rocm/bin

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

Thanks...Please find isa for gfx803.
test_hip_warp_reduce-gfx803.zip

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

taccuser@ROCM-REL-FIJI:~/Desktop/rocprim/build/test/rocprim$ /opt/rocm/bin/rocminfo

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents


Agent 1


Name: Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0
Queue Min Size: 0
Queue Max Size: 0
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768KB
Chip ID: 0
Cacheline Size: 64
Max Clock Frequency (MHz):3500
BDFID: 0
Compute Unit: 16
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16323912KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16323912KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A


Agent 2


Name: gfx803
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 29440
Cacheline Size: 64
Max Clock Frequency (MHz):1000
BDFID: 2304
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 150995968
Dim[2]: 3698327552
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx803
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32


Agent 3


Name: gfx803
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 29440
Cacheline Size: 64
Max Clock Frequency (MHz):1000
BDFID: 1280
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 83887104
Dim[2]: 3698327552
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx803
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
*** Done ***

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

@jszuppe any update on this?
Issue is STILL observed with Fiji always with different ROCm driver packages.

from rocprim.

ex-rzr avatar ex-rzr commented on June 7, 2024

No update. Only common thing in these functions is that they use shared memory, but they are not only such functions.

Could you check this? (Old ROCm versions had some issues with synchronizations, I doubt it's the cause here, but it's worth trying)

HCC_OPT_FLUSH=0 HIP_LAUNCH_BLOCKING=1 test/rocprim/test_hip_warp_reduce

I've compared your ISA and ours they're exactly the same. Perhaps the binaries differ? Could you upload test/rocprim/test_hip_warp_reduce? I'll try it on our 9300.

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

Output on Fiji for HCC_OPT_FLUSH=0 HIP_LAUNCH_BLOCKING=1 test/rocprim/test_hip_warp_reduce:

Check the attachment for details log.
rocprim-warp_reduce.log

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

Any update on this? How can we resolve this issue?

from rocprim.

jszuppe avatar jszuppe commented on June 7, 2024

I think we installed same gpu in our servers on Friday, so I'll check it next week.

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

Sure, thank you

from rocprim.

rkothako avatar rkothako commented on June 7, 2024

Just now tested with ROCm1.9 Beta + rocprim with master branch, failure count on Fiji decreased to 2 only.
33 - rocprim.hc.warp_reduce (Failed)
34 - rocprim.hc.warp_scan (Failed)

from rocprim.

ntrost57 avatar ntrost57 commented on June 7, 2024

With latest roc-master I see test failures with rocprim.hip.device_scan on Fiji Nano as well.

from rocprim.

ex-rzr avatar ex-rzr commented on June 7, 2024

@ntrost57
roc-master? What are HCC and HIP versions it includes?
Could you also upload the test log?

from rocprim.

ntrost57 avatar ntrost57 commented on June 7, 2024

HIP version: 1.5.18413
HCC clang version 8.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang dfca79082cc28df5ad8395629e0f5aa43e37354a) (ssh://gerritgit/lightning/ec/llvm 84310fb4caf397ff0f4ba26feecb839c993e2f17) (based on HCC 1.3.18414-023e273-dfca790-84310fb )

~/rocPRIM/build$ ./test/rocprim/test_hip_device_scan
Running main() from /home/nico/rocPRIM/build/googletest-src/googletest/src/gtest_main.cc
[==========] Running 25 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 5 tests from RocprimDeviceScanTests/0, where TypeParam = DeviceScanParams<int, int, false>
[ RUN ] RocprimDeviceScanTests/0.InclusiveScanSumEmptyInput
[ OK ] RocprimDeviceScanTests/0.InclusiveScanSumEmptyInput (35 ms)
[ RUN ] RocprimDeviceScanTests/0.InclusiveScanSum
Memory access fault by GPU node-1 (Agent handle: 0x14c16e0) on address 0xd01ad5000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

from rocprim.

VincentSC avatar VincentSC commented on June 7, 2024

@ntrost57 That looks like a completely new bug. As this potentially is an HCC issue, I think we should first discuss it there to get some broader feedback. If we're sure this is not caused by HCC, a new issue can be created here.

from rocprim.

VincentSC avatar VincentSC commented on June 7, 2024

@ntrost57 @rkothako What is the status of this bug? Could you check with the development branch?

from rocprim.

ntrost57 avatar ntrost57 commented on June 7, 2024

Using develop branch:

HIP version: 1.5.18494
HCC clang version 8.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 6ec3c61e09fbb60373eaf5a40021eb862363ba2c) (ssh://gerritgit/lightning/ec/llvm ab3b88ffc2ae50f55361a49aec89f6e95d9d0ec4) (based on HCC 1.3.18482-757fb49-6ec3c61-ab3b88f )

~/rocPRIM/build$ ./test/rocprim/test_hip_device_scan
Running main() from /home/nico/rocPRIM/build/googletest-src/googletest/src/gtest_main.cc
[==========] Running 45 tests from 9 test suites.
[----------] Global test environment set-up.
[----------] 5 tests from RocprimDeviceScanTests/0, where TypeParam = DeviceScanParams<unsigned short, unsigned short, rocprim::plus, false>
[ RUN ] RocprimDeviceScanTests/0.InclusiveScanEmptyInput
[ OK ] RocprimDeviceScanTests/0.InclusiveScanEmptyInput (31 ms)
[ RUN ] RocprimDeviceScanTests/0.InclusiveScan
Memory access fault by GPU node-2 (Agent handle: 0xae9250) on address 0x16020f1000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

from rocprim.

VincentSC avatar VincentSC commented on June 7, 2024

We have tested on ROCM 2.0 (instead of 1.9) and there indeed "Memory access fault" pops up for various algorithms. We're looking into it.

from rocprim.

ex-rzr avatar ex-rzr commented on June 7, 2024

@ntrost57 if you have time, could you check this branch https://github.com/ROCmSoftwarePlatform/rocPRIM/tree/develop_stream ?

I suspect that there is a compiler bug, I have a workaround and all tests pass on our S9300 (Fiji) and ROCm 2.0.

from rocprim.

ntrost57 avatar ntrost57 commented on June 7, 2024

@ex-rzr I still observe same Memory access fault using develop_stream branch on Fiji.

from rocprim.

ex-rzr avatar ex-rzr commented on June 7, 2024

Which tests and what is your version of HCC?

from rocprim.

ntrost57 avatar ntrost57 commented on June 7, 2024

I was linking to the wrong library (been testing with external tests). All tests are passing now for me too. Thanks!

from rocprim.

ex-rzr avatar ex-rzr commented on June 7, 2024

@ntrost57, thank you!
I've opened an issue (ROCm/hcc#1024), I hope that my findings will help the compiler team.

from rocprim.

b-sumner avatar b-sumner commented on June 7, 2024

@ex-rzr would you mind describing your workaround? What did you need to change?

from rocprim.

ex-rzr avatar ex-rzr commented on June 7, 2024

master has been updated.
I'm closing the issue, but please report as soon as possible if you notice something strange again (the workaround is not a bullet-proof fix).

from rocprim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.