Code Monkey home page Code Monkey logo

gpuopen-tools / radeon_compute_profiler Goto Github PK

View Code? Open in Web Editor NEW
82.0 15.0 19.0 1.07 MB

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.

License: MIT License

Python 0.29% Makefile 0.52% Shell 0.41% C++ 96.11% C 2.60% Batchfile 0.06%
opencl rocm profiler

radeon_compute_profiler's Issues

Kernel and data transfer events not recorded in OpenCL applications

I was using the earlier version of RCP that is shipped with ROCM 1.6.Back to that time, the profiling result contains the kernel launching information. If I import the atp file into CodeXL, the timeline looks like this.

image

Recently, when I try the same commands again, I can only get a timeline like this.

image

The kernel launching information are not recorded anymore. Am I making some mistakes when I run the profiler? The command that I used is

rocm-profiler -w . -A --hsaaqlpackettrace [my_application and its arguments]

No perfcounters in the generated csv

Hi,

I use the legacy opencl driver.
As CodeXL 2.5 features an old version of rcprof that hangs with the latest legacy opencl driver, I've followed the instructions here to compile the latest version of rcprof.

This worked out nicely, except that the debug info doesn't include any perfcounters.
My card is an RX480.

rcprof -l gives me:
`The list of valid counters for Graphics IP v6 based graphics cards:
Wavefronts, VALUInsts, SALUInsts, VFetchInsts, SFetchInsts,
VWriteInsts, LDSInsts, GDSInsts, VALUUtilization, VALUBusy,
SALUBusy, FetchSize, WriteSize, CacheHit, MemUnitBusy,
MemUnitStalled, WriteUnitStalled, LDSBankConflict

The list of valid counters for Graphics IP v7 based graphics cards:
Wavefronts, VALUInsts, SALUInsts, VFetchInsts, SFetchInsts,
VWriteInsts, FlatVMemInsts, LDSInsts, FlatLDSInsts, GDSInsts,
VALUUtilization, VALUBusy, SALUBusy, FetchSize, WriteSize,
CacheHit, MemUnitBusy, MemUnitStalled, WriteUnitStalled, LDSBankConflict

The list of valid counters for Graphics IP v8 based graphics cards:
Wavefronts, VALUInsts, SALUInsts, VFetchInsts, SFetchInsts,
VWriteInsts, FlatVMemInsts, LDSInsts, FlatLDSInsts, GDSInsts,
VALUUtilization, VALUBusy, SALUBusy, FetchSize, WriteSize,
CacheHit, MemUnitBusy, MemUnitStalled, WriteUnitStalled, LDSBankConflict

The list of valid counters for Vega based graphics cards:
Wavefronts, VALUInsts, SALUInsts, VFetchInsts, SFetchInsts,
VWriteInsts, FlatVMemInsts, LDSInsts, FlatLDSInsts, GDSInsts,
VALUUtilization, VALUBusy, SALUBusy, FetchSize, WriteSize,
L1CacheHit, L2CacheHit, MemUnitBusy, MemUnitStalled, WriteUnitStalled,
LDSBankConflict`

However rcprof --listactive doesn't return anything.

If found out I could fix CodeXL 2.5 by just replacing libRCPCLProfileAgent, but then the perf counters are missing I well. I guess this indicates the issue is related to this library.

Incompatible git command

Hi,

when trying to follow the installation instructions for the master branch on Ubuntu 16.04 I encountered the following error:

$ python Scripts/UpdateCommon.py 
Cloning common-lib-amd-ADL into /home/jstephan/build/Common/Lib/AMD/ADL
error: unknown option `no-tags'
usage: git clone [<options>] [--] <repo> [<dir>]

    -v, --verbose         be more verbose
    -q, --quiet           be more quiet
    --progress            force progress reporting
    -n, --no-checkout     don't create a checkout
    --bare                create a bare repository
    --mirror              create a mirror repository (implies bare)
    -l, --local           to clone from a local repository
    --no-hardlinks        don't use local hardlinks, always copy
    -s, --shared          setup as shared repository
    --recursive           initialize submodules in the clone
    --recurse-submodules  initialize submodules in the clone
    --template <template-directory>
                          directory from which templates will be used
    --reference <repo>    reference repository
    --dissociate          use --reference only while cloning
    -o, --origin <name>   use <name> instead of 'origin' to track upstream
    -b, --branch <branch>
                          checkout <branch> instead of the remote's HEAD
    -u, --upload-pack <path>
                          path to git-upload-pack on the remote
    --depth <depth>       create a shallow clone of that depth
    --single-branch       clone only one branch, HEAD or --branch
    --separate-git-dir <gitdir>
                          separate git dir from working tree
    -c, --config <key=value>
                          set config inside the new repository

'git clone' failed with return code 129

Using python2 or python3 doesn't make a difference as this is a git error. I'm using the default git version in the Ubuntu repository:

$ git --version
git version 2.7.4

Removing the "no-tags" flag from the script worked for me but I don't know if this will cause issues at some point.

Remove RCP by script

Hi,
I had installed RCP using build_rcp.sh several weeks ago on my Ubuntu 16.04 pc
Now, I want to reinstall it.
Before that, I have to uninstall RCP.
I've noticed that there is no uninstall shell script in RCP repo

My Question is

Is there any proper way to uninstall RCP?
Or I only can remove it manually?
But how?
Thank you for your time

Kernel execution serialization

Is there an option for making all the kernels execute sequentially (especially when work is launched in multiple queues)? Coming from CUDA and nvprof, I was surprised to not find such a feature for the better kernel performance understanding.

Out of resources when collecting performance counters

So I'm trying to collect performance counters on some hip kernels that are called by a python script. The GPU is an R9 Fury Nano.
I've been using the terminal command
rocm-profiler -o "counters.csv" --counterfile counters.txt -C -w . /usr/bin/python3 <python_app>

The file counters.txt only has a single counter name in the first line, SALUBusy.

However, when I try to run it, I always get the error
### HCC STATUS_CHECK Error: HSA_STATUS_ERROR_OUT_OF_RESOURCES (0x1008) at file:mcwamp_hsa.cpp line:1185

I tried with other programs and it seems to be consistent. I assume that means the GPU doesn't have enough storage to store all the counter data? Or is it something else?

Does this mean I can't do it this way or is there a better way to go about collecting counter values in this case? I would appreciate if somebody could point me in the right direction. Thanks in advance.

build failed according to instuction

Build Instructions
  • cd into the Build\Linux directory
  • Execute ./build_rcp.sh
    • By default this performs a from-scratch build of the release version of RCP.

root@gg-desktop:/ROCm/RCP/Build/Linux# git checkout m/roc-3.1.0
M Build/Linux/build_rcp.sh
HEAD is now at 3a49405 Add clarification on the units used in the hsa kernel timestamp sectrion
root@gg-desktop:
/ROCm/RCP/Build/Linux# ./build_rcp.sh
Command line arguments passed to build_rcp.sh:
RCPROOT=/root/ROCm/RCP/Build/Linux/../..
=====Building Radeon Compute Profiler======

Building infra projects

Build arguments passed to scons:

----------- Start building ---------------
Sun Apr 5 23:51:32 PDT 2020

scons -C /root/ROCm/RCP/Build/Linux/../../Build/Linux CXL_prefix=/root/ROCm/RCP/Build/Linux/../../Output CXL_build_type=static
*** ERROR during the build of the 64 bit framework ***

----------- End building -----------------
Sun Apr 5 23:51:32 PDT 2020

*** ERROR ***
*** the build failed - see the logs for details ***
root@gg-desktop:~/ROCm/RCP/Build/Linux#

Failed to generate profile result

I'm trying to profile an app and RCP shows "Failed to generate profile result /tmp/Session2.csv."
I use it with:
rcprof-d -o "/tmp/Session2.csv" -w "/home/alex/CPP_code/testrun/" -p -c "/home/alex/CPP_code/testrun/counters_OpenCL_gfx900_26751.txt" [myapp]
strace shows this:

"[{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], 0, NULL) = 31785
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=31785, si_uid=1000, si_status=SIGSEGV, si_utime=84, si_stime=15} ---
readlink("/proc/self/exe", "/usr/bin/rcprof-d", 4095) = 17
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fee195e9e50) = 31807
wait4(31807, NULL, 0, NULL)             = 31807
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31807, si_uid=1000, si_status=255, si_utime=0, si_stime=0} ---
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fee19622e00}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fee19622e00}, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fee19622e00}, 8) = 0
openat(AT_FDCWD, "/tmp/Session2.csv", O_RDONLY) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "Failed to generate profile resul"..., 53Failed to generate profile result /tmp/Session2.csv.
) = 53
unlink("/home/alex/.rcpdata")           = 0
unlink("/home/alex/.rcpdata.amdtperfmarker") = 0
exit_group(11)                          = ?
+++ exited with 11 +++
"

The GPU is Vega 64, kernel version 4.20, amdgpu (mesa) 18.3

build problem

Hello,
After I get a satisfactory build I do not find executables.
Process:
git clone https://github.com/GPUOpen-Tools/RCP.git
cd RCP
python Scripts/UpdateCommon.py
cd Build/Linux/
sh ./build_rcp.sh skip-32bitbuild skip-hsaprofiler

NOTE:
hsa 64bit fails if I do not skip it
I try to install ROCm but it seem to interfere with my AMDGPU driver installed 18.30.

With the process above I get the return:
----------- End building -----------------
Fr 31. Aug 21:10:06 CEST 2018

*** SUCCESS ***
Build Common, 64-bit...
Build DeviceInfo, 64-bit...
Build CLCommon, 64-bit...
Build CLProfileAgent, 64-bit...
Build CLTraceAgent, 64-bit...
Build CLOccupancyAgent, 64-bit...
Build ProfileDataParser, 64-bit...
Build sprofile, 64-bit...
Build PreloadXInitThreads, 64-bit...


Content of the folder RCP/Output/bin
total 94296
drwxr-xr-x 2 myuser myuser 4096 Aug 31 21:10 ./
drwxr-xr-x 6 myuser myuser 4096 Aug 31 21:10 ../
-rwxr-xr-x 1 myuser myuser 12457628 Aug 31 21:10 libGPUPerfAPICL32.so*
-rwxr-xr-x 1 myuser myuser 23332448 Aug 31 21:10 libGPUPerfAPICL.so*
-rwxr-xr-x 1 myuser myuser 12097328 Aug 31 21:10 libGPUPerfAPICounters32.so*
-rwxr-xr-x 1 myuser myuser 22962744 Aug 31 21:10 libGPUPerfAPICounters.so*
-rwxr-xr-x 1 myuser myuser 4151928 Aug 31 21:10 libRCPCLOccupancyAgent.so*
-rwxr-xr-x 1 myuser myuser 4463464 Aug 31 21:10 libRCPCLProfileAgent.so*
-rwxr-xr-x 1 myuser myuser 5170752 Aug 31 21:10 libRCPCLTraceAgent.so*
-rwxr-xr-x 1 myuser myuser 7584 Aug 31 21:10 libRCPPreloadXInitThreads.so*
-rwxr-xr-x 1 myuser myuser 4747600 Aug 31 21:10 libRCPProfileDataParser.so*
-rwxr-xr-x 1 myuser myuser 7022744 Aug 31 21:10 rcprof*
-rwxr-xr-x 1 myuser myuser 24088 Aug 31 21:10 VkStableClocks*

The log is in an attached file.
Thank you for your help.

RCP_Build_problem.log

[Build failure] [HSAFdnTrace] Missing header and wrong definition

Hello AMD!
Here to inform you that the HSAFdnTrace cannot build in its current state.

These are the issues:

  1. AutoGenerated/HSATraceStringOutput.cpp error: ‘GPU_MEMORY_FAULT_EVENT’ was not declared in this scope; did you mean ‘HSA_AMD_GPU_MEMORY_FAULT_EVENT’?
  2. HSAFdnAPIInfoManager.h includes hsa_ext_profiler.h which is unavailable in the current OSS GIT repositories.

Solved as following:

  1. The compiler was right, the definition was missing the HSA_AMD_ prefix: this autogenerated code looks like referencing an old version of the header that contains this definition (hsa_ext_amd.h). After checking both old and new headers, it's the very same enumeration, so, manually adding said prefix, it becomes happy.
  2. The hsa_ext_profiler.h header was pulled from hsa-rocr-dev_1.1.9-74-g4eea4a1_amd64.deb, pushed where the build expects it and it did build fine... though, it looks like the only thing that the HSAFdnAPI misses here is hsa_profiler_kernel_time_t.

Thanks for the wonderful OSS stack!
Angelo

SALUBusy Incorrect?

I'm profiling clsparse library in ubuntu 16.04 with rx vega 64.
The data generated from rcprof shows SALUBusy values over 100.
From the documentation, SALUBusy values should be in 0 to 100.
How is that possible?
What is the equation to generate SALUBusy?
Is that a bug of rcprof?

Building RCP-5.6 - HSAFdnCommon misses "amd_hsa_tools_interfaces.h"

I built RCP 5.6 against ROCm 2.6, which results in the missing header file "amd_hsa_tools_interfaces.h" while building the module "HSAFdnCommon".

Using the current state of the git repository does not result in a failure.

Can you point me to the change which resolves this problem?

Doesn't find amd_comgr.h automatically

user@debian:~/RCP/Build/Linux$ dpkg -l | grep comgr
ii  comgr                                                       1.6.0.143-rocm-rel-3.5-30-e24e8c1      amd64        Library to provide support functions
user@debian:~/RCP/Build/Linux$ dpkg -L comgr
/opt
/opt/rocm-3.5.0
/opt/rocm-3.5.0/include
/opt/rocm-3.5.0/include/amd_comgr.h
/opt/rocm-3.5.0/include/opencl1.2-c.pch
/opt/rocm-3.5.0/include/opencl2.0-c.pch
/opt/rocm-3.5.0/lib
/opt/rocm-3.5.0/lib/cmake
/opt/rocm-3.5.0/lib/cmake/amd_comgr
/opt/rocm-3.5.0/lib/cmake/amd_comgr/amd_comgr-config-version.cmake
/opt/rocm-3.5.0/lib/cmake/amd_comgr/amd_comgr-config.cmake
/opt/rocm-3.5.0/lib/cmake/amd_comgr/amd_comgr-targets-release.cmake
/opt/rocm-3.5.0/lib/cmake/amd_comgr/amd_comgr-targets.cmake
/opt/rocm-3.5.0/lib/libamd_comgr.so.1.6.30500
/opt/rocm-3.5.0/share
/opt/rocm-3.5.0/share/amd_comgr
/opt/rocm-3.5.0/share/amd_comgr/LICENSE.txt
/opt/rocm-3.5.0/share/amd_comgr/NOTICES.txt
/opt/rocm-3.5.0/share/amd_comgr/README.md
/opt/rocm-3.5.0/lib/libamd_comgr.so
/opt/rocm-3.5.0/lib/libamd_comgr.so.1

...

g++ -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wno-unknown-pragmas -Wno-strict-aliasing -Wno-non-virtual-dtor -Wno-conversion-null -Wno-ignored-attributes -Werror -msse   -I. -I../../Src/Common -I../../Src/CLCommon -I../../../Common/Src -I../../../Common/Lib/AMD/GPUPerfAPI/3_3/Include -I../../../Common/Lib/AMD/APPSDK/3-0/include -I/include -I../../../Common/Src/DynamicLibraryModule -I../../../Common/Src/DeviceInfo -I../../../Common/Src/TSingleton -isystem../../../Common/Lib/Ext/Boost/boost_1_59_0 -I../../../Common/Lib/Ext/utf8cpp/source -I../../../Common/Lib/AMD/ACL/TOT/include -I../../../Common/Src/ComgrUtils/Src -I../../../Common/Lib/AMD/ADL/include -I../../../Common/Src/ADLUtil -I../../../Common/Src/ACLModuleManager -I/opt/rocm/hsa/include -I/opt/rocm/hsa/include/hsa -I/opt/rocm/hsa/include/comgr/ -I/opt/rocm/hsa/../include  -D_LINUX -DUSE_POINTER_SINGLETON -DCOMGR_DYNAMIC_LINKING -DCL_USE_DEPRECATED_OPENCL_1_0_APIS -DCL_USE_DEPRECATED_OPENCL_1_1_APIS -DCL_USE_DEPRECATED_OPENCL_1_2_APIS -DCL_USE_DEPRECATED_OPENCL_2_0_APIS -DAMDT_PUBLIC -DAMDT_BUILD_SUFFIX=\"\" -DAMDT_PLATFORM_SUFFIX=\"\" -DAMDT_DEBUG_SUFFIX=\"\"   -c ../../../Common/Src/ComgrUtils/Src/ComgrUtils.cpp -o ../../Output/obj/release/x64/CLProfileAgent/ComgrUtils.o
In file included from ../../../Common/Src/ComgrUtils/Src/ComgrUtils.cpp:7:
../../../Common/Src/ComgrUtils/Src/ComgrUtils.h:23:10: fatal error: amd_comgr.h: No such file or directory
   23 | #include "amd_comgr.h"
      |          ^~~~~~~~~~~~~
compilation terminated.

Build error

I tried a build an it fails as below:

/home/preda/Common/Src/AMDTOSWrappers/src/common/osCpuid.cpp: In constructor 'osCpuid::osCpuid()':
/home/preda/Common/Src/AMDTOSWrappers/src/common/osCpuid.cpp:53:20: error: '((void)& info +12)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
osCpuidParam_t info[MAX_CPUID_REGS];
^~~~
/home/preda/Common/Src/AMDTOSWrappers/src/common/osCpuid.cpp:53:20: error: '((void)& info +8)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
/home/preda/Common/Src/AMDTOSWrappers/src/common/osCpuid.cpp:53:20: error: '((void)& info +4)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
/home/preda/Common/Src/AMDTOSWrappers/src/common/osCpuid.cpp:70:5: error: 'info' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (info[EAX] > 0)
^~
cc1plus: all warnings being treated as errors

Build failure in AMDTActivityLoggerProfileControl

I'm building with gcc 9.1.0 at Ubuntu. All ROCm present.
The paths were shortened. This is from RCP_Build.log:

g++ -o ./rcp/src/Common/Src/AMDTActivityLogger/AMDTActivityLogger.os -c -fPIC -Wall -Werror -Wextra -g -fmessage-length=0 -Wno-unknown-pragmas -pthread -std=c++11 -D_LINUX -Wno-expansion-to-defined -Wno-ignored-attributes -Wno-implicit-fallthrough -O3 -DNDEBUG -fvisibility=hidden -Wno-maybe-uninitialized -DAMDT_PUBLIC -std=c++11 -fno-strict-aliasing -D_LINUX -DAMDT_BUILD_SUFFIX= -DAMDT_DEBUG_SUFFIX= -I./rcp/src/Common/Src/AMDTActivityLogger -I./rcp/src/Common/Src -I./rcp/src/Common/Src/TSingleton -I./rcp/src/Common/Lib/Ext/utf8cpp/source -I./rcp/src/Common/Src ./rcp/src/Common/Src/AMDTActivityLogger/AMDTActivityLogger.cpp
g++ -o ./rcp/src/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.os -c -fPIC -Wall -Werror -Wextra -g -fmessage-length=0 -Wno-unknown-pragmas -pthread -std=c++11 -D_LINUX -Wno-expansion-to-defined -Wno-ignored-attributes -Wno-implicit-fallthrough -O3 -DNDEBUG -fvisibility=hidden -Wno-maybe-uninitialized -DAMDT_PUBLIC -std=c++11 -fno-strict-aliasing -D_LINUX -DAMDT_BUILD_SUFFIX= -DAMDT_DEBUG_SUFFIX= -I./rcp/src/Common/Src/AMDTActivityLogger -I./rcp/src/Common/Src -I./rcp/src/Common/Src/TSingleton -I./rcp/src/Common/Lib/Ext/utf8cpp/source -I./rcp/src/Common/Src ./rcp/src/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.cpp
./rcp/src/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.cpp: In member function 'bool AMDTActivityLoggerProfileControl::CallProfileControlEntryPointFromLibraryWithMode(void*&, const wchar_t*, void (*&)(amdtProfilingControlMode), const char*, amdtProfilingControlMode)':
./rcp/src/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.cpp:176:98: error: cast between incompatible function types from 'osProcedureAddress' {aka 'int (*)()'} to 'ProfilingControlProcWithMode' {aka 'void (*)(amdtProfilingControlMode)'} [-Werror=cast-function-type]
  176 |                 profilingControlProc = reinterpret_cast<ProfilingControlProcWithMode>(procAddress);
      |                                                                                                  ^
cc1plus: all warnings being treated as errors
scons: *** [./rcp/src/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.os] Error 1
scons: building terminated because of errors.
*** ERROR during the build of the 64 bit framework ***

dump ISA

How can I dump the GPU ISA code with RCP?
I read that this is a feature but I can not find any command line option to select ISA dumps.

scons not listed as a dependency, but is required.

Build/Linux/build_rcp.sh invokes scons, but nowhere is it listed as required. If you start with a clean install of Ubuntu, you will get confusing error that lists "Build arguments passed to scons" but obviously they weren't, if you don't have scons.

Command line arguments passed to build_rcp.sh:
RCPROOT=/home/crews/RCP/Build/Linux/../..
=====Building Radeon Compute Profiler======

Building infra projects

Build arguments passed to scons:

----------- Start building ---------------
Sun Feb 4 10:44:00 EST 2018

scons -C /home/crews/RCP/Build/Linux/../../Build/Linux CXL_prefix=/home/crews/RCP/Build/Linux/../../Output CXL_build_type=static
*** ERROR during the build of the 64 bit framework ***

(scons -C /home/crews/RCP/Build/Linux/../../Build/Linux CXL_prefix=/home/crews/RCP/Build/Linux/../../Output CXL_arch=x86 CXL_build_type=static )
*** ERROR during the build of the 32 bit framework ***

----------- End building -----------------
Sun Feb 4 10:44:00 EST 2018

*** ERROR ***
*** the build failed - see the logs for details ***

sudo apt-get install scons fixed issue. Should be listed as a prerequisite along with python

[RFE] RCP manual profiler triggers / tracing instrumentation

Is there a public API (and docs) on how to do manual instrumentation of kernels (e.g. for simple counters) Secondly, tracing instrumentation in host-code e.g. for markers in CPU-GPU tracing especially when there will be a GUI to work with?

More generally, a full, detailed documentation of RCP seems to be lacking -- this would ideally include all the above information.

Unable to profile some programs on linux (XXX is not a valid application)

I'd like to profile some python application with rcprof on linux. The python application uses pyopencl.

I don't know exactly how rcprof checks if it can execute the input program, but apparently what it does it too restrictive:
/opt/AMDAPP/CodeXL/rcprof -o "/tmp/test.csv" -p -w "/tmp/" /usr/bin/python2.7 test.py
/usr/bin/python2.7 is not a valid application

Similarly if I make test.py executable with chmod +x and add the correct prefix in the file,
/opt/AMDAPP/CodeXL/rcprof -o "/tmp/test.csv" -p -w "/tmp/" test.py
fails similarly.

On linux, it should be sufficient to just check for the execution bit.

Some other programs are successfully profiled with rcprof, thus it's not a faulty installation.

Navi10 support on launch day?

Hi,
just asking if we can expect an RCP release on Navi launch day (7 july) or shortly after with support for 5700XT..
if not,some ETA?
thanks..

Build Instructions - Linux - arguments to build_rcp.sh

I need to change the path to HSA. According to the build instructions for Linux it should be done with:

  • build_rcp.sh hsadir=/path/to/hsa

A few line below there is the following command:

  • make Dbg HSA_DIR=/home/user/hsa_dir

But nothing seem to work for me.
What I am doing wrong?

Currently I changed the path in "Build/Linux/Common.mk". But I would like to use the argument to build_rcp.sh.

Build failure in AMDTActivityLoggerProfileControl.cpp

...
g++ -o /home/user/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.os -c -fPIC -Wall -Werror -Wextra -g -fmessage-length=0 -Wno-unknown-pragmas -pthread -std=c++11 -D_LINUX -Wno-expansion-to-defined -Wno-ignored-attributes -Wno-implicit-fallthrough -O3 -DNDEBUG -fvisibility=hidden -Wno-maybe-uninitialized -DAMDT_PUBLIC -std=c++11 -fno-strict-aliasing -D_LINUX -DAMDT_BUILD_SUFFIX= -DAMDT_DEBUG_SUFFIX= -I/home/user/Common/Src/AMDTActivityLogger -I/home/user/Common/Src -I/home/user/Common/Src/TSingleton -I/home/user/Common/Lib/Ext/utf8cpp/source -I/home/user/Common/Src /home/user/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.cpp
/home/user/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.cpp: In member function 'bool AMDTActivityLoggerProfileControl::CallProfileControlEntryPointFromLibraryWithMode(void*&, const wchar_t*, void (*&)(amdtProfilingControlMode), const char*, amdtProfilingControlMode)':
/home/user/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.cpp:176:98: error: cast between incompatible function types from 'osProcedureAddress' {aka 'int (*)()'} to 'ProfilingControlProcWithMode' {aka 'void (*)(amdtProfilingControlMode)'} [-Werror=cast-function-type]
  176 |                 profilingControlProc = reinterpret_cast<ProfilingControlProcWithMode>(procAddress);
      |                                                                                                  ^
cc1plus: all warnings being treated as errors
scons: *** [/home/user/Common/Src/AMDTActivityLogger/AMDTActivityLoggerProfileControl.os] Error 1
scons: building terminated because of errors.

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 9.3.0-13' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-F9gimE/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
gcc version 9.3.0 (Debian 9.3.0-13) 

Missing source files

I can not build rcp from source (on NixOS using gcc 7.3.0) as files seem to be missing. First, amd_hsa_tools_interfaces.h is not in any of the repositories the build fetches. I was able to find it here, though that repository is marked as obsolete. After patching things so the build can find that file, I am missing hsa_ext_debugger.h which I have not been able to find anywhere. There may be more missing files, but this is as far as the build goes.

To get this far, I needed to disable -Werror as mentioned in another issue, and add an #include <vector> line to Src/CLCommon/CLDeviceReplacer.cpp. I've also needed to address hard coded paths, but I think there are many that I have yet to fix, so I won't enumerate them here.

Profiling a Python script

Hi, I have a python script that calls HSA kernels, and I want to profile this script.

Ideally, I want to do something like:
$ rocm-profiler -A -w . python style_transfer.py

But I am getting this error message at the moment:
Process failed to run. Make sure you have specified the correct path.

Is there a way to do what I want?

It seems CodeXL can profile python scripts, by specifying the executable name as "python" and setting the script name to the command line argument.

"not a valid application" message from rocm-profiler

I am trying to profile HSA applications on a discrete GPU node.

Rocm 1.6; Radeon Compute Profiler (rcprof) 64 bits - V5.1.6400.

I copied the vector copy example in rocm/hsa. Built and ran the code. It ran fine. When I try the profiler, I get a "not a valid application" message.

(t1)%rocm-profiler -C ./vector_copy /home/apan/code/dlbench/opencl/vcopy/sample/./vector_copy is not a valid application

I am getting similar errors with other applications as well. Probably some flag I need to pass in the build process?

Build RCP for Gentoo Linux using GCC 9.1 - problems and solutions

Hi,
I built RCP for Gentoo Linux using GCC 9.1.
I had to change some things, you can find the patches here -> https://github.com/justxi/rocm/tree/master/dev-util/rcp

Besides some parenthesis for python scripts and path adjustments, there were two problems where I had to change the source code:

Missing basic counters from full list of performance counters

I'm running the current version of RCP (5.6) on a Radeon VII. When I ask for the list of available performance counters its incomplete, it only gives derived counts. Basic counts are nowhere to be found although clearly they're needed for the derived counts. However when I ask rocprofiler (also current version), which I understand is what RCP is based on, for a list of metrics they're all there.

rcprof -l
OpenCL performance counters:
The list of valid counters for Graphics IP v6 based graphics cards:
Wavefronts, VALUInsts, SALUInsts, VFetchInsts, SFetchInsts,
VWriteInsts, LDSInsts, GDSInsts, VALUUtilization, VALUBusy,
SALUBusy, FetchSize, WriteSize, CacheHit, MemUnitBusy,
MemUnitStalled, WriteUnitStalled, LDSBankConflict

...

HSA performance counters:
The list of valid counters for Graphics IP v8 based graphics cards:
Wavefronts, VALUInsts, SALUInsts, VFetchInsts, SFetchInsts,
VWriteInsts, FlatVMemInsts, LDSInsts, FlatLDSInsts, GDSInsts,
VALUUtilization, VALUBusy, SALUBusy, FetchSize, WriteSize,
CacheHit, MemUnitBusy, MemUnitStalled, WriteUnitStalled, LDSBankConflict


The list of valid counters for Vega based graphics cards:
Wavefronts, VALUInsts, SALUInsts, VFetchInsts, SFetchInsts,
VWriteInsts, FlatVMemInsts, LDSInsts, FlatLDSInsts, GDSInsts,
VALUUtilization, VALUBusy, SALUBusy, FetchSize, WriteSize,
L2CacheHit, MemUnitBusy, MemUnitStalled, WriteUnitStalled, LDSBankConflict
rpl_run.sh --list-basic
RPL: on '190801_110408' from '/home/ddpruitt/rocm' in '/home/ddpruitt/HIP/samples/0_Intro/square'
ROCProfiler: rc-file '/home/ddpruitt/rpl_rc.xml'
Basic HW counters:

  gpu-agent0 : GRBM_COUNT : Tie High - Count Number of Clocks
      block GRBM has 2 counters

  gpu-agent0 : GRBM_GUI_ACTIVE : The GUI is Active
      block GRBM has 2 counters

  gpu-agent0 : SQ_WAVES : Count number of waves sent to SQs. (per-simd, emulated, global)
      block SQ has 8 counters

  gpu-agent0 : SQ_INSTS_VALU : Number of VALU instructions issued. (per-simd, emulated)
      block SQ has 8 counters

  gpu-agent0 : SQ_INSTS_VMEM_WR : Number of VMEM write instructions issued (including FLAT). (per-simd, emulated)
      block SQ has 8 counters

  gpu-agent0 : SQ_INSTS_VMEM_RD : Number of VMEM read instructions issued (including FLAT). (per-simd, emulated)
      block SQ has 8 counters

  gpu-agent0 : SQ_INSTS_SALU : Number of SALU instructions issued. (per-simd, emulated)
      block SQ has 8 counters

  gpu-agent0 : SQ_INSTS_SMEM : Number of SMEM instructions issued. (per-simd, emulated)
      block SQ has 8 counters

  gpu-agent0 : SQ_INSTS_FLAT : Number of FLAT instructions issued. (per-simd, emulated)
      block SQ has 8 counters

...
rpl_run.sh --list-derived
RPL: on '190801_110411' from '/home/ddpruitt/rocm' in '/home/ddpruitt/HIP/samples/0_Intro/square'
ROCProfiler: rc-file '/home/ddpruitt/rpl_rc.xml'
Derived metrics:

  gpu-agent0 : TA_BUSY_avr : TA block is busy. Average over TA instances.
      TA_BUSY_avr = avr(TA_TA_BUSY,16)

  gpu-agent0 : TA_BUSY_max : TA block is busy. Max over TA instances.
      TA_BUSY_max = max(TA_TA_BUSY,16)

  gpu-agent0 : TA_BUSY_min : TA block is busy. Min over TA instances.
      TA_BUSY_min = min(TA_TA_BUSY,16)

  gpu-agent0 : TA_FLAT_READ_WAVEFRONTS_sum : Number of flat opcode reads processed by the TA. Sum over TA instances.
      TA_FLAT_READ_WAVEFRONTS_sum = sum(TA_FLAT_READ_WAVEFRONTS,16)

  gpu-agent0 : TA_FLAT_WRITE_WAVEFRONTS_sum : Number of flat opcode writes processed by the TA. Sum over TA instances.
      TA_FLAT_WRITE_WAVEFRONTS_sum = sum(TA_FLAT_WRITE_WAVEFRONTS,16)

  gpu-agent0 : TCC_HIT_sum : Number of cache hits. Sum over TCC instances.
      TCC_HIT_sum = sum(TCC_HIT,16)

  gpu-agent0 : TCC_MISS_sum : Number of cache misses. Sum over TCC instances.
      TCC_MISS_sum = sum(TCC_MISS,16)

  gpu-agent0 : TCC_EA_RDREQ_32B_sum : Number of 32-byte TCC/EA read requests. Sum over TCC instances.
      TCC_EA_RDREQ_32B_sum = sum(TCC_EA_RDREQ_32B,16)

  gpu-agent0 : TCC_EA_RDREQ_sum : Number of TCC/EA read requests (either 32-byte or 64-byte). Sum over TCC instances.
      TCC_EA_RDREQ_sum = sum(TCC_EA_RDREQ,16)

  gpu-agent0 : TCC_EA_WRREQ_sum : Number of transactions (either 32-byte or 64-byte) going over the TC_EA_wrreq interface. Sum over TCC instances.
      TCC_EA_WRREQ_sum = sum(TCC_EA_WRREQ,16)

/opt/rocm/include/hip/hip_profile.h --> include <CXLActivityLogger.h>

I am a bit lost. HIP seems to want the CXLActivityLogger file, but if I understand correctly, this is no longer part of the RCP.

Do I need to get CodeXL? Is this an error in the hip_profile header?

If it helps, the error occurs when I try to compile hipCaffe:

CXX src/caffe/common.cpp
In file included from src/caffe/common.cpp:7:
In file included from ./include/caffe/common.hpp:19:
In file included from ./include/caffe/util/device_alternate.hpp:39:
/opt/rocm/include/hip/hip_profile.h:31:10: fatal error: 'CXLActivityLogger.h' file not found
#include <CXLActivityLogger.h>
         ^~~~~~~~~~~~~~~~~~~~~
1 error generated.
Died at /opt/rocm/bin/hipcc line 452.
Makefile:624: recipe for target '.build_release/src/caffe/common.o' failed
make: *** [.build_release/src/caffe/common.o] Error 1

Code from hip_profile.h below:

#ifndef HIP_INCLUDE_HIP_HIP_PROFILE_H
#define HIP_INCLUDE_HIP_HIP_PROFILE_H

#if not defined (ENABLE_HIP_PROFILE)
#define ENABLE_HIP_PROFILE 1
#endif

#if defined(__HIP_PLATFORM_HCC__) and (ENABLE_HIP_PROFILE==1)
#include <CXLActivityLogger.h>
#define HIP_SCOPED_MARKER(markerName, group) amdtScopedMarker __scopedMarker(markerName, group, nullptr);
#define HIP_BEGIN_MARKER(markerName, group) amdtBeginMarker(markerName, group, nullptr);
#define HIP_END_MARKER() amdtEndMarker();
#else
#define HIP_SCOPED_MARKER(markerName, group)
#define HIP_BEGIN_MARKER(markerName, group)
#define HIP_END_MARKER()
#endif

#endif

Build failure in scons

already install scons,but it not work

rking@rking-Ubuntu:~/RCP/Build/Linux$ ./build_rcp.sh
Command line arguments passed to build_rcp.sh:
RCPROOT=/home/rking/RCP/Build/Linux/../..
=====Building Radeon Compute Profiler======

Building infra projects

Build arguments passed to scons:

----------- Start building ---------------
2020年 03月 09日 星期一 05:00:38 CST

scons -C /home/rking/RCP/Build/Linux/../../Build/Linux CXL_prefix=/home/rking/RCP/Build/Linux/../../Output CXL_build_type=static
*** ERROR during the build of the 64 bit framework ***

----------- End building -----------------
2020年 03月 09日 星期一 05:00:38 CST

*** ERROR ***
*** the build failed - see the logs for details ***
rking@rking-Ubuntu:~/RCP/Build/Linux$ scons --version
SCons by Steven Knight et al.:
script: v3.0.1.74b2c53bc42290e911b334a6b44f187da698a668, 2017/11/14 13:16:53, by bdbaddog on hpmicrodog
engine: v3.0.1.74b2c53bc42290e911b334a6b44f187da698a668, 2017/11/14 13:16:53, by bdbaddog on hpmicrodog
engine path: ['/usr/lib/scons/SCons']
Copyright (c) 2001 - 2017 The SCons Foundation

kernel output with all fails to generate the output profile

I don't know if this is the expected behavior, but in Windows, when running -k all, it fails: Failed to generate profile result.

I need to write -k cl or leave that option empty to be able to generate the profile data.

Since it fails, CodeXL has also this bug because it executes rcprof-x64.exe with that option forced.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.