how to enable ChromePerformanceTimingInStages in L0

Profiling Tools Interfaces for GPU (PTI for GPU)

Overview

This repository describes the ways of collecting performance data for Intel(R) Processor Graphics and provides a set of samples that help to start.

License

Samples for Profiling Tools Interfaces for GPU (PTI for GPU) are distributed under the MIT License.

You may obtain a copy of the License at https://opensource.org/licenses/MIT

Supported OS

Linux

Windows support is under development

Supported Platforms

Intel(R) Processor Graphics Gen9 (formerly Skylake) and newer
Intel® Iris® Xe Graphics
Intel® Data Center GPU Flex Series
Intel® Data Center GPU Max Series

Some samples may have higher hardware requirements

Regularly Tested Configurations

Ubuntu 20.04 with Intel(R) Iris(R) Plus Graphics 655

Profiling Chapters

Runtime API Tracing
- for OpenCL(TM)
- for oneAPI Level Zero (Level Zero)
- for OpenMP*
Device Activity Tracing
- for OpenCL(TM)
- for oneAPI Level Zero (Level Zero)
- for SYCL/DPC++
Binary/Source Correlation
- for OpenCL(TM)
- for oneAPI Level Zero (Level Zero)
Metrics Collection
- based on oneAPI Level Zero (Level Zero) Metric API
- based on Intel(R) Metrics Discovery Application Programming Interface
- based on Performance Monitoring (PM) Register
Binary Instrumentation
- based on Graphics Technology Pin (GT Pin)
- based on OpenCL(TM) built-in intrinsics
Code Annotation
- based on Instrumentation and Tracing Technology API (ITT API)
System Management
- for oneAPI Level Zero (Level Zero)

Profiling & Debug Tools

unitrace - unified tracing and profiling tool. In addition to Level Zero and/or OpenCL, this tool is capable of profiling software layers in the software stack, for example, SYCL and plugins, oneCCL, MPI etc., for scale-up and scale-out applications. It also supports profiling hardware metrics (including instruction-level EU stalls) and software events at the same time.
onetrace - host and device tracing tool for OpenCL(TM) and Level Zero backends with support of DPC++ (both for CPU and GPU) and OpenMP* GPU offload;
oneprof - GPU HW metrics collection tool for OpenCL(TM) and Level Zero backends with support of DPC++ and OpenMP* GPU offload;
ze_tracer - "Swiss army knife" for Level Zero API call tracing and profiling (former ze_intercept);
cl_tracer - "Swiss army knife" for OpenCL(TM) API call tracing and profiling;
gpuinfo - provides basic information about the GPUs installed in a system, and the list of HW metrics one can collect for it;
sysmon - Linux "top" like utility to monitor GPUs installed on a system;

Sample Tools & Utilities

tools for OpenCL(TM), DPC++ (with OpenCL(TM) backend) and OpenMP* GPU offload (with OpenCL(TM) backend):
- cl_hot_functions - provides a list of hottest OpenCL(TM) API calls by backend (CPU and GPU);
- cl_hot_kernels - provides a list of hottest OpenCL(TM) kernels by backend (CPU and GPU);
- cl_debug_info - prints source and assembly (GEN ISA) for kernels on GPU;
- cl_gpu_metrics - provides a list of hottest OpenCL(TM) GPU kernels along with percent of cycles it was active, stall and idle (based on continuous metrics collection mode);
- cl_gpu_query - provides a list of hottest OpenCL(TM) GPU kernels along with percent of cycles it was active, stall and idle (based on query metrics collection mode);
tools for Level Zero, DPC++ (with Level Zero backend) and OpenMP* GPU offload (with Level Zero backend):
- ze_hot_functions - provides a list of hottest Level Zero API calls;
- ze_hot_kernels - provides a list of hottest Level Zero kernels;
- ze_debug_info - prints source and assembly (GEN ISA) for kernels on GPU;
- ze_metric_query - provides a list of hottest Level Zero GPU kernels along with percent of cycles it was active, stall and idle (metrics are collected in query mode);
- ze_metric_streamer - provides a list of hottest Level Zero GPU kernels along with percent of cycles it was active, stall and idle (metrics are collected in streamer mode);
tools for OpenMP*:
- omp_hot_regions - provides a list of hottest parallel (for CPU) and target (for GPU) OpenMP* regions;
tools for binary instrumentation:
- gpu_inst_count - prints GPU kernel assembly (GEN ISA) annotated by instruction execution count;
- gpu_perfmon_read - prints GPU kernel assembly (GEN ISA) annotated by specific HW metric, which is accumulated in EU PerfMon register;
utilities:
- dpc_info - prints information on available platforms and devices in DPC++;
- ze_info - prints information on available platforms and devices in Level Zero;
- ze_metric_info - prints the list of HW metrics one can collect with the help of Level Zero;
- gpu_perfmon_set - allows to choose HW metric for collection in EU PerfMon register;

Prerequisites

CMake (version 3.12 and above)
Git (version 1.8 and above)
Python (version 3.6 and above)
On Linux one have to be a part of video (Ubuntu 18 and below) or render (Ubuntu 19 and above) user group to do any computations on Intel(R) Processor Graphics:
```
sudo usermod -a -G video <username>
# OR
sudo usermod -a -G render <username>
```
OpenCL(TM) ICD Loader and Headers
- to use non-standard path to OpenCL ICD library one may add it into LD_LIBRARY_PATH:
```
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path-to-opencl>
```
oneAPI Level Zero loader
Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver
Intel(R) Metrics Discovery Application Programming Interface
- one may need to install libdrm-dev package to build the library from sources
- one may need to allow metrics collection for non-root users:
```
sudo echo 0 > /proc/sys/dev/i915/perf_stream_paranoid
```
Metrics Library for Metrics Discovery API (Metrics Library for MD API)
Graphics Technology Pin (GT Pin)
Intel(R) oneAPI Base Toolkit
libdrm
- on Ubuntu one may perform:
```
sudo apt-get install libdrm-dev
```

More information of what is needed for particular sample can be found on sample description page.

Build and Run

In general, to build samples one need to perform the following steps (specific instructions for particular sample can be found on sample description page):

cd <pti_root>/samples/<sample_root>
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

To point out to specific headers and libraries one may use -DCMAKE_INCLUDE_PATH and -DCMAKE_LIBRARY_PATH options correspondingly, e.g.:

cmake -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INCLUDE_PATH=/tmp/level_zero/include \
  -DCMAKE_LIBRARY_PATH=/tmp/level_zero/lib \
  ..

Run instructions may vary from sample to sample significantly, so they are provided on particular sample description page.

Testing

There is a way to build and test all the samples in one command, e.g.:

LD_LIBRARY_PATH=/usr/local/lib python <pti_root>/tests/run.py

In case of failed tests, error output will be available in stderr.log file.

It's also possible to test an exact sample or a group of samples, e.g.:

python <pti_root>/tests/run.py -s cl_hot_functions # build and test an exact sample "cl_hot_functions"
python <pti_root>/tests/run.py -s ze # build and test all L0 samples

To run testing in debug mode one may use -d option, e.g.:

python <pti_root>/tests/run.py -s ze_gemm -d

The script creates build directory inside each sample folder while testing. To remove all of these folders, use:

python <pti_root>/tests/run.py -c

Tested software versions one may find in SOFTWARE file.

Known Issues

On RHEL IGA library may not be found even after Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver installation. To fix it, make a link libiga64.so to libiga64.so.1, e.g.:
```
cd /usr/lib64
sudo ln -s libiga64.so.1 libiga64.so
cd -
```

On RHEL one may need to use newer compiler. To enable it, one may fix PATH and LD_LIBRARY_PATH variables, e.g.:

export PATH=/opt/gcc/7.4.0/bin/:$PATH
export LD_LIBRARY_PATH=/opt/gcc/7.4.0/lib:/opt/gcc/7.4.0/lib64:$LD_LIBRARY_PATH

(*) Other names and brands may be claimed as property of others

	static void TargetDataOp(
	ompt_scope_endpoint_t endpoint, ompt_id_t target_id,
	ompt_id_t host_op_id, ompt_target_data_op_t optype,
	void *src_addr, int src_device_num,
	void *dest_addr, int dest_device_num,
	size_t bytes, const void *codeptr_ra) {

intel / pti-gpu Goto Github PK

pti-gpu's Introduction

Profiling Tools Interfaces for GPU (PTI for GPU)

Overview

License

Supported OS

Supported Platforms

Regularly Tested Configurations

Profiling Chapters

Profiling & Debug Tools

Sample Tools & Utilities

Prerequisites

Build and Run

Testing

Known Issues

pti-gpu's People

Contributors

Stargazers

Watchers

Forkers

pti-gpu's Issues

Description

Handling of host events with Level Zero PTI-SDK PoC

What is the current situation in the PTI-SDK PoC (as of December 19th 2023)

How Score-P handles accelerators in other adapters

Questions

Will PTI-SDK handle any kind of host events, similar to CUPTI, rocTracer and other frameworks?

How will those host events be delivered to the tool?

Reproducer

ze.cpp

kernel.cl

Compile

What we should expect?

Timestamp conversion

Proposal

Device / context-based buffers instead of thread-based buffers

How PTI-SDK PoC currently captures events

What the current implementation does

The issue

Side note

Reproducer

Environment

Recommend Projects

Recommend Topics

Recommend Org