Comments (6)
Are you certain the application is hanging? Is there a way to check CPU activity in another console while the application is running? I ask because runtime instrumentation unfortunately tends to take a very long time because it ends up parsing not only your executable but every library linked to your executable, which is why I generally recommend binary rewrites if you don’t want to instrument the shared libraries linked to the executable. If you are unsure, it might help to just use omnitrace-run with sampling enabled on an uninstrumented executable to see if the backtraces show a lot of time being spent in the linked libraries
from omnitrace.
Thanks @jrmadsen for the prompt reply. I'll monitor the CPU activity to verify if it is running or hanging and also use omnitrace-run
.
from omnitrace.
I tried omnitrace-run on my binary and it kept running for over an hour at which point I exited using Ctrl-C. The binary I have is a basic triton kernel which executes in less than a couple of seconds with triton and pytorch. The build system I use (buck) packages everything together and generates a 700MB executable. Unfortunately, executing ldd
on the file says it is not a dynamic executable so I can't see the linked libraries.
I also tried omnitrace-run --enable-categories rocprofiler -- ./rms_norm.par
but it didn't help. Top show CPU utilization is 0.0%.
❯ omnitrace-run --enable-categories rocprofiler -- ./rms_norm.par
OMNITRACE: HSA_TOOLS_LIB=/home/anupamb/omnitrace/lib/libomnitrace-dl.so.1.11.0
OMNITRACE: HSA_TOOLS_REPORT_LOAD_FAILURE=1
OMNITRACE: LD_PRELOAD=/home/anupamb/omnitrace/lib/libomnitrace-dl.so.1.11.0
OMNITRACE: OMNITRACE_ENABLE_CATEGORIES=rocprofiler
OMNITRACE: OMP_TOOL_LIBRARIES=/home/anupamb/omnitrace/lib/libomnitrace-dl.so.1.11.0
OMNITRACE: ROCP_HSA_INTERCEPT=1
OMNITRACE: ROCP_TOOL_LIB=/home/anupamb/omnitrace/lib/libomnitrace.so.1.11.0
[omnitrace][dl][1292192] omnitrace_main
[omnitrace][1292192][omnitrace_init_tooling] Instrumentation mode: Sampling
______ .___ ___. .__ __. __ .___________..______ ___ ______ _______
/ __ \ | \/ | | \ | | | | | || _ \ / \ / || ____|
| | | | | \ / | | \| | | | `---| |----`| |_) | / ^ \ | ,----'| |__
| | | | | |\/| | | . ` | | | | | | / / /_\ \ | | | __|
| `--' | | | | | | |\ | | | | | | |\ \----./ _____ \ | `----.| |____
\______/ |__| |__| |__| \__| |__| |__| | _| `._____/__/ \__\ \______||_______|
omnitrace v1.11.0 (rev: 77d52814e9050004cfb11d7917e155b00ab861b1, tag: v1.11.0, compiler: GNU v11.4.1, rocm: v6.0.x)
from omnitrace.
I was not aware this was a PyTorch app. If your executable is 700 MB, I’m not surprised Dyninst takes forever to parse the binary. You’ve clearly got a deadlock, sampling doesn’t slow down an app that runs in a couple of seconds to more than a minute or two. Are you executing on multiple GPUs? PyTorch RPATHs its own ROCm libraries (or in your case, it might statically link or dlopen them), this is not going to play nice with Omnitrace loading a different ROCm runtime.
from omnitrace.
Honestly, I’d probably install the omnitrace that doesn’t have support for ROCm. Until we complete our work on a new roctracer/rocprofiler implementation that doesn’t link to the HIP/HSA runtimes, there’s very little tools like Omnitrace can do for apps like PyTorch which have their own “hidden” ROCm distributions that they use bc it results in multiple ROCm runtimes being loaded.
from omnitrace.
I got omnitrace working with my triton kernel on MI300. To get it working, I built pytorch from source on MI300, installed triton-rocm and then ran omnitrace on my kernel. It worked flawlessly. Kudos to you for building this high quality software.
I will be diving deeper into it next week and will reach out if I have more questions, which I most likely will 😄 . I love the fact that you dump Perfetto compatible output.
from omnitrace.
Related Issues (20)
- `omnitrace-avail` fails on ROCM 5.3 and RX 6800XT HOT 2
- Omnitrace hangs and prints errors while running STEMDL/stdfc with more than 1 GPU HOT 9
- Update Dyninst submodule
- Segmentation fault in multi-threaded code HOT 10
- omnitrace hangs before hostCallback function HOT 4
- Segmentation fault in sampling multi-processing code HOT 8
- Missing Information for some Memory Operations (host to device or device to host) HOT 2
- Slice has duration of "Did not end." HOT 10
- Missing GPU kernels when using @profile and -b flag HOT 1
- Still an issue related to "Segmentation fault in multi-threaded code" HOT 3
- Issues with Python support HOT 2
- Inaccurate device counter trace HOT 1
- ROCm 6.0 HOT 4
- omnitrace-python errors with OMNITRACE_USE_ROCM_SMI = true HOT 11
- omnitrace-avail fails with GFXIP is not supported(gfx90c) HOT 1
- Enabling Detailed Profiling of Graph Nodes in OmniTrace HOT 1
- torch.cuda.is_available() aborts after module loading omnitrace HOT 1
- Visualizing profiling results for multi-GPUs HOT 4
- [Documentation]: Fix User API Example
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from omnitrace.