Comments (9)
conda activate stemdl
where does this conda env come from?
Do you know how the PyTorch execution model changes when multiple GPUs are used? Does it fork for each additional GPU? Bc I’m seeing 3 fork calls which suggests that might be the root cause of the issue.
from omnitrace.
My mistake, it should have been: conda create stemdl. Yes, it uses fork. Is there a workaround?
from omnitrace.
fork has caused a number of problems in the past, mostly related to perfetto bc of a background thread. You might want to try perfetto with the system backend. You will probably want to increase the flush and write periods to the same as the duration in the perfetto config file (see sample here) because of quirks w.r.t. how perfetto writes that file and how omnitrace writes some perfetto data — essentially once perfetto flushes/writes data, you can’t add any time-stamped data that happened before that point and a fair amount of data gathered through sampling isn’t passed to perfetto until finalization bc we have to map instruction pointers to line info and doing so while sampling adds too much overhead during runtime
from omnitrace.
Is there a command example when using omnitrace-python? I have tried without success:
export OMNITRACE_PERFETTO_BACKEND=system
omnitrace-perfetto-traced --background
omnitrace-perfetto --out ./omnitrace-perfetto.proto --txt -c ${OMNITRACE_ROOT}/rocm-5.4/share/omnitrace/omnitrace.cfg --background
omnitrace-python-3.8 -- ./stemdl_classification.py --config ./stemdlConfig.yaml
The option --perfetto-backend=system is not valid for omnitrace-python.
from omnitrace.
Update: I’ve tracked down the issue. It’s not related to perfetto, but rather the sys.argv passed to omnitrace’s __main__.py
upon re-entry after PyTorch forks. I should have a PR merged with the fix by tomorrow afternoon.
from omnitrace.
from omnitrace.
Only difference is I am not using slurm
Ah yeah, I’m running this on Lockhart and without using SLURM, I end up with only 1 CPU available to me (e.g. nproc
returns 1) whereas srun nproc
returns 128. Given all the threads that are created, I figured that was desirable and maybe just an omission in the instructions. As it turns out, I assumed, incorrectly, that the execution model would be the same.
It appears PyTorch will make even more forks when nproc < ngpu and these forks appear to not retain the variable I stored in #291 to re-patch sys.argv
. Storing it in an environment variable in #292 appears to do the trick.
from omnitrace.
By the way, if you are also running on Lockhart, I'd highly recommend using srun. PyTorch may try to compensate by forking instead of creating threads but from viewing top
while that code was running, all 4 of the forked processes were all sharing the same CPU (i.e. their CPU%
was all roughly ~25% instead of ~100%, which is what you would see if they were running on separate CPUs)
from omnitrace.
Thanks #292 fixed the issue.
from omnitrace.
Related Issues (20)
- Rename OMNITRACE_USE_PERFETTO to OMNITRACE_TRACE HOT 1
- Rename OMNITRACE_USE_TIMEMORY to OMNITRACE_PROFILE
- Segmentation fault if no command specified
- Bad metric 'L2CacheHit', var 'TCC_HIT[0]' is not found when running `omnitrace-avail -G omnitrace.cfg --all` HOT 6
- omnitrace needs dyninst-12.0.0 or higher HOT 3
- Binary analysis cache
- Command line multi-value passing style is not clear from help HOT 1
- Segmentation fault when using `omnitrace` for generating instrumented binary HOT 8
- Allow using external elfutils HOT 4
- Segfault when OMNITRACE_USE_ROCTX is true HOT 1
- Problem with flow event HOT 4
- GPU HW counter metrics broken in ROCm 5.4
- feature request - Energy profiling HOT 14
- Feature request: Move GPU trace closer to HIP+CPU activity HOT 1
- omnitrace user API HOT 11
- Percentiles and other statistics besides mean, min, max for flat profiles HOT 2
- OpenMP offloading
- `omnitrace-avail` fails on ROCM 5.3 and RX 6800XT HOT 2
- Update Dyninst submodule
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from omnitrace.