Comments (6)
Hi @gdroy,
I see the same effect when the tool listens to the pid of the job transform. Looking at how MemoryMonitor
works in the production workflow, I see we fire-up a number of instances listening to the individual pids of runwrapper.X.sh
in each step X
and the outputs are piped into individual files. This makes me think that maybe we need a more elaborate framework for testing in production workflow. Maybe @graeme-a-stewart has an idea. I can think of a super hacky way (aliasing MemoryMonitor
to prmon
as the interfaces are exactly the same). Although it's not ideal it works :)
Best,
Serhan
from prmon.
Hi
Sim_tf.py
is a single step transform so it should work - athena
will pop into existence as a child process of the transform and should be monitored. Even for the more complex multi-step transforms the various athenas will still be children of the guiding python process that is the transform, so they should always be found as children.
It is possible for athena to re-exec itself (to save memory) after initialisation, but the PID should stay the same. That seems to be confirmed if re-starting prmon
starts to work again.
So this is weird and I have no obvious explanation for why it's happening...
I am not sure I can look before next week, I'm afraid. (Maybe a debug mode for prmon would be useful here.)
Cheers
Graeme
from prmon.
Yeah it's confusing to me, if it was the spawning of the new processes I'd expect it to happen immediately but I get 60+ seconds of the multiple athena
processes running ( I can see loads go up and cpu time jumping up) and then the pipe seems to fail. I admit I've got to the limit of my C++ / pipes knowledge as I'm not seeing how this can fail:
snprintf(smaps_buffer,64,"pstree -A -p %ld | tr \\- \\\\n",(long)mother_pid);
FILE* pipe = popen(smaps_buffer, "r");
if (pipe==0) {
if (verbose)
std::cerr << "MemoryMonitor: unable to open pstree pipe!" << std::endl;
return 1;
}
mother_pid
shouldn't have changed (especially in my test as the top level script is always there) and I can't see how pstree can fail... and then fail constantly (or why it would work after restarting prmon
).
from prmon.
Yes, it's very odd - speciality as that code never changed from MemoryMonitor. I will try and take a look, but all day today I am in a workshop.
from prmon.
Okay found the bug, it is only indirectly due to no being able to open the pstree
pipe. What's actually happening is we're running out of file descriptors. The real culprit is line 113 where we don't close file3
so we keep accumulating open files of /proc/pid/stat until we reach 1024 and then can't open the pstree
pipe. Code should be:
snprintf(stat_buffer,64,"/proc/%llu/stat",(unsigned long long)*it);
FILE *file3 = fopen(stat_buffer,"r");
if(file3==0) {
openFails.push_back(std::string(stat_buffer));
}
else {
while(fgets(sbuffer,2048,file3)) {
tsbuffer = strchr (sbuffer, ')');
if(sscanf(tsbuffer + 2 , "%*c %*d %*d %*d %*d %*d %*u %*u %*u %*u %*u %80llu %80llu %80llu %80llu", &utime, &stime, &cutime, &cstime)) {
valuesCPU[0] += utime;
valuesCPU[1] += stime;
valuesCPU[2] += cutime;
valuesCPU[3] += cstime;
}
}
fclose(file3);
}
from prmon.
Hi @gdroy,
Thanks! That's an oversight on my part, I'll put in the fix shortly.
Best,
Serhan
from prmon.
Related Issues (20)
- prmon spawning nvidia-smi? HOT 11
- Overflowing network metrics HOT 8
- Disable monitors more flexibly
- Preparing v2.2.0 HOT 5
- Protect monotonic increasing stats against race conditions
- Count number of active CPUs for monitored processes HOT 2
- Rename master branch to main HOT 3
- spdlog vs boost.log for prmon logging HOT 7
- Fix compilation issues under gcc11 HOT 1
- Network monitor not properly initialised
- Cleanup CI HOT 3
- Improve Python to be more Python3 native HOT 1
- error on make -j8
- Submodule missing from source tarball HOT 2
- prmon v3.0.0 compilation problem with clang10/13
- Improve logging output handling
- Update spdlog
- Should we try to support prmon on OS X? HOT 1
- Add support for smaps_rollup HOT 3
- Add CITATION.cff file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prmon.