Code Monkey home page Code Monkey logo

Comments (6)

amete avatar amete commented on June 18, 2024

Hi @gdroy,

I see the same effect when the tool listens to the pid of the job transform. Looking at how MemoryMonitor works in the production workflow, I see we fire-up a number of instances listening to the individual pids of runwrapper.X.sh in each step X and the outputs are piped into individual files. This makes me think that maybe we need a more elaborate framework for testing in production workflow. Maybe @graeme-a-stewart has an idea. I can think of a super hacky way (aliasing MemoryMonitor to prmon as the interfaces are exactly the same). Although it's not ideal it works :)

Best,
Serhan

from prmon.

graeme-a-stewart avatar graeme-a-stewart commented on June 18, 2024

Hi

Sim_tf.py is a single step transform so it should work - athena will pop into existence as a child process of the transform and should be monitored. Even for the more complex multi-step transforms the various athenas will still be children of the guiding python process that is the transform, so they should always be found as children.

It is possible for athena to re-exec itself (to save memory) after initialisation, but the PID should stay the same. That seems to be confirmed if re-starting prmon starts to work again.

So this is weird and I have no obvious explanation for why it's happening...

I am not sure I can look before next week, I'm afraid. (Maybe a debug mode for prmon would be useful here.)

Cheers

Graeme

from prmon.

gdroy avatar gdroy commented on June 18, 2024

Hi @graeme-a-stewart @amete,

Yeah it's confusing to me, if it was the spawning of the new processes I'd expect it to happen immediately but I get 60+ seconds of the multiple athena processes running ( I can see loads go up and cpu time jumping up) and then the pipe seems to fail. I admit I've got to the limit of my C++ / pipes knowledge as I'm not seeing how this can fail:

      snprintf(smaps_buffer,64,"pstree -A -p %ld | tr \\- \\\\n",(long)mother_pid);
      FILE* pipe = popen(smaps_buffer, "r");
      if (pipe==0) {
	if (verbose) 
	  std::cerr << "MemoryMonitor: unable to open pstree pipe!" << std::endl;
        return 1;
      }

mother_pid shouldn't have changed (especially in my test as the top level script is always there) and I can't see how pstree can fail... and then fail constantly (or why it would work after restarting prmon).

from prmon.

graeme-a-stewart avatar graeme-a-stewart commented on June 18, 2024

Yes, it's very odd - speciality as that code never changed from MemoryMonitor. I will try and take a look, but all day today I am in a workshop.

from prmon.

gdroy avatar gdroy commented on June 18, 2024

Okay found the bug, it is only indirectly due to no being able to open the pstree pipe. What's actually happening is we're running out of file descriptors. The real culprit is line 113 where we don't close file3 so we keep accumulating open files of /proc/pid/stat until we reach 1024 and then can't open the pstree pipe. Code should be:

        snprintf(stat_buffer,64,"/proc/%llu/stat",(unsigned long long)*it);
        FILE *file3 = fopen(stat_buffer,"r");
        if(file3==0) {
          openFails.push_back(std::string(stat_buffer));
        }
        else {
          while(fgets(sbuffer,2048,file3)) {
            tsbuffer = strchr (sbuffer, ')');
            if(sscanf(tsbuffer + 2 , "%*c %*d %*d %*d %*d %*d %*u %*u %*u %*u %*u %80llu %80llu %80llu %80llu", &utime, &stime, &cutime, &cstime)) {
              valuesCPU[0] += utime;
              valuesCPU[1] += stime;
              valuesCPU[2] += cutime;
              valuesCPU[3] += cstime;
            }
          }
          fclose(file3);
        }

from prmon.

amete avatar amete commented on June 18, 2024

Hi @gdroy,

Thanks! That's an oversight on my part, I'll put in the fix shortly.

Best,
Serhan

from prmon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.