Code Monkey home page Code Monkey logo

Comments (24)

lightsighter avatar lightsighter commented on July 17, 2024

There is blocking in your top-level task that is preventing the runtime from getting ahead. I suspect you haven't adjusted your mapper to correctly cope with the fixed frame code, but you could also be waiting on a future.

from legion.

syamajala avatar syamajala commented on July 17, 2024

I did not see anything in #1680 about needing to update the mapper? I will talk to @elliottslaughter.

from legion.

rohany avatar rohany commented on July 17, 2024

Are you sure it's blocking? It doesn't look like that to me (or at least S3D is pushing out a full iteration and then stopping). On 4 nodes, it takes 300ms for all operations in the trace to make it through the mapping stage of the pipeline, while on 2048 nodes it takes 3 seconds for that to happen. While it would be nice for the application to be farther ahead, that still seems like a problem.

from legion.

elliottslaughter avatar elliottslaughter commented on July 17, 2024

Before Mike fixed #1680, we were running about 2ร— the requested number of frames in advance. Now that Mike has fixed that bug, we should probably double our min_frames_to_schedule and max_outstanding_frames values, since the runtime is now much more accurate about following what we ask for.

from legion.

lightsighter avatar lightsighter commented on July 17, 2024

Are we sure that all the task launches in this program are index space task launches that span the whole machine? There are no individual task launches being done right (unless they are for future operations)?

from legion.

syamajala avatar syamajala commented on July 17, 2024

Yes every single task launch in S3D should have either __demand(__index_launch) or __demand(__constant_time_launch) on it.

from legion.

syamajala avatar syamajala commented on July 17, 2024

I doubled the number of frames and it doesnt seem like it made much of a difference?

I only ran 2048 nodes: https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pwave_x_2048_ammonia/legion_prof/

from legion.

lightsighter avatar lightsighter commented on July 17, 2024

That profile does not seem like it wants to load for me.

All the index launches span the entire machine?

from legion.

syamajala avatar syamajala commented on July 17, 2024

Do we have some way in regent or the runtime to actually verify this?

from legion.

elliottslaughter avatar elliottslaughter commented on July 17, 2024

Regent doesn't know anything about how big the machine is, and the static analysis is nontrivial.

Something like the LoggingWrapper would report the sizes (and mapping) of index launches. Note that there will be extreme performance degradation from running with it, so this is for debugging purposes only.

from legion.

syamajala avatar syamajala commented on July 17, 2024

Well I think the first think I want to check is that there are no single task launches and that every task is in fact being index space launched.

from legion.

syamajala avatar syamajala commented on July 17, 2024

I guess there is one:

https://gitlab.com/legion_s3d/legion_s3d/-/blob/subranks/rhst/s3d.rg?ref_type=heads#L1471

https://gitlab.com/legion_s3d/legion_s3d/-/blob/subranks/rhst/mpi_tasks.rg?ref_type=heads#L86-93

from legion.

elliottslaughter avatar elliottslaughter commented on July 17, 2024

@lightsighter you can manually load the profile with:

legion_prof --attach http://sapling.stanford.edu/~seshu/s3d_ammonia/pwave_x_2048_ammonia/legion_prof/

Why are you asking about the index launches being across the entire machine?

Seshu's links above go to a task that is called once per timestep to fetch the timestep information. I think we have arranged this to not actually block on MPI 90% of the time. Therefore, the vast majority of these cases should give Legion plenty of time to do the reduce/broadcast on the futures.

I believe the index launches themselves should be across the entire machine.

from legion.

syamajala avatar syamajala commented on July 17, 2024

It seems to be the complete_frame call that is causing the main task to block.

Here is a profile on 1 node with it commented out:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/legion_prof.3/

from legion.

lightsighter avatar lightsighter commented on July 17, 2024

Ok, but that only happens every N iterations. The profile looked like it was blocking multiple times each iteration so something else has to be blocking as well.

from legion.

syamajala avatar syamajala commented on July 17, 2024

That was the only thing I changed.

from legion.

lightsighter avatar lightsighter commented on July 17, 2024

And what happens if you switch back to non-frame execution?

from legion.

syamajala avatar syamajala commented on July 17, 2024

Still waiting on the 8192 node run but have up to 4096 nodes.

No frames:
http://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/weak_scaling.html

No frames profile at 4 nodes:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/pwave_x_4_ammonia/legion_prof/

No frames profile at 2048 nodes:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/pwave_x_2048_ammonia/legion_prof/

With frames:
http://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/old/weak_scaling.html

With frames profile at 4 nodes:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/old/pwave_x_4_ammonia/legion_prof/

With frames profile at 2048 nodes:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/old/pwave_x_2048_ammonia/legion_prof/

from legion.

lightsighter avatar lightsighter commented on July 17, 2024

I can't see the profiles. They're not loading. Are the permissions set correctly?

Is there a reason you ran them so large? I would expect to see the difference in waits even on a small number of nodes.

What happens if you grow the number of frames? Do you see the waits spread out?

from legion.

syamajala avatar syamajala commented on July 17, 2024

I am able to view the profiles. There are profiles for smaller node counts available in that directory as well in the pwave_x directories.

1 - 4096 nodes: http://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/

You can try using attach as well:
legion_prof --attach http://sapling.stanford.edu/~seshu/s3d_ammonia/pressure_wave/pwave_x_2048_ammonia/legion_prof/

The only reason I ran it so large is because we have hours to burn, the ALCC allocation expires at the end of June and we didn't use all of it.

I can try running again with frames and use more of them.

from legion.

lightsighter avatar lightsighter commented on July 17, 2024

Something changed very dramatically in the four node runs with frames. The main task is not blocking at all in these runs. It is gone before we even start running anything as if we unrolled the whole main task. That doesn't appear to be happening in the old version. What did you set the mapper frame runahead to be?

from legion.

lightsighter avatar lightsighter commented on July 17, 2024

Also, just looking at these profiles, the copies just look like they are taking longer from the old to the new.

from legion.

syamajala avatar syamajala commented on July 17, 2024

The original run with frames min_frames_to_schedule was 1 and max_outstanding_frames was 2. In this case 1 frame is 10 timesteps.

I did try min_frames_to_schedule = 2 and max_outstanding_frames = 4 at some point, where 1 frame is 10 timesteps, but it did not look any different to me.

It looks like Frontier is down so cant do any runs today.

from legion.

lightsighter avatar lightsighter commented on July 17, 2024

I don't see any difference on the Legion side of things at scale. The trace replays are happening and they are taking the same amount of time to replay the traces. There's very little runtime overhead. Whatever is not scaling, it is not Legion's fault.

from legion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.