Ive done another set of runs of S3D on Frontier and am seeing poor weak scaling perfor

I did not see anything in <a class="issue-link js-issue-link" data-error-text="Failed

Before Mike fixed <a class="issue-link js-issue-link" data-error-text="Failed to load

Legion: S3D poor weak scaling performance on Frontier about legion HOT 24 OPEN

syamajala commented on July 17, 2024

Legion: S3D poor weak scaling performance on Frontier

from legion.

Comments (24)

lightsighter commented on July 17, 2024

There is blocking in your top-level task that is preventing the runtime from getting ahead. I suspect you haven't adjusted your mapper to correctly cope with the fixed frame code, but you could also be waiting on a future.

from legion.

syamajala commented on July 17, 2024

I did not see anything in #1680 about needing to update the mapper? I will talk to @elliottslaughter.

from legion.

rohany commented on July 17, 2024

Are you sure it's blocking? It doesn't look like that to me (or at least S3D is pushing out a full iteration and then stopping). On 4 nodes, it takes 300ms for all operations in the trace to make it through the mapping stage of the pipeline, while on 2048 nodes it takes 3 seconds for that to happen. While it would be nice for the application to be farther ahead, that still seems like a problem.

from legion.

elliottslaughter commented on July 17, 2024

Before Mike fixed #1680, we were running about 2× the requested number of frames in advance. Now that Mike has fixed that bug, we should probably double our min_frames_to_schedule and max_outstanding_frames values, since the runtime is now much more accurate about following what we ask for.

from legion.

lightsighter commented on July 17, 2024

Are we sure that all the task launches in this program are index space task launches that span the whole machine? There are no individual task launches being done right (unless they are for future operations)?

from legion.

syamajala commented on July 17, 2024

Yes every single task launch in S3D should have either __demand(__index_launch) or __demand(__constant_time_launch) on it.

from legion.

syamajala commented on July 17, 2024

I doubled the number of frames and it doesnt seem like it made much of a difference?

I only ran 2048 nodes: https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pwave_x_2048_ammonia/legion_prof/

from legion.

lightsighter commented on July 17, 2024

That profile does not seem like it wants to load for me.

All the index launches span the entire machine?

from legion.

syamajala commented on July 17, 2024

Do we have some way in regent or the runtime to actually verify this?

from legion.

elliottslaughter commented on July 17, 2024

Regent doesn't know anything about how big the machine is, and the static analysis is nontrivial.

Something like the LoggingWrapper would report the sizes (and mapping) of index launches. Note that there will be extreme performance degradation from running with it, so this is for debugging purposes only.

from legion.

syamajala commented on July 17, 2024

Well I think the first think I want to check is that there are no single task launches and that every task is in fact being index space launched.

from legion.

syamajala commented on July 17, 2024

I guess there is one:

https://gitlab.com/legion_s3d/legion_s3d/-/blob/subranks/rhst/s3d.rg?ref_type=heads#L1471

https://gitlab.com/legion_s3d/legion_s3d/-/blob/subranks/rhst/mpi_tasks.rg?ref_type=heads#L86-93

from legion.

elliottslaughter commented on July 17, 2024

@lightsighter you can manually load the profile with:

legion_prof --attach http://sapling.stanford.edu/~seshu/s3d_ammonia/pwave_x_2048_ammonia/legion_prof/

Why are you asking about the index launches being across the entire machine?

Seshu's links above go to a task that is called once per timestep to fetch the timestep information. I think we have arranged this to not actually block on MPI 90% of the time. Therefore, the vast majority of these cases should give Legion plenty of time to do the reduce/broadcast on the futures.

I believe the index launches themselves should be across the entire machine.

from legion.

syamajala commented on July 17, 2024

It seems to be the complete_frame call that is causing the main task to block.

Here is a profile on 1 node with it commented out:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/legion_prof.3/

from legion.

lightsighter commented on July 17, 2024

Ok, but that only happens every N iterations. The profile looked like it was blocking multiple times each iteration so something else has to be blocking as well.

from legion.

syamajala commented on July 17, 2024

That was the only thing I changed.

from legion.

lightsighter commented on July 17, 2024

And what happens if you switch back to non-frame execution?

from legion.

syamajala commented on July 17, 2024

Still waiting on the 8192 node run but have up to 4096 nodes.

No frames:
http://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/weak_scaling.html

No frames profile at 4 nodes:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/pwave_x_4_ammonia/legion_prof/

No frames profile at 2048 nodes:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/pwave_x_2048_ammonia/legion_prof/

With frames:
http://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/old/weak_scaling.html

With frames profile at 4 nodes:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/old/pwave_x_4_ammonia/legion_prof/

With frames profile at 2048 nodes:
https://legion.stanford.edu/prof-viewer/?url=https://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/old/pwave_x_2048_ammonia/legion_prof/

from legion.

lightsighter commented on July 17, 2024

I can't see the profiles. They're not loading. Are the permissions set correctly?

Is there a reason you ran them so large? I would expect to see the difference in waits even on a small number of nodes.

What happens if you grow the number of frames? Do you see the waits spread out?

from legion.

syamajala commented on July 17, 2024

I am able to view the profiles. There are profiles for smaller node counts available in that directory as well in the pwave_x directories.

1 - 4096 nodes: http://sapling2.stanford.edu/~seshu/s3d_ammonia/pressure_wave/

You can try using attach as well:
legion_prof --attach http://sapling.stanford.edu/~seshu/s3d_ammonia/pressure_wave/pwave_x_2048_ammonia/legion_prof/

The only reason I ran it so large is because we have hours to burn, the ALCC allocation expires at the end of June and we didn't use all of it.

I can try running again with frames and use more of them.

from legion.

lightsighter commented on July 17, 2024

Something changed very dramatically in the four node runs with frames. The main task is not blocking at all in these runs. It is gone before we even start running anything as if we unrolled the whole main task. That doesn't appear to be happening in the old version. What did you set the mapper frame runahead to be?

from legion.

lightsighter commented on July 17, 2024

Also, just looking at these profiles, the copies just look like they are taking longer from the old to the new.

from legion.

syamajala commented on July 17, 2024

The original run with frames min_frames_to_schedule was 1 and max_outstanding_frames was 2. In this case 1 frame is 10 timesteps.

I did try min_frames_to_schedule = 2 and max_outstanding_frames = 4 at some point, where 1 frame is 10 timesteps, but it did not look any different to me.

It looks like Frontier is down so cant do any runs today.

from legion.

lightsighter commented on July 17, 2024

I don't see any difference on the Legion side of things at scale. The trace replays are happening and they are taking the same amount of time to replay the traces. There's very little runtime overhead. Whatever is not scaling, it is not Legion's fault.

from legion.

Legion: S3D poor weak scaling performance on Frontier about legion HOT 24 OPEN

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent