Code Monkey home page Code Monkey logo

Comments (11)

boyceg avatar boyceg commented on June 12, 2024

Would it be easier/better to adopt a simpler grid generation algorithm --- e.g. a tree-structure with fixed-sized patches?

from ibamr.

drwells avatar drwells commented on June 12, 2024

That may be more efficacious, but I think just redoing the load balancer will be easier. If we use a different grid generation algorithm we'll have to redo a load balancer and also the gridding algorithm instead of just the load balancer. Furthermore, being able to chop boxes in more arbitrary ways might be advantageous for IB load balancing.

from ibamr.

drwells avatar drwells commented on June 12, 2024

This has come up a couple of times on slack in different places so I'll summarize what I know here.

I've been running the nozzle benchmark on my machine to try and address some IFED scalability issues - at this point we have more parallelization problems with SAMRAI then IFED, which is good since, fundamentally, we can fix a lot of these by just writing a better LoadBalancer.

A few highlights from callgrind:

  • some basic stuff which is very easy to patch (#1562, #1563, #1564) and marginally improves performance (each is a few percent of the overall solver) (the 'recompute overlapping boxes on demand' patch actually makes my benchmark run twice as fast)
  • SAMRAI's communication uses extra buffers on both ends - we can get another marginal improvement in communication cost by writing directly into PatchData from MPI's buffer.
  • we spend roughly half our time when doing a MatMult communicating ghost data: that's good motivation to get rid of some more intermediate copies in SAMRAI and reduce the total number of ghost cells by writing a better load balancer. Reducing copies is a marginal improvement compared to writing a really good load balancer. Maybe we can collaborate with the AMREX people on this? I haven't found a ton of load-balancing algorithms for patch-based FDM in the literature. I should look more.
  • Nodal IFED is now fast enough that we spend a lot more time communicating Cartesian-grid data than actually doing IFED :) The global FE scatter isn't that bad (and there isn't much more we can do here - see drwells/fiddle@d24a478) but doing a global redistribution of Cartesian grid data is about 10x as expensive as doing IFED stuff. Scattering the accumulated forces back is also quite slow (maybe 3x the cost of doing the actual spreading). This implies that we need to get rid of SecondaryHierarchy and have a single global partitioning of the Cartesian-grid data which load balances both the total number of cells/processor and number of IB points/processor.

Update 1: some relevant discussion on AMR in the Parthenon paper (AFAICT they use trees, not patches)

  • https://arxiv.org/pdf/2202.12309.pdf: "PARTHENON provides infrastructure for AMR with both cell- and face-centered data. The size of these arrays must be the same on all MeshBlocks, and moreover the overall domain must contain an integer number of MeshBlocks in each dimension. However, the number and size of individual MeshBlocks tiling the computational domain is arbitrary."

Update 2: I spent a little bit playing around with a possible new load-balancing algorithm for patches. It does a better job than the default LoadBalancer because it alternates between chopping and balancing. The algorithm is something like:

  1. Calculate a weight for each patch (e.g., number of cells)
  2. use a bin packing algorithm (e.g., Karmarkar-Karp) to balance patches between bins
  3. pair the biggest and smallest bin quartiles together. The bigger bin donates something (either a patch or a subset of a patch) to the smaller bin to move it close to the optimal workload.
  4. Repeat 6-7 times.

My rough Python implementation of this algorithm converges nicely: the difference between the biggest and smallest bins roughly halves at each step and we create a constant (n_procs / 4) number of new patches at most. It's just a simulacrum (I approximate bad chops by adding some randomness) but I get far better results than SAMRAI's load balancer (60 total patches instead of 300, maximum difference of 100 instead of 3000)

Update 3: We may have found a capable undergraduate student to work on this this summer.

Another thing we should examine, if we end up using integer programming, is penalizing using more ghost regions. Formulations of bin packing as an integer program typically involve minimizing some quantity r. We could penalize the existence of ghost regions by just adding up the number of ghost regions present with certain bin packing operations. It might not help a lot but it could keep adjacent patches preferentially together (and then we can just merge them in a postprocessing step).

from ibamr.

knepley avatar knepley commented on June 12, 2024

from ibamr.

drwells avatar drwells commented on June 12, 2024

@knepley We can probably interpret this load-balancing problem as a weighted graph-balancing problem. A fundamental difference between this and a more normal situation (e.g., FEM) is that we can chop patches. For example, on a single processor SAMRAI generates

visit0000

for exactly the same grid as the one in my first picture. Just partitioning this isn't going to work since there are only 16 patches: we need a way to chop them up in a nice way too.

from ibamr.

boyceg avatar boyceg commented on June 12, 2024

In some sense, we don't have to use SAMR as long as the levels are properly nested. Would it make sense to use p4est?

from ibamr.

knepley avatar knepley commented on June 12, 2024

Also, by checking the refinement criteria you input, you can force p4est to deliver blocks if you want, and the 2:1 balance is also optional.

from ibamr.

boyceg avatar boyceg commented on June 12, 2024

Is it possible straightforward to get p4est to generate what SAMR folks call "properly nested" grids?

from ibamr.

drwells avatar drwells commented on June 12, 2024

Would it make sense to use p4est?

Maybe! That's essentially what Parthenon is doing (and a lot of other libraries). Notably, AMREX does not (they still use SAMR).

Is it straightforward to get p4est to generate what SAMR folks call "properly nested" grids?

I believe this is the default - e.g., everything in deal.II is properly nested.

from ibamr.

boyceg avatar boyceg commented on June 12, 2024

Seems pretty tempting to try to use p4est here...

from ibamr.

knepley avatar knepley commented on June 12, 2024

I have been using it for a while now. All the plasma stuff I am doing with Mark Adams uses p4est. You can convert seamlessly to a Plex (and pretty seamlessly back). We have some built-in stuff to define Vec objects that define the refinement/coarsening (see VecTagger and DMAdaptLabel/DMAdaptMetric).

from ibamr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.