Code Monkey home page Code Monkey logo

Comments (6)

mccoys avatar mccoys commented on May 28, 2024

Hi,

I have a few comments, that might help you investigate the causes of this problem, unless you have already tried all of that. Let me try anyway.

Concerning simulations with a cold or hot core, there is always going to be one MPI that takes all the work to do. To reduce this problem you may use load balancing:

  • Have you activated MPI load balancing ? It could help move patches around MPI processes.

  • Have you tried making use of OpenMP ? Your simulations seem to use only 1 openMP thread per MPI. You should investigate on the architecture of your machines. Typically, you must have as many MPI processes as nodes and as many threads-per-MPI as you have cores-per-node. This situation can vary.

  • To have efficient local (OpenMP) and global (MPI) load balancing, you must ensure you have more patches than the total number of threads. You should also make sure that your patches are smaller than the core itself, in order to share the particles between several patches.

Concerning the hot homogeneous simulation, this is a bit puzzling. From your outputs, it seems that most of the simulation time is taken in the synchronisation of particles. However, with a decently low temperature, the time for synchronisations should be low compared to the time for computing particles. If the temperature is very high, there might be too many communications because the particles travel fast, causing the synchronisation lag. This still looks surprising. Are you sure the simulation stays homogeneous over time?

from smilei.

jderouillat avatar jderouillat commented on May 28, 2024

Hi,
I ran the hothomogenouscorescaninput.py simulation on our cluster (each node have 2 Intel Sandy Bridge processor of 8 cores) for 1 to 256 MPI process (1 thread OpenMP).
The test is homogeneous so I didn't oversplit the domain in patches, I used 1 patch per MPI and no dynamic load balancing.

I split the domain to be as square as possible, and the only change in the namelist is the suppression of the definition of patchesx and patchesy that I modifief in the command line:

$ mpirun -np 1 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[1,1]  2>&1 |tee smilei_001_MPI_001_OMP.log
$ mpirun -np 2 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[2,1]  2>&1 |tee smilei_002_MPI_001_OMP.log
$ mpirun -np 4 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[2,2]  2>&1 |tee smilei_004_MPI_001_OMP.log
$ mpirun -np 8 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[4,2]  2>&1 |tee smilei_008_MPI_001_OMP.log
$ mpirun -np 16 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[4,4]  2>&1 |tee smilei_016_MPI_001_OMP.log
$ mpirun -np 32 -ppn 16 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[8,4]  2>&1 |tee smilei_032_MPI_001_OMP.log
$ mpirun -np 64 -ppn 16 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[8,8]  2>&1 |tee smilei_064_MPI_001_OMP.log
$ mpirun -np 128 -ppn 16 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[16,8]  2>&1 |tee smilei_128_MPI_001_OMP.log
$ mpirun -np 256 -ppn 16 ./smilei  hothomogenouscorescaninput.py Main.number_of_patches=[16,16] 2>&1 |tee smilei_256_MPI_001_OMP_clrw.log

For my point of view results are good considering the total number of particles.
On the 256 MPI process simulation, the number of particles per MPI process considering the 2 species is 1600 :

$ grep "Time in time loop" smilei*
smilei_001_MPI_001_OMP.log: Time in time loop :	74.567	99.843% coverage
smilei_002_MPI_001_OMP.log: Time in time loop :	35.904	99.835% coverage
smilei_004_MPI_001_OMP.log: Time in time loop :	18.314	99.785% coverage
smilei_008_MPI_001_OMP.log: Time in time loop :	9.738	99.722% coverage
smilei_016_MPI_001_OMP.log: Time in time loop :	5.421	99.466% coverage
smilei_032_MPI_001_OMP.log: Time in time loop :	2.813	99.247% coverage
smilei_064_MPI_001_OMP.log: Time in time loop :	1.653	98.879% coverage
smilei_128_MPI_001_OMP.log: Time in time loop :	1.114	97.852% coverage
smilei_256_MPI_001_OMP.log: Time in time loop :	0.895	97.031% coverage

from smilei.

jderouillat avatar jderouillat commented on May 28, 2024

I looked at hotcoreinput.py too, for this beginning of simulation it will be very hard to get an honest scaling on more than 4 compute units.
At initialization, the plasma is distributed on a circle which have a radius of 1.5.
The cell_length is 0.2, so with a minimum patch size of 6 cells, a patch will almost cover a quarter of the plasma, 4 patches cover almost the plasma. At the end of this part of the simulation, particles haven't move enough to consider a better scaling.
Below the scalability for 1 to 4 MPI / patches :

$ grep "Time in time" smilei*.log
smilei_001_MPI_001_OMP.log: Time in time loop :	16.202	98.869% coverage
smilei_002_MPI_001_OMP.log: Time in time loop :	8.825	98.801% coverage
smilei_004_MPI_001_OMP.log: Time in time loop :	4.533	98.629% coverage

from smilei.

JimmyHolloway avatar JimmyHolloway commented on May 28, 2024

Hi Mccoys, Jderouillat,

I took your advice and made use of open MP and load balancing and investigated the temperature's effect on runtime.

The high temperatures:
I ran a cold-homogenous simulation set to compare with the hot-homogenous set. The cold-homogenous scaled sensibly. The log file told me that the code spent a lot less time synchronised partciles and fields in the cold-homo simulations. I am convinced the problem was just the really high temperature of the hot-homo simulations.
coldhomo

Open MP:
I performed the same simulation set as above but with open MP used properly and got good computational savings:
coldhomoopm

I am still testing load balancing but it looks like it gives further computaitonal saves with the 'core' simulations.

Thank you both for helping me with this!
Best,
Jimmy.

from smilei.

MickaelGrech avatar MickaelGrech commented on May 28, 2024

Hi Jimmy,
Are you now happy with how the code performs?
If so, can you close this issue?

from smilei.

JimmyHolloway avatar JimmyHolloway commented on May 28, 2024

Hi Mickael,
Yes the issue was resolved. Thank you.

from smilei.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.