Comments (6)
Hi,
I have a few comments, that might help you investigate the causes of this problem, unless you have already tried all of that. Let me try anyway.
Concerning simulations with a cold or hot core, there is always going to be one MPI that takes all the work to do. To reduce this problem you may use load balancing:
-
Have you activated MPI load balancing ? It could help move patches around MPI processes.
-
Have you tried making use of OpenMP ? Your simulations seem to use only 1 openMP thread per MPI. You should investigate on the architecture of your machines. Typically, you must have as many MPI processes as nodes and as many threads-per-MPI as you have cores-per-node. This situation can vary.
-
To have efficient local (OpenMP) and global (MPI) load balancing, you must ensure you have more patches than the total number of threads. You should also make sure that your patches are smaller than the core itself, in order to share the particles between several patches.
Concerning the hot homogeneous simulation, this is a bit puzzling. From your outputs, it seems that most of the simulation time is taken in the synchronisation of particles. However, with a decently low temperature, the time for synchronisations should be low compared to the time for computing particles. If the temperature is very high, there might be too many communications because the particles travel fast, causing the synchronisation lag. This still looks surprising. Are you sure the simulation stays homogeneous over time?
from smilei.
Hi,
I ran the hothomogenouscorescaninput.py
simulation on our cluster (each node have 2 Intel Sandy Bridge processor of 8 cores) for 1 to 256 MPI process (1 thread OpenMP).
The test is homogeneous so I didn't oversplit the domain in patches, I used 1 patch per MPI and no dynamic load balancing.
I split the domain to be as square as possible, and the only change in the namelist is the suppression of the definition of patchesx and patchesy that I modifief in the command line:
$ mpirun -np 1 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[1,1] 2>&1 |tee smilei_001_MPI_001_OMP.log
$ mpirun -np 2 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[2,1] 2>&1 |tee smilei_002_MPI_001_OMP.log
$ mpirun -np 4 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[2,2] 2>&1 |tee smilei_004_MPI_001_OMP.log
$ mpirun -np 8 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[4,2] 2>&1 |tee smilei_008_MPI_001_OMP.log
$ mpirun -np 16 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[4,4] 2>&1 |tee smilei_016_MPI_001_OMP.log
$ mpirun -np 32 -ppn 16 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[8,4] 2>&1 |tee smilei_032_MPI_001_OMP.log
$ mpirun -np 64 -ppn 16 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[8,8] 2>&1 |tee smilei_064_MPI_001_OMP.log
$ mpirun -np 128 -ppn 16 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[16,8] 2>&1 |tee smilei_128_MPI_001_OMP.log
$ mpirun -np 256 -ppn 16 ./smilei hothomogenouscorescaninput.py Main.number_of_patches=[16,16] 2>&1 |tee smilei_256_MPI_001_OMP_clrw.log
For my point of view results are good considering the total number of particles.
On the 256 MPI process simulation, the number of particles per MPI process considering the 2 species is 1600 :
$ grep "Time in time loop" smilei*
smilei_001_MPI_001_OMP.log: Time in time loop : 74.567 99.843% coverage
smilei_002_MPI_001_OMP.log: Time in time loop : 35.904 99.835% coverage
smilei_004_MPI_001_OMP.log: Time in time loop : 18.314 99.785% coverage
smilei_008_MPI_001_OMP.log: Time in time loop : 9.738 99.722% coverage
smilei_016_MPI_001_OMP.log: Time in time loop : 5.421 99.466% coverage
smilei_032_MPI_001_OMP.log: Time in time loop : 2.813 99.247% coverage
smilei_064_MPI_001_OMP.log: Time in time loop : 1.653 98.879% coverage
smilei_128_MPI_001_OMP.log: Time in time loop : 1.114 97.852% coverage
smilei_256_MPI_001_OMP.log: Time in time loop : 0.895 97.031% coverage
from smilei.
I looked at hotcoreinput.py
too, for this beginning of simulation it will be very hard to get an honest scaling on more than 4 compute units.
At initialization, the plasma is distributed on a circle which have a radius of 1.5.
The cell_length is 0.2, so with a minimum patch size of 6 cells, a patch will almost cover a quarter of the plasma, 4 patches cover almost the plasma. At the end of this part of the simulation, particles haven't move enough to consider a better scaling.
Below the scalability for 1 to 4 MPI / patches :
$ grep "Time in time" smilei*.log
smilei_001_MPI_001_OMP.log: Time in time loop : 16.202 98.869% coverage
smilei_002_MPI_001_OMP.log: Time in time loop : 8.825 98.801% coverage
smilei_004_MPI_001_OMP.log: Time in time loop : 4.533 98.629% coverage
from smilei.
Hi Mccoys, Jderouillat,
I took your advice and made use of open MP and load balancing and investigated the temperature's effect on runtime.
The high temperatures:
I ran a cold-homogenous simulation set to compare with the hot-homogenous set. The cold-homogenous scaled sensibly. The log file told me that the code spent a lot less time synchronised partciles and fields in the cold-homo simulations. I am convinced the problem was just the really high temperature of the hot-homo simulations.
Open MP:
I performed the same simulation set as above but with open MP used properly and got good computational savings:
I am still testing load balancing but it looks like it gives further computaitonal saves with the 'core' simulations.
Thank you both for helping me with this!
Best,
Jimmy.
from smilei.
Hi Jimmy,
Are you now happy with how the code performs?
If so, can you close this issue?
from smilei.
Hi Mickael,
Yes the issue was resolved. Thank you.
from smilei.
Related Issues (20)
- Error on MPI HOT 9
- Integer wrapper for MPI communication HOT 3
- Particle Binning: Axes limits [0, '"auto"] bug HOT 1
- Choosing output number HOT 1
- Possibility of adding time-dependent ionization rate HOT 2
- This requires setting certain spatial attributes of the materials such as dielectric constants and conductivity in the simulation space. HOT 3
- smilei_test passed, but actual run failed HOT 5
- Operations between quantities in Scalar Diagnostic HOT 1
- Tasks Parallelisation HOT 2
- Initial phase for LaserGaussian3D HOT 2
- Explanation/example for ParticleBinning units HOT 2
- EM_boundary_conditions set as "PML" in 3Dcartesian is ok? HOT 2
- The Screen diagnostic Data at instantaneous time step. HOT 1
- Clarification on the Screen diagnostic HOT 1
- Shortcut to profiles in the documentation HOT 7
- Segmentation faults HOT 16
- Segmentation fault HOT 16
- Unit conversion in PostProcess
- Adjusting gridGlobalOffset and temporal offset HOT 7
- Keeping track of number of MPI processes and OpenMP threads HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smilei.