Code Monkey home page Code Monkey logo

Comments (22)

bwvdnbro avatar bwvdnbro commented on June 7, 2024

Sounds like a badly handled corner case. We did introduce some extra dependencies recently, so that could explain why the older version works.

Any chance you could run this through a debugger and send me a stack trace for the point where it crashes (you can set a breakpoint on scheduler.c:106)?

Or could you make the code change in the patch below and let me know what the error message becomes?
unlock_null.txt

from swift.

FHusko avatar FHusko commented on June 7, 2024

Hi Bert, this is the new error message: [0003] [00010.3] scheduler.c:scheduler_addunlock():111: Unlocking task is NULL (task unlocks send/tend). Thanks for being on the case!

from swift.

bwvdnbro avatar bwvdnbro commented on June 7, 2024

@MatthieuSchaller looks like this is the only place where scheduler_addunlock() could be called with a first argument that is NULL and a second argument that is a send/tend:
https://github.com/SWIFTSIM/swiftsim/blob/67e0fdc56c335fb75fddb33435f8630f5a5ea74b/src/engine_maketasks.c#L3961

Is this a realistic scenario, or does that mean something else went wrong? Is ci->timestep_collect guaranteed to exist?

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

I wonder whether there is something wrong in the case where some TLCs are completely empty, in which case timestep_collect could have been missed.
Filip's setup is a zoom(-ish) so maybe there is something I did not think of when changing the dt exchange.

from swift.

FHusko avatar FHusko commented on June 7, 2024

I don't know how it compares to a typical zoom, but the particle masses start growing as r^2 after 500 kpc, and the halo extends out to 6000 kpc. On top of that, the mass density in the setup falls as r^1.5. So the number density of particles falls as r^3.5 in total throughout most of the box.

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

You can try the branch Filip_fix in the gitlab. It should now work. If you confirm it does, then I'll clean it up and make it a permanent change.

from swift.

FHusko avatar FHusko commented on June 7, 2024

It does indeed! Have tested it out on the smaller test problem, and have also begun the actual larger run (36 nodes) which prompted this in the first place. That one works too.

Thanks again!

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

Great, thanks for checking. I'll write this up as a proper clean fix and we'll merge it into the main code.

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

I have now pushed a cleaner version. Just to be extra safe, could you pull the latest version of this branch and test it once more? If it starts smoothly that will be enough.

Thanks!

from swift.

FHusko avatar FHusko commented on June 7, 2024

I applied the most recent changes. I get a following error now with the test case:

[0000] [00167.9] stars_spart_has_no_neighbours: WARNING: Star particle with ID 1000008101 treated as having no neighbours (h: 225, wcount: 0).
[0000] [00167.9] ./feedback/EAGLE_thermal/feedback.h:feedback_prepare_feedback():211: Evolving a star particle that should not!

This happens in the initial fake time step. There are some stellar particles in the initial conditions which are probably prompting this error. I got the error on Friday with the earlier set of changes which you had made. I don't know how I managed to get it to run earlier, since I remember the run did happily go for around 20 minutes. Possibly it was because I didn't turn on the --stars and --feedback options with the earlier test run. Also this may not have anything to do with the changes you had made; it could be from some change along the way (I was using a year-old version of SWIFT).

If you don't have a clear idea where this is coming from, I could try my setup with the current master as well, just to see if this is related to the latest changes.

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

mmmh.... Both should be unrelated to the changes here.

The first message is probably because there is a star somewhere far from everything and limited by h_max.
That's a relatively new warning. But the code should survive.

I guess the problem here is that the setup is problematic in terms of some of the stars. Can you print out more information about that star and see whether it is indeed at a strange place?

from swift.

FHusko avatar FHusko commented on June 7, 2024

Ah, yes, there are some stars that are placed very far away from the centre. This was one of them (at a dozen Mpc away from the centre of mass, according to a check I did now). This happens because the script which creates the stellar bulge places stars by drawing random numbers from a distribution, without any maximum radius.

I'll try cutting off the stars at around 100 kpc or something similar, see if I get the same thing.

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

Can you give the stars far away an age of -1 ?

from swift.

FHusko avatar FHusko commented on June 7, 2024

That's the odd thing, the stars in the ICs should have a birth time of -1. Could it be that overwrite_birth_time: 1 and birth time: -1 in the stars section of the parameter file is not enough with the newer version of SWIFT? Although I don't see any other ones that could affect this.

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

No, that hasn't changed. Maybe there is something fishy in the logic.

That should be the same without MPI however.

from swift.

FHusko avatar FHusko commented on June 7, 2024

Yes, the error happens without MPI too with the newer version of SWIFT.

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

Can you show me the full log to that point?

from swift.

FHusko avatar FHusko commented on June 7, 2024

Here it is:

====
Starting job 4900398 at Mon 21 Feb 16:42:42 GMT 2022 for user dc-husk1.
Running on nodes: m6320
====
 Welcome to the cosmological hydrodynamical code
    ______       _________________
   / ___/ |     / /  _/ ___/_  __/
   \__ \| | /| / // // /_   / /
  ___/ /| |/ |/ // // __/  / /
 /____/ |__/|__/___/_/    /_/
 SPH With Inter-dependent Fine-grained Tasking

 Version : 0.9.0
 Revision: v0.9.0-844-g2b88a89d-dirty, Branch: master, Date: 2022-02-04 10:50:57 +0000
 Webpage : www.swiftsim.com

 Config. options: '--with-subgrid=EAGLE-XL --with-hydro=sphenix --with-kernel=wendland-C2 --with-ext-potential=nfw --enable-fixed-boundary-particles=2 --with-parmetis --with-tbbmalloc --disable-optimization --enable-debug --enable-debugging-checks'

 Compiler: ICC, Version: 20.21.20201112
 CFLAGS  : '-g -O0  -debug inline-debug-info -pthread -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -w2 -Wunused-variable -Wshadow -Werror -Wstrict-prototypes'

 HDF5 library version     : 1.10.6
 FFTW library version     : 3.x (details not available)
 GSL library version      : 2.5

[00000.0] main: CPU frequency used for tick conversion: 2599962789 Hz
[00000.0] main: Running on: m6320.pri.cosma7.alces.network
[00000.0] main: WARNING: Debugging checks activated. Code will be slower !
[00000.0] main: sizeof(part)        is  320 bytes.
[00000.0] main: sizeof(xpart)       is  160 bytes.
[00000.0] main: sizeof(sink)        is   96 bytes.
[00000.0] main: sizeof(spart)       is  448 bytes.
[00000.0] main: sizeof(bpart)       is 4128 bytes.
[00000.0] main: sizeof(gpart)       is  152 bytes.
[00000.0] main: sizeof(multipole)   is  200 bytes.
[00000.0] main: sizeof(grav_tensor) is  168 bytes.
[00000.0] main: sizeof(task)        is   96 bytes.
[00000.0] main: sizeof(cell)        is 1376 bytes.
[00000.0] main: Reading runtime parameters from file 'isolated_galaxy.yml'
[00000.1] output_options_init: Reading select output parameters from file 'param_list.yml'
[00000.1] io_prepare_output_fields: WARNING: Trying to change behaviour of field 'Default:FOFGroupIDs_Gas' (read from 'param_list.yml') that does not exist. This may be because you are not running with all of the physics that you compiled the code with.
[00000.1] io_prepare_output_fields: WARNING: Trying to change behaviour of field 'Default:VELOCIraptorGroupIDs_Gas' (read from 'param_list.yml') that does not exist. This may be because you are not running with all of the physics that you compiled the code with.
[00000.1] main: Internal unit system: U_M = 1.988480e+43 g.
[00000.1] main: Internal unit system: U_L = 3.085660e+21 cm.
[00000.1] main: Internal unit system: U_t = 3.085660e+16 s.
[00000.1] main: Internal unit system: U_I = 1.000000e+00 A.
[00000.1] main: Internal unit system: U_T = 1.000000e+00 K.
[00000.1] phys_const_print:    Gravitational constant = 4.301093e+04
[00000.1] phys_const_print:            Speed of light = 2.997925e+05
[00000.1] phys_const_print:           Planck constant = 1.079908e-96
[00000.1] phys_const_print:        Boltzmann constant = 6.943238e-70
[00000.1] phys_const_print:     Thomson cross-section = 6.986924e-68
[00000.1] phys_const_print:             Electron-Volt = 8.057293e-66
[00000.1] phys_const_print:               Proton mass = 8.411560e-68
[00000.1] phys_const_print:                      Year = 1.022696e-09
[00000.1] phys_const_print:         Astronomical Unit = 4.848164e-09
[00000.1] phys_const_print:                    Parsec = 1.000006e-03
[00000.1] phys_const_print:                Solar mass = 9.999648e-11
[00000.1] phys_const_print:    H_0 / h = 100 km/s/Mpc = 9.999943e-02
[00000.1] phys_const_print:                    T_CMB0 = 2.725500e+00
[00000.4] feedback_props_init: Feedback model is EAGLE (EAGLE)
[00000.4] feedback_props_init: Feedback energy fraction min=1.000000, max=1.000000
[00000.4] feedback_props_init: Feedback energy fraction powers: n_n=0.868600, n_Z=0.868600
[00000.4] feedback_props_init: Feedback energy fraction widths: s_n=0.499994, s_Z=0.499994
[00000.4] feedback_props_init: Feedback energy fraction pivots: Z_0=0.001266, n_0_cgs=1.458800
[00012.5] read_cooling_tables: Done reading in general cooling table
[00012.5] cooling_print_backend: Cooling function is 'COLIBRE'.
[00012.5] starformation_print_backend: Star formation model is EAGLE
[00012.5] starformation_print_backend: Density threshold uses subgrid quantities
[00012.5] starformation_print_backend: Particles are star-forming if their properties obey (T_sub < 1.000000e+03 K OR (T_sub < 3.162200e+04 K AND n_H,sub > 1.000000e+01 cm^-3))
[00012.5] starformation_print_backend: Star formation law is a pressure law (Schaye & Dalla Vecchia 2008):
[00012.5] starformation_print_backend: With properties: normalization = 1.515000e-04 Msun/kpc^2/yr, slope of theKennicutt-Schmidt law = 1.400000e+00 and gas fraction = 1.000000e+00
[00012.5] starformation_print_backend: At densities of 1.000000e+03 H/cm^3 the slope changes to 2.000000e+00.
[00012.5] starformation_print_backend: Running with a direct conversion density of: 3.402823e+38 #/cm^3
[00012.5] chemistry_print_backend: Chemistry model is 'EAGLE' tracking 9 elements.
[00012.5] main: Reading ICs from file 'ICs.hdf5'
[00012.5] io_read_unit_system: Reading IC units from ICs.
[00012.5] read_ic_single: Conversion needed from:
[00012.5] read_ic_single: (ICs) Unit system: U_M =      1.988480e+43 g.
[00012.5] read_ic_single: (ICs) Unit system: U_L =      3.085678e+21 cm.
[00012.5] read_ic_single: (ICs) Unit system: U_t =      3.085678e+16 s.
[00012.5] read_ic_single: (ICs) Unit system: U_I =      1.000000e+00 A.
[00012.5] read_ic_single: (ICs) Unit system: U_T =      1.000000e+00 K.
[00012.5] read_ic_single: to:
[00012.5] read_ic_single: (internal) Unit system: U_M = 1.988480e+43 g.
[00012.5] read_ic_single: (internal) Unit system: U_L = 3.085660e+21 cm.
[00012.5] read_ic_single: (internal) Unit system: U_t = 3.085660e+16 s.
[00012.5] read_ic_single: (internal) Unit system: U_I = 1.000000e+00 A.
[00012.5] read_ic_single: (internal) Unit system: U_T = 1.000000e+00 K.
[00012.5] ic_info_read_hdf5: Metadata group ICs_parameters not found in ICs file
[00029.4] main: Reading initial conditions took 16905.954 ms.
[00030.6] part_verify_links: All links OK
[00030.6] part_verify_links: took 962.443 ms.
[00030.6] main: Read 13035246 gas particles, 0 sink particles, 93750 star particles, 1 black hole particles, 0 DM particles, 0 DM background particles, and 0 neutrino DM particles from the ICs.
[00030.6] space_init: Imposing a star smoothing length of 1.050000e+00
[00032.0] space_regrid: (re)griding space cdim=(8 8 8)
[00032.4] main: space_init took 1772.770 ms.
[00032.5] potential_print_backend: External potential is 'NFW' with properties are (x,y,z) = (2.054982e+04, 2.054982e+04, 2.054982e+04), scale radius = 5.157065e+02 timestep multiplier = 1.500000e-02, mintime = 6.672009e-04
[00032.5] potential_print_backend: Properties of the halo M200 = 1.000000e+05, R200 = 2.062826e+03, c = 4.000000e+00
[00032.5] main: space dimensions are [ 41099.648 41099.648 41099.648 ].
[00032.5] main: space isn't periodic.
[00032.5] main: highest-level cell dimensions are [ 8 8 8 ].
[00032.5] main: 13035246 parts in 512 cells.
[00032.5] main: 13128997 gparts in 512 cells.
[00032.5] main: 0 sinks in 512 cells.
[00032.5] main: 93750 sparts in 512 cells.
[00032.5] main: 1 bparts in 512 cells.
[00032.5] main: maximum depth is 0.
[00032.5] engine_init: took 0.324 ms.
[00032.5] engine_config: Running simulation 'IsolatedGalaxy-EAGLE-Ref'.
[00032.5] engine_config: prefer NUMA-distant CPUs
[00032.5] engine_init: cpu map is [ 0 8 1 9 2 10 3 11 4 12 5 13 6 14 7 15 16 24 17 25 18 26 19 27 20 28 21 29 22 30 23 31 ].
[00034.0] engine_policy: engine policies are [  'steal'  'keep'  'numa affinity'  'hydro'  'self gravity'  'external gravity'  'cooling'  'stars'  'star formation'  'feedback'  'black holes'  'time-step limiter'  'time-step sync'  ]
[00034.0] eos_print: Equation of state: Ideal gas.
[00034.0] eos_print: Adiabatic index gamma: 1.666667.
[00034.0] pressure_floor_print: Pressure floor is 'none'
[00034.0] hydro_props_print: Hydrodynamic scheme: SPHENIX (Borrow+ 2020) in 3D.
[00034.0] hydro_props_print: Hydrodynamic kernel: Wendland C2 with eta=1.234800 (57.27 neighbours).
[00034.0] hydro_props_print: Hydrodynamic relative tolerance in h: 0.00010 (+/- 0.0172 neighbours).
[00034.0] hydro_props_print: Hydrodynamic integration: CFL parameter: 0.2000.
[00034.0] hydro_props_print: Hydrodynamic integration: Max change of volume: 1.40 (max|dlog(h)/dt|=0.112157).
[00034.0] hydro_props_print: Neighbour number definition: Unweighted.
[00034.0] hydro_props_print: Maximal smoothing length allowed: 225.0000
[00034.0] hydro_props_print: Maximal time-bin difference between neighbours: 2
[00034.0] hydro_props_print: Minimal gas temperature set to 100.000000
[00034.0] hydro_props_print: No particle splitting
[00034.0] viscosity_print: Artificial viscosity parameters set to alpha: 0.100, max: 2.000, min: 0.000, length: 0.050.
[00034.0] diffusion_print: Artificial diffusion parameters set to alpha: 0.000, max: 1.000, min: 0.000, beta: 1.000.
[00034.0] entropy_floor_print: Entropy floor is 'EAGLE' with:
[00034.0] entropy_floor_print: Jeans limiter with slope n=1.333 at rho=3.286268e-07 (1.000000e-04 H/cm^3) and T=800.0 K
[00034.0] entropy_floor_print:  Cool limiter with slope n=1.000 at rho=3.286268e-08 (1.000000e-05 H/cm^3) and T=10.0 K
[00034.0] gravity_props_print: Self-gravity scheme: With per-particle softening
[00034.0] gravity_props_print: Self-gravity scheme: FMM-MM with m-poles of order 4
[00034.0] gravity_props_print: Self-gravity time integration: eta=0.0250
[00034.0] gravity_props_print: Self-gravity opening angle scheme:  fixed
[00034.0] gravity_props_print: Self-gravity opening angle:  theta_cr=0.7000
[00034.0] gravity_props_print: Self-gravity softening functional form: Wendland-C2
[00034.0] gravity_props_print: Self-gravity DM comoving softening: epsilon=3.150000 (Plummer equivalent: 1.050000)
[00034.0] gravity_props_print: Self-gravity DM maximal physical softening:    epsilon=3.150000 (Plummer equivalent: 1.050000)
[00034.0] gravity_props_print: Self-gravity baryon comoving softening: epsilon=3.150000 (Plummer equivalent: 1.050000)
[00034.0] gravity_props_print: Self-gravity baryon maximal physical softening:    epsilon=3.150000 (Plummer equivalent: 1.050000)
[00034.0] gravity_props_print: Self-gravity neutrino DM comoving softening: epsilon=0.000000 (Plummer equivalent: 0.000000)
[00034.0] gravity_props_print: Self-gravity neutrino DM maximal physical softening:    epsilon=0.000000 (Plummer equivalent: 0.000000)
[00034.0] gravity_props_print: Self-gravity mesh side-length: N=0
[00034.0] gravity_props_print: Self-gravity mesh smoothing-scale: a_smooth=0.000000
[00034.0] gravity_props_print: Self-gravity distributed mesh enabled: 0
[00034.0] gravity_props_print: Self-gravity tree cut-off ratio: r_cut_max=0.000000
[00034.0] gravity_props_print: Self-gravity truncation cut-off ratio: r_cut_min=0.000000
[00034.0] gravity_props_print: Self-gravity mesh truncation function: Gadget-like (using erfc())
[00034.0] gravity_props_print: Self-gravity tree update frequency: f=0.010000
[00034.0] stars_props_print: Stars kernel: Wendland C2 with eta=1.164200 (48.00 neighbours).
[00034.0] stars_props_print: Stars relative tolerance in h: 0.00700 (+/- 1.0150 neighbours).
[00034.0] stars_props_print: Stars integration: Max change of volume: 1.40 (max|dlog(h)/dt|=0.112157).
[00034.0] stars_props_print: Maximal iterations in ghost task set to 30
[00034.0] stars_props_print: Stars' birth time read from the ICs will be overwritten to -1.000000
[00034.0] stars_props_print: Stars' age threshold for unlimited dt: 0.000000e+00 [U_t]
[00034.0] stars_props_print: Stars' young/old age threshold: 1.022718e-02 [U_t]
[00034.0] stars_props_print: Max time-step size of young stars: 1.022718e-04 [U_t]
[00034.0] stars_props_print: Max time-step size of old stars: 1.022718e-03 [U_t]
[00034.0] engine_config: Absolute minimal timestep size: 1.110223e-16
[00034.0] engine_config: Minimal timestep size (on time-line): 7.105427e-15
[00034.0] engine_config: Maximal timestep size (on time-line): 7.812500e-03
[00034.3] engine_config: Restarts will be dumped every 4.000000 hours
[00034.3] engine_config: Using 8 threads in the thread-pool
[00034.3] engine_config: took 1832.496 ms.
[00034.3] main: Running on 13035246 gas particles, 0 sink particles, 93750 stars particles 1 black hole particles, 0 neutrino particles, and 0 DM particles (13128997 gravity particles)
[00034.3] main: from t=0.000e+00 until t=1.600e+01 with 1 ranks, 8 threads / rank and 8 task queues / rank (dt_min=1.000e-14, dt_max=1.000e-02)...
[00034.3] engine_init_particles: Setting particles to a valid state...
[00035.2] engine_init_particles: Computing initial gas densities and approximate gravity.
[00035.2] space_rebuild: (re)building space
[00279.8] engine_init_particles: Converting internal energy variable.
[00280.1] engine_init_particles: Running initial fake time-step.
[00280.1] space_rebuild: (re)building space
[00466.6] ./feedback/EAGLE_thermal/feedback.h:feedback_prepare_feedback():211: Evolving a star particle that should not!
/var/slurm/slurmd/job4900398/slurm_script: line 37: 19234 Aborted                 /cosma/home/durham/dc-husk1/SWIFT_spin_bh_new/swiftsim/examples/swift --stars --star-formation --feedback --external-gravity --self-gravity --hydro --cooling --black-holes --threads=8 --limiter --sync --pin isolated_galaxy.yml

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

Can you point me to the example on cosma? Or, better, to a smaller one that has the same issue?

from swift.

FHusko avatar FHusko commented on June 7, 2024

It turns out I had my changes to black hole physics incorrectly implemented in the newer version. Now that I corrected it, the error no longer appears. The weird thing was that the error was related to thermal feedback from stars, and none of my changes to black hole physics relate at all to stars. So I hadn't thought that that may even be an issue.

But sorry for wasting your time with that! The big run is now going happily. With intel_mpi/2020 I was getting an inexplicable MPI error after 2-3 days. I am now using intel_mpi/2018 and the newer version of SWIFT. Hopefully that will avoid the problem. But if I get it again, should I report it here or open a new issue?

from swift.

MatthieuSchaller avatar MatthieuSchaller commented on June 7, 2024

Ok, so everything solved?

intel_mpi/2020 is buggy, so that's not a swift problem. Not much we can do about it unfortunately.
On cosma, use either intel-mpi 2018 or Openmpi 4.x.y.

from swift.

FHusko avatar FHusko commented on June 7, 2024

Yes, everything solved in terms of this problem. Thanks!

from swift.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.