amrex-astro / microphysics Goto Github PK
View Code? Open in Web Editor NEWcommon astrophysical microphysics routines with interfaces for the different AMReX codes
Home Page: https://amrex-astro.github.io/Microphysics
License: Other
common astrophysical microphysics routines with interfaces for the different AMReX codes
Home Page: https://amrex-astro.github.io/Microphysics
License: Other
We should have an eos_finalize()
and actual_eos_finalize()
functionality.
It would be fun to order pizza for those of us participating in the mini-hackathon.
I'd be happy to chip in $10.
we scale based on abs(y) + dt abs(ydot)
, but shouldn't we try abs(y + dt*ydot)
too? maybe scaling_method
= 3?
We should investigate making a tabular EOS in (rho, e) -- this would be especially useful for SDC. Perhaps the thing to tabulate is entropy, then we can express it in terms of rho, e and get p, T via partial derivatives.
Applying the temp_scale
and ener_scale
to the Jacobian elements after they are filled doesn't seem right for the derivative wrt T. E.g., we do:
bs % jac(net_itemp,:) = bs % jac(net_itemp,:) * inv_temp_scale
but that shouldn't apply to bs % jac(net_itemp, net_itemp)
the current value of ode_scale_floor means that trace abundances have essentially no input of the error and convergence. We should try to link ode_scale_floor to small_x somehow
In particular, there are no tests of anything other than burning_mode = 1
Also need to check if we have coverage of do_constant_volume_burn
The size of the system allocated in the BS
actual_integrator_sdc.F90
is SVAR
but it shouldn't it really be SVAR_EVOLVE
? This affects, for example, the tolerances.
The SDC integrators don't currently handle how we update species where nspec_evolve
< nspec
. Siince these still have advective terms, we still would need to do some integration. But the current update_unevolved_species
mechanism probably is not enough
We don't have any SDC unit tests in the suite
We should create a microphysics-wide rate tablulation module that operates on the T-dependence of the reaction rates
None of the tests exercise use_eos_in_rhs
or the dT_crit
options
This gives fairly accurate results relative to the direct rate evaluation method, but is much faster on CPUs and essentially necessary for GPUs.
This can be done by setting use_tables to .true. in the aprox13/_parameters file.
to allow the code to be used by stuff other than AMReX codes, we should write wrappers for things like bl_error
, etc.
We want to make VODE work with the SDC interface. Unlike the BS
integrator, there is no VODE analog to the bs_t
type. We need to do the following:
we need to create a version of vode_type.F90
for SDC.
This will need to have have a clean_state
the fixes up the internal energy, a fill_unevolved_variables
routine,
There will be no update_thermodynamics
routine.
there are no vode_to_eos
or eos_to_vode
routines
we need vode_to_sdc
and sdc_to_vode
routines
the general rpar.F90
that lives in integrator/
will need some different components -- see the BS/
version as comparison. In particular, it will need the rho and momentum indices. We probably will also need to store the advective sources here.
With AMReX coming online, we need to decouple these routines from the boxlib AMReX dependency.
The main place this comes in is through calls to bl_error
and using bl_constants_module
.
We can instead provide a microphysics_error
and microphysics_constants
. These can simply wrap the BoxLib or AMReX routines, assuming that they provide the necessary info. We need to then have a build-time way of letting Microphysics know which of the libraries to link in.
We use esum()
to do exact sums of specific terms in the RHS of the ODEs to prevent roundoff. But esum()
is slow. At the moment, we have a general routine with a large max_esum_size
-- this also causes trouble on the GPUs.
We should experiment with creating specific esum()
routines for the number of terms involved, e.g., esum3()
, esum4()
, esum5()
, ...
We know this ahead of time, since we are explicitly calling esum()
on specific combination of terms in rate equations.
the rate caches should be removed from the burn_t
If we encounter a situation where the Coulomb corrections make the pressure, energy, or entropy negative, we simply turn them off now. We should smoothly bring them to zero to prevent discontinuous derivatives.
We should modify the bdf_t to include a burn_t directly, eliminating much of the work done in bdf_to_burn. This will mirror what is done with BS.
We have no GPU tests in the PGI test suite. We should pick some basic tests to add coverage.
At the moment, the EOS returns all possible thermodynamic quantities, but sometimes we don't need all of these. We should create and EXTRA_THERMO
preprocessor flag that will turn off some of the less-needed quantities. This also should be hooked into the eos_t
type in the application codes.
A table has been started to keep track of which integrators are able to integrate different networks on the CPU (space is also available for a similar table for the GPU, but isn't populated yet. We should work on the CPU before trying to work on the GPU anyway).
This issue addresses VBDF failures on the CPU. As the table shows, VBDF fails for the aprox13
and aprox19
networks using the configuration and input found in the unit test.
I'm currently comparing the integration of VBDF with VODE, which in theory implement the same algorithms. For aprox19
I've isolated the cell that fails for VBDF, which VODE seems fine with. I'm currently working to find where the algorithms deviate such that VODE is able to converge to a result while VBDF is not.
VODE's clean_state should include the calls to renormalize_species like BS does. Also, VODE doesn't have a check to ensure that the temperature stays reasonable, like BS does.
This comes out of discussions with Sam Jones, Aron Michel, and @carlnotsagan
We should implement the neutrino losses from the weak reactions. This would mean keeping track of each reaction and what the actual Q value is (subtracting neutrino losses), and evolving an enuc equation that used these Q values.
From Sam:
I think we estimated the neutrino energy losses, and even though they were smaller than
I had expected, I agree that they're still important.
...
The way I would implement it would be to introduce the Q value (binding energy difference
between products and reactants) for each reaction, and additionally a Q_neu for the weak
reactions, which is the average neutrino energy per reaction, Q_neu = eps_neu/lambda,
where eps_neu and lambda are the neutrino luminosity [MeV/s] and the rate [/s] from the
LMP tables, respectively. Q_neu is of course 0 for the reactions involving the strong
nuclear force. Then the energy generation is the sum of the number of times a reaction
takes place multiplied by (Q-Qneu).
When playing with OpenACC, there were compiler issues with Fortran parameters on GPUs. We got rid of the parameters to make things play nice. With our new CUDA methodology, we should go back to parameters. E.g., in helmholtz/actual_eos.F90
, the variable pi
from Sam:
I found a bug in your implementation of approx21 in BoxLib. The
jacobian is fine but the rhss do not include terms for fe56 and cr56
(i.e. they are zero). Looks like it was copied from approx19 but not
modified for approx21.
Testing has shown system implementations of BLAS are much more efficient than compiling in BLAS ourselves. We should switch the CUDA version of VODE90 to use cuBLAS and check performance.
In particular, since cuBLAS calls require an on-device kernel launch, it will be interesting to see whether the overall performance gains from cuBLAS are worthwhile.
Some networks use He4
others he4
-- we should be consistent
In bs_type_sdc
we dimension:
real(kind=dp_t) :: u(n_rpar_comps), u_init(n_rpar_comps), udot_a(n_rpar_comps)
but these should really be dimensioned as SVAR-SVAR_EVOLVE
This should be done via a call to eos_get_max_temp so that the user's probin variables get used properly.
Currently this would be inconsistent with some EoS tables that are pre-generated with a specific set of constants, but could still be useful in the long-run for separate codes to have a shared set of constants.
We should completely remove the integrate_molar_fraction
option and instead rely on networks
to always return stuff in terms of dX/dt. This will cut back on the complexity of the code a lot, eliminating a lot of unnecessary conversions
The helmholtz EOS can represent a significant computational cost. We could consider vectorizing it.
We ask for species to be evolved to a tolerance of 1.d-12 (in integration/_parameters
).
This is pretty tight. We need to check whether it can be relaxed. We can relax on a network-by-network basis (using priorities in the _parameter
files).
It seems that the original aprox13 and aprox19 networks used tolerances of 1.e-6
Max has suggested we profile the SDC integration to determine how expensive the EOS calls really are.
The motivation for this is that the EOS calls use rho, e as input variables and it may be worthwhile to think about how to formulate T integration source terms so we could use rho, T as input variables to the EOS instead.
The cost of the EOS should be more apparent using tabulated rates, so this is related to issue #12
Consider using the new rate from this compilation:
https://journals.aps.org/rmp/abstract/10.1103/RevModPhys.89.035007
@carlnotsagan can advise us :)
At the moment, we have two VODE-style integrators, VODE and VODE90.
Earlier testing indicates they yield identical integration answers but there may be performance differences. We should compare the performance of each and determine whether it is worth switching to VODE90.
Issues to consider:
Since MAESTRO is now moving to the C++ AMReX, we should migrate the test suite drivers in Microphysics to use the C++ AMReX as well.
A good starting point is test_react
in Castro.
VODE90 currently uses linpack (since that's what VODE used originally). We should switch it over to lapack so we can use system-optimized lapack routines. This shows the lapack equivalent functions:
http://www.netlib.org/lapack/lug/node147.html
(note, the table has routines starting with s
for single precision, but ours, of course, use d
)
We should add unit_tests/burn_cell/
to the Docs
we should move the stellar conductivity routine from Maestro to here
Many of the results from GPU-accelerated unit-test code appear to be wrong. As a concrete example, I've built an accelerated and CPU-only executable of the test_react
unit test.
Build and execute an accelerated binary, move output for later comparison (note that I've supressed the output of commands):
cd $MICROPHYSICS_HOME/unit_test/test_react
make COMP=PGI NETWORK_DIR=ignition_simple ACC=t -j6
./main.Linux.PGI.acc.exe inputs_ignition.BS
mv react_ignition_test_react.BS react_ignition_test_react.BS.ACC
Build and execute a CPU-only binary:
make COMP=PGI NETWORK_DIR=ignition_simple -j6
./main.Linux.PGI.exe inputs_ignition.BS
If I now compare the two output files, we see they're very different:
fcompare.Linux.gfortran.exe --infile1 react_ignition_test_react.BS --infile2 react_ignition_test_react.BS.ACC
variable name absolute error relative error
(||A - B||) (||A - B||/||A||)
----------------------------------------------------------------------
level = 1
density 0.2384185791E-06 0.1192092896E-15
temperature 0.6854534149E-06 0.9792191642E-15
Xnew_carbon-12 0.9999999997 0.9999999999
Xnew_oxygen-16 0.7999999999 0.9999999999
Xnew_magnesium-24 0.9999999997 9.999436761
Xold_carbon-12 0.9999999997 0.9999999999
Xold_oxygen-16 0.7999999999 0.9999999999
Xold_magnesium-24 0.9999999997 9.999999997
wdot_carbon-12 0.2812178371E-03 1.000000000
wdot_oxygen-16 0.1110223025E-14 1.000000000
wdot_magnesium-24 0.2812178371E-03 1.000000000
rho_Hnuc 0.3150192097E+24 1.000000000
So while many networks and integrators seem to be able to compile and run without crashing, it's not clear how many are generating correct physical results. I've seen a similar issue with the VBDF integrator, so it doesn't appear to be specific to an integrator or network. These results are from bender
, which has PGI 16.9 and a GeForce GTX 960 GPU (with CUDA 8.0 drivers and CUDA 7.5 compilers).
The BS integrator does not allow for different tolerances on each component, like we do with VODE. We should generalize it so that we can specify a separate rtol for each integration variable.
The basic GPU test described in Issue #15 fails. On my local machine, I get
[ajacobs@xrb test](development *)$ ./testburn.Linux.PGI.acc.exe
Initializing Helmholtz EOS and using Coulomb corrections.
FATAL ERROR: data in update device clause was not found on device 1: name=pi
file:/home/ajacobs/Codebase/Microphysics/networks/ignition_simple/test/../../../EOS/helmholtz/actual_eos.F90 actual_eos_init line:1327
On Stony Brook's bender
, I get what may be an error in the system configuration:
[ajacobs@bender test](development)$ ./testburn.Linux.PGI.acc.exe
Initializing Helmholtz EOS and using Coulomb corrections.
modprobe: FATAL: Module nvidia-uvm not found in directory /lib/modules/4.7.5-200.fc24.x86_64
call to cuInit returned error 999: Unknown
The error happens with and without debug symbols.
The error seems to be saying pi
isn't initialized, but in actual_eos.F90
it is declared. I'm investigating the error now.
Not all the networks need the same amount of rate storage. num_rate_groups
should be defined on a network-by-network basis
Building test_react
with
make COMP=PGI NDEBUG= OMP= NETWORK_DIR=ignition_simple INTEGRATOR_DIR=VBDF ACC=t
Errors like the following come up:
ptxas /tmp/pgaccBw5JrAtcYokR.ptx, line 1842; fatal : Parsing error near '-': syntax error
ptxas fatal : Ptx assembly aborted due to errors
PGF90-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (../../integration/VBDF/actual_integrator.F90: 1)
0 inform, 0 warnings, 1 severes, 0 fatal for
make: *** [t/Linux.PGI.debug.acc/o/actual_integrator.o] Error 2
make: *** Waiting for unfinished jobs....
Through commenting out and slowly uncommenting, I've traced at least one triggering of the error to a derived type assignment in Microphysics/integration/VBDF/actual_integrator.F90
in the initial_timestep()
subroutine: ts_temp = ts
.
However, after writing and using a copy subroutine for bdf_ts
types, the error continues. It seems any use of ts_temp
triggers the error, even ts_temp%neq = 1
.
we now have the ability to use VODE or BS as the backup for integration -- this needs to be documented.
To my knowledge, the PGI test suite doesn't do any tests that utilize the GPU. I recommend adding a test using the ignition_simple
network, BS
integrator, and the GPU (ACC=t
). Something along the lines of:
$ cd $MICROPHYSICS_HOME/networks/ignition_simple/test
$ make COMP=PGI ACC=t
$ ./testburn.Linux.PGI.debug.acc.exe
Something like this should serve as a minimal verification that basic GPU code is working. As more integrators and/or networks are robustly utilizing the GPU, we can add similar tests to test them (in this case, the default GNUMakefile has already chosen the BS
integrator for us).
If integration failed and we reset to the initial state to try again, we need to reset T_old
and the cv/cp too, for consistency. Perhaps this would be easier with a bs_init
variable so we can just do bs = bs_init
and go.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.