Code Monkey home page Code Monkey logo

amrex-astro / microphysics Goto Github PK

View Code? Open in Web Editor NEW
32.0 11.0 32.0 127.75 MB

common astrophysical microphysics routines with interfaces for the different AMReX codes

Home Page: https://amrex-astro.github.io/Microphysics

License: Other

Makefile 0.30% Fortran 0.02% Python 1.46% Shell 0.01% C++ 97.80% CSS 0.01% HTML 0.01% CMake 0.19% Batchfile 0.08% BrighterScript 0.01% Hack 0.12%
equation-of-state reactions nuclear-reactions conductivity stars microphysics-routines

microphysics's People

Contributors

abigailbishop avatar adam-m-jcbs avatar aisclark91 avatar ajnonaka avatar asalmgren avatar benwibking avatar biboyd avatar cmsquared avatar dependabot[bot] avatar doreenfan avatar dwillcox avatar harpolea avatar jaharris87 avatar jmsexton03 avatar kissformiss avatar maxpkatz avatar psharda avatar shardi2 avatar simonguichandut avatar weiqunzhang avatar xinlongsbu avatar yut23 avatar zhichen3 avatar zingale avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microphysics's Issues

add eos_finalize

We should have an eos_finalize() and actual_eos_finalize() functionality.

BS scaling method

we scale based on abs(y) + dt abs(ydot), but shouldn't we try abs(y + dt*ydot) too? maybe scaling_method = 3?

make a tabular EOS in terms of (rho, e)

We should investigate making a tabular EOS in (rho, e) -- this would be especially useful for SDC. Perhaps the thing to tabulate is entropy, then we can express it in terms of rho, e and get p, T via partial derivatives.

scaling of Jacobian elements is not right

Applying the temp_scale and ener_scale to the Jacobian elements after they are filled doesn't seem right for the derivative wrt T. E.g., we do:

bs % jac(net_itemp,:) = bs % jac(net_itemp,:) * inv_temp_scale               

but that shouldn't apply to bs % jac(net_itemp, net_itemp)

BS SDC uses SVAR instead of SVAR_EVOLVE

The size of the system allocated in the BS actual_integrator_sdc.F90 is SVAR but it shouldn't it really be SVAR_EVOLVE? This affects, for example, the tolerances.

SDC integrators don't support nspec_evolve != nspec

The SDC integrators don't currently handle how we update species where nspec_evolve < nspec. Siince these still have advective terms, we still would need to do some integration. But the current update_unevolved_species mechanism probably is not enough

For aprox13, rate tabulation should be the default

This gives fairly accurate results relative to the direct rate evaluation method, but is much faster on CPUs and essentially necessary for GPUs.

This can be done by setting use_tables to .true. in the aprox13/_parameters file.

make a VODE SDC integrator

We want to make VODE work with the SDC interface. Unlike the BS integrator, there is no VODE analog to the bs_t type. We need to do the following:

  • we need to create a version of vode_type.F90 for SDC.

    • This will need to have have a clean_state the fixes up the internal energy, a fill_unevolved_variables routine,

    • There will be no update_thermodynamics routine.

    • there are no vode_to_eos or eos_to_vode routines

    • we need vode_to_sdc and sdc_to_vode routines

  • the general rpar.F90 that lives in integrator/ will need some different components -- see the BS/ version as comparison. In particular, it will need the rho and momentum indices. We probably will also need to store the advective sources here.

decouple from amrex

With AMReX coming online, we need to decouple these routines from the boxlib AMReX dependency.

The main place this comes in is through calls to bl_error and using bl_constants_module.

We can instead provide a microphysics_error and microphysics_constants. These can simply wrap the BoxLib or AMReX routines, assuming that they provide the necessary info. We need to then have a build-time way of letting Microphysics know which of the libraries to link in.

esum is slow

We use esum() to do exact sums of specific terms in the RHS of the ODEs to prevent roundoff. But esum() is slow. At the moment, we have a general routine with a large max_esum_size -- this also causes trouble on the GPUs.

We should experiment with creating specific esum() routines for the number of terms involved, e.g., esum3(), esum4(), esum5(), ...

We know this ahead of time, since we are explicitly calling esum() on specific combination of terms in rate equations.

vbdf should carry its own burn_t

We should modify the bdf_t to include a burn_t directly, eliminating much of the work done in bdf_to_burn. This will mirror what is done with BS.

create an EXTRA_THERMO preprocessor

At the moment, the EOS returns all possible thermodynamic quantities, but sometimes we don't need all of these. We should create and EXTRA_THERMO preprocessor flag that will turn off some of the less-needed quantities. This also should be hooked into the eos_t type in the application codes.

VBDF fails for some networks on CPU

A table has been started to keep track of which integrators are able to integrate different networks on the CPU (space is also available for a similar table for the GPU, but isn't populated yet. We should work on the CPU before trying to work on the GPU anyway).

This issue addresses VBDF failures on the CPU. As the table shows, VBDF fails for the aprox13 and aprox19 networks using the configuration and input found in the unit test.

I'm currently comparing the integration of VBDF with VODE, which in theory implement the same algorithms. For aprox19 I've isolated the cell that fails for VBDF, which VODE seems fine with. I'm currently working to find where the algorithms deviate such that VODE is able to converge to a result while VBDF is not.

add neutrino losses to aprox13

This comes out of discussions with Sam Jones, Aron Michel, and @carlnotsagan

We should implement the neutrino losses from the weak reactions. This would mean keeping track of each reaction and what the actual Q value is (subtracting neutrino losses), and evolving an enuc equation that used these Q values.

From Sam:

I think we estimated the neutrino energy losses, and even though they were smaller than 
I had expected, I agree that they're still important.
...
The way I would implement it would be to introduce the Q value (binding energy difference 
between products and reactants) for each reaction, and additionally a Q_neu for the weak 
reactions, which is the average neutrino energy per reaction, Q_neu = eps_neu/lambda, 
where eps_neu and lambda are the neutrino luminosity [MeV/s] and the rate [/s] from the 
LMP tables, respectively. Q_neu is of course 0 for the reactions involving the strong 
nuclear force. Then the energy generation is the sum of the number of times a reaction 
takes place multiplied by (Q-Qneu).

reintroduce parameters into helmholtz/actual_eos.F90

When playing with OpenACC, there were compiler issues with Fortran parameters on GPUs. We got rid of the parameters to make things play nice. With our new CUDA methodology, we should go back to parameters. E.g., in helmholtz/actual_eos.F90, the variable pi

aprox21 missing rates (reported by Sam Jones)

from Sam:

I found a bug in your implementation of approx21 in BoxLib. The
jacobian is fine but the rhss do not include terms for fe56 and cr56
(i.e. they are zero). Looks like it was copied from approx19 but not
modified for approx21.

Switch CUDA VODE90 to use cuBLAS

Testing has shown system implementations of BLAS are much more efficient than compiling in BLAS ourselves. We should switch the CUDA version of VODE90 to use cuBLAS and check performance.

In particular, since cuBLAS calls require an on-device kernel launch, it will be interesting to see whether the overall performance gains from cuBLAS are worthwhile.

BS SDC dimensioning

In bs_type_sdc we dimension:

     real(kind=dp_t) :: u(n_rpar_comps), u_init(n_rpar_comps), udot_a(n_rpar_comps)

but these should really be dimensioned as SVAR-SVAR_EVOLVE

Include fundamental constants in this repo?

Currently this would be inconsistent with some EoS tables that are pre-generated with a specific set of constants, but could still be useful in the long-run for separate codes to have a shared set of constants.

remove integrate_molar_fraction option

We should completely remove the integrate_molar_fraction option and instead rely on networks
to always return stuff in terms of dX/dt. This will cut back on the complexity of the code a lot, eliminating a lot of unnecessary conversions

Vectorize helmholtz EOS!

The helmholtz EOS can represent a significant computational cost. We could consider vectorizing it.

need to reevaluate the tolerances

We ask for species to be evolved to a tolerance of 1.d-12 (in integration/_parameters).

This is pretty tight. We need to check whether it can be relaxed. We can relax on a network-by-network basis (using priorities in the _parameter files).

It seems that the original aprox13 and aprox19 networks used tolerances of 1.e-6

Profile the SDC implementation in the Microphysics integrators

Max has suggested we profile the SDC integration to determine how expensive the EOS calls really are.

The motivation for this is that the EOS calls use rho, e as input variables and it may be worthwhile to think about how to formulate T integration source terms so we could use rho, T as input variables to the EOS instead.

The cost of the EOS should be more apparent using tabulated rates, so this is related to issue #12

Compare VODE to VODE90. Determine if a switch to VODE90 is in order.

At the moment, we have two VODE-style integrators, VODE and VODE90.

Earlier testing indicates they yield identical integration answers but there may be performance differences. We should compare the performance of each and determine whether it is worth switching to VODE90.

Issues to consider:

  • VODE90's use of derived types may slow down the GPU. We may need to refactor a bit to eliminate derived types.
  • VODE90 seems a bit slower than VODE, but we should check this and figure out why.

Migrate Test Suite to C++

Since MAESTRO is now moving to the C++ AMReX, we should migrate the test suite drivers in Microphysics to use the C++ AMReX as well.

A good starting point is test_react in Castro.

Bad GPU results

Many of the results from GPU-accelerated unit-test code appear to be wrong. As a concrete example, I've built an accelerated and CPU-only executable of the test_react unit test.

Build and execute an accelerated binary, move output for later comparison (note that I've supressed the output of commands):

cd $MICROPHYSICS_HOME/unit_test/test_react
make COMP=PGI NETWORK_DIR=ignition_simple ACC=t -j6
./main.Linux.PGI.acc.exe inputs_ignition.BS
mv react_ignition_test_react.BS react_ignition_test_react.BS.ACC

Build and execute a CPU-only binary:

make COMP=PGI NETWORK_DIR=ignition_simple -j6
./main.Linux.PGI.exe inputs_ignition.BS

If I now compare the two output files, we see they're very different:

fcompare.Linux.gfortran.exe --infile1 react_ignition_test_react.BS --infile2 react_ignition_test_react.BS.ACC

            variable name            absolute error            relative error
                                        (||A - B||)         (||A - B||/||A||)
 ----------------------------------------------------------------------
 level =  1
 density                           0.2384185791E-06          0.1192092896E-15
 temperature                       0.6854534149E-06          0.9792191642E-15
 Xnew_carbon-12                    0.9999999997              0.9999999999    
 Xnew_oxygen-16                    0.7999999999              0.9999999999    
 Xnew_magnesium-24                 0.9999999997               9.999436761    
 Xold_carbon-12                    0.9999999997              0.9999999999    
 Xold_oxygen-16                    0.7999999999              0.9999999999    
 Xold_magnesium-24                 0.9999999997               9.999999997    
 wdot_carbon-12                    0.2812178371E-03           1.000000000    
 wdot_oxygen-16                    0.1110223025E-14           1.000000000    
 wdot_magnesium-24                 0.2812178371E-03           1.000000000    
 rho_Hnuc                          0.3150192097E+24           1.000000000 

So while many networks and integrators seem to be able to compile and run without crashing, it's not clear how many are generating correct physical results. I've seen a similar issue with the VBDF integrator, so it doesn't appear to be specific to an integrator or network. These results are from bender, which has PGI 16.9 and a GeForce GTX 960 GPU (with CUDA 8.0 drivers and CUDA 7.5 compilers).

BS integrator uses a single rtol

The BS integrator does not allow for different tolerances on each component, like we do with VODE. We should generalize it so that we can specify a separate rtol for each integration variable.

Basic GPU test fails

The basic GPU test described in Issue #15 fails. On my local machine, I get

[ajacobs@xrb test](development *)$ ./testburn.Linux.PGI.acc.exe

 Initializing Helmholtz EOS and using Coulomb corrections.

FATAL ERROR: data in update device clause was not found on device 1: name=pi
 file:/home/ajacobs/Codebase/Microphysics/networks/ignition_simple/test/../../../EOS/helmholtz/actual_eos.F90 actual_eos_init line:1327

On Stony Brook's bender, I get what may be an error in the system configuration:

[ajacobs@bender test](development)$ ./testburn.Linux.PGI.acc.exe 

 Initializing Helmholtz EOS and using Coulomb corrections.

modprobe: FATAL: Module nvidia-uvm not found in directory /lib/modules/4.7.5-200.fc24.x86_64
call to cuInit returned error 999: Unknown

The error happens with and without debug symbols.

The error seems to be saying pi isn't initialized, but in actual_eos.F90 it is declared. I'm investigating the error now.

OpenACC F90 test_react w/ ignition_simple & VBDF giving ptx errors

Building test_react with

make COMP=PGI NDEBUG= OMP= NETWORK_DIR=ignition_simple INTEGRATOR_DIR=VBDF ACC=t

Errors like the following come up:

ptxas /tmp/pgaccBw5JrAtcYokR.ptx, line 1842; fatal   : Parsing error near '-': syntax error
ptxas fatal   : Ptx assembly aborted due to errors
PGF90-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (../../integration/VBDF/actual_integrator.F90: 1)
  0 inform,   0 warnings,   1 severes, 0 fatal for 
make: *** [t/Linux.PGI.debug.acc/o/actual_integrator.o] Error 2
make: *** Waiting for unfinished jobs....

Through commenting out and slowly uncommenting, I've traced at least one triggering of the error to a derived type assignment in Microphysics/integration/VBDF/actual_integrator.F90 in the initial_timestep() subroutine: ts_temp = ts.

However, after writing and using a copy subroutine for bdf_ts types, the error continues. It seems any use of ts_temp triggers the error, even ts_temp%neq = 1.

test suite should include some GPU tests

To my knowledge, the PGI test suite doesn't do any tests that utilize the GPU. I recommend adding a test using the ignition_simple network, BS integrator, and the GPU (ACC=t). Something along the lines of:

$ cd $MICROPHYSICS_HOME/networks/ignition_simple/test
$ make COMP=PGI ACC=t
$ ./testburn.Linux.PGI.debug.acc.exe

Something like this should serve as a minimal verification that basic GPU code is working. As more integrators and/or networks are robustly utilizing the GPU, we can add similar tests to test them (in this case, the default GNUMakefile has already chosen the BS integrator for us).

reset of integration needs to reset T_old

If integration failed and we reset to the initial state to try again, we need to reset T_old and the cv/cp too, for consistency. Perhaps this would be easier with a bs_init variable so we can just do bs = bs_init and go.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.