Many of the results from GPU-accelerated unit-test code appear to be wrong. As a conc

Bad GPU results about microphysics HOT 7 CLOSED

amrex-astro commented on June 20, 2024

Bad GPU results

from microphysics.

Comments (7)

zingale commented on June 20, 2024

the fact that the density is different is telling -- nothing should be changing the density in this unit test.

although it is roundoff-level different

from microphysics.

dwillcox commented on June 20, 2024

I've compared temp_zone and dens_zone calculated inside the loop between PGI serial (debug) and gfortran serial. (with aprox13's input, haven't tried it for ignition_simple)

Printing those variables with 15 sf shows they can differ by about 1E-7, which is the absolute difference in density you see above. My guess is that the roundoff error differs between the log10 or power operations implemented for PGI vs GNU.

Doing the same for PGI-serial vs PGI-acc, I see smaller differences, but at least one difference nonetheless, e.g.

dens_zone = 3414548.873833601

vs.

dens_zone = 3414548.873833600

That suggests the difference you see in density, as Mike said, is roundoff, and that the integrator may not necessarily be doing anything to the density.

from microphysics.

zingale commented on June 20, 2024

good -- forgot that we are doing the exponentiation there.

from microphysics.

adam-m-jcbs commented on June 20, 2024

Some sleuthing indicates that xn_zone contains junk on the GPU. After adding a print statement to main.f90 and doing

make COMP=PGI  NDEBUG=t   -j6
./main.Linux.PGI.exe inputs_3alpha.BS

indicates that all xn_zone values are bounded by 0 <= xn_zone <= 1.0, as they should be. However,

make COMP=PGI ACC=t NDEBUG=t   -j6
./main.Linux.PGI.acc.exe inputs_3alpha.BS.ACC

indicates bad output, such as

...
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             1           15   1.1651353297957938E-004
 j, kk, xn_zone:             1           15   1.1651353297957938E-004
...

I'll continue investigating the origin of this, but wanted to note it in the issue thread.

from microphysics.

adam-m-jcbs commented on June 20, 2024

A quick note for the record: Max and I looked into this in depth and the origin of the issue appears to be the fact that 1) we're using pf on the GPU without ever having it in a data statement (seems PGI should've complained) and 2) pf has a Fortran character array (not supported by PGI on GPU) and a bound procedure (also not something I would expect to work on the GPU, though we don't actually try to use the procedure or character array). It's not clear how, but using this type on the GPU seems to be messing with memory, which may be why xn_zone contains garbage. Will look into this more tomorrow.

from microphysics.

zingale commented on June 20, 2024

oh fun.

from microphysics.

adam-m-jcbs commented on June 20, 2024

After the code was changed to not use pf, the error appears to have gone away. GPU and CPU comparison now yields

fcompare react_ignition_test_react.VBDF react_ignition_test_react.VBDF.ACC/

            variable name            absolute error            relative error
                                        (||A - B||)         (||A - B||/||A||)
 ----------------------------------------------------------------------
 level =  1
 density                            0.000000000               0.000000000    
 temperature                        0.000000000               0.000000000    
 Xnew_carbon-12                    0.1465605415E-10          0.4396816244E-10
 Xnew_oxygen-16                     0.000000000               0.000000000    
 Xnew_magnesium-24                 0.1465605415E-10          0.4396775531E-10
 Xold_carbon-12                     0.000000000               0.000000000    
 Xold_oxygen-16                     0.000000000               0.000000000    
 Xold_magnesium-24                  0.000000000               0.000000000    
 wdot_carbon-12                    0.1465605415E-09          0.4748322189E-05
 wdot_oxygen-16                     0.000000000               0.000000000    
 wdot_magnesium-24                 0.1465605415E-09          0.4748322189E-05
 rho_Hnuc                          0.1581619200E+18          0.4574365704E-05

Relative errors are at most about 5e-6 between CPU and GPU.

from microphysics.

Bad GPU results about microphysics HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent