Code Monkey home page Code Monkey logo

Comments (8)

jdonners avatar jdonners commented on July 18, 2024

one of the issues was that not all arrays were completely initialized:

10: ==41618== Conditional jump or move depends on uninitialised value(s)
10: ==41618==    at 0x72C54BF: MPIR_MAXF (opmax.c:34)
10: ==41618==    by 0x72FF197: MPIR_Reduce_local_impl (reduce_local.c:95)
10: ==41618==    by 0x7047A89: MPIR_Allreduce_intra (allreduce.c:803)
10: ==41618==    by 0x7049827: MPIR_Allreduce_impl (allreduce.c:1165)
10: ==41618==    by 0x704AD20: PMPI_Allreduce (allreduce.c:1304)
10: ==41618==    by 0x777C34D: PMPI_ALLREDUCE (in /nfs/admin/opt/intel/impi/4.1.0.024/intel64/lib/libmpigf.so.4.1)
10: ==41618==    by 0x430003: vmaxv_ (vmaxv.F:35)
10: ==41618==    by 0x40ACD4: gcurv_ (gcurv.F:279)
10: ==41618==    by 0x424097: MAIN__ (papero.F:220)
10: ==41618==    by 0x406585: main (in /nfs/home1/donners/Tickets/24142/verzicco.xztransform/origin/boutnp)

so I changed initia.F into:

!$OMP PARALLEL DO
!$OMP$ DEFAULT(none)
!$OMP$ SHARED(xstart,xend,n3,dens)
!$OMP$ SHARED(q1,q2,q3,n1m,n2m)
!$OMP$ PRIVATE(i,j,k)
      do i=max(xstart(3)-1,1),min(xend(3)+1,n1m)
      do j=max(xstart(2)-1,1),min(xend(2)+1,n2m)
      do k=1,n3
      q1(k,j,i)=0.d0
      q2(k,j,i)=0.d0
      q3(k,j,i)=0.d0
      dens(k,j,i)=1.d0
      enddo
      enddo
      enddo
!$OMP END PARALLEL DO
      q1=0.d0
      q2=0.d0
      q3=0.d0
      dens=1.d0

and that indeed removes the problem mentioned above. Apparently, not all used elements are initialized. However, the optimized binary still outputs irreproducible results.

from afid.

jdonners avatar jdonners commented on July 18, 2024

vdpoel wrote:
When I run in serial I do not observe the same problems as you do. Come to think about it, initia.F is in fact completely unnecessary and acts like a -zero, which is bad coding. We should remove it as it hides bugs.

Using the last trunk version I outcommented initia.F fully and compared two runs with and without -ftrapuv. There was no difference.
Also, I found no difference between multiple instances of the same binary (-r8 -fpp -O3 -ip -ipo -openmp).

If the irreproducibility is only observed when running with different MPI tasks and is only observed in quantities obtained by a MPI_REDUCE (like Maximum Divergence), there is nothing to worry about. Correct me if I am wrong but I presume that because of the hierarchical operation behind MPI_REDUCE, round-of errors can be different. Because Maximum Divergence should be essentially zero by design, it is composed fully out of round-of errors and that is why the irreproducibility appears to be strong on that quantity. I think there is nothing to worry about and that this is normal behaviour of MPI_REDUCE.

from afid.

jdonners avatar jdonners commented on July 18, 2024

ok, just a few remarks about the last comment:

what is bad about initializing variables?
the -zero option sets only local scalar variables to zero. So arrays or variables on the heap are not affected (like the ones set in initia.F). See 'man ifort'.
the -ftrapuv option only works for stack local variables, so not for heap-allocated variables.
I would expect that functions like MPI_REDUCE(..,MPI_MAX,..) give identical results each time, but it seems that Intel MPI has an option I_MPI_FAST_COLLECTIVES=no to be absolutely sure. Not tested, though, since.. 

I merged the code with the trunk and the results have changed. Now I get the following:

the optimized binary compares ok.
the unoptimized binary compares ok.
when leaving out the option -fpe0, the result is NaN's and huge numbers (..E308). 

from afid.

jdonners avatar jdonners commented on July 18, 2024

vdpoel Replying to donners:

ok, just a few remarks about the last comment:

    what is bad about initializing variables? 

They are initialized twice currently. The results should not depend on what is written in initia.F. Rodolfo and me will fix this.

    the -zero option sets only local scalar variables to zero. So arrays or variables on the heap are not affected (like the ones set in initia.F). See 'man ifort'. 

I mean that it acts like that for heap-allocated variables.

    the -ftrapuv option only works for stack local variables, so not for heap-allocated variables. 

Didnt know, thanks.

    I would expect that functions like MPI_REDUCE(..,MPI_MAX,..) give identical results each time, but it seems that Intel MPI has an option I_MPI_FAST_COLLECTIVES=no to be absolutely sure. Not tested, though, since.. 

I merged the code with the trunk and the results have changed. Now I get the following:

    the optimized binary compares ok.
    the unoptimized binary compares ok.
    when leaving out the option -fpe0, the result is NaN's and huge numbers (..E308). 

Ill download the source and see what happens.

from afid.

jdonners avatar jdonners commented on July 18, 2024

ok, the offending options are the '-xAVX -axAVX'. With these options, the code produces NaN's. Without these options the code runs indeed fine in serial mode (1 MPI task). However, the parallel code still produces different results, even if I_MPI_FAST_COLLECTIVES=off. The parallel code requires -O0 to be reproducible.

from afid.

jdonners avatar jdonners commented on July 18, 2024

The problem with AVX seems to occur at the loops with the ZGTTR* calls. Perhaps turning these options on uncovers a hidden bug somewhere. Or could there be another reason for these options to be problematic?

from afid.

jdonners avatar jdonners commented on July 18, 2024

However, I agree with you that the problems when using optimization are at least worrisome. First problem is the reproducibility (by checking e.g. quantities that don't depend on MPI_Reduce). Second problem, with a lower priority, is the use of -xAVX (might be a compiler issue).

from afid.

jdonners avatar jdonners commented on July 18, 2024

adding the -fp-model precise flag gives reproducible results:

slurm-635364.out:00:  the best processor grid is probably           16  by            1
slurm-635364.out:00:   initial maximum divergence:    24.5301631160944     
slurm-635364.out:00:  Maximum divergence =   1.636524249448712E-012
slurm-635364.out:00:  Maximum divergence =   3.150257832373882E-015
slurm-635364.out:00:  Maximum divergence =   3.473350079774562E-015
slurm-635364.out:00:  Maximum divergence =   3.400058012914542E-015
slurm-635364.out:00:  Maximum divergence =   3.136380044566067E-015
slurm-635364.out:00:  Maximum divergence =   3.469446951953614E-015
slurm-635364.out:00:  Maximum divergence =   3.358424649491099E-015
slurm-635364.out:00:  Maximum divergence =   3.344546861683284E-015
slurm-635364.out:00:  Maximum divergence =   3.275157922644212E-015
slurm-635364.out:00:  Maximum divergence =   3.206636345343128E-015
slurm-635364.out:00:  Maximum divergence =   3.098216128094577E-015
slurm-635365.out:00:  the best processor grid is probably           16  by            1
slurm-635365.out:00:   initial maximum divergence:    24.5301631160944     
slurm-635365.out:00:  Maximum divergence =   1.636524249448712E-012
slurm-635365.out:00:  Maximum divergence =   3.150257832373882E-015
slurm-635365.out:00:  Maximum divergence =   3.473350079774562E-015
slurm-635365.out:00:  Maximum divergence =   3.400058012914542E-015
slurm-635365.out:00:  Maximum divergence =   3.136380044566067E-015
slurm-635365.out:00:  Maximum divergence =   3.469446951953614E-015
slurm-635365.out:00:  Maximum divergence =   3.358424649491099E-015
slurm-635365.out:00:  Maximum divergence =   3.344546861683284E-015
slurm-635365.out:00:  Maximum divergence =   3.275157922644212E-015
slurm-635365.out:00:  Maximum divergence =   3.206636345343128E-015
slurm-635365.out:00:  Maximum divergence =   3.098216128094577E-015
slurm-635366.out:00:  the best processor grid is probably            8  by            2
slurm-635366.out:00:   initial maximum divergence:    24.5301631160944     
slurm-635366.out:00:  Maximum divergence =   1.636524249448712E-012
slurm-635366.out:00:  Maximum divergence =   3.150257832373882E-015
slurm-635366.out:00:  Maximum divergence =   3.473350079774562E-015
slurm-635366.out:00:  Maximum divergence =   3.400058012914542E-015
slurm-635366.out:00:  Maximum divergence =   3.136380044566067E-015
slurm-635366.out:00:  Maximum divergence =   3.469446951953614E-015
slurm-635366.out:00:  Maximum divergence =   3.358424649491099E-015
slurm-635366.out:00:  Maximum divergence =   3.344546861683284E-015
slurm-635366.out:00:  Maximum divergence =   3.275157922644212E-015
slurm-635366.out:00:  Maximum divergence =   3.206636345343128E-015
slurm-635366.out:00:  Maximum divergence =   3.098216128094577E-015

also the option -fp-model source results in reproducible results, even when combined with -O3 -xAVX -fopenmp. Performance doesn't seem to suffer.

from afid.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.