physicsoffluids / afid Goto Github PK

A highly parallel application for Rayleigh-Benard and Taylor-Couette flows

License: Other

Fortran 78.98% Shell 6.16% Python 0.89% Makefile 1.36% M4 12.60%

afid's Introduction

AFiD

AFiD is a highly parallel application for Rayleigh-Benard and Taylor-Couette flows. See van der Poel et al (2015) for more details.

It is developed by Twente University, SURFsara and University of Rome "Tor Vergata".

See the file COPYING for copying permission of the 2DECOMP&FFT library. This application is free and unencumbered software released into the public domain.

Installation

The AFiD model has the following prerequisites:

MPI
BLAS
LAPACK
FFTW3
HDF5 with parallel I/O

It's recommended to download a release tarball of AFiD, which can be found here. To install AFiD, please use the 'configure' script. Note that you'll need to set optimization and debugging options yourself. The easiest way to configure and build AFiD is

./configure
make
make install prefix=/path/to/install/afid

It tries to find and configure all prerequisites automatically, although it doesn't always succeed. By default it uses the -O2 optimization flag (if available). The most important configuration options are:

./configure MPIFC=mpif90.gfortran              # set MPIFC to your MPI compiler wrapper for Fortran
./configure --with-blas=/path/to/blas.lib      # library with blas routines
./configure --with-lapack=/path/to/lapack.lib  # library with lapack routines 
./configure FCFLAGS=-O3                        # very high optimization
./configure FCFLAGS="-g -O0"                   # debug info, no optimization

The configure script locates the fftw-wisdom utility to find the root path of the FFTW3 library and it uses the h5pfc compiler wrapper to configure the HDF5 library. You can override these using:

./configure --with-fftw3=<root path to fftw3 installation>
./configure --with-hdf5=<root path to hdf5 installation>

It is recommended to use the vendor-optimized libraries for BLAS and (possibly) LAPACK (e.g. MKL, ESSL or LibSci). Note that the FFTW3 library cannot be replaced with the MKL library, since it doesn't support the calls that are used in AFiD.

Should you want to build from the repository (when you download the source file from directly using "Download ZIP") you first need to create the configure script. For this you need a recent versions of the GNU autotools and the configure script is created using the command

autoreconf -i

Usage

See the manual for more a description of the input parameters of the code. An example of the usage of the code can be found here.

afid's People

Contributors

Stargazers

Watchers

afid's Issues

Configuring with an HDF5 version higher than 1.8.22

Hi,
I have been trying to compile this program on an Ubuntu 20.4 LTS system with different versions of HDF5 and only the version 1.8.22 configures and compiles correctly.
My system has these versions:

kernel 5.4.0-89
regular desktop computer with an intel i7-7700K Processor
gcc version is 9.3.0-17 with gfortran at the same version.
MPICH of version 4.0.3.
libblas and liblapack in version 3.9.0

I have compiled HDF5 in all three different versions (1.8.22, 1.10.7 and 1.12.1) from source with these configuration flags
./configure --enable-parallel --enable-fortran --enable-shared
and all Versions configured correctly. I used make and make install to create the libraries, binaries and include folders.

Afterwards I used the configure script of the AFiD program with one additional compiler flag
--with-hdf5=
listing the folder of the different HDF5 versions for example (version 1.12.1):
./configure --with-hdf5=/opt/hdf5/hdf5-1.12.1/hdf5
The result is that the configuration stops at the check for the parallel HDF5 library. The error found in config.log is

configure:6404: mpif90 -o conftest -I/opt/hdf5/hdf5-1.12.1/hdf5/include -L/opt/hdf5/hdf5-1.12.1/hdf5/lib conftest.f90 -lhdf5_fortran -lhdf5  -lz -ldl -lm  >&5 
conftest.f90:9:61:

    9 |       call h5pset_fapl_mpio_f(classtype,comm,plist_id,hdferr)
      |                                                             1
Error: Type mismatch in argument 'prp_id' at (1); passed INTEGER(4) to INTEGER(8)
configure:6404: $? = 1 
configure: failed program was:
| 
|       program conftest
|       use hdf5
|       implicit none
|       integer         :: classtype
|       integer         :: comm
|       integer (hid_t) :: plist_id
|       integer         :: hdferr
|       call h5pset_fapl_mpio_f(classtype,comm,plist_id,hdferr)
|       end
configure:6407: error: Unable to find hdf5 Fortran library with parallel I/O

I have attached the full log file.
config.log

This error only occurs for HDF5 versions 1.10.7 and 1.12.1.
As far as I can see, this is only a compiler problem in the program used to test the HDF5 capabilities and not the program itself.
From my point of view, there is no need to change anything, just update the README to include the versions needed to compile the program (HDF5 version 1.8.22).
Newer HDF5 versions have some benefits in performance and storage but if these are not required than there is no need to change the software.

Thank you for this open source program.

Temporary array arguments

Erwin van der Poel mentioned that the code (unnecessarily) produces several temporary arrays for arguments when calling routines. These can be found with the compiler flag:

mpiifort -check arg_temp_created

This can be prevented by putting routines in modules, or adding an explicit interface in the calling routine.

Faster Implicit update

Massimiliano Fatica (NVIDIA) wrote:

While I was adding the GPU path to the Implicit update routines, I found a good improvement (2x-3x) for the CPU code with a better use of the dgttrs call.

Basically, after the dgttrf call, instead of solving each vertical line:

       do ic=xstart(3),xend(3)
       do jc=xstart(2),xend(2)

!     Normalize RHS of equation

        fkl(1)= real(0.,fp_kind)
        do kc=2,nxm
         ackl_b=real(1.0,fp_kind)/(real(1.0,fp_kind)-ac3ssk(kc)*betadx)
         fkl(kc)=rhs(kc,jc,ic)*ackl_b
        end do
        fkl(nx)= real(0.,fp_kind)

!     Solve equation using LAPACK library

        call dgttrs('N',nx,1,amkT,ackT,apkT,appk,ipkv,fkl,nx,info)

!      Update temperature field

        do kc=2,nxm
          temp(kc,jc,ic) = temp(kc,jc,ic) + fkl(kc)
        end do

       enddo
      end do

you can solve all of them together

       nrhs=(xend(3)-xstart(3)+1)*(xend(2)-xstart(2)+1)
! Normalize RHS (but this should be moved in the main loop of the corresponding ImplicitUpdate
       do ic=xstart(3),xend(3)
         do jc=xstart(2),xend(2)
            do kc=2,nxm
              ackl_b=real(1.0,fp_kind)/(real(1.0,fp_kind)-ac3ssk(kc)*betadx)
              rhs(kc,jc,ic)=rhs(kc,jc,ic)*ackl_b
             end do
          end do
      end do

      call dgttrs('N',nx,nrhs,amkT,ackT,apkT,appk,ipkv,rhs,nx,info)

! You can also add OpenMP directives on these loops
       do ic=xstart(3),xend(3)
         do jc=xstart(2),xend(2)
            do kc=2,nxm
              temp(kc,jc,ic)=temp(kc,jc,ic) + rhs(kc,jc,ic)
             end do
          end do
      end do

Inconsistent initialization of temperature

In CreateInitialConditions.F90, the temperature is initialized as:

do i=xstart(3),xend(3)
do j=xstart(2),xend(2)
do k=2,nxm
temp(k,j,i)= tempbp(j,i) - (tempbp(j,i) - temptp(j,i)) &
*xc(k)
enddo
end do
end do

    temp(1,:,:)=1.0d0
    temp(nx,:,:)=0.0d0

The last two assignments should use tempbp for k=1 and tempt for k=nx.
I also suggest to pull them in the previous loop, since temp has halo cells.

SEGFAULT on Cartesius

The code segfaults on Cartesius, while it does not on HERMIT or local machines.
It happens in SolvePressureCorrection.F90 (old: phcalc.F) at this call:

call dfftw_execute_dft_r2c(fwd_guruplan_y,ry1,cy1)

which is line 106 in version 49.

It has something to do with the guru interface as this call without guru in older versions of the code works.

Reproducibility

It is expected that the code should produce identical results for identical runs. Judging from the 'Maximum divergence' after 10 time steps, that seems to be the case when no optimization is used:

FC = h5pfc -r8 -O0 -fpp -g -traceback -fpe0 -warn all -debug all -check all
FC += -fopenmp

However, that seems not the case when used with full optimization:

FC = h5pfc -r8 -ip -ipo -O3 -fpp -g -traceback -fpe0
FC += -fopenmp
FC += -axAVX -xAVX

Why?

test on 384^3 using ./configure FCFLAGS="-O3 -xAVX -axCORE-AVX2”. afid-1.1 crashes on an Ivy Bridge node (i.e. AVX) with the message “too large local residue for mass conservation at:”. This code works fine on a Haswell node (i.e. AVX2).
Using only the “-O3” flag everything works well on both types of nodes.
Using the “-O3 -xAVX” flags the code fails on both types of nodes, so the problem is in the part “-xAVX”.

It also occurs with the latest compiler version (16.0.1).

some print statements show:

...
16:  dmax,resid=  6.953275390687991E-310  1.000000000000000E-002
18:  dmax,resid=  6.952658331733957E-310  1.000000000000000E-002
00:  dmax,resid= -1.797693134862316E+308  1.000000000000000E-002
00:           too large local residue for mass conservation at:

only process 0 has a somewhat excessive residue..

the same happens with ./configure FCFLAGS="-xAVX -O0 -g -traceback", so without optimization.