Code Monkey home page Code Monkey logo

heffte's People

Contributors

af-ayala avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

heffte's Issues

Question/performance issue - fastest way to compute r2c

Hi,

I am trying to compute the r2c transform of a 3D distributed field and to get as much performance as possible from your library.
However, I get a very slow forward-backward transform, even on a single core.
Could you help me to get a faster piece of code?

Here is the version I have for the moment, where the number of indexes by rank is given by the external solver we use.

// get the indexes - here used as an example. In practise, will be given by the main solver.
const int real_limits[3] = {16, 16, 16};
heffte::box3d<> const real_idx = {{0, 0, 0}, {real_limits[0]-1, real_limits[1]-1, real_limits[2]-1}};
heffte::box3d<> const cplx_idx = {{0, 0, 0}, {real_limits[0]/2-1, real_limits[1]-1, real_limits[2]-1}};

// set the r2c direction (what is that?)
int r2c_dir = 0;

// setup the fft
heffte::fft3d_r2c<heffte::backend::fftw> fft(real_idx, cplx_idx, r2c_dir, comm);

// set iota memory
std::vector<double> heffte_data(fft.size_inbox());
std::iota(heffte_data.begin(), heffte_data.end(), 0);

// do a few solves
for (int iter=0; iter<n_warm; ++iter){
        // warm up heffte
        auto output  = fft.forward(heffte_data, scale::none);
        auto inverse = fft.backward(output);
}

Regarding this piece of code, I have a few questions:

  • am I missing something performance-wise? The code seems to be very slow compared to other FFT solvers.
  • what is this the purpose of r2c_dir? I guess it's the direction in which the first/last mode is dropped?
  • are the data to be understood in a cell-centered or node-centered way?
  • how can I impose even/odd symmetry?

Thanks a lot for your help and your time.

Performance Issue With Multiple GPUs Using CUFFT

I am observing a huge slow down when using 2 MPI ranks with 2 GPUs on the same node compared to 1 MPI rank with 1 GPU.
Could someone please explain the reason for this slow down in performance?

Benchmark results:

$ mpirun --bind-to socket -n 1 ./benchmarks/speed3d_r2c cufft double 256 256 256
----------------------------------------------------------------------------- 
heFFTe performance test
----------------------------------------------------------------------------- 
Backend:   cufft
Size:      256x256x256
MPI ranks:    1
Grids: (1, 1, 1)  
Time per run: 0.00373505 (s)
Performance:  539.02 GFlops/s
Memory usage: 768MB/rank
Tolerance:    1e-11
Max error:    3.72476e-15
$ mpirun --bind-to socket -n 2 ./benchmarks/speed3d_r2c cufft double 256 256 256
----------------------------------------------------------------------------- 
heFFTe performance test
----------------------------------------------------------------------------- 
Backend:   cufft
Size:      256x256x256
MPI ranks:    2
Grids: (1, 1, 2)  (1, 2, 1)  (1, 1, 2)  
Time per run: 0.143419 (s)
Performance:  14.0376 GFlops/s
Memory usage: 640MB/rank
Tolerance:    1e-11
Max error:    4.00439e-15

CMake options:

-- heFFTe 2.0.0
--  -D CMAKE_INSTALL_PREFIX=/vast/home/cyenusah/HEFFTE/cuda_build
--  -D BUILD_SHARED_LIBS=
--  -D CMAKE_BUILD_TYPE=Release
--  -D CMAKE_CXX_FLAGS_RELEASE=-O3 -DNDEBUG
--  -D CMAKE_CXX_FLAGS=
--  -D MPI_CXX_COMPILER=/projects/opt/rhel7/ppc64le/p9/openmpi/4.1.1-gcc_9.4.0/bin/mpicxx
--  -D MPI_CXX_COMPILE_OPTIONS=-pthread
--  -D CUDA_NVCC_FLAGS=-std=c++11
--  -D CUDA_TOOLKIT_ROOT_DIR=/projects/darwin-nv/rhel7/ppc64le/packages/cuda/11.4.0
--  -D Heffte_ENABLE_FFTW=OFF
--  -D Heffte_ENABLE_MKL=OFF
--  -D Heffte_ENABLE_CUDA=ON
--  -D Heffte_ENABLE_ROCM=OFF
--  -D Heffte_ENABLE_PYTHON=OFF
--  -D Heffte_ENABLE_FORTRAN=OFF
--  -D Heffte_ENABLE_TRACING=OFF

Using:

1) cmake/3.19.2
2) gcc/9.4.0
3) cuda/11.4.0
4) openmpi/p9/4.1.1-gcc_9.4.0

Error Compiling MKL Backend

First Line of Error Message:
In file included from /heffte/include/heffte_reshape3d.h(14),
from /heffte/src/heffte_reshape3d.cpp(11):
/heffte/include/heffte_backend_mkl.h(166): error: argument list for class template "heffte::box3d" is missing
mkl_executor(box3d const box, int dimension) :

CMake options:
-- heFFTe 2.0.0
-- -D CMAKE_INSTALL_PREFIX=/vast/home/cyenusah/HEFFTE/mkl_build
-- -D BUILD_SHARED_LIBS=ON
-- -D CMAKE_BUILD_TYPE=Release
-- -D CMAKE_CXX_FLAGS_RELEASE=-O3 -DNDEBUG
-- -D CMAKE_CXX_FLAGS=
-- -D MPI_CXX_COMPILER=/projects/opt/centos8/x86_64/oneAPI/2022.2.0.262/mpi/2021.6.0/bin/mpiicpc
-- -D MPI_CXX_COMPILE_OPTIONS=
-- -D Heffte_ENABLE_FFTW=OFF
-- -D Heffte_ENABLE_MKL=ON
-- -D Heffte_ENABLE_CUDA=OFF
-- -D Heffte_ENABLE_ROCM=OFF
-- -D Heffte_ENABLE_PYTHON=OFF
-- -D Heffte_ENABLE_FORTRAN=OFF
-- -D Heffte_ENABLE_TRACING=OFF

Using:

  1. cmake/3.19.2
  2. intel/19.0.5
  3. intel-mpi/2021.6.0

Missing getting started guide

Hi, I just wanted to use this library in molecular dynamics code like gromacs and Lammps.
I could build this library but MD codes somehow missed to use it.
Checked with ldd gmx_mpi it ignores this library.

Do I need to write a separate wrapper code or need to change gromacs or lammps code?
It will be a great help if you can provide the recipe for using it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.