dmalhotra / pvfmm Goto Github PK

View Code? Open in Web Editor NEW

51.0 10.0 28.0 2.24 MB

A parallel kernel-independent FMM library for particle and volume potentials

Home Page: http://pvfmm.org

License: GNU Lesser General Public License v3.0

Shell 12.94% C++ 72.26% C 1.26% Cuda 0.37% Makefile 0.87% M4 6.99% Fortran 1.56% CMake 1.48% Python 2.27%

pvfmm's Introduction

PVFMM

What is PVFMM?

PVFMM is a library for solving certain types of elliptic partial differential equations.

We support Stokes, Poisson, and Helmholtz problems on the unit cube, with free-space or periodic boundary conditions, with constant or mildly varying coefficients. Our method is based on volume potential integral equation formulation accelerated by the Kernel Independent Fast Multipole Method.

How to get PVFMM

For the latest stable release of PVFMM visit pvfmm.org

License

PVFMM is distributed under the LGPLv3 licence. See COPYING in the top-level directory of the distribution.

Installing PVFMM

To install PVFMM, follow the steps in the INSTALL file, which is located in the top directory of the source distribution.

Using PVFMM

The file examples/Makefile can be used as a template makefile for any project using the library. In general the MakeVariables file should be included in any makefile and CXXFLAGS_PVFMM and LDFLAGS_PVFMM should be used to compile the code.

Two very simple examples illustrating usage of the library are available: For particle N-body : examples/src/example1.cpp For volume potentials: examples/src/example2.cpp

To compile these examples: make examples/bin/example1 make examples/bin/example2

The volume potentials example will take a long time, the first time it is used, since it has to precompute quadrature rules. This data is saved to a file and used for subsequent runs. See INSTALL for the configure option '--with-precomp-dir=DIR' to set the default path for precomputed data.

Acknowledgment

This software has been developed as part of the work supported by,

US National Institutes of Health/10042242
US Department of Energy/DE-SC0010518
US Department of Energy/DE-SC0009286
US National Science Foundation/CCF-1337393
US Air Force Office for Scientific Research /FA9550-12-10484

The authors would also like to thank ORNL/OLCF and TACC for providing access to computing resources for the development, testing and benchmarking of this software.

pvfmm's People

Contributors

Stargazers

Watchers

pvfmm's Issues

pvfmm_config.h breaks add_subdirectory() compilation with PVFMM as a third-party library

It seems that when one tries to auto-download PVFMM with FetchContent_Declare() in CMake, the resulting call to make is broken because pvfmm_config.h is not found. It seems that ${CMAKE_CURRENT_SOURCE_DIR} is not added through target_include_directories(), but the root level CMakeLists.txt writes pvfmm_config.h to the root directory (seen here ).

It seems that there are two possible fixes for this:

Since it seems that having pvfmm_config.h in the project root is relevant to other parts of the build, simply adding:

configure_file(pvfmm_config.h.in include/pvfmm_config.h @ONLY)

below line 70 seems to address the problem in a non-elegant but sufficient way. This is probably the least effort solution and makes some sense. It seems that files in PVFMM explicitly depend on pvfmm_config.h, so maybe it should live in include/.
2. Explicitly adding something like ${PROJECT_SOURCE_DIR} to the target_include_directories() calls also works. But this could have problematic effects with the calls to install() if it is written without the proper generator expression to switch locations for the install and build interfaces.

A minimal example to reproduce the error:
root level CMakeLists.txt

cmake_minimum_required(VERSION 3.1)
set(CMAKE_CXX_STANDARD 14) 
project(test) 
include(cmake/PVFMM.cmake)

In cmake/PVFMM.cmake:

if (NOT TARGET PVFMM::PVFMM)                                                                                                                                                                                                                              
    include(FetchContent)                                                                                                                                                                                                                                 
    FetchContent_Declare(                                                                                                                                                                                                                                 
        PVFMM                                                                                                                                                                                                                                             
        SOURCE_DIR ${CMAKE_BINARY_DIR}/_deps/PVFMM                                                                                                                                                                                                        
        BINARY_DIR ${CMAKE_BINARY_DIR}/_deps/PVFMM                                                                                                                                                                                                        
        GIT_REPOSITORY https://github.com/dmalhotra/pvfmm.git                                                                                                                                                                                             
        GIT_TAG        ffec8376dac7e2df134e56c1a37f22051ec483bb                                                                                                                                                                                           
        GIT_SHALLOW TRUE                                                                                                                                                                                                                                  
    )                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                          
    set(CMAKE_BUILD_TYPE Release CACHE INTERNAL  "Release or debug mode")                                                                                                                                                                                 
                                                                                                                                                                                                                                                          
    FetchContent_GetProperties(PVFMM)                                                                                                                                                                                                                     
    if(NOT pvfmm_POPULATED)                                                                                                                                                                                                                               
        FetchContent_Populate(PVFMM)                                                                                                                                                                                                                      
        message("pvfmm_SOURCE_DIR " ${pvfmm_SOURCE_DIR} " " ${pvfmm_BINARY_DIR})                                                                                                                                                                          
        set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CMAKE_CURRENT_SOURCE_DIR}/cmake)                                                                                                                                                                     
                                                                                                                                                                                                                                                          
        add_subdirectory(${pvfmm_SOURCE_DIR} ${pvfmm_BINARY_DIR})                                                                                                                                                                                         
        add_library(PVFMM::PVFMM INTERFACE IMPORTED)                                                                                                                                                                                                      
        target_include_directories(PVFMM::PVFMM SYSTEM INTERFACE                                                                                                                                                                                          
            ${pvfmm_SOURCE_DIR}/include)                                                                                                                                                                                                                  
        target_link_libraries(PVFMM::PVFMM INTERFACE                                                                                                                                                                                                      
            ${pvfmm_BINARY_DIR}/lib/libpvfmm.a)                                                                                                                                                                                                           
    endif()                                                                                                                                                                                                                                               
endif()

in the root level run:

mkdir build
cd build
cmake ..
make

Thanks!!

Not compiling on Linux & Intel Compiler

The code is not compiling with standard setup sequence (autogen, configure, make) with Intel Composer_XE_2015.0 on Ubuntu, with a strange error possibly related to MIC.
The compiler log is attached
output.txt

Documentation/suggestions for repeated invocation?

I'm working on a benchmark that repeatedly calls pvffm to evaluate a biot-savart point-to-point kernel as part of a runge kutta time integration, with the source and target points and potentials changing between each invocation. Are there any particularly strategies, documentation, or examples on how to cut down on or reuse initialization/tree matrix initialization, tree construction or other setup overheads between successive invocations? Is this, for example, what fmm_pts.cpp is doing?

develop branch fails to build when USE_SSE is defined

I've tried to compile the code using g++ (Debian 6.3.0-6), but it failed when USE_SSE variable was defined with the following message

In file included from /home/uranix/pvfmm-dev/include/kernel.hpp:202:0,
                 from /home/uranix/pvfmm-dev/include/cheb_utils.hpp:12,
                 from /home/uranix/pvfmm-dev/src/cheb_utils.cpp:8:
/home/uranix/pvfmm-dev/include/kernel.txx: In function ‘void pvfmm::{anonymous}::stokesStressSSE(int, int, const double*, const double*, const double*, const double*, const double*, const double*, const double*, double*)’:
/home/uranix/pvfmm-dev/include/kernel.txx:2043:32: error: ‘T’ was not declared in this scope
         double r = pvfmm::sqrt<T>(r2);
                                ^
/home/uranix/pvfmm-dev/include/kernel.txx:2043:37: error: no matching function for call to ‘sqrt(double&)’
         double r = pvfmm::sqrt<T>(r2);
                                     ^
In file included from /home/uranix/pvfmm-dev/include/pvfmm_common.hpp:62:0,
                 from /home/uranix/pvfmm-dev/include/cheb_utils.hpp:10,
                 from /home/uranix/pvfmm-dev/src/cheb_utils.cpp:8:
/home/uranix/pvfmm-dev/include/math_utils.hpp:29:15: note: candidate: template<class Real_t> Real_t pvfmm::sqrt(Real_t)
 inline Real_t sqrt(const Real_t a){return ::sqrt(a);}

It looks like T was a template argument, but T was later hardcoded to be double.

Kernel evaluation - non-symmetric

Dear All,

I am new to pvFMM and trying to use a non-symmetric kernel.
As a first test, I changed the laplace_potent_kernel to be non-symmetric (as given in the diff below)

Running the classical exemple1 with the potential kernel
const pvfmm::Kernel<double>& kernel_fn=pvfmm::LaplaceKernel<double>::potential(); indicates me that the kernel is non-symmetric and then segfaults in the U2U part:

InitFMM_Pts {
no-symmetry for: laplace
    LoadMatrices {
        ReadFile {
        }
        Broadcast {
        }
    }
//some other output...
RunFMM {
    UpwardPass {
        S2U {
        }
        U2U {
 *** Process received signal ***
 Signal: Segmentation fault: 11 (11)
Signal code:  (0)
Failing at address: 0x0
*** End of error message ***
Segmentation fault: 11

Do you have any tip/idea of what is going on? Do you support non-symmetric kernels?

diff --git a/include/kernel.txx b/include/kernel.txx
index 7867086..f919a8b 100755
--- a/include/kernel.txx
+++ b/include/kernel.txx
@@ -1108,7 +1108,7 @@ void laplace_poten_uKernel(Matrix<Real_t>& src_coord, Matrix<Real_t>& src_value,
 
         Vec_t r2=        mul_intrin(dx,dx) ;
         r2=add_intrin(r2,mul_intrin(dy,dy));
-        r2=add_intrin(r2,mul_intrin(dz,dz));
+        //r2=add_intrin(r2,mul_intrin(dz,dz));
 
         Vec_t rinv=RSQRT_INTRIN(r2);
         tv=add_intrin(tv,mul_intrin(rinv,sv));
@@ -1405,8 +1405,8 @@ void laplace_grad(T* r_src, int src_cnt, T* v_src, int dof, T* r_trg, int trg_cn
 
 
 template<class T> const Kernel<T>& LaplaceKernel<T>::potential(){
-  static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,1>, laplace_dbl_poten<T,1> >("laplace"     , 3, std::pair<int,int>(1,1),
-      NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL, &laplace_vol_poten<T>);
+  static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,1> >("laplace"     , 3, std::pair<int,int>(1,1),
+      NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL,NULL,false);
   return potn_ker;
 }
 template<class T> const Kernel<T>& LaplaceKernel<T>::gradient(){
@@ -1418,8 +1418,8 @@ template<class T> const Kernel<T>& LaplaceKernel<T>::gradient(){
 
 template<> inline const Kernel<double>& LaplaceKernel<double>::potential(){
   typedef double T;
-  static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,2>, laplace_dbl_poten<T,2> >("laplace"     , 3, std::pair<int,int>(1,1),
-      NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL, &laplace_vol_poten<double>);
+  static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,2> >("laplace"     , 3, std::pair<int,int>(1,1),
+      NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL,NULL,false);
   return potn_ker;
 }
 template<> inline const Kernel<double>& LaplaceKernel<double>::gradient(){

PS: I am runing on MacOS and compiled with

./configure MPICXX=mpic++ CXX=icpc CC=icc F77=ifort CXXFLAGS="-mavx -g -std=c++11" CFLAGS="-mavx -g" FFLAGS="-mavx -g" --prefix=/opt/intel/intel_lib/pvfmm-1.0.0 --with-openmp-flag="qopenmp" --with-fftw-include="${FFTW_INC}" --with-fftw-lib="-mkl" --with-blas="-mkl" --with-lapack="-mkl" --disable-doxygen-doc --disable-doxygen-dot --disable-doxygen-html

Odd behaviour of example2 with -q 0

When running example2 with the parameters

./examples/bin/example2 -m 2 -q 0 -N 669

the example does not terminate. Instead there is just a single thread constantly allocating memory until the compute node runs out of memory (which in my case would be about 192 GB) and crashes.

When I am running the example like

./examples/bin/example2 -m 2 -q 0 -N 668

this behaviour does not occur. Instead the example terminates almost instantly and needs about 400 MB of memory.

Is this a bug, or am I missing something obvious here? The difference in the memory requirements seems extreme given the two scenarios. I have encountered this behaviour on two separate systems (as I wanted to see whether more memory would fix it).

If it is of any help, here the configuration I used:
I am using the latest commit (6cd67bd) and have built pvfmm with cuda support:

./configure --with-cuda=/usr/local/cuda

Output of the configuration:
pvfmm-lib-configuration.txt

no include sctl.hpp

when i compile the library, it shows no include sctl.hpp, how can i solve this issue?
/home/alex/dev/pvfmm/include/pvfmm_common.hpp:64:10: fatal error: sctl.hpp: No such file or directory
64 | #include <sctl.hpp>
| ^~~~~~~~~~
compilation terminated.

Laplace potential kernel - rinv evaluation

Dear all,

I am starting with pvFMM and encountering an issue while trying to change the kernel evaluation. I tried to replace the evaluation of rinv = RSQRT_INTRIN by another intrinsic, provided, rsqrt_approx_intrin (which should be more accurate, up to my understanding).

I first try to run the program, without any modification (downloaded 5 march 2018), which goes through the example1 test smoothly.

./bin/exemple1 -N 4096

Maximum Absolute Error:2.47002e-05
Maximum Relative Error:4.5345e-09

Then, changing the evaluation of the rinv intrinsic in the kernel evaluation makes the algorithm blow up (after make clean everything).

./bin/exemple1 -N 4096

Maximum Absolute Error:3.07983e+09
Maximum Relative Error:565400

and runing the potential kernel only, gives me an error in the precomputation of the boundary conditions:

Cheb_Integ::Failed to converge.[6.93278e-09,-0.975,-0.975,-0.975]
Cheb_Integ::Failed to converge.[6.40075e-10,-0.975,-0.647222,-0.975]
Cheb_Integ::Failed to converge.[2.07315e-09,-0.975,-0.647222,-0.647222]
Cheb_Integ::Failed to converge.[2.08389e-09,-0.975,-0.647222,-0.319444]
...
Cheb_Integ::Failed to converge.[4.40053e-10,-0.975,0.00833333,0.663889]
Cheb_Integ::Failed to converge.[3.17001e-09,-0.975,0.00833333,0.991667]
Cheb_Integ::Failed to converge.[2.80505e-09,-0.975,0.00833333,1.31944]
Cheb_Integ::Failed to converge.[4.60546e-10,-0.975,0.00833333,1.64722]
Cheb_Integ::Failed to converge.[7.76104e-10,-0.975,0.336111,-0.975]
Cheb_Integ::Failed to converge.[8.40628e-11,-0.975,0.336111,-0.647222]
Cheb_Integ::Failed to converge.[1.85214e-09,-0.975,0.336111,-0.319444]
Cheb_Integ::Failed to converge.[4.40053e-10,-0.975,0.336111,0.00833333]

Here is the git diff of the downloaded code. I checked the kernel and the value only differs from each other by 1e-4 to 1e-6.

Do you have any solution to that issue?

diff --git a/include/kernel.txx b/include/kernel.txx
index 7867086..5abd5df 100755
--- a/include/kernel.txx
+++ b/include/kernel.txx
@@ -1088,7 +1088,7 @@ void laplace_poten_uKernel(Matrix<Real_t>& src_coord, Matrix<Real_t>& src_value,
   for(int i=0;i<NWTN_ITER;i++){
     nwtn_scal=2*nwtn_scal*nwtn_scal*nwtn_scal;
   }
-  const Real_t OOFP = 1.0/(4*nwtn_scal*const_pi<Real_t>());
+  const Real_t OOFP = 1.0/(4*const_pi<Real_t>());
 
   size_t src_cnt_=src_coord.Dim(1);
   size_t trg_cnt_=trg_coord.Dim(1);
@@ -1110,7 +1110,7 @@ void laplace_poten_uKernel(Matrix<Real_t>& src_coord, Matrix<Real_t>& src_value,
         r2=add_intrin(r2,mul_intrin(dy,dy));
         r2=add_intrin(r2,mul_intrin(dz,dz));
 
-        Vec_t rinv=RSQRT_INTRIN(r2);
+        Vec_t rinv=rsqrt_approx_intrin(r2);
         tv=add_intrin(tv,mul_intrin(rinv,sv));
       }
       Vec_t oofp=set_intrin<Vec_t,Real_t>(OOFP);

EDIT

When using the potential kernel, by removing the scale invariance boolean and the dbl_layer kernel, i managed to reduce the error:

@@ -1405,8 +1405,8 @@ void laplace_grad(T* r_src, int src_cnt, T* v_src, int dof, T* r_trg, int trg_cn
 template<class T> const Kernel<T>& LaplaceKernel<T>::potential(){
-  static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,1>, laplace_dbl_poten<T,1> >("laplace"     , 3, std::pair<int,int>(1,1),
-      NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL, &laplace_vol_poten<T>);
+  static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,1> >("laplace"     , 3, std::pair<int,int>(1,1),
+      NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL,NULL,false);
   return potn_ker;
 }
 template<class T> const Kernel<T>& LaplaceKernel<T>::gradient(){
@@ -1418,8 +1418,8 @@ template<class T> const Kernel<T>& LaplaceKernel<T>::gradient(){
 
 template<> inline const Kernel<double>& LaplaceKernel<double>::potential(){
   typedef double T;
-  static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,2>, laplace_dbl_poten<T,2> >("laplace"     , 3, std::pair<int,int>(1,1),
-      NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL, &laplace_vol_poten<double>);
+  static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,2> >("laplace"     , 3, std::pair<int,int>(1,1),
+      NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL,NULL,false);
   return potn_ker;
 }
 template<> inline const Kernel<double>& LaplaceKernel<double>::gradient(){

gives

Maximum Absolute Error:21.2006
Maximum Relative Error:0.0541196

PS: I am runing on mac OS and compiled with

./configure MPICXX=mpic++ CXX=icpc CC=icc F77=ifort CXXFLAGS="-mavx -g -std=c++11" CFLAGS="-mavx -g" FFLAGS="-mavx -g" --with-openmp-flag="qopenmp" --with-fftw-include="${FFTW_INC}" --with-fftw-lib="-mkl" --with-blas="-mkl" --with-lapack="-mkl" --disable-doxygen-doc --disable-doxygen-dot --disable-doxygen-html

where FFTW_INC refers to the the include dir of MKL/fftw

SVD still enters infinite loops in some cases

I can't seem to reopen #5 myself, so I opened a new one.

Bad news. Still seems to be happening with the following matrix:

{ {7044.7734691220212, 0, 0}, {0, -1.284570679187241e-322, 57.264113734770199}, {0, 0, 0} }

SVD loops forever in certain cases

I'm not sure how to fix it, but your SVD algorithm isn't incrementing k0 in some cases — it then enters an infinite loop.

First found it here: http://stackoverflow.com/questions/3856072/single-value-decomposition-implementation-c/25291714#25291714

Segmentation fault when compiled with cuda

Hi I am trying to compile your code and had a segmentation fault problem when compiled with cuda.

Software:
Centos 7, mpicxx(openmpi 1.10.0 + gcc 4.8.5), cuda-7.5, nvidia-driver 367.35
Hardware:
Xeon E5 2643 V3 x2, 128GB mem
pvfmm cloned from the github repo.

When compiled with cpu only, the examples run smoothly. When I configured it with cuda, for example:

 /configure MPICXX=/usr/lib64/openmpi/bin/mpicxx --prefix=/home_local/wyan_local/software/PVFMM/install --with-cuda=/usr/local/cuda

The examples throw segmentation fault. For example with 1 openmp thread:

example1 -N 512

gives

        W-List {
        }
        U-List {
        }
        V-List {
        }
        D2H_Wait:LocExp {
Segmentation fault (core dumped)

I looked the code a bit and it seems the loop copy dev_ptr to host_ptr at line 681 in fmm_tree.txx gives that segmentation fault

  Profile::Tic("D2H_Wait:LocExp",this->Comm(),false,5);
  if(device) if(setup_data[0+MAX_DEPTH*2].output_data!=NULL){
    Real_t* dev_ptr=(Real_t*)&fmm_mat->staging_buffer[0];
    Matrix<Real_t>& output_data=*setup_data[0+MAX_DEPTH*2].output_data;
    size_t n=output_data.Dim(0)*output_data.Dim(1);
    Real_t* host_ptr=output_data[0];
    output_data.Device2HostWait();

    #pragma omp parallel for
    for(size_t i=0;i<n;i++){
      host_ptr[i]+=dev_ptr[i];
    }
  }

I have tried moving from openmpi 1.10 to 2.0(latest), and to latest mpich. Also configured pvfmm with different gcc/nvcc compiling flags from '-g -O0' to '-O2' to 'mtune=native', '-gencode arch=compute_52,code=sm_52 '. All the combinations give the same segamentation fault.

Could you please help me locate the problem?

Thank you,