qmcpack / qmcpack Goto Github PK

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support

Home Page: http://www.qmcpack.org

License: Other

CMake 2.96% Shell 0.75% C++ 61.37% Gnuplot 0.01% Python 16.00% TeX 0.19% Makefile 0.01% PostScript 15.19% HTML 0.01% CSS 0.07% Cuda 1.92% C 0.72% Perl 0.62% GAMS 0.02% Emacs Lisp 0.05% Batchfile 0.01% Dockerfile 0.03% BASIC 0.08% Visual Basic 6.0 0.01%

quantum-monte-carlo electronic-structure c-plus-plus high-performance-computing quantum-chemistry cuda gpu hpc mpi

qmcpack's Introduction

QMCPACK is an open-source production-level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, 2D nanomaterials and solids. The solid-state capabilities include metallic systems as well as insulators. QMCPACK is expected to run well on workstations through to the latest generation supercomputers. Besides high performance, particular emphasis is placed on code quality and reproducibility.

Obtaining and installing QMCPACK

Obtain the latest release from https://github.com/QMCPACK/qmcpack/releases or clone the development source from https://github.com/QMCPACK/qmcpack. A full installation guide and steps to perform an initial QMC calculation are given in the extensive online documentation for QMCPACK.

The CHANGELOG.md describes key changes made in each release as well as any major changes to the development version.

Documentation and support

For more information, consult QMCPACK pages at http://www.qmcpack.org, the manual at https://qmcpack.readthedocs.io/en/develop/index.html, or its sources in the docs directory.

If you have trouble using or building QMCPACK, or have questions about its use, please post to the Google QMCPACK group, create a GitHub issue at https://github.com/QMCPACK/qmcpack/issues or contact a developer.

Learning about Quantum Monte Carlo

To learn about the fundamentals of Quantum Monte Carlo through to their practical application to molecular and solid-state systems with QMCPACK, see the materials and tutorials from our most recent QMC workshop. These include a virtual machine to run examples without having to install QMCPACK yourself, and slides and recorded videos of introductory talks through to spin-orbit QMC.

Citing QMCPACK

Please cite J. Kim et al. J. Phys. Cond. Mat. 30 195901 (2018), https://doi.org/10.1088/1361-648X/aab9c3, and if space allows, P. Kent et al. J. Chem. Phys. 152 174105 (2020), https://doi.org/10.1063/5.0004860 . These papers are both open access.

Installation Prerequisites

C++ 17 and C99 capable compilers.
CMake v3.21.0 or later, build utility, http://www.cmake.org
BLAS/LAPACK, numerical library. Use vendor and platform-optimized libraries.
LibXml2, XML parser, http://xmlsoft.org/
HDF5 v1.10.0 or later, portable I/O library, http://www.hdfgroup.org/HDF5/
BOOST v1.61.0 or newer, peer-reviewed portable C++ source libraries, http://www.boost.org
FFTW, FFT library, http://www.fftw.org/
MPI, parallel library. Optional, but a near requirement for production calculations.
Python3. Older versions are not supported as of January 2020.
CUDA v11.0 or later. Optional, but required for builds with NVIDIA GPU support. Use 12.3 or newer if possible. 11.3-12.2 have a bug affecting multideterminant calculations. Single determinant calculations are OK.

We aim to support open source compilers and libraries released within two years of each QMCPACK release. Use of software versions over two years old may work but is discouraged and untested. Proprietary compilers (Intel, NVHPC) are generally supported over the same period but may require use of an exact version. We also aim to support the standard software environments on machines such as Frontier and Summit at OLCF, Aurora and Polaris at ALCF, and Perlmutter at NERSC. Use of the most recently released compilers and library versions is particularly encouraged for highest performance and easiest configuration.

Nightly testing currently includes at least the following software versions:

Compilers
- GCC 13.2.0, 11.4.0
- Clang/LLVM 17.0.4
Boost 1.83.0, 1.77.0
HDF5 1.14.3
FFTW 3.3.10, 3.3.8
CMake 3.27.9, 3.21.4
MPI
- OpenMPI 4.1.6
CUDA 12.3

GitHub Actions-based tests include additional version combinations from within our two year support window. On a developmental basis we also check the latest Clang and GCC development versions, AMD Clang and Intel OneAPI compilers.

Workflow tests are currently performed with Quantum Espresso v7.2.0 and PySCF v2.2.0. These check trial wavefunction generation and conversion through to actual QMC runs.

Building with CMake

The build system for QMCPACK is based on CMake. It will auto-configure based on the detected compilers and libraries. When these are installed in standard locations, e.g., /usr, /usr/local, there is no need to set either environment or CMake variables.

See the manual linked at https://qmcpack.readthedocs.io/en/develop/ and https://www.qmcpack.org/documentation or buildable using sphinx from the sources in docs/. A PDF version is still available at https://qmcpack.readthedocs.io/_/downloads/en/develop/pdf/

Quick build

On a standard UNIX-like system such as a Linux workstation:

Safest quick build option is to specify the C and C++ compilers through their MPI wrappers. Here we use Intel MPI and Intel compilers. Move to the build directory, run CMake and make

cd build
cmake -DCMAKE_C_COMPILER=mpiicc -DCMAKE_CXX_COMPILER=mpiicpc ..
make -j 8

Substitute mpicc and mpicxx or other wrapped compiler names to suit your system. e.g. With OpenMPI use

cd build
cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx ..
make -j 8

Non-MPI build:

cd build
cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DQMC_MPI=0 ..
make -j 8

If you are feeling particularly lucky, you can skip the compiler specification:

cd build
cmake ..
make -j 8

The complexities of modern computer hardware and software systems are such that you should check that the auto-configuration system has made good choices and picked optimized libraries and compiler settings before doing significant production. i.e. Check the details below.

Set the environment

A number of environment variables affect the build. In particular, they can control the default paths for libraries, the default compilers, etc. The list of environment variables is given below:

Environment variable	Description
CXX	C++ compiler
CC	C Compiler
MKL_ROOT	Path for MKL
HDF5_ROOT	Path for HDF5
BOOST_ROOT	Path for Boost
FFTW_HOME	Path for FFTW

CMake options

In addition to reading the environment variables, CMake provides a number of optional variables that can be set to control the build and configure steps. When passed to CMake, these variables will take precedent over the environment and default variables. To set them add -D FLAG=VALUE to the configure line between the CMake command and the path to the source directory.

General build options

    CMAKE_C_COMPILER    Set the C compiler
    CMAKE_CXX_COMPILER  Set the C++ compiler
    CMAKE_BUILD_TYPE    A variable which controls the type of build (defaults to Release).
                        Possible values are:
                        None (Do not set debug/optmize flags, use CMAKE_C_FLAGS or CMAKE_CXX_FLAGS)
                        Debug (create a debug build)
                        Release (create a release/optimized build)
                        RelWithDebInfo (create a release/optimized build with debug info)
                        MinSizeRel (create an executable optimized for size)
    CMAKE_SYSTEM_NAME   Set value to CrayLinuxEnvironment when cross-compiling
                        in Cray Programming Environment.
    CMAKE_C_FLAGS       Set the C flags.  Note: to prevent default debug/release flags
                        from being used, set the CMAKE_BUILD_TYPE=None
                        Also supported: CMAKE_C_FLAGS_DEBUG, CMAKE_C_FLAGS_RELEASE,
                                        CMAKE_C_FLAGS_RELWITHDEBINFO
    CMAKE_CXX_FLAGS     Set the C++ flags.  Note: to prevent default debug/release flags
                        from being used, set the CMAKE_BUILD_TYPE=None
                        Also supported: CMAKE_CXX_FLAGS_DEBUG, CMAKE_CXX_FLAGS_RELEASE,
                                        CMAKE_CXX_FLAGS_RELWITHDEBINFO

Key QMCPACK build options

    QMC_COMPLEX           ON/OFF(default). Build the complex (general twist/k-point) version.
    QMC_MIXED_PRECISION   ON/OFF(default). Build the mixed precision (mixing double/float) version
                          Mixed precision calculations can be signifiantly faster but should be
                          carefully checked validated against full double precision runs,
                          particularly for large electron counts.
    ENABLE_OFFLOAD        ON/OFF(default). Enable OpenMP target offload for GPU acceleration.
    ENABLE_CUDA           ON/OFF(default). Enable CUDA code path for NVIDIA GPU acceleration.
                          Production quality for AFQMC and real-space performance portable implementation.
    QMC_CUDA2HIP          ON/OFF(default). Map all CUDA kernels and library calls to HIP and use ROCm libraries.
                          Set both ENABLE_CUDA and QMC_CUDA2HIP ON to target AMD GPUs.
    ENABLE_SYCL           ON/OFF(default). Enable SYCL code path. Only support Intel GPUs and OneAPI compilers.
    QMC_GPU_ARCHS         Specify GPU architectures. For example, "gfx90a" targets AMD MI200 series GPUs.
                          "sm_80;sm_70" creates a single executable running on both NVIDIA A100 and V100 GPUs.
                          Mixing vendor "gfx90a;sm_70" is not supported. If not set, atempt to derive it
                          from CMAKE_CUDA_ARCHITECTURES or CMAKE_HIP_ARCHITECTURES if available and then
                          atempt to auto-detect existing GPUs.

Additional QMCPACK options

     QE_BIN              Location of Quantum Espresso binaries including pw2qmcpack.x
     RMG_BIN             Location of RMG binary
     QMC_DATA            Specify data directory for QMCPACK performance and integration tests
     QMC_INCLUDE         Add extra include paths
     QMC_EXTRA_LIBS      Add extra link libraries
     QMC_BUILD_STATIC    ON/OFF(default). Add -static flags to build
     QMC_SYMLINK_TEST_FILES Set to zero to require test files to be copied. Avoids space
                            saving default use of symbolic links for test files. Useful
                            if the build is on a separate filesystem from the source, as
                            required on some HPC systems.
     ENABLE_TIMERS       ON(default)/OFF. Enable fine-grained timers. Timers are on by default but at level coarse
                         to avoid potential slowdown in tiny systems.
                         For systems beyond tiny sizes (100+ electrons) there is no risk.

libxml2 related

     LIBXML2_INCLUDE_DIR Include directory for libxml2
     LIBXML2_LIBRARY     Libxml2 library

HDF5 related

     HDF5_PREFER_PARALLEL 1(default for MPI build)/0, enables/disable parallel HDF5 library searching.
     ENABLE_PHDF5         1(default for parallel HDF5 library)/0, enables/disable parallel collective I/O.

FFTW related

     FFTW_INCLUDE_DIRS   Specify include directories for FFTW
     FFTW_LIBRARY_DIRS   Specify library directories for FFTW

Example configure and build

In the build directory, run cmake with appropriate options, then make.

Using Intel compilers and their MPI wrappers. Assumes HDF5 and libxml2 will be automatically detected.

cd build
cmake -DCMAKE_C_COMPILER=mpiicc -DCMAKE_CXX_COMPILER=mpiicpc ..
make -j 8

Special notes

It is recommended to create a helper script that contains the configure line for CMake. This is particularly useful when using environment variables, packages are installed in custom locations, or the configure line may be long or complex. In this case it is recommended to add "rm -rf CMake*" before the configure line to remove existing CMake configure files to ensure a fresh configure each time that the script is called. and example script build.sh is given below:

export CXX=mpic++
export CC=mpicc
export HDF5_ROOT=/opt/hdf5
export BOOST_ROOT=/opt/boost

rm -rf CMake*

cmake                                               \
  -D CMAKE_BUILD_TYPE=Debug                         \
  -D LIBXML2_INCLUDE_DIR=/usr/include/libxml2      \
  -D LIBXML2_LIBRARY=/usr/lib/x86_64-linux-gnu/libxml2.so \
  -D FFTW_INCLUDE_DIRS=/usr/include                 \
  -D FFTW_LIBRARY_DIRS=/usr/lib/x86_64-linux-gnu    \
  -D QMC_DATA=/projects/QMCPACK/qmc-data            \
  ..

Additional examples:

Set compile flags manually:

   cmake                                                \
      -D CMAKE_BUILD_TYPE=None                          \
      -D CMAKE_C_COMPILER=mpicc                         \
      -D CMAKE_CXX_COMPILER=mpic++                      \
      -D CMAKE_C_FLAGS="  -O3 -fopenmp -malign-double -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -Wno-deprecated -march=native -mtune=native" \
      -D CMAKE_CXX_FLAGS="-O3 -fopenmp -malign-double -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -Wno-deprecated -march=native -mtune=native" \
      ..

Add extra include directories:

   cmake                                                \
      -D CMAKE_BUILD_TYPE=Release                       \
      -D CMAKE_C_COMPILER=mpicc                         \
      -D CMAKE_CXX_COMPILER=mpic++                      \
      -D QMC_INCLUDE="~/path1;~/path2"                  \
      ..

Testing and validation of QMCPACK

We highly encourage tests to be run before using QMCPACK. Details are given in the QMCPACK manual. QMCPACK includes extensive validation tests to ensure the correctness of the code, compilers, tools, and runtime. The tests should ideally be run each compilation, and certainly before any research use. The tests include checks of the output against known mean-field, quantum chemistry, and other QMC results.

While some tests are fully deterministic, due to QMCPACK's stochastic nature some tests are statistical and can occasionally fail. We employ a range of test names and labeling to differentiate between these, as well as developmental tests that are known to fail. In particular, "deterministic" tests include this in their ctest test name, while tests known to be unstable (stochastically or otherwise) are labeled unstable using ctest labels.

The tests currently use up to 16 cores in various combinations of MPI tasks and OpenMP threads. Current status for many combinations of systems, compilers, and libraries can be checked at https://cdash.qmcpack.org

Note that due to the small electron and walker counts used in the tests, they should not be used for any performance measurements. These should be made on problem sizes that are representative of actual research calculations. As described in the manual, performance tests are provided to aid in monitoring performance.

Run the unit tests

From the build directory, invoke ctest specifying only the unit tests

ctest -j 16 -R unit --output-on-failure

All of these tests should pass within a few minutes. Modify the parallization setting (-j 16) to suit the core count of your system.

Run the deterministic tests

From the build directory, invoke ctest specifying only tests that are deterministic and known to be reliable.

ctest -j 16 -R deterministic -LE unstable --output-on-failure

These tests currently take a few minutes to run, and include all the unit tests. All tests should pass. Failing tests likely indicate a significant problem that should be solved before using QMCPACK further. This ctest invocation can be used as part of an automated installation verification process. Many of the tests use a multiple of 16 processes, so on large core count machines a significant speedup can be obtained with -j 64 etc.

Run the short (quick) tests

From the build directory, invoke ctest specifying only tests including "short" to run that are known to be stable.

ctest -j 16 -R short -LE unstable --output-on-failure

These tests currently take up to around one hour. On average, all tests should pass at a three sigma level of reliability. Any initially failing test should pass when rerun.

Run individual tests

Individual tests can be run by specifying their name

ctest -R name-of-test-to-run

Contributing

Contributions of any size are very welcome. Guidance for contributing to QMCPACK is included in the manual https://qmcpack.readthedocs.io/en/develop/introduction.html#contributing-to-qmcpack. We use a git flow model including pull request reviews. A continuous integration system runs on pull requests. See https://github.com/QMCPACK/qmcpack/wiki for details. For an extensive contribution, it can be helpful to discuss on the Google QMCPACK group, to create a GitHub issue, or to talk directly with a developer in advance.

Contributions are made under the same UIUC/NCSA open source license that covers QMCPACK. Please contact us if this is problematic.

qmcpack's People

Contributors

Stargazers

Watchers

Forkers

lshulen ye-luo markdewing naromero77 paul-st-young jtkrogel scootersmk rcclay anbenali dcyang chrislzhao shivupa hongxiahao91 andreazen333 mmorale3 eneuscamman sehentschel zoowe nsblunt fdmalone weiliweili adbacze pdoakornl manjugv kylanpaa kpesler atillack kencat hungpham2017 j143-zz camelto2 jefflarkin mathemaphysics fabioaffinito gorelov93 leonotis kgasperich cocteautwins whtugithub eminsight bwvdg mcbennet michruggeri jptowns zenandrea jokeren kovalp edeiana recohen xiexiguo tiihonej hyeondeok-shin simonpintarelli chunde henhans prckent addman2 dixitmudit spinedaflores wenchanglu zeta1999 jbatson5 amandadumi gjohnson3 kkly1995 djstaros kryczko 2013311026 correaa ddinge williamfgc eugeneswalker pk-organics quantumsteve walshmm jakurzak ayazskhan kuangllbnu jngkim scorpjd aslozada langzii ghzytp lsvvt kipeters binghuang2018 spflores-ct brtnfld yaohualiu hufngvuowng khou2020 edwin-li-cn hzhou23 manuelgpda python-repository-hub algoskynet rkollataj lerandc pruthvi77 stephan-rohr

qmcpack's Issues

CIPSI and Quantum Package integration writeup needed in manual

Reported by: prckent

A short write up of Anouar's talk demo from the workshop, with references, is needed.

Mixed precision preview

Reported by: ye-luo

Hi all,
The mixed precision has been added for QMCPACK.
Add -D QMC_MIXED_PRECISION=1 to activate it.
Single precision is set as the base precision. Double precision is set as the full precision.

The single precision is used almost everywhere including particle/lattice coordinates, distance tables, wave functions (SPO, determinants, Jastrows), Hamiltonians.
To retain accuracy, a lot of reductions (estimators for energy components, gradient/laplacian of WF), coulumb/pseudo potentials initialization and random walking trajectory updates are in DP.
Uniform and gaussian RNGs are always in DP but not necessary needed for accuracy. They are useful to check MP against DP.

A recompute is introduced to recompute the inverse matrix for determinants from scratch, inversion in DP.
1
By default, recompute at the end of every block as the GPU code.

The short tests in the test suite all pass.
The mixed precision code has been tested mainly on solids with real/complex builds with VMC VMC+drift and DMC runs.
Using SD+J1+J2 wavefunction.
Certain parts of WF optimization needs DP, not fixed yet.

DP is fully DP calculation, SP is DP code with SP spline, MP is mostly SP calculation.

In solid ZrO2 with 144x2 electrons, I checked the complex code.

VMC runs:
LocalEnergy Variance ratio
no drift
DP -953.463912 +/- 0.000598 19.293480 +/- 0.013491 0.0202
SP -953.465124 +/- 0.000534 19.286142 +/- 0.005167 0.0202 Good news to tell
MP-nocompute -953.463879 +/- 0.000757 19.287460 +/- 0.006794 0.0202
MP -953.464368 +/- 0.000717 19.296810 +/- 0.009718 0.0202

with drift
DP -953.464105 +/- 0.000571 19.285185 +/- 0.010345 0.0202
SP -953.464476 +/- 0.000586 19.288203 +/- 0.007079 0.0202
MP-nocompute -953.464267 +/- 0.000510 19.293128 +/- 0.011464 0.0202
MP -953.464501 +/- 0.000635 19.291002 +/- 0.006508 0.0202

DMC runs

tw_id energy error tw_x tw_y tw_z kpoint_id weight

tw0 -955.0420 0.0028 -0.25 0.25 0.25 3 0.5000000
tw1 -955.0408 0.0021 -0.25 -0.25 0.25 2 1.0000000
tw2 -955.0375 0.0025 -0.25 -0.25 -0.25 1 0.5000000

all_tw -955.04027 0.00141
12/ncell -79.58669 0.00012

tw_id energy error tw_x tw_y tw_z kpoint_id weight

tw0 -955.0374 0.0027 -0.25 0.25 0.25 3 0.5000000
tw1 -955.0423 0.0026 -0.25 -0.25 0.25 2 1.0000000
tw2 -955.0344 0.0021 -0.25 -0.25 -0.25 1 0.5000000

all_tw -955.03910 0.00156
12/ncell -79.58659 0.00013

The DMC results are consistent with in 0.1 mHa per formula unit.

In solid TiO2 with 864x2 electrons, VMC runs.
LocalEnergy Variance ratio
cpu-MP-recompute4 -6513.765656 +/- 0.005351 171.717760 +/- 1.120446 0.0264
cpu-MP -6513.767989 +/- 0.006393 170.280852 +/- 0.323286 0.0261
cpu-SP -6513.758155 +/- 0.007937 170.783247 +/- 0.306188 0.0262
cpu-DP -6513.767100 +/- 0.007118 170.458162 +/- 0.217447 0.0262
gpu -6513.756115 +/- 0.009468 170.240843 +/- 0.190607 0.0261

In brief, the accuracy is not compromised with the mixed precision code.
Actually, reducing the recompute frequency doesn't seem hurt the accuracy at all.

By default the mixed precision is switched off, the code should behave as the trunk.
Please provide feedback under this post.
Ye

Porting CPU spline builder to GPU in the GPU_precision_validation branch

Reported by: ye-luo

In this change, the GPU code also take advantage of the advanced spline builder used by the CPU code.
Feasures are loading and fft the wavefunction in parallel, dumping realspace h5 wavefunction....
Tested both real and complex orbitals.
related files:
QMCWaveFunctions/SplineMixedAdoptorReaderP.h
QMCWaveFunctions/MultiGridBsplineSetReader.h
QMCWaveFunctions/SplineAdoptorReaderP.h
QMCWaveFunctions/EinsplineSetBuilder_createSPOs.cpp
QMCWaveFunctions/BsplineReaderBase.h

Clean up output text

Reported by: markdewing

The output text from QMCPACK needs to be structured better, and only what is useful to an end-user should be shown.

The output of the ReportEngine is not likely useful to end-users because it requires internal knowledge of the code to make sense. It might be better if this output were disabled by default, and could be enabled with a command line option (either -v, verbose or -d, debug)
XML fragments - there should be no XML going to stdout

check_scalars.py should print more information after a failure

Reported by: prckent

After failing, check_scalars.py current only prints "fail". ( "fail" is checked for by the ctest scripts. )

Would be beneficial to print more verbose info: what numbers and quantities were compared?

CMake MKL issue

Reported by: ye-luo

The current cmake was not able to find the MKL on my desktop.

The MKL detection rely on the environment variable CPATH.
If the MKL include paths are not added to CPATH, the current FindMKL.cmake fails to build the test code even if MKL is there.
The reason I found is that the -mkl in the try_compile command is not actually added to the try build script.
Please have a look.

/soft/apps/packages/mpich2-1.4.1p1-intel/bin/mpicxx -openmp -Wno-deprecated -opt-prefetch -ftz -xHost -o CMakeFiles/cmTryCompileExec1434585437.dir/src_mkl.cxx.o -c /sandbox/opt/qmcdev/trunk/build_real_intel16/CMakeFiles/CMakeTmp/src_mkl.cxx
/sandbox/opt/qmcdev/trunk/build_real_intel16/CMakeFiles/CMakeTmp/src_mkl.cxx(2): catastrophic error: cannot open source file "mkl.h"
#include <mkl.h>
^
compilation aborted for /sandbox/opt/qmcdev/trunk/build_real_intel16/CMakeFiles/CMakeTmp/src_mkl.cxx (code 4)

Fix the bugs in GPU CostFunction class used by optimization in the GPU_precision_validation branch

Reported by: ye-luo

QMCCostFunctionCUDA::fillOverlapHamiltonianMatrices was not implemented correctly.
All the three matrices Left Right Overlap output by the this routine has been corrected. The results are tested against the CPU code.
Now the J1/2 optimization works on GPU.
related files:
QMCCostFunctionCUDA.cpp

random number generator

Reported by: mcminis1

On my local machine using seed=-1 gets me the same random number seed every time. Is this true for everyone or is it my install that's busted?

svn co fails

Reported by: anbenali

Hi,
I can't svn qmcpack using
svn co https://subversion.assembla.com/svn/qmcdev qmcpack

I have
svn: PROPFIND request failed on '/svn/qmcdev'
svn: PROPFIND of '/svn/qmcdev': could not connect to server (https://subversion.assembla.com)

Thanks

population control for branching

Reported by: mcminis1

I run multiple dmc runs and increase the size of the walker population. the max/min number of walkers per node does not change. targetwalkers does change.

This is frozen to the first value.
Max and mimum walkers per node= 601 61

slurm support on crays

Reported by: prckent

Need to switch from aprun to srun. Likely only extra FIND_PROGRAM(MPIEXEC srun) if aprun not found in CMakeLists.txt

Needed for all NERSC systems.

Regenerating DFT data for diamond tests fails with QE 5.3.0

Reported by: markdewing

In tests/solids/diamondC_1x1x1_pp (and diamondC_2x1x1_pp) when re-running the input file in dft-inputs/, QE 5.3.0 fails with the error:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error in routine cdiaghg (10):
S matrix not positive definite
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Full output is attached.

The psuedopotential file (C.BFD.upf) is not included in the dft-inputs directory. I used the one from pseudopotentials/BFD.

QE 5.2.1 also fails with the same error.

Gamess CI converter fails for nact < 6

Reported by: markdewing

Reported by Kevin Gasperich on the QMC google group, quoted below. Example files also attached.

"I've found a minor issue regarding the QMCPACK converter and GAMESS ALDET type calculations (either SCFTYP=MCSCF and CISTEP=ALDET or CITYP=ALDET). If one is using fewer than six orbitals in the active space (NACT in $DET or $CIDET of the GAMESS input), the QMCPACK converter does not properly parse the GAMESS output.

In searching for a determinant expansion, the converter splits lines in the GAMESS output into strings separated by whitespace. It then looks for the strings "ALPHA", "BETA", and "COEFFICIENT" in positions 0, 2, and 4 of the split line.

For NACT>=6, a line similar to " ALPHA | BETA | COEFFICIENT" will precede the determinant expansion, with varying amounts of whitespace around "ALPHA" and "BETA"; this is parsed correctly.

If NACT=5 the line reads "ALPHA |BETA | COEFFICIENT"

NACT=4:
"ALPHA |BETA | COEFFICIENT"

NACT=3:
"ALPHA|BETA | COEFFICIENT"

NACT=2:
"ALPH|BETA| COEFFICIENT"

For any of these four cases, the lack of whitespace around the pipe characters will cause a problem with the parser: the converter will output "Could not find CI expansion." to stderr, and then it will abort.
The relevant code is in function "getCI" in trunk/src/QMCTools/GamesAsciiParser.cpp

Splitting on "|" instead of whitespace would solve this problem for NACT > 2 (for NACT=2 one would also need to account for the truncated "ALPHA").

CMake: MKL not autodetected on edison.nersc.gov

Reported by: prckent

Following build recipe in manual with default Intel PrgEnv:

module load cray-hdf5
module load cmake
module load fftw
export FFTW_HOME=$FFTW_DIR/..
module load boost
mkdir build_edison
cd build_edison
cmake ..

Same recipe works on cori.nersc.gov where mkl is autodetected. Cray libsci is used for link instead, should be OK performancewise.

DMC stochastic reconfiguration is incorrect

Reported by: prckent

Ye Luo reports odd behavior of stochastic reconfiguration DMC after equilibration suggesting it is not correct. Stochastic reconfiguration is highly desirable when running on large numbers of nodes and near node memory limits.

Stochastic reconfiguration == DMC with fixed total walker count, weights, periodic replacement of walkers with small weights by (split) walkers with large weights.

After verifying standard DMC algorithms, stochastic reconfiguration branch logic etc. needs to be tested, documented: what is the supported and correct way of running stochastic reconfiguration?

Testing: Statistics for reference values should be smaller

Reported by: prckent

Some of the reference error bars are large. Reference values should be checked, error bars from longs runs to be input.

ppconvert is not built by or integrated with the build system

Reported by: prckent

Since this is a required utility for any pseudopotential calculation, should be built by default.

When compiled with fast_math nvcc option, CUDA code incorrect

Reported by: prckent

Total energies in tests are wrong in third digit. Suggests an NVIDIA bug or dangerous code. fast_math should result in last digit changes only.

clean_and_link_h5.sh is dangerous

Reported by: prckent

cat utils/clean_and_link_h5.sh
rm -rf $2
ln -s ../$1 $2

clean_and_link_h5.sh is extremely dangerous. A miscall could be disastrous and take out a users home directory: unprotected rm -rf !!
Scripts such as these should only delete specified files or specific file prefixes/suffixes etc.
Please can you put some protection in so that it only deletes specified files!

Spline bug

Reported by: jtkrogel

Problem:

Large spikes in the LocalEnergy (from Kinetic and NonLocalECP) lead to population explosions in DMC

Physical system:

32 atom rhombohedral cell of MnO
B-spline orbitals from espresso (LDA+U)
Krogel-Santana-Reboredo pseudopotentials

Observations:

Problem reduces in frequecy with increasing meshfactor (vanishes for meshfactor=2.0)
Problem frequency varies w/ U (max occurance at U=5.5 eV)
Several other condensed phase calculations involving Mn do not show this issue
Moving to double precision (cpu) does not affect the issue
A version of QMCPACK prior to the introduction of SplineAdoptor classes segfaults during initialization (revision 5594)

Likely issue:

B-splines are formed incorrectly for a small part of phase/orbital space, leading to large energy spikes

Debugging approach:

Use TraceManager output to find the size of single walker kinetic energy spikes
** Insert the following after and before in the input file:
** Use a Python post-processing tool (pop_traces.py) to plot the per walker LE's/KE's (see pop_traces*.png)
*** set "path" and "prefix" at the head of pop_traces.py
*** Note: requires h5py and LOTS of memory (>30 GB)
Insert sentinal code in e.g. QMCHamiltonian
** Watch for a walker w/ a LE/KE larger than some fraction of the spike (for MnO this is LE<-5000 Ha)
** Write the electron coordinates and the walker's energy components to file
** Useful quantities are: per electron KE's and momenta, orbital values, total energy component values
** Abort after one or more such walkers is found
Debug b-spline orbitals using the identified electron coords
** Use electron configs to evaluate local KE of each orbital
** Hopefully this shows in which spline cell the problem lies

Attached files:

Mn.opt.upf -- KSR PP for Mn in UPF format
Mn.opt.xml -- KSR PP for Mn in FSAtom format
O.opt.upf -- KSR PP for O in UPF format
O.opt.xml -- KSR PP for O in FSAtom format
scf.in -- input to QM Espresso for MnO primitive cell
scf.out -- output from QM Espresso
scf.qsub.in -- original submission file used w/ QM Espresso
p2q.in -- input to pw2qmcpack
p2q.out -- output from pw2qmcpack
p2q.qsub.in -- original submission file used w/ pw2qmcpack
qmc.in.xml -- input to QMCPACK
qmc.out -- log output from QMCPACK
qmc.qsub.in -- original submission file used w/ QMCPACK
pop_traces.py -- tool to plot per walker energies
qmca__energy_vs_step.png -- QMCA trace of DMC local energy vs. ensemble step (qmca -t -q e --noac ./long_trace/*dmc.dat)
qmca__weight_vs_step.png -- QMCA trace of DMC ensemble weight vs. ensemble step (qmca -t -q w --noac ./long_trace/*.dmc.dat)
pop_traces__energy_vs_step.png -- pop_traces plot of per walker local energy vs. step
pop_traces__kinetic_vs_step.png -- pop_traces plot of per walker kinetic energy vs. step

Location of full dataset:

vesta.alcf.anl.gov
/projects/QMCPACk-Training/transfer/spline_bug_MnO

Fix the estimator on GPU, only VMC works in the GPU_precision_validation branch

Reported by: ye-luo

VMC is checked by comparing data with the CPU code.
DMC is still not correct.
related files:
QMCDrivers/VMC/VMC_CUDA.cpp
QMCHamiltonians/DensityEstimator.cpp
QMCHamiltonians/SpinDensity.h
QMCHamiltonians/SpinDensity.cpp

CUDA build not compatible with new adaptive LMY optimizer

Reported by: prckent

Need to have same "state of the art" optimizer in both CPU and CUDA builds.

Disabled in cmake in commit 7334. Appears to be some missing functionality (functions), but likely easy to fix, not requiring new actual GPU code.

AtomicBasisBuilder.h changeset 7101 breaks build on many platforms

Reported by: prckent

At least unexpected use of auto. Possibly this is because C++11 is not force enabled in most builds (BGQ, Clang on mac OK). If this is the only location I would simply use the explicit type, although we could discuss forcing the C++ standard now.

/.../trunk/src/QMCWaveFunctions/MolecularOrbitals/AtomicBasisBuilder.h:90:8: error: ‘tmp_addsignforM’ does not name a type
auto tmp_addsignforM=addsignforM;
^

bcc H all electron test fails with CUDA

Reported by: prckent

Total energy is incorrect due to bad electron-ion energy. Other quantities are OK.

  Start 10: short-bccH_1x1x1_ae-vmc_sdj-1-16

10/24 Test #10: short-bccH_1x1x1_ae-vmc_sdj-1-16 ................................ Passed 118.58 sec
Start 11: short-bccH_1x1x1_ae-vmc_sdj-1-16-totenergy
11/24 Test #11: short-bccH_1x1x1_ae-vmc_sdj-1-16-totenergy ......................***Failed 0.11 sec
Start 12: short-bccH_1x1x1_ae-vmc_sdj-1-16-samples
12/24 Test #12: short-bccH_1x1x1_ae-vmc_sdj-1-16-samples ........................ Passed 0.16 sec

[pk7 @oxygen short-bccH_1x1x1_ae-vmc_sdj-1-16]$ qmca qmc_short.s000.scalar.dat qmc-ref/qmc_short.s000.scalar.dat

qmc_short series 0
LocalEnergy = -1.5590 +/- 0.0013
Variance = 0.876 +/- 0.023
Kinetic = 0.1847 +/- 0.0016
LocalPotential = -1.74372 +/- 0.00054
ElecElec = -0.77596 +/- 0.00030
IonIon = -0.96 +/- 0.00
ElecIon = -0.00487 +/- 0.00042
LocalEnergy_sq = 3.308 +/- 0.020
BlockWeight = 960.00 +/- 0.00
BlockCPU = 0.11195 +/- 0.00023
AcceptRatio = 0.94546 +/- 0.00014
Efficiency = 19690.95 +/- 0.00
TotalTime = 111.95 +/- 0.00
TotalSamples = 960000 +/- 0

qmc-ref/qmc_short series 0
LocalEnergy = -1.83368 +/- 0.00029
Variance = 0.06250 +/- 0.00027
Kinetic = 0.1845 +/- 0.0017
LocalPotential = -2.0182 +/- 0.0016
ElecElec = -0.77579 +/- 0.00030
IonIon = -0.96 +/- 0.00
ElecIon = -0.2795 +/- 0.0015
LocalEnergy_sq = 3.4250 +/- 0.0012
BlockWeight = 960.00 +/- 0.00
BlockCPU = 0.0015212 +/- 0.0000018
AcceptRatio = 0.94410 +/- 0.00014
Efficiency = 29520407.02 +/- 0.00
TotalTime = 1.52 +/- 0.00
TotalSamples = 960000 +/- 0

Prevent Jastrow cutoffs from ever being larger than the inscribing radius of the simulation cell

Reported by: jtkrogel

QMCPACK currently allows Jastrow cutoffs to be any value specified by the user.

Cutoffs larger than the incribing radius of the simulation cell can result in extremely unphysical energies if periodic boundary conditions are used (a new user stumbled across this and routinely got -2040 Ha in VMC instead of the correct -1913 Ha, rcut 2x too large).

The current default behavior if no rcut is provided is wrong: QMCPACK defaults to the Wigner radius, which is often (always?) larger than the inscribing radius. It is not known how badly this choice affects the energy, but given the user's extreme experience, it is clearly best to fix this problem immediately.

Check_scalars.py reports "nan" deviation when test data exactly matches expected data, e.g. TotalSamples

Reported by: prckent

Not likely to be consequential, but good to avoid nans:

5: Test status: pass
5: Tests for series 0
5: Testing quantity: TotalSamples
5: reference mean value : 1600000.00000000
5: reference error bar : 0.00000000
5: computed mean value : 1600000.00000000
5: computed error bar : 0.00000000
5: pass tolerance : 0.00000000 ( 3.00000000 sigma)
5: deviation from reference : 0.00000000 ( -nan sigma)
5: error bar of deviation : 0.00000000
5: significance probability : -nan (gaussian statistics)
5: status of this test : pass

CMAKE_CXX_FLAGS override not working yet/currently

Reported by: prckent

To reproduce:

cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_FLAGS="-fopenmp -malign-double -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -Wno-deprecated -march=native -mtune=native" ..

Will not result in final configured CXX_FLAGS featuring -march=native etc. will be replaced by internal detected versions.

Open BC bug fixed in GPU code in the GPU_precision_validation branch

Reported by: ye-luo

Fixed a very trival bug in QMCWaveFunctions/Jastrow/BsplineJastrowCuda.cu
Now the Open BC with spline WF works as the CPU code.
The code was tested on an isolated atom. the result is compared against CPU results.
We need to include test cases in our ctest to avoid code break.

Spline bug the second

Reported by: jtkrogel

Note: the crystal structure associated with this ticket is to remain private among the developers

Problem

Large VMC variance encountered when using real code at the gamma point
Other real valued twists are fine with the real code
All twists are fine with the complex code (i.e. gamma run w/ real code is broken, exact same input is fine for complex code)

Observations

Problem is apparent w/ Jastrow (twist0 is gamma)


    oahu>qmca -q ev *J3*scalar*
                                LocalEnergy               Variance           ratio 
    dmcJ3_comp_twist0  series 0  -2573.984458 +/- 0.018028   62.769322 +/- 0.380993   0.0244 
    dmcJ3_comp_twist1  series 0  -2573.956292 +/- 0.054766   62.173276 +/- 0.363645   0.0242 

    dmcJ3_real_twist0  series 0  -2272.833662 +/- 2.041285   1099920.558714 +/- 19477.842043   483.9424 
    dmcJ3_real_twist1  series 0  -2574.016425 +/- 0.018762   63.536917 +/- 0.316388   0.0247

And also w/o Jastrow


    oahu>qmca -q ev *J0*scalar*
                                LocalEnergy               Variance           ratio 
    dmcJ0_comp_twist0  series 0  -2554.831042 +/- 0.073880   406.263676 +/- 6.761861   0.1590 
    dmcJ0_comp_twist1  series 0  -2554.869480 +/- 0.046005   407.847141 +/- 5.939818   0.1596 

    dmcJ0_real_twist0  series 0  -2184.008348 +/- 2.959968   1320686.004763 +/- 49900.421665   604.7074 
    dmcJ0_real_twist1  series 0  -2554.944868 +/- 0.106715   404.221088 +/- 3.652783   0.1582

The issue is localized to the kinetic energy


    oahu>qmca -q k dmcJ0_*_twist0.*scalar*
    dmcJ0_comp_twist0  series 0  Kinetic               =  1549.597363 +/- 0.363819 
    dmcJ0_real_twist0  series 0  Kinetic               =  1919.979739 +/- 3.012542 

    oahu>qmca -q p dmcJ0_*_twist0.*scalar*
    dmcJ0_comp_twist0  series 0  LocalPotential        =  -4104.428404 +/- 0.371510 
    dmcJ0_real_twist0  series 0  LocalPotential        =  -4103.988086 +/- 0.514058

In contrast to the first "spline bug", the energy and variance are significantly larger at each and every step

Other details

Workflow is SCF->PW2QMCPACK->QMCPACK
PWSCF version 5.1 was used to generate the wavefunction (LDA+U)
QMCPACK revision 7044 was used for the VMC runs
All runs were performed on EOS at OLCF

Files

File: bug_package.tgz|File: bug_package.tgz -- tar file w/ all files
Original orbital h5 file not included, but it can be transferred
Mn.opt.upf -- Mn PP in upf format
Mn.opt.xml -- Mn PP in FSAtom format
Ni.opt.upf -- Ni PP in upf format
Ni.opt.xml -- Ni PP in FSAtom format
O.opt.upf -- O PP in upf format
O.opt.xml -- O PP in FSAtom format
scf.in -- PWSCF input file
scf.out -- PWSCF log output
p2q.in -- pw2qmcpack input file
p2q.out -- pw2qmcpack log output
Pattern for QMCPACK files
** J0/J3 => No Jastrow/3 body Jastrow
** comp/real => complex code/real code
** twist0/twist1 => gamma twist/non-gamma twist
** .in.xml => input file
** .out => log output
** .qsub.in => EOS submission file
** .s000.scalar.dat => VMC output data
All files


dmcJ0_comp_twist0.in.xml           dmcJ0_real_twist1.qsub.in          dmcJ3_real_twist1.in.xml
dmcJ0_comp_twist0.out              dmcJ0_real_twist1.s000.scalar.dat  dmcJ3_real_twist1.out
dmcJ0_comp_twist0.qsub.in          dmcJ3_comp_twist0.in.xml           dmcJ3_real_twist1.qsub.in
dmcJ0_comp_twist0.s000.scalar.dat  dmcJ3_comp_twist0.out              dmcJ3_real_twist1.s000.scalar.dat
dmcJ0_comp_twist1.in.xml           dmcJ3_comp_twist0.qsub.in          Mn.opt.upf
dmcJ0_comp_twist1.out              dmcJ3_comp_twist0.s000.scalar.dat  Mn.opt.xml
dmcJ0_comp_twist1.qsub.in          dmcJ3_comp_twist1.in.xml           Ni.opt.upf
dmcJ0_comp_twist1.s000.scalar.dat  dmcJ3_comp_twist1.out              Ni.opt.xml
dmcJ0_real_twist0.in.xml           dmcJ3_comp_twist1.qsub.in          O.opt.upf
dmcJ0_real_twist0.out              dmcJ3_comp_twist1.s000.scalar.dat  O.opt.xml
dmcJ0_real_twist0.qsub.in          dmcJ3_real_twist0.in.xml           p2q.in
dmcJ0_real_twist0.s000.scalar.dat  dmcJ3_real_twist0.out              p2q.out
dmcJ0_real_twist1.in.xml           dmcJ3_real_twist0.qsub.in          scf.in
dmcJ0_real_twist1.out              dmcJ3_real_twist0.s000.scalar.dat  scf.out

File: bug_package.tgz|File: bug_package.tgz

Defaults for linear optimization are required

Reported by: jtkrogel

Set sensible defaults for as many input parameters as possible for the "linear" optimizer. Preferable to complete prior to the school as newcomers (or veterans!) shouldn't have to turn many knobs. If this is not simple to do then the optimizer should either be fixed or very detailed explanations/examples of driving the optimizer should be included in the manual.

Can not specify total number of walkers in VMC

Reported by: prckent

The current inputs do not allow for specifying the sum total number of walkers. Inputs are only per MPI, modulo minimum 1 walker/thread default. It is impossible to specify a fixed total amount of work for tests or "strong scaling" or to easily achieve a fixed amount of statistics. Inputs must be changed to achieve this.

VMC, DMC, and other QMCDrivers should be consistent and hand-off to each other correctly.
Check: samples, threading (due to current min walker count/thread)
Should remain possible to specify walkers/mpi because this determines computational efficiency and is how most of us work
Need to be clear in manual what the normalizations are for all inputs (total, per mpi, or per thread)

Suggested fix: total_walkers input

Test failure with mixed precision on KNL

Reported by: naromero77

A number of tests are failing with mixed precision complex. Most of the test seem to be an energy exceeding three sigma.

This is occurring
hyperion build script, configure and build log
File: build_hyperion.sh
File: configure-hyperion.log
File: build-hyperion.log
long tests summary
File: tests_long.log
long tests log file
File: LastTest.log
short tests log file
File: all_tests.log
unit tests log file
File: unit_test.log
on KNL with the Intel 17 Update 1 compiler.

I am attaching my build script and several regression test log file.

Anouar has tested the double precision version and it does not seem to have this issue. The solution to getting that "correct" answer in mixed precision is to run longer --- is this really the path forward?

GPU kernel performance tuned in the GPU_precision_validation branch

Reported by: ye-luo

1, Faster applying the phase in spline
In the complex to real wavefunction case, phases are applied after all the splines are evaluated.
The old implementaion checks whether a spline need to be copied twice or not for the complex to real wavefunction and write back to the memory almost sequentially.
My solution is to compute the copy index only when EinsplineSet class is built and then send to to the GPU, this is a very small piece of memory of constants. This kernel now drops from 20% to ~3% of calculation in both small and large system sizes.
related files:
QMCWaveFunctions/EinsplineSet.h
QMCWaveFunctions/EinsplineSetCuda.cpp
QMCWaveFunctions/PhaseFactors.cu

2, J1/J2 PBC peformance tuning by
a) reducing unnecessy slow memory. faster in any size.
b) use larger block size, 32->128 to incease occupancy. Only notice J1 is slower on small system but less then 0.5% of the total compuational time.
c) distribute work to multiple blocks. Always faster in any size.
related files:
QMCWaveFunctions/Jastrow/BsplineJastrowCudaPBC.cu

3), Determinant update cuda kernel 2 performance tuning. distribute work to multiple blocks to maximize bandwidth utilization. Always faster, gain more on large system.
related files:
QMCWaveFunctions/Fermion/determinant_update.cu

4), Spine performance tuned. reduce register usage. Noticed that more register are used from Cuda 6.5 (or maybe 6.0). Always faster.
einspline/multi_bspline_cuda_c_impl.h

Noticed typically 15% faster for VMC and 30 % faster for DMC.

Enabling reconfiguration DMC on GPU in the GPU_precision_validation branch

Reported by: ye-luo

The main problem before was that the output was written to wrong files due to the fact that the series id was not properly updated.
Now that's fixed. So the same input file used for the CPU code with a few reconfiguration blocks of DMC for equilibration and then polulation functuation DMC for main calculation can be used also in GPU calculatiion. Now the CPU and GPU code have the same reconfiguration.
In addtion, there are inconsistent definition of DMC input max_walkers (currenly not documented) in the code. It has been corrected. The current meaning is maximum number of walkers per node. It could be used to avoid exhausting memory.

related files
QMCDrivers/SimpleFixedNodeBranch.cpp
QMCDrivers/WalkerControlBase.cpp
QMCDrivers/DMC/WalkerControlFactory.cpp

Allows more than needed k points for supertwists in the GPU_precision_validation branch

Reported by: ye-luo

This is a general change for both CPU and GPU calculations.

The twists analyzer fist check all the kpoints in the h5 and then classify them by supertwists based on the tiling matrix.
In a supercell size 4 calculation, 4 kpoints should correspond to 1 super twists. The analyzor check that all the sorted supertwists has 4 kpoints.

When you have one extra k point in your nscf calculation, the analyzer will find that its corresponding supertwists has only one k point, the extra one you just added, and the code aborts.
My change is not to abort the code if this supertwist is not the one needed by the current calculation but only print a message. This allows me to add extra kpoints in my DFT calculation.
I made this change because I wanted to try hybrid functional in pwscf to generate WF and I can't run separate scf and nscf calculation.

CMake: CMake rerun on multiple makes when QMC_CUDA=1

Reported by: prckent

It appears that cmake is being rerun at the make step, at least on occasion, after updates associated with the optimizer merge. This is a change of behavior and very probably a bug.

Can someone reproduce?

[pk7 @oxygen build_gcc_cuda]$ cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DQMC_CUDA=1 ..
-- CMAKE_BUILD_TYPE is RELEASE
-- defining the float point precision
Base precision = double
Full precision = double
CUDA base precision = float
CUDA full precision = double
-- LMY engine is not compatiable with CUDA build! Disabling LMY engine
-- Current revision : 7333 modified on
...
[pk7 @oxygen build_gcc_cuda]$ make -j 24
Scanning dependencies of target qmcfakerng
[ 0%] Building NVCC (Device) object src/einspline/CMakeFiles/einspline.dir/einspline_generated_bspline_create_cuda.cu.o
[ 0%] Building NVCC (Device) object src/einspline/tests/CMakeFiles/cudatests.dir/cudatests_generated_test_cuda.cu.o
Scanning dependencies of target getSupercell
[ 1%] Building NVCC (Device) object src/CMakeFiles/qmcutil.dir/Numerics/CUDA/qmcutil_generated_cuda_inverse.cu.o
[...
[100%] Built target test_hamiltonian
[pk7 @oxygen build_gcc_cuda]$ make # <---- should rescan the dependencies and build nothing. Instead readds the tests, recompiles/links
-- CMAKE_BUILD_TYPE is RELEASE
-- defining the float point precision
Base precision = double
Full precision = double
CUDA base precision = float
...
Adding test long-monoO_1x1x1_pp-vmc_sdj-1-16
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pk7/projects/qmc/qmcdev-fresh/build_gcc_cuda
[ 0%] Building NVCC (Device) object src/einspline/CMakeFiles/einspline.dir/einspline_generated_bspline_create_cuda.cu.o
[ 1%] Building NVCC (Device) object src/einspline/CMakeFiles/einspline.dir/einspline_generated_multi_bspline_create_cuda.cu.o
[ 2%] Linking CXX static library ../../lib/libeinspline.a
[ 5%] Built target einspline
...

Update minimum CMake version to 2.8.11

Reported by: markdewing

When adding the unit testing, I unintentionally used some functionality that is only available in cmake 2.8.11 and later (TARGET_COMPILE_DEFINITIONS). Looking online, this functionality would be difficult to simulate or replace.

The build on Titan is failing because of this issue. The default version of CMake on Titan is 2.8.10.2. Newer versions of cmake are available (2.8.11.2 works, I suspect 3.2.3 works)

The required version of CMake should be updated from 2.10 to 2.11.
Are there any objections to this change?

Vet/fix pseudopotential database

Reported by: jtkrogel

Current (BFD) pseudopotentials were included using files converted from 2011 (perhaps as early as 2009). These converted PPs are untested.

I recently tried the C UPF PP from the database but could not converge graphene in Quantum Espresso. I had converted the PP myself at some other point with ppconvert and it works fine (seemingly).

The pseudopotentials in the database need to be tested. At the very least they should be regenerated with the current version of ppconvert and searched for differences.

Code crashes if spline file not present. Should be error trapped.

Reported by: prckent

As a usability feature, file open of the HDF5 file should be properly error trapped and a sensible error message printed. Code currently crashes, and source of error might not be clear to users.

[ Logical extension of this ticket: error trap every file open in the application ]

Spline memory decomposition between GPU and CPU in the GPU_precision_validation branch

Reported by: ye-luo

Now in the Complex to real spline wavefunction case, it is allowed to set a memory size limit to transfer only part of the spline table to GPU and leave the rest on CPU size and doing zero-copy during computation. This functionality removes the limiation of the GPU memory size. However, the zero-copy introduces performanc penatly. Users show find the balance between the amout of memory used by walkers and spline table.
related files:
einspline/multi_bspline_cuda_c_impl.h
einspline/multi_bspline_structs_cuda.h
einspline/multi_bspline_create_cuda.cu
QMCWaveFunctions/EinsplineSetBuilder_createSPOs.cpp
Platforms/devices.h
CUDA/gpu_misc.cpp
CUDA/gpu_misc.h

Check the maximal number of electrons-ion pairs in the pseudopotential core radius in the GPU_precision_validation branch

Reported by: ye-luo

The limit is hard coded previously in the NonLocalECPotential_CUDA class as MaxPairs = 2 * NumElecs.
Now the limitation has been raised to 3 * NumElecs and the code and the code aborts and print an error message when exceeding the limit.
The vector sized by MaxPairs are storing not only the ion-electron pairs but also ion-quadpoints pairs. So the old limit sometimes creates problmes.
This is useful when studying high pressure systems and elements requires large number of quadpoints.

Fix CSVMC & RMC for Trace update

Reported by: rcclay

Major performance failure with GNU compilers and spline basis

Reported by: prckent

QMC runs using splines and GNU are 2.5-3 times slower than Intel. i.e. Something is badly wrong for performance.
Both Intel 2015 and 2016 give similar performance. gnu 4.8.3 is installed on oxygen where I found this problem.

The problem needs to be reproduced on another machine.

Candidate theories: (1) flags to compilers are wrong, (2) vectorization is badly broken/not activated with GNU builds, (3) quirk on oxygen.

Steps to reproduce: the difference is already visible in the short tests with a similar ratio to the "long" tests. Gaussian basis runs are only moderately slower.

LiH gaussian dimer all electron: 11.2 vs 9.22 secs
LiH 1x1x1 gamma splines: 30.6 vs 9.81 secs
bccH 1x1x1 splines, long : 6260.94 vs 2053.87 secs

Below, output from "ctest --timeout 36000"

gcc:

--- Testing build_gcc
Test project /home/pk7/projects/qmc/qmcdev/trunk/build_gcc
Start 1: short-LiH_dimer_ae-vmc_hf_noj-16-1
1/59 Test #1: short-LiH_dimer_ae-vmc_hf_noj-16-1 .............................. Passed 11.20 sec
...
Start 21: short-LiH_solid_1x1x1_pp-gamma-vmc_hf_noj-1-16
21/59 Test #21: short-LiH_solid_1x1x1_pp-gamma-vmc_hf_noj-1-16 .................. Passed 30.63 sec
...
Start 45: long-bccH_1x1x1_ae-vmc_sdj-1-16
45/59 Test #45: long-bccH_1x1x1_ae-vmc_sdj-1-16 ................................. Passed 6260.94 sec
Start 46: long-bccH_1x1x1_ae-vmc_sdj-1-16-totenergy
46/59 Test #46: long-bccH_1x1x1_ae-vmc_sdj-1-16-totenergy ....................... Passed 0.08 sec
Start 47: long-bccH_1x1x1_ae-vmc_sdj-1-16-samples
47/59 Test #47: long-bccH_1x1x1_ae-vmc_sdj-1-16-samples ......................... Passed 0.14 sec
Start 48: long-diamondC_1x1x1_pp-vmc_sdj-1-16
48/59 Test #48: long-diamondC_1x1x1_pp-vmc_sdj-1-16 ............................. Passed 6913.34 sec
Start 49: long-diamondC_1x1x1_pp-vmc_sdj-1-16-totenergy
49/59 Test #49: long-diamondC_1x1x1_pp-vmc_sdj-1-16-totenergy ................... Passed 0.13 sec
Start 50: long-diamondC_1x1x1_pp-vmc_sdj-1-16-samples
50/59 Test #50: long-diamondC_1x1x1_pp-vmc_sdj-1-16-samples ..................... Passed 0.13 sec
Start 51: long-diamondC_2x1x1_pp-vmc_sdj-1-16
51/59 Test #51: long-diamondC_2x1x1_pp-vmc_sdj-1-16 ............................. Passed 5628.58 sec
Start 52: long-diamondC_2x1x1_pp-vmc_sdj-1-16-totenergy
52/59 Test #52: long-diamondC_2x1x1_pp-vmc_sdj-1-16-totenergy ................... Passed 0.15 sec
Start 53: long-diamondC_2x1x1_pp-vmc_sdj-1-16-samples
53/59 Test #53: long-diamondC_2x1x1_pp-vmc_sdj-1-16-samples ..................... Passed 0.12 sec
Start 54: long-hcpBe_1x1x1_pp-vmc_sdj-1-16
54/59 Test #54: long-hcpBe_1x1x1_pp-vmc_sdj-1-16 ................................ Passed 5712.80 sec
Start 55: long-hcpBe_1x1x1_pp-vmc_sdj-1-16-totenergy
55/59 Test #55: long-hcpBe_1x1x1_pp-vmc_sdj-1-16-totenergy ...................... Passed 0.08 sec
Start 56: long-hcpBe_1x1x1_pp-vmc_sdj-1-16-samples
56/59 Test #56: long-hcpBe_1x1x1_pp-vmc_sdj-1-16-samples ........................ Passed 0.15 sec
Start 57: long-monoO_1x1x1_pp-vmc_sdj-1-16
57/59 Test #57: long-monoO_1x1x1_pp-vmc_sdj-1-16 ................................ Passed 5313.34 sec
Start 58: long-monoO_1x1x1_pp-vmc_sdj-1-16-totenergy
58/59 Test #58: long-monoO_1x1x1_pp-vmc_sdj-1-16-totenergy ...................... Passed 0.09 sec
Start 59: long-monoO_1x1x1_pp-vmc_sdj-1-16-samples
59/59 Test #59: long-monoO_1x1x1_pp-vmc_sdj-1-16-samples ........................ Passed 0.12 sec

100% tests passed, 0 tests failed out of 59

Intel 2016:

--- Testing build_intel2016
Test project /home/pk7/projects/qmc/qmcdev/trunk/build_intel2016
Start 1: short-LiH_dimer_ae-vmc_hf_noj-16-1
1/59 Test #1: short-LiH_dimer_ae-vmc_hf_noj-16-1 .............................. Passed 9.22 sec
...
Start 21: short-LiH_solid_1x1x1_pp-gamma-vmc_hf_noj-1-16
21/59 Test #21: short-LiH_solid_1x1x1_pp-gamma-vmc_hf_noj-1-16 .................. Passed 9.81 sec
...
Start 45: long-bccH_1x1x1_ae-vmc_sdj-1-16
45/59 Test #45: long-bccH_1x1x1_ae-vmc_sdj-1-16 ................................. Passed 2053.87 sec
Start 46: long-bccH_1x1x1_ae-vmc_sdj-1-16-totenergy
46/59 Test #46: long-bccH_1x1x1_ae-vmc_sdj-1-16-totenergy ....................... Passed 0.12 sec
Start 47: long-bccH_1x1x1_ae-vmc_sdj-1-16-samples
47/59 Test #47: long-bccH_1x1x1_ae-vmc_sdj-1-16-samples ......................... Passed 0.14 sec
Start 48: long-diamondC_1x1x1_pp-vmc_sdj-1-16
48/59 Test #48: long-diamondC_1x1x1_pp-vmc_sdj-1-16 ............................. Passed 2052.95 sec
Start 49: long-diamondC_1x1x1_pp-vmc_sdj-1-16-totenergy
49/59 Test #49: long-diamondC_1x1x1_pp-vmc_sdj-1-16-totenergy ................... Passed 0.08 sec
Start 50: long-diamondC_1x1x1_pp-vmc_sdj-1-16-samples
50/59 Test #50: long-diamondC_1x1x1_pp-vmc_sdj-1-16-samples ..................... Passed 0.15 sec
Start 51: long-diamondC_2x1x1_pp-vmc_sdj-1-16
51/59 Test #51: long-diamondC_2x1x1_pp-vmc_sdj-1-16 ............................. Passed 2064.24 sec
Start 52: long-diamondC_2x1x1_pp-vmc_sdj-1-16-totenergy
52/59 Test #52: long-diamondC_2x1x1_pp-vmc_sdj-1-16-totenergy ................... Passed 0.08 sec
Start 53: long-diamondC_2x1x1_pp-vmc_sdj-1-16-samples
53/59 Test #53: long-diamondC_2x1x1_pp-vmc_sdj-1-16-samples ..................... Passed 0.14 sec
Start 54: long-hcpBe_1x1x1_pp-vmc_sdj-1-16
54/59 Test #54: long-hcpBe_1x1x1_pp-vmc_sdj-1-16 ................................ Passed 1940.90 sec
Start 55: long-hcpBe_1x1x1_pp-vmc_sdj-1-16-totenergy
55/59 Test #55: long-hcpBe_1x1x1_pp-vmc_sdj-1-16-totenergy ...................... Passed 0.08 sec
Start 56: long-hcpBe_1x1x1_pp-vmc_sdj-1-16-samples
56/59 Test #56: long-hcpBe_1x1x1_pp-vmc_sdj-1-16-samples ........................ Passed 0.12 sec
Start 57: long-monoO_1x1x1_pp-vmc_sdj-1-16
57/59 Test #57: long-monoO_1x1x1_pp-vmc_sdj-1-16 ................................ Passed 1939.02 sec
Start 58: long-monoO_1x1x1_pp-vmc_sdj-1-16-totenergy
58/59 Test #58: long-monoO_1x1x1_pp-vmc_sdj-1-16-totenergy ...................... Passed 0.08 sec
Start 59: long-monoO_1x1x1_pp-vmc_sdj-1-16-samples
59/59 Test #59: long-monoO_1x1x1_pp-vmc_sdj-1-16-samples ........................ Passed 0.11 sec

100% tests passed, 0 tests failed out of 59

Slow CPU code on Titan

Reported by: ye-luo

Following my discussion with Paul on the GNU performance issue, I just realized that building the cpu verions of qmcpack on titan resulted a slow binary because the AMD vector math library was not linked properly as it was before.
I have to use my old binary and hope someone could privide a build script or get it fixed in the cmake.

General spin dependence for 2-body Bspline Jastrow

Reported by: jtkrogel

Original implementation only allowed for uu==dd + ud correlations (dd was explicitly ignored if entered in input). A recent update (revision 6789) partially fixes this issue as VMC now works with dd parameters supplied (tried with dd params explicitly set to previously optimized uu params).

The remaining problem to fix is that the optimizer does not update dd parameters, e.g. if the initial guess is 0 for all bspline coefficients they remain at zero through all opt cycles and energies/variances remain high compared to a uu==dd + ud jastrow.

See src/QMCWaveFunctions/Jastrow/TwoBodyJastrowOrbital.cpp.

Fully recovers the double precision capability in the GPU_precision_validation branch

Reported by: ye-luo

The double precision compuation of the GPU code is fully recovered.
I noticed last year that the Coulomb interation inclueded in the Hamiltonian always require double precision on medium to large systems.
A macro CUDA_COULOMB_PRECISION was introduced to control this part and is always set double. The performance penalty in this change is neglegible even on small systems.
The precison of the rest part of the GPU code is controlled by CUDA_PRECISION. It was set float to use single precision by default. Now you can use double precsion by add -DCUDA_PRECISION=double after cmake.
related files:
einspline/multi_bspline_eval_cuda.h
einspline/multi_bspline_cuda_d_impl.h
QMCWaveFunctions/Jastrow/BsplineJastrowCudaPBC.cu

Restart/checkpoint memory limits with hdf5

Reported by: jngkim

Restart/checkpoint fixes : bcast fails with too many walkers

Size-consistent t-moves are not implemented

Reported by: prckent

Need to implement and add a switch between the different implementations.

Should assess size of this error between the different schemes.

CASINO has size-consistent t-moves

GCC OpenMPI converter tests failing on oxygen due to missing mpi library / bad path

Reported by: prckent

mpirun being used on converters(?) but some part of environment not set, or wrong mpi is used?

Return code nonzero: 127
Stderr not emptry
/scratch/pk7/QMCPACK_CI_BUILDS_DO_NOT_REMOVE/build_gcc/trunk/build_gcc/bin/convert4qmc: error while loading shared libraries: libmpi_cxx.so.1: cannot open shared object file: No such file or directory

mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.

Traceback (most recent call last):
File "converter_test.py", line 110, in
ret = run_one_converter_test(args.exe)
File "converter_test.py", line 90, in run_one_converter_test
expect_fail)
File "converter_test.py", line 57, in run_test
if not filecmp.cmp(gold_file, test_file):
File "/usr/lib64/python2.7/filecmp.py", line 43, in cmp
s2 = _sig(os.stat(f2))
OSError: [Errno 2] No such file or directory: 'test.Gaussian-G2.xml'

Intel builds with Intel MPI on same system are OK.

QMCPACK no longer builds on edison

Reported by: prckent

Following the recipe in the manual (which worked 1 Feb 2016) we run into what looks to be a compiler settings issue:
Config:
module load cray-hdf5
module load cmake
module load fftw
export FFTW_HOME=$FFTW_DIR/..
module load boost
cmake ..
make -j 8

make output:

[ 5%] Building CXX object src/CMakeFiles/qmcbase.dir/Lattice/Uniform3DGridLayout.cpp.o
/usr/include/c++/4.3/ext/new_allocator.h(114): error: a value of type "long" cannot be used to initialize an entity of type "qmcplusplus::Uniform3DGridLayout::Grid_t *"
{ ::new((void *)__p) _Tp(std::forward<_Args>(__args)...); }
^
detected during:
instantiation of "void __gnu_cxx::new_allocator<_Tp>::construct(__gnu_cxx::new_allocator<_Tp>::pointer, _Args &&...) [with _Tp=qmcplusplus::Uniform3DGridLayout::Grid_t *, _Args=]" at line 704 of "/usr/include/c++/4.3/bits/stl_vector.h"
instantiation of "void std::vector<_Tp, _Alloc>::push_back(_Args &&...) [with _Tp=qmcplusplus::Uniform3DGridLayout::Grid_t *, _Alloc=std::allocator<qmcplusplus::Uniform3DGridLayout::Grid_t *>, _Args=]" at line 88 of "/global/homes/p/pkent/projects/qmc/qmcdev/trunk/src/Lattice/Uniform3DGridLayout.h"

qmcpack / qmcpack Goto Github PK

qmcpack's Introduction

Obtaining and installing QMCPACK

Documentation and support

Learning about Quantum Monte Carlo

Citing QMCPACK

Installation Prerequisites

Building with CMake

Quick build

Set the environment

CMake options

Example configure and build

Special notes

Additional examples:

Testing and validation of QMCPACK

Run the unit tests

Run the deterministic tests

Run the short (quick) tests

Run individual tests

Contributing

qmcpack's People

Contributors

Stargazers

Watchers

Forkers

qmcpack's Issues

tw_id energy error tw_x tw_y tw_z kpoint_id weight

tw0 -955.0420 0.0028 -0.25 0.25 0.25 3 0.5000000 tw1 -955.0408 0.0021 -0.25 -0.25 0.25 2 1.0000000 tw2 -955.0375 0.0025 -0.25 -0.25 -0.25 1 0.5000000

tw_id energy error tw_x tw_y tw_z kpoint_id weight

tw0 -955.0374 0.0027 -0.25 0.25 0.25 3 0.5000000 tw1 -955.0423 0.0026 -0.25 -0.25 0.25 2 1.0000000 tw2 -955.0344 0.0021 -0.25 -0.25 -0.25 1 0.5000000

Return code nonzero: 127 Stderr not emptry /scratch/pk7/QMCPACK_CI_BUILDS_DO_NOT_REMOVE/build_gcc/trunk/build_gcc/bin/convert4qmc: error while loading shared libraries: libmpi_cxx.so.1: cannot open shared object file: No such file or directory

mpirun noticed that the job aborted, but has no info as to the process that caused that situation.

Recommend Projects

Recommend Topics

Recommend Org

tw0 -955.0420 0.0028 -0.25 0.25 0.25 3 0.5000000
tw1 -955.0408 0.0021 -0.25 -0.25 0.25 2 1.0000000
tw2 -955.0375 0.0025 -0.25 -0.25 -0.25 1 0.5000000

tw0 -955.0374 0.0027 -0.25 0.25 0.25 3 0.5000000
tw1 -955.0423 0.0026 -0.25 -0.25 0.25 2 1.0000000
tw2 -955.0344 0.0021 -0.25 -0.25 -0.25 1 0.5000000

Return code nonzero: 127
Stderr not emptry
/scratch/pk7/QMCPACK_CI_BUILDS_DO_NOT_REMOVE/build_gcc/trunk/build_gcc/bin/convert4qmc: error while loading shared libraries: libmpi_cxx.so.1: cannot open shared object file: No such file or directory

mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.