Code Monkey home page Code Monkey logo

elemental's Introduction

Hydrogen

Hydrogen is a fork of Elemental used by LBANN. Hydrogen is a redux of the Elemental functionality that has been ported to make use of GPGPU accelerators. The supported functionality is essentially the core infrastructure plus BLAS-1 and BLAS-3.

Building

Hydrogen builds with a CMake (version 3.9.0 or newer) build system. The build system respects the "normal" CMake variables (CMAKE_CXX_COMPILER, CMAKE_INSTALL_PREFIX, CMAKE_BUILD_TYPE, etc) in addition to the Hydrogen-specific options documented below.

Dependencies

The most basic build of Hydrogen requires only:

  • CMake: Version 3.9.0 or newer.

  • A C++11-compliant compiler.

  • MPI 3.0-compliant MPI library.

  • BLAS: Provides basic linear algebra kernels for the CPU code path.

  • LAPACK: Provides a few utility functions (norms and 2D copies, e.g.). This could be demoted to "optional" status with little effort.

Optional dependencies of Hydrogen include:

  • Aluminum: Provides asynchronous blocking and non-blocking communication routines with an MPI-like syntax. The use of Aluminum is highly recommended.

  • CUDA: Version 9.2 or newer. Hydrogen primarily uses the runtime API and also grabs some features of NVML and NVPROF (if enabled).

  • CUB: Version 1.8.0 is recommended. This will become required for CUDA-enabled builds in the very near future.

  • Half: Provides support for IEEE-754 16-bit precision support. (Note: This is work in progress.)

  • OpenMP: OpenMP 3.0 is probably sufficient for the limited use of the features in Hydrogen.

  • VTune: Proprietary profiler from Intel. May provide more detailed annotations to profiles of Hydrogen CPU code.

Hydrogen CMake options

Some of the options are inherited from Elemental with EL_ replaced by Hydrogen_. Others are unique to Hydrogen. Supported options are:

  • Hydrogen_AVOID_CUDA_AWARE_MPI (Default: OFF): There is a very small amount of logic to try to detect CUDA-aware MPI (it should not give a false-positive but is likey to give a false negative). This option causes the library to ignore this and assume the MPI library is not CUDA-aware.

  • Hydrogen_ENABLE_ALUMINUM (Default: OFF): Enable the Aluminum library for asynchronous device-aware communication. The use of this library is highly recommended for CUDA-enabled builds.

  • Hydrogen_ENABLE_CUDA (Default: OFF): Enable CUDA support in the library. This enables the device type El::Device::GPU and allows memory to reside on CUDA-aware GPGPUs.

  • Hydrogen_ENABLE_CUB (Default: Hydrogen_ENABLE_CUDA): Only available if CUDA is enabled. This enables device memory management through a memory pool using CUB.

  • Hydrogen_ENABLE_HALF (Default: OFF): Enable IEEE-754 "binary16" 16-bit precision floating point support through the Half library.

  • Hydrogen_ENABLE_BFLOAT16 (Default: OFF): This option is a placeholder. This will enable support for "bfloat16" 16-bit precision floating point arithmetic if/when that becomes a thing.

  • Hydrogen_USE_64BIT_INTS (Default: OFF): Use long as the default signed integer type within Hydrogen.

  • Hydrogen_USE_64BIT_BLAS_INTS (Default: OFF): Use long as the default signed integer type for interacting with BLAS libraries.

  • Hydrogen_ENABLE_TESTING (Default: ON): Build the test suite.

  • Hydrogen_ZERO_INIT (Default: OFF): Initialize buffers to zero by default. There will obviously be a compute-time overhead.

  • Hydrogen_ENABLE_NVPROF (Default: OFF): Enable library annotations using the nvtx interface in CUDA.

  • Hydrogen_ENABLE_VTUNE (Default: OFF): Enable library annotations for use with Intel's VTune performance profiler.

  • Hydrogen_ENABLE_SYNCHRONOUS_PROFILING (Default: OFF): Synchronize computation at the beginning of profiling regions.

  • Hydrogen_ENABLE_OPENMP (Default: OFF): Enable OpenMP on-node parallelization primatives. OpenMP is used for CPU parallelization only; the device offload features of modern OpenMP are not used.

  • Hydrogen_ENABLE_OMP_TASKLOOP (Default: OFF): Use omp taskloop instead of omp parallel for. This is a highly experimental feature. Use with caution.

The following options are legacy options inherited from Elemental. The related functionality is not tested regularly. The likely implication of this statement is that nothing specific to this option has been removed from what remains of Elemental but also that nothing specific to these options has been added to any of the new features of Hydrogen.

  • Hydrogen_ENABLE_VALGRIND (Default: OFF): Search for valgrind and enable related features if found.

  • Hydrogen_ENABLE_QUADMATH (Default: OFF): Search for the quadmath library and enable related features if found. This is for extended-precision computations.

  • Hydrogen_ENABLE_QD (Default: OFF): Search for the QD library and enable related features if found. This is for extended-precision computations.

  • Hydrogen_ENABLE_MPC (Default: OFF): Search for the GNU MPC library (requires MPFR and GMP as well) and enable related features if found. This is for extended precision.

  • Hydrogen_USE_CUSTOM_ALLTOALLV (Default: OFF): Avoid MPI_Alltoallv for performance reasons.

  • Hydrogen_AVOID_COMPLEX_MPI (Default: OFF): Avoid potentially buggy complex MPI routines.

  • Hydrogen_USE_BYTE_ALLGATHERS (Default: OFF): Avoid BG/P allgather performance bug.

  • Hydrogen_CACHE_WARNINGS (Default: OFF): Warns when using cache-unfriendly routines.

  • Hydrogen_UNALIGNED_WARNINGS (Default: OFF): Warn when performing unaligned redistributions.

  • Hydrogen_VECTOR_WARNINGS (Default: OFF): Warn when vector redistribution chances are missed.

Example CMake invocation

The following builds a CUDA-enabled, CUB-enabled, Aluminum-enabled version of Hydrogen:

    cmake -GNinja \
        -DCMAKE_BUILD_TYPE=Release \
        -DBUILD_SHARED_LIBS=ON \
        -DCMAKE_INSTALL_PREFIX=/path/to/my/install \
        -DHydrogen_ENABLE_CUDA=ON \
        -DHydrogen_ENABLE_CUB=ON \
        -DHydrogen_ENABLE_ALUMINUM=ON \
        -DCUB_DIR=/path/to/cub \
        -DAluminum_DIR=/path/to/aluminum \
        /path/to/hydrogen
    ninja install

Reporting issues

Issues should be reported on Github.

elemental's People

Contributors

aidangg avatar aj-prime avatar andreasnoack avatar andy-yoo avatar benson31 avatar bvanessen avatar davidsblom avatar davydden avatar dylanmckinney avatar gunney1 avatar jakebolewski avatar jeffhammond avatar jiahao avatar justusc avatar liruipeng avatar mcg1969 avatar mcneish1 avatar mcopik avatar naoyam avatar ndryden avatar poulson avatar restrin avatar rhl- avatar rocanale avatar rools32 avatar tbennun avatar timmoon10 avatar tkelman avatar xantares avatar yingzhouli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elemental's Issues

SendRecv dispatch to Aluminum is wrong for in-place operations.

SendRecv cannot use the same buffer for sendbuf and recvbuf (in plain MPI-land, we'd need to use MPI_Sendrecv_replace for this). However, the in-place SendRecv dispatch to Aluminum uses the same pointer. For HostTransfer, this is fine because the host buffers are distinct, but with NCCL, this causes big problems.

We have identified the issue and are working on a fix.

Issue in headers found when building LBANN

Hi,
I compiled all of the LBANN dependencies from source with cmake. I ran the tests for hydrogen which all passed successfully. Then, when I try to compile LBANN, it gets through approximately 65% of the compilation (CMakeFiles/lbann.dir/src/execution_algorithms/kfac/kfac_block_gru.cpp.o) and then fails due to an issue with a header file from hydrogen (include/El/core/imports/mpi/aluminum_comm.hpp), with the following error message.

[ 66%] Building CXX object CMakeFiles/lbann.dir/src/execution_algorithms/kfac/kfac_block_gru.cpp.o
In file included from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/comm.hpp:8:0,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi.hpp:16,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core.hpp:268,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El.hpp:14,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/lbann-latest/include/lbann/base.hpp:30,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/lbann-latest/include/lbann/execution_algorithms/execution_context.hpp:30,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/lbann-latest/include/lbann/execution_algorithms/kfac/execution_context.hpp:29,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/lbann-latest/include/lbann/execution_algorithms/kfac/kfac_block.hpp:30,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/lbann-latest/include/lbann/execution_algorithms/kfac/kfac_block_gru.hpp:30,
                 from /lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/lbann-latest/src/execution_algorithms/kfac/kfac_block_gru.cpp:28:
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp: In instantiation of ‘El::mpi::AluminumComm::CommSync<BackendT> El::mpi::AluminumComm::GetComm(const hydrogen::SyncInfo<D2>&) const [with BackendT = Al::MPIBackend; hydrogen::Device D = (hydrogen::Device)1]’:
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/lbann-latest/src/execution_algorithms/kfac/kfac_block_gru.cpp:153:69:   required from here
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:207:20: error: no matching function for call to ‘El::mpi::AluminumComm::CommSync<Al::MPIBackend>::CommSync(const hydrogen::SyncInfo<(hydrogen::Device)0>&, const hydrogen::SyncInfo<(hydrogen::Device)1>&, std::__shared_ptr_access<Al::internal::mpi::MPICommunicator, (__gnu_cxx::_Lock_policy)2, false, false>::element_type&)’
             return comm_sync_type(syncinfo, syncinfo_in,
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                   *(comm_map.front().second));
                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:110:9: note: candidate: El::mpi::AluminumComm::CommSync<BackendT>::CommSync(const hydrogen::SyncInfo<El::mpi::AluminumComm::CommSync<BackendT>::D>&, const hydrogen::SyncInfo<El::mpi::AluminumComm::CommSync<BackendT>::D>&, El::mpi::AluminumComm::CommSync<BackendT>::comm_type&) [with BackendT = Al::MPIBackend; El::mpi::AluminumComm::CommSync<BackendT>::comm_type = Al::internal::mpi::MPICommunicator]
         CommSync(SyncInfo<D> const& master, SyncInfo<D> const& other,
         ^~~~~~~~
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:110:9: note:   no known conversion for argument 2 from ‘const hydrogen::SyncInfo<(hydrogen::Device)1>’ to ‘const hydrogen::SyncInfo<(hydrogen::Device)0>&’
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:105:12: note: candidate: constexpr El::mpi::AluminumComm::CommSync<Al::MPIBackend>::CommSync(const El::mpi::AluminumComm::CommSync<Al::MPIBackend>&)
     struct CommSync
            ^~~~~~~~
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:105:12: note:   candidate expects 1 argument, 3 provided
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:105:12: note: candidate: constexpr El::mpi::AluminumComm::CommSync<Al::MPIBackend>::CommSync(El::mpi::AluminumComm::CommSync<Al::MPIBackend>&&)
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:105:12: note:   candidate expects 1 argument, 3 provided
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:211:16: error: no matching function for call to ‘El::mpi::AluminumComm::CommSync<Al::MPIBackend>::CommSync(const hydrogen::SyncInfo<(hydrogen::Device)0>&, const hydrogen::SyncInfo<(hydrogen::Device)1>&, std::__shared_ptr_access<Al::internal::mpi::MPICommunicator, (__gnu_cxx::_Lock_policy)2, false, false>::element_type&)’
         return comm_sync_type(syncinfo, syncinfo_in, *it->second);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:110:9: note: candidate: El::mpi::AluminumComm::CommSync<BackendT>::CommSync(const hydrogen::SyncInfo<El::mpi::AluminumComm::CommSync<BackendT>::D>&, const hydrogen::SyncInfo<El::mpi::AluminumComm::CommSync<BackendT>::D>&, El::mpi::AluminumComm::CommSync<BackendT>::comm_type&) [with BackendT = Al::MPIBackend; El::mpi::AluminumComm::CommSync<BackendT>::comm_type = Al::internal::mpi::MPICommunicator]
         CommSync(SyncInfo<D> const& master, SyncInfo<D> const& other,
         ^~~~~~~~
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:110:9: note:   no known conversion for argument 2 from ‘const hydrogen::SyncInfo<(hydrogen::Device)1>’ to ‘const hydrogen::SyncInfo<(hydrogen::Device)0>&’
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:105:12: note: candidate: constexpr El::mpi::AluminumComm::CommSync<Al::MPIBackend>::CommSync(const El::mpi::AluminumComm::CommSync<Al::MPIBackend>&)
     struct CommSync
            ^~~~~~~~
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:105:12: note:   candidate expects 1 argument, 3 provided
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:105:12: note: candidate: constexpr El::mpi::AluminumComm::CommSync<Al::MPIBackend>::CommSync(El::mpi::AluminumComm::CommSync<Al::MPIBackend>&&)
/lustre/scafellpike/local/HT04543/jxc06/jxw92-jxc06/hydrogen/build_newompi3cuda11/install/include/El/core/imports/mpi/aluminum_comm.hpp:105:12: note:   candidate expects 1 argument, 3 provided
make[2]: *** [CMakeFiles/lbann.dir/build.make:5172: CMakeFiles/lbann.dir/src/execution_algorithms/kfac/kfac_block_gru.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1805: CMakeFiles/lbann.dir/all] Error 2

I am not really sure how to resolve this issue, I thought it was maybe the wrong version of MPI (I was previously using openmpi 4.0.4). I am not really sure if it is an issue with Aluminium, Hydrogen or LBANN. I guess it is something wrong with the final argument *(comm_map.front().second), but not really sure what this is doing. Any help would be greatly appreciated.

Best,
Josh

FYI, Packages used:
openmpi 3.1
gcc 7.2
cuda 11.2
Hydrogen latest version
Aluminium LLNL/Aluminum@8cf6dfb
cmake 3.26.3

Compilation error with Intel 2019.5

Compilation fails at the file Instantiate.cpp because of creation of purely virtual object.
CMake options used:

cmake .. -GNinja -D -MATH_LIBS="-L/opt/intel/mkl/lib/intel64 -mkl" -D CMAKE_CXX_COMPILER=/opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icpc -D CMAKE_C_COMPILER=/opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icc -D CMAKE_Fortran_COMPILER=/opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort -D HYDROGEN_HAVE_MKL=ON

Complete error message:
`
FAILED: CMakeFiles/Hydrogen_CXX.dir/src/core/Instantiate.cpp.o
/opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icpc -I../include -I../include/El -Iinclude -isystem /opt/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/include -fPIC -Wall -Wextra -Wno-unused-parameter -pedantic -std=gnu++14 -MD -MT CMakeFiles/Hydrogen_CXX.dir/src/core/Instantiate.cpp.o -MF CMakeFiles/Hydrogen_CXX.dir/src/core/Instantiate.cpp.o.d -o CMakeFiles/Hydrogen_CXX.dir/src/core/Instantiate.cpp.o -c ../src/core/Instantiate.cpp
../include/El/core/Matrix/impl_cpu.hpp(79): error: object of abstract class type "El::AbstractMatrix<uint8_t={__uint8_t={unsigned char}}>" is not allowed:
function "El::AbstractMatrix::Copy [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::Construct [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::MemorySize [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::GetDevice [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::MemoryMode [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::SetMemoryMode [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::Buffer() [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::Buffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer() const [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) const [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::CRef [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) const [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::Ref [with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) [with T=uint8_t={_uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::do_get
[with T=uint8_t={_uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::do_set
[with T=uint8_t={_uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::do_empty
[with T=uint8_t={_uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::do_resize
[with T=uint8_t={_uint8_t={unsigned char}}]" is a pure virtual function
function "El::AbstractMatrix::do_swap
[with T=uint8_t={__uint8_t={unsigned char}}]" is a pure virtual function
: AbstractMatrix{std::move(A)},
^
detected during instantiation of "El::Matrix<T, hydrogen::Device::CPU>::Matrix(El::Matrix<T, hydrogen::Device::CPU> &&) [with T=uint8_t={__uint8_t={unsigned char}}]" at line 698

../include/El/core/Matrix/impl_cpu.hpp(79): error: object of abstract class type "El::AbstractMatrixEl::Int={int}" is not allowed:
function "El::AbstractMatrix::Copy [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::Construct [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::MemorySize [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::GetDevice [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::MemoryMode [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::SetMemoryMode [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::Buffer() [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::Buffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer() const [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) const [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::CRef [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) const [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::Ref [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::do_get_ [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::do_set_ [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::do_empty_ [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::do_resize_ [with T=El::Int={int}]" is a pure virtual function
function "El::AbstractMatrix::do_swap_ [with T=El::Int={int}]" is a pure virtual function
: AbstractMatrix{std::move(A)},
^
detected during instantiation of "El::Matrix<T, hydrogen::Device::CPU>::Matrix(El::Matrix<T, hydrogen::Device::CPU> &&) [with T=El::Int={int}]" at line 114 of "../include/El/macros/Instantiate.h"

../include/El/core/Matrix/impl_cpu.hpp(79): error: object of abstract class type "El::AbstractMatrix" is not allowed:
function "El::AbstractMatrix::Copy [with T=float]" is a pure virtual function
function "El::AbstractMatrix::Construct [with T=float]" is a pure virtual function
function "El::AbstractMatrix::MemorySize [with T=float]" is a pure virtual function
function "El::AbstractMatrix::GetDevice [with T=float]" is a pure virtual function
function "El::AbstractMatrix::MemoryMode [with T=float]" is a pure virtual function
function "El::AbstractMatrix::SetMemoryMode [with T=float]" is a pure virtual function
function "El::AbstractMatrix::Buffer() [with T=float]" is a pure virtual function
function "El::AbstractMatrix::Buffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) [with T=float]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer() const [with T=float]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) const [with T=float]" is a pure virtual function
function "El::AbstractMatrix::CRef [with T=float]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) const [with T=float]" is a pure virtual function
function "El::AbstractMatrix::Ref [with T=float]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) [with T=float]" is a pure virtual function
function "El::AbstractMatrix::do_get_ [with T=float]" is a pure virtual function
function "El::AbstractMatrix::do_set_ [with T=float]" is a pure virtual function
function "El::AbstractMatrix::do_empty_ [with T=float]" is a pure virtual function
function "El::AbstractMatrix::do_resize_ [with T=float]" is a pure virtual function
function "El::AbstractMatrix::do_swap_ [with T=float]" is a pure virtual function
: AbstractMatrix{std::move(A)},
^
detected during instantiation of "El::Matrix<T, hydrogen::Device::CPU>::Matrix(El::Matrix<T, hydrogen::Device::CPU> &&) [with T=float]" at line 122 of "../include/El/macros/Instantiate.h"

../include/El/core/Matrix/impl_cpu.hpp(79): error: object of abstract class type "El::AbstractMatrix" is not allowed:
function "El::AbstractMatrix::Copy [with T=double]" is a pure virtual function
function "El::AbstractMatrix::Construct [with T=double]" is a pure virtual function
function "El::AbstractMatrix::MemorySize [with T=double]" is a pure virtual function
function "El::AbstractMatrix::GetDevice [with T=double]" is a pure virtual function
function "El::AbstractMatrix::MemoryMode [with T=double]" is a pure virtual function
function "El::AbstractMatrix::SetMemoryMode [with T=double]" is a pure virtual function
function "El::AbstractMatrix::Buffer() [with T=double]" is a pure virtual function
function "El::AbstractMatrix::Buffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) [with T=double]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer() const [with T=double]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) const [with T=double]" is a pure virtual function
function "El::AbstractMatrix::CRef [with T=double]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) const [with T=double]" is a pure virtual function
function "El::AbstractMatrix::Ref [with T=double]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) [with T=double]" is a pure virtual function
function "El::AbstractMatrix::do_get_ [with T=double]" is a pure virtual function
function "El::AbstractMatrix::do_set_ [with T=double]" is a pure virtual function
function "El::AbstractMatrix::do_empty_ [with T=double]" is a pure virtual function
function "El::AbstractMatrix::do_resize_ [with T=double]" is a pure virtual function
function "El::AbstractMatrix::do_swap_ [with T=double]" is a pure virtual function
: AbstractMatrix{std::move(A)},
^
detected during instantiation of "El::Matrix<T, hydrogen::Device::CPU>::Matrix(El::Matrix<T, hydrogen::Device::CPU> &&) [with T=double]" at line 125 of "../include/El/macros/Instantiate.h"

../include/El/core/Matrix/impl_cpu.hpp(79): error: object of abstract class type "El::AbstractMatrix<El::Complex>" is not allowed:
function "El::AbstractMatrix::Copy [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::Construct [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::MemorySize [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::GetDevice [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::MemoryMode [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::SetMemoryMode [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::Buffer() [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::Buffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer() const [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) const [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::CRef [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) const [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::Ref [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_get_ [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_set_ [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_empty_ [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_resize_ [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_swap_ [with T=El::Complex]" is a pure virtual function
: AbstractMatrix{std::move(A)},
^
detected during instantiation of "El::Matrix<T, hydrogen::Device::CPU>::Matrix(El::Matrix<T, hydrogen::Device::CPU> &&) [with T=El::Complex]" at line 146 of "../include/El/macros/Instantiate.h"

../include/El/core/Matrix/impl_cpu.hpp(79): error: object of abstract class type "El::AbstractMatrix<El::Complex>" is not allowed:
function "El::AbstractMatrix::Copy [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::Construct [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::MemorySize [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::GetDevice [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::MemoryMode [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::SetMemoryMode [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::Buffer() [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::Buffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer() const [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::LockedBuffer(El::AbstractMatrix::index_type={El::Int={int}}, El::AbstractMatrix::index_type={El::Int={int}}) const [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::CRef [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) const [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::Ref [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::operator()(El::Int={int}, El::Int={int}) [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_get_ [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_set_ [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_empty_ [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_resize_ [with T=El::Complex]" is a pure virtual function
function "El::AbstractMatrix::do_swap_ [with T=El::Complex]" is a pure virtual function
: AbstractMatrix{std::move(A)},
`
Is there a obvious fix I'm missing?

CMake build fails because of "Unknown option 'pthread'"

I already reported this at CMake
I noticed that Aluminum v0.1 has the same problem.
Am I doing something wrong? I can't see this behavior documented anywhere. Can a fix be implemented inside the CMakeLists.txt for this? "How is Elemental supposed to be built with CMake?". I think, I tried already CMake 3.10, 3.13, and 3.14. All of them have this problem.

Build steps and error:

apt-get -y install libopenblas-dev libopenmpi-dev cmake
wget -qO- https://github.com/LLNL/Elemental/archive/v1.0.1.tar.gz | tar -xz && cd Elemental-* && mkdir build && cd $_
export CUB_DIR=$( find /usr/local/ -name 'cub.cuh' | sed 's|/cub/cub\.cuh||' )
cmake -DHydrogen_USE_64BIT_INTS=ON -DHydrogen_ENABLE_OPENMP=ON -DBUILD_SHARED_LIBS=ON -DHydrogen_ENABLE_ALUMINUM=ON -DHydrogen_ENABLE_CUDA=ON ..
    fatal: not a git repository (or any of the parent directories): .git
    -- Found CUB: /usr/local/cuda-10.0/targets/x86_64-linux/include/thrust/system/cuda/detail
    -- Found OpenMP_CXX: -fopenmp
    -- Aluminum support enabled.
    -- Aluminum detected with NCCL2 backend support.
    -- Aluminum detected with MPI-CUDA backend support.
    -- Using __restrict__ keyword.
    -- Found OpenMP_CXX: -fopenmp
    -- A library with BLAS API found.
    -- A library with LAPACK API found.
    -- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so;/usr/lib/x86_64-linux-gnu/libopenblas.so
    -- Using BLAS with trailing underscore.
    -- Using LAPACK with trailing underscore.
    BUILD SHARED
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /opt/Elemental-1.0.1/build

make -j 4 install
    Scanning dependencies of target Hydrogen
    [  0%] Building CUDA object CMakeFiles/Hydrogen.dir/src/blas_like/level1/GPU/Axpy.cu.o
    nvcc fatal   : Unknown option 'pthread'
    CMakeFiles/Hydrogen.dir/build.make:62: recipe for target 'CMakeFiles/Hydrogen.dir/src/blas_like/level1/GPU/Axpy.cu.o' failed
    make[2]: *** [CMakeFiles/Hydrogen.dir/src/blas_like/level1/GPU/Axpy.cu.o] Error 1
    CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/Hydrogen.dir/all' failed
    make[1]: *** [CMakeFiles/Hydrogen.dir/all] Error 2
    Makefile:140: recipe for target 'all' failed
    make: *** [all] Error 2

Workaround:

cmake <...>
find . -type f | xargs -I{} bash -c 'if grep -q "nvcc.* -pthread" "$0"; then sed -i -r "/nvcc.* -pthread/{s: -pthread( |$): :g}" "$0"; fi' {}
make -j 4 install

Update cmake to find_package(CUDAToolkit)

The current CMakeLists.txt uses both enable_language(CUDA) and find_package(CUDA). The latter is deprecated in favor of FindCUDAToolkit, introduced in cmake 3.17.

A straightforward fix would be to add
find_package(CUDAToolkit)
target_link_libraries(cuda::toolkit INTERFACE CUDA::cublas ...)

but CUB still has to be added manually, and the device-specific compile options should be re-visited too.

error: 'Conj' is missing exception specification 'noexcept'

Build breaks:

/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/Element.cpp:115:23: error: 'Conj' is missing exception specification 'noexcept'
Complex<DoubleDouble> Conj( const Complex<DoubleDouble>& alpha )
                      ^
                                                                 noexcept
/usr/ports/math/elemental/work/Elemental-1.5.3/include/El/core/Element/decl.hpp:342:23: note: previous declaration is here
Complex<DoubleDouble> Conj( const Complex<DoubleDouble>& alpha ) EL_NO_EXCEPT;
                      ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/Element.cpp:122:21: error: 'Conj' is missing exception specification 'noexcept'
Complex<QuadDouble> Conj( const Complex<QuadDouble>& alpha )
                    ^
                                                             noexcept
/usr/ports/math/elemental/work/Elemental-1.5.3/include/El/core/Element/decl.hpp:343:21: note: previous declaration is here
Complex<QuadDouble> Conj( const Complex<QuadDouble>& alpha ) EL_NO_EXCEPT;
                    ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/Element.cpp:130:6: error: 'Conj' is missing exception specification 'noexcept'
void Conj
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/include/El/core/Element/decl.hpp:356:6: note: previous declaration is here
void Conj
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/Element.cpp:137:6: error: 'Conj' is missing exception specification 'noexcept'
void Conj
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/include/El/core/Element/decl.hpp:359:6: note: previous declaration is here
void Conj
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/Element.cpp:146:19: error: 'Conj' is missing exception specification 'noexcept'
Complex<BigFloat> Conj( const Complex<BigFloat>& alpha )
                  ^
                                                         noexcept
/usr/ports/math/elemental/work/Elemental-1.5.3/include/El/core/Element/decl.hpp:346:19: note: previous declaration is here
Complex<BigFloat> Conj( const Complex<BigFloat>& alpha ) EL_NO_EXCEPT;
                  ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/Element.cpp:153:6: error: 'Conj' is missing exception specification 'noexcept'
void Conj( const Complex<BigFloat>& alpha, Complex<BigFloat>& alphaConj )
     ^
                                                                          noexcept
/usr/ports/math/elemental/work/Elemental-1.5.3/include/El/core/Element/decl.hpp:364:6: note: previous declaration is here
void Conj
     ^
6 errors generated.

/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:2562:1: error: explicit instantiation of 'TaggedISend' does not refer to a function template, variable template, member function, member class, or static data member
MPI_PROTO_COMPLEX(Complex<BigFloat>)
^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:2507:5: note: expanded from macro 'MPI_PROTO_COMPLEX'
    MPI_PROTO_DEVICELESS_COMPLEX(T)                    \
    ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:2341:19: note: expanded from macro 'MPI_PROTO_DEVICELESS_COMPLEX'
    template void TaggedISend<T>(                                       \
                  ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:708:6: note: candidate template ignored: could not match 'void (const Complex<BigFloat> *, int, int, int, const Comm &, Request<Complex<BigFloat>> &) noexcept' (aka 'void (const Complex<BigFloat> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<BigFloat>> &) noexcept') against 'void (const Complex<Complex<BigFloat>> *, int, int, int, const Comm &, Request<Complex<Complex<BigFloat>>> &) noexcept' (aka 'void (const Complex<Complex<BigFloat>> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<Complex<BigFloat>>> &) noexcept')
void TaggedISend
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:727:6: note: candidate template ignored: could not match 'void (Complex<BigFloat>, int, int, const Comm &, Request<Complex<BigFloat>> &) noexcept' (aka 'void (Complex<BigFloat>, int, int, const El::mpi::AluminumComm &, Request<Complex<BigFloat>> &) noexcept') against 'void (const Complex<Complex<BigFloat>> *, int, int, int, const Comm &, Request<Complex<Complex<BigFloat>>> &) noexcept' (aka 'void (const Complex<Complex<BigFloat>> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<Complex<BigFloat>>> &) noexcept')
void TaggedISend( T b, int to, int tag, Comm const& comm, Request<T>& request )
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:687:6: note: candidate template ignored: requirement 'IsPacked<El::Complex<El::BigFloat>>::value' was not satisfied [with Real = Complex<BigFloat>]
void TaggedISend
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:673:6: note: candidate template ignored: could not match 'void (const Complex<BigFloat> *, int, int, int, const Comm &, Request<Complex<BigFloat>> &) noexcept' (aka 'void (const Complex<BigFloat> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<BigFloat>> &) noexcept') against 'void (const Complex<Complex<BigFloat>> *, int, int, int, const Comm &, Request<Complex<Complex<BigFloat>>> &) noexcept' (aka 'void (const Complex<Complex<BigFloat>> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<Complex<BigFloat>>> &) noexcept')
void TaggedISend
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:2562:1: error: explicit instantiation of 'TaggedISSend' does not refer to a function template, variable template, member function, member class, or static data member
MPI_PROTO_COMPLEX(Complex<BigFloat>)
^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:2507:5: note: expanded from macro 'MPI_PROTO_COMPLEX'
    MPI_PROTO_DEVICELESS_COMPLEX(T)                    \
    ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:2345:19: note: expanded from macro 'MPI_PROTO_DEVICELESS_COMPLEX'
    template void TaggedISSend<T>(                                      \
                  ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:838:6: note: candidate template ignored: could not match 'void (const Complex<BigFloat> *, int, int, int, const Comm &, Request<Complex<BigFloat>> &) noexcept' (aka 'void (const Complex<BigFloat> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<BigFloat>> &) noexcept') against 'void (const Complex<Complex<BigFloat>> *, int, int, int, const Comm &, Request<Complex<Complex<BigFloat>>> &) noexcept' (aka 'void (const Complex<Complex<BigFloat>> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<Complex<BigFloat>>> &) noexcept')
void TaggedISSend
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:856:6: note: candidate template ignored: could not match 'void (Complex<BigFloat>, int, int, const Comm &, Request<Complex<BigFloat>> &) noexcept' (aka 'void (Complex<BigFloat>, int, int, const El::mpi::AluminumComm &, Request<Complex<BigFloat>> &) noexcept') against 'void (const Complex<Complex<BigFloat>> *, int, int, int, const Comm &, Request<Complex<Complex<BigFloat>>> &) noexcept' (aka 'void (const Complex<Complex<BigFloat>> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<Complex<BigFloat>>> &) noexcept')
void TaggedISSend( T b, int to, int tag, Comm const& comm, Request<T>& request )
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:817:6: note: candidate template ignored: requirement 'IsPacked<El::Complex<El::BigFloat>>::value' was not satisfied [with Real = Complex<BigFloat>]
void TaggedISSend
     ^
/usr/ports/math/elemental/work/Elemental-1.5.3/src/core/imports/mpi.cpp:804:6: note: candidate template ignored: could not match 'void (const Complex<BigFloat> *, int, int, int, const Comm &, Request<Complex<BigFloat>> &) noexcept' (aka 'void (const Complex<BigFloat> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<BigFloat>> &) noexcept') against 'void (const Complex<Complex<BigFloat>> *, int, int, int, const Comm &, Request<Complex<Complex<BigFloat>>> &) noexcept' (aka 'void (const Complex<Complex<BigFloat>> *, int, int, int, const El::mpi::AluminumComm &, Request<Complex<Complex<BigFloat>>> &) noexcept')
void TaggedISSend
     ^

Version: 1.5.3
clang-16
FreeBSD 13.2

Dense linear solve with GPUs

I am new to Elemental and I would like to ask whether El::LinearSolve can support GPUs or is it just running on CPUs? Thank you. Vinh Dang

It isn't clear how to run tests

With -DHydrogen_ENABLE_TESTING=ON test executables are built, but it isn't clear how to run them.

Is there a build target that runs tests and displays their run summary?

Recent changes broke fp16 support.

The uncovered issues so far:

  1. PROTO(cpu_half_type) in the L3500-ish range of TranslateBetweenGrids() should be PROTO_DIFF(cpu_half_type).
  2. Fill_GPU_1D_impl is called with T=half_float::half_float. So... that's not good.

Enum for memory modes

We currently specify memory modes for the Memory class with an unsigned int:

void SetMode(unsigned int mode);

Things are messy since the same number means different things on different devices (mode 1 means pinned memory on CPU and CUB memory pool on GPU). This was not a good design (mea culpa). Imagine an idyllic future with something like:

Matrix<float,Device::CPU> A;
A.MemoryMode(); // Returns memory_mode_type::default
A.SetMemoryMode(memory_mode_type::pinned_memory);

This would require making simultaneous changes in LBANN.

What is the status of this project and what are its goals?

The tl;dr version: what is the status of this project and what are its goals? To the best of my knowledge, Jack no longer actively maintains the root Elemental repo, aside from the occasional PR merge. At first glance, the README for this project looks essentially identical to that of the root Elemental repo, but there are clearly new features added to this fork, like support for the Aluminum GPU communication library. Is the intention to continue to maintain this Elemental fork as long as LBANN is maintained?

The LLNL reduced order modeling library libROM is interested in using this fork of Elemental for its distributed memory version of QR factorization with column pivoting (QRCP), but we'd like to have some idea as to whether this fork will be maintained for the foreseeable future before we invest effort in using it.

Project fails to reconfigure with '-DHydrogen_ENABLE_TESTING=ON'

I first build with -DHydrogen_ENABLE_TESTING=OFF - it succeeds.
Then I reconfigure it with ``````-DHydrogen_ENABLE_TESTING=ON``` in order to build and run tests.
This fails:

===>  Testing for hydrogen-linear-algebra-1.5.1.29
fatal: ambiguous argument 'hydrogen': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
-- Using __restrict__ keyword.
-- Found LAPACK: /usr/local/lib/libopenblas.so;-lpthread;-lm;-ldl
-- Using BLAS with trailing underscore.
-- Using LAPACK with trailing underscore.
-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_TESTING

Most cmake-based projects are able to reconfigure with tests and run tests after build then.

It also isn't clear what make target runs tests. The README doesn't say anything about this.

DifferentGridsGeneral tests need work and documentation.

The setup logic in the DifferentGridsGeneral tests needs to be clarified. There were minimal checks that the grid sizes were compatible with the communicator size, and the few I saw were not fatal (MPI will happily blow up, though).

For now (after #158 merges), the tests are skipped. If it's possible to reenable them, we should. I think these will only make sense to add to CTest with the 4-rank version. A single-rank test cannot have different (meaningful, non-overlapping) grids.

Missing -lgfortran when linking tests

Hello,

I am build Hydrogen using CMake 3.15.1 and GNU 7.4.0 (OpenBLAS v0.3.9, OpenMPI 4.0.1, Aluminum v0.3.3). Everything is built from scratch including GCC, they are all install in their own directories.

From the command line and errors below, it seems that -lgfortran is missing from the LDFLAGS.

FYI: it is not a blocking issue as I just set LDFLAGS=-lgfortran when running cmake.

Thanks,
-Tristan

$ /opt/gcc/7.4.0/release/install/bin/g++    -Wl,-rpath -Wl,/opt/openmpi/4.0.1/release/install/lib -Wl,--enable-new-dtags -pthread CMakeFiles/EntrywiseMap.dir/blas_like/EntrywiseMap.cpp.o  -o EntrywiseMap -Wl,-rpath,/opt/aluminum/v0.3.3/release/install/lib64:/opt/hwloc/hwloc-2.2.0/release/install/lib:/opt/openmpi/4.0.1/release/install/lib ../libHydrogen_CXX.a /opt/aluminum/v0.3.3/release/install/lib64/libAl.so /opt/hwloc/hwloc-2.2.0/release/install/lib/libhwloc.so /opt/gcc/7.4.0/release/install/lib64/libgomp.so -lpthread
 /opt/openmpi/4.0.1/release/install/lib/libmpi.so /opt/openblas/v0.3.9/release/install/lib64/libopenblas.a -lpthread
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(sgesvd.f.o): In function `sgesvd_':
sgesvd.f:(.text+0x45e): undefined reference to `_gfortran_concat_string'
sgesvd.f:(.text+0x12c6): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(shseqr.f.o): In function `shseqr_':
shseqr.f:(.text+0x5eb): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(sormbr.f.o): In function `sormbr_':
sormbr.f:(.text+0x372): undefined reference to `_gfortran_concat_string'
sormbr.f:(.text+0x413): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(sormbr.f.o):sormbr.f:(.text+0x4be): more undefined references to `_gfortran_concat_string' follow
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(slasd0.f.o): In function `slasd0_':
slasd0.f:(.text+0x71a): undefined reference to `_gfortran_pow_i4_i4'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(slasda.f.o): In function `slasda_':
slasda.f:(.text+0x1243): undefined reference to `_gfortran_pow_i4_i4'
slasda.f:(.text+0x1297): undefined reference to `_gfortran_pow_i4_i4'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(dgesvd.f.o): In function `dgesvd_':
dgesvd.f:(.text+0x45e): undefined reference to `_gfortran_concat_string'
dgesvd.f:(.text+0x12c6): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(dhseqr.f.o): In function `dhseqr_':
dhseqr.f:(.text+0x5ee): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(dormbr.f.o): In function `dormbr_':
dormbr.f:(.text+0x372): undefined reference to `_gfortran_concat_string'
dormbr.f:(.text+0x413): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(dormbr.f.o):dormbr.f:(.text+0x4be): more undefined references to `_gfortran_concat_string' follow
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(dlasd0.f.o): In function `dlasd0_':
dlasd0.f:(.text+0x71a): undefined reference to `_gfortran_pow_i4_i4'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(dlasda.f.o): In function `dlasda_':
dlasda.f:(.text+0x1243): undefined reference to `_gfortran_pow_i4_i4'
dlasda.f:(.text+0x1297): undefined reference to `_gfortran_pow_i4_i4'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(cgesvd.f.o): In function `cgesvd_':
cgesvd.f:(.text+0x44c): undefined reference to `_gfortran_concat_string'
cgesvd.f:(.text+0x76d): undefined reference to `_gfortran_concat_string'
cgesvd.f:(.text+0x11d6): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(chseqr.f.o): In function `chseqr_':
chseqr.f:(.text+0x62c): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(ctrevc3.f.o): In function `ctrevc3_':
ctrevc3.f:(.text+0x278): undefined reference to `_gfortran_concat_string'
/opt/openblas/v0.3.9/release/install/lib64/libopenblas.a(cunmbr.f.o):cunmbr.f:(.text+0x39d): more undefined references to `_gfortran_concat_string' follow
collect2: error: ld returned 1 exit status

Support for GPU gathers without the Aluminum MPI-CUDA backend

When I build without the Aluminum MPI-CUDA backend, GPU gathers fail at runtime since NCCL doesn't support gather. I see several options:

  • Write a fallback gather implementation that synchronously copies GPU data to the CPU, calls MPI_Gather, and copies back to the GPU.
  • Make MPI-CUDA a requirement when building Hydrogen with CUDA support. If MPI-CUDA is too unstable, NCCL can take precedence for the operations it supports.
  • How hard is it to make invalid type/device/collective combinations a compile-time error? I'm thinking we could make the default template implementation include a failed static assert.
  • Change the LBANN build script to build with MPI-CUDA by default.

Cannot build LAPACK examples.

I'm unable to build the lapack_like examples due to the following error:

/home/sameer.deshmukh/gitrepos/Elemental/tests/lapack_like/LU.cpp: In function ‘int main(int, char**)’:
/home/sameer.deshmukh/gitrepos/Elemental/tests/lapack_like/LU.cpp:255:27: error: use of deleted function ‘El::mpi::PlainComm::PlainComm(const El::mpi::PlainComm&)’
     mpi::Comm comm = mpi::COMM_WORLD;
                           ^~~~~~~~~~
In file included from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core/imports/mpi/comm.hpp:6:0,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core/imports/mpi.hpp:16,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core.hpp:280,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El.hpp:14,
                 from /home/sameer.deshmukh/gitrepos/Elemental/tests/lapack_like/LU.cpp:9:
/home/sameer.deshmukh/gitrepos/Elemental/include/El/core/imports/mpi/plain_comm.hpp:15:7: note: ‘El::mpi::PlainComm::PlainComm(const El::mpi::PlainComm&)’ is implicitly deleted because the default definition would be ill-formed:
 class PlainComm : public CommImpl<PlainComm>
       ^~~~~~~~~
/home/sameer.deshmukh/gitrepos/Elemental/include/El/core/imports/mpi/plain_comm.hpp:15:7: error: use of deleted function ‘El::mpi::CommImpl<SpecificCommImpl>::CommImpl(const El::mpi::CommImpl<SpecificCommImpl>&) [with SpecificCommImpl = El::mpi::PlainComm]’
In file included from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core/imports/mpi/plain_comm.hpp:5:0,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core/imports/mpi/comm.hpp:6,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core/imports/mpi.hpp:16,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core.hpp:280,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El.hpp:14,
                 from /home/sameer.deshmukh/gitrepos/Elemental/tests/lapack_like/LU.cpp:9:
/home/sameer.deshmukh/gitrepos/Elemental/include/El/core/imports/mpi/comm_impl.hpp:73:5: note: declared here
     CommImpl(CommImpl<SpecificCommImpl> const&) = delete;
     ^~~~~~~~
/home/sameer.deshmukh/gitrepos/Elemental/tests/lapack_like/LU.cpp:285:50: error: use of deleted function ‘El::mpi::PlainComm::PlainComm(const El::mpi::PlainComm&)’
         const Grid grid( comm, gridHeight, order );
                                                  ^
In file included from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core/Matrix/decl.hpp:13:0,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El/core.hpp:315,
                 from /home/sameer.deshmukh/gitrepos/Elemental/include/El.hpp:14,
                 from /home/sameer.deshmukh/gitrepos/Elemental/tests/lapack_like/LU.cpp:9:
/home/sameer.deshmukh/gitrepos/Elemental/include/El/core/Grid.hpp:20:14: note:   initializing argument 1 of ‘El::Grid::Grid(El::mpi::Comm, int, El::GridOrderNS::GridOrder)’
     explicit Grid(mpi::Comm comm, int height, GridOrder order=COLUMN_MAJOR);

On further investigation I found that the src/lapack_like/factor does not get compiled since the folder has been commented out in src/lapack_like/CMakeLists.txt. Has support for LU been removed?

4 tests fail

Test project /usr/ports/math/elemental/work/.build
      Start  1: Axpy.test
 1/46 Test  #1: Axpy.test ........................................   Passed    0.31 sec
      Start  2: Axpy_mpi_np4.test
 2/46 Test  #2: Axpy_mpi_np4.test ................................   Passed    0.68 sec
      Start  3: BasicGemm.test
 3/46 Test  #3: BasicGemm.test ...................................   Passed   14.27 sec
      Start  4: BasicGemm_mpi_np4.test
 4/46 Test  #4: BasicGemm_mpi_np4.test ...........................   Passed   20.96 sec
      Start  5: ColumnNorms.test
 5/46 Test  #5: ColumnNorms.test .................................   Passed    0.29 sec
      Start  6: ColumnNorms_mpi_np4.test
 6/46 Test  #6: ColumnNorms_mpi_np4.test .........................   Passed    0.39 sec
      Start  7: Dot.test
 7/46 Test  #7: Dot.test .........................................   Passed    0.31 sec
      Start  8: Dot_mpi_np4.test
 8/46 Test  #8: Dot_mpi_np4.test .................................   Passed    0.42 sec
      Start  9: EntrywiseMap.test
 9/46 Test  #9: EntrywiseMap.test ................................   Passed    0.34 sec
      Start 10: EntrywiseMap_mpi_np4.test
10/46 Test #10: EntrywiseMap_mpi_np4.test ........................   Passed    0.30 sec
      Start 11: Gemm.test
11/46 Test #11: Gemm.test ........................................   Passed    1.94 sec
      Start 12: Gemm_mpi_np4.test
12/46 Test #12: Gemm_mpi_np4.test ................................   Passed   10.33 sec
      Start 13: Gemm_Suite.test
13/46 Test #13: Gemm_Suite.test ..................................   Passed    0.35 sec
      Start 14: Gemm_Suite_mpi_np4.test
14/46 Test #14: Gemm_Suite_mpi_np4.test ..........................   Passed    0.32 sec
      Start 15: Gemv.test
15/46 Test #15: Gemv.test ........................................   Passed    0.44 sec
      Start 16: Gemv_mpi_np4.test
16/46 Test #16: Gemv_mpi_np4.test ................................   Passed    0.35 sec
      Start 17: Hadamard.test
17/46 Test #17: Hadamard.test ....................................   Passed    0.36 sec
      Start 18: Hadamard_mpi_np4.test
18/46 Test #18: Hadamard_mpi_np4.test ............................   Passed    0.34 sec
      Start 19: BasicBlockDistMatrix.test
19/46 Test #19: BasicBlockDistMatrix.test ........................   Passed    0.30 sec
      Start 20: BasicBlockDistMatrix_mpi_np4.test
20/46 Test #20: BasicBlockDistMatrix_mpi_np4.test ................   Passed    0.40 sec
      Start 21: Constants.test
21/46 Test #21: Constants.test ...................................   Passed    0.36 sec
      Start 22: Constants_mpi_np4.test
22/46 Test #22: Constants_mpi_np4.test ...........................   Passed    0.30 sec
      Start 23: DifferentGrids.test
23/46 Test #23: DifferentGrids.test ..............................   Passed    0.27 sec
      Start 24: DifferentGrids_mpi_np4.test
24/46 Test #24: DifferentGrids_mpi_np4.test ......................   Passed    0.30 sec
      Start 25: DifferentGridsGeneralAllreduce.test
25/46 Test #25: DifferentGridsGeneralAllreduce.test ..............***Failed    0.31 sec
Rank is 0
Optional arguments:
  --colMajor [bool,1,1,NOT found]
    column-major ordering?

  --colMajorSqrt [bool,1,1,NOT found]
    colMajor sqrt?

  --height [int,50,50,NOT found]
    height of matrix

  --width [int,100,100,NOT found]
    width of matrix

  --print [bool,0,0,NOT found]
    print matrices?

  --iters [int,100,100,NOT found]
    Iterations (default:100)?

  --g1Width [int,1,1,NOT found]
    width of grid 1?

  --g1Height [int,2,2,NOT found]
    height of grid 1?

  --warmup [int,10,10,NOT found]
    warmup iterations?

  --numSubGrids [int,2,2,NOT found]
    number of subgrids?

Out of 0 required arguments, 0 were not specified.
Out of 10 optional arguments, 10 were not specified.

[yv:15118] *** An error occurred in MPI_Group_incl
[yv:15118] *** reported by process [3909287937,0]
[yv:15118] *** on communicator MPI_COMM_WORLD
[yv:15118] *** MPI_ERR_RANK: invalid rank
[yv:15118] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[yv:15118] ***    and potentially your MPI job)

      Start 26: DifferentGridsGeneralAllreduce_mpi_np4.test
26/46 Test #26: DifferentGridsGeneralAllreduce_mpi_np4.test ......   Passed    0.38 sec
      Start 27: DifferentGridsGeneralBroadcastAll.test
27/46 Test #27: DifferentGridsGeneralBroadcastAll.test ...........***Failed    0.28 sec
Rank is 0
Optional arguments:
  --colMajor [bool,1,1,NOT found]
    column-major ordering?

  --colMajorSqrt [bool,1,1,NOT found]
    colMajor sqrt?

  --height [int,50,50,NOT found]
    height of matrix

  --width [int,100,100,NOT found]
    width of matrix

  --print [bool,0,0,NOT found]
    print matrices?

  --iters [int,100,100,NOT found]
    Iterations (default:100)?

  --g1Width [int,1,1,NOT found]
    width of grid 1?

  --g2Width [int,1,1,NOT found]
    width of grid 2?

  --g1Height [int,2,2,NOT found]
    height of grid 1?

  --g2Height [int,2,2,NOT found]
    height of grid 2?

  --warmup [int,10,10,NOT found]
    warmup iterations?

  --numvectors [int,10,10,NOT found]
    number of vectors?

Out of 0 required arguments, 0 were not specified.
Out of 12 optional arguments, 12 were not specified.

[yv:15127] *** An error occurred in MPI_Group_incl
[yv:15127] *** reported by process [3909812225,0]
[yv:15127] *** on communicator MPI_COMM_WORLD
[yv:15127] *** MPI_ERR_RANK: invalid rank
[yv:15127] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[yv:15127] ***    and potentially your MPI job)

      Start 28: DifferentGridsGeneralBroadcastAll_mpi_np4.test
28/46 Test #28: DifferentGridsGeneralBroadcastAll_mpi_np4.test ...***Failed    2.33 sec
Rank is 3
Rank is 1
Rank is 2
Rank is 0
Optional arguments:
  --colMajor [bool,1,1,NOT found]
    column-major ordering?

  --colMajorSqrt [bool,1,1,NOT found]
    colMajor sqrt?

  --height [int,50,50,NOT found]
    height of matrix

  --width [int,100,100,NOT found]
    width of matrix

  --print [bool,0,0,NOT found]
    print matrices?

  --iters [int,100,100,NOT found]
    Iterations (default:100)?

  --g1Width [int,1,1,NOT found]
    width of grid 1?

  --g2Width [int,1,1,NOT found]
    width of grid 2?

  --g1Height [int,2,2,NOT found]
    height of grid 1?

  --g2Height [int,2,2,NOT found]
    height of grid 2?

  --warmup [int,10,10,NOT found]
    warmup iterations?

  --numvectors [int,10,10,NOT found]
    number of vectors?

Out of 0 required arguments, 0 were not specified.
Out of 12 optional arguments, 12 were not specified.

[yv:15132] *** An error occurred in MPI_Group_incl
[yv:15132] *** reported by process [3909746689,2]
[yv:15132] *** on communicator MPI_COMM_WORLD
[yv:15132] *** MPI_ERR_RANK: invalid rank
[yv:15132] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[yv:15132] ***    and potentially your MPI job)
[yv.noip.me:15129] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2198
[yv.noip.me:15129] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[yv.noip.me:15129] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

      Start 29: DifferentGridsGeneralGather.test
29/46 Test #29: DifferentGridsGeneralGather.test .................   Passed    0.35 sec
      Start 30: DifferentGridsGeneralGather_mpi_np4.test
30/46 Test #30: DifferentGridsGeneralGather_mpi_np4.test .........   Passed    0.32 sec
      Start 31: DifferentGridsGeneralScatter.test
31/46 Test #31: DifferentGridsGeneralScatter.test ................***Failed    0.23 sec
Rank is 0
Optional arguments:
  --colMajor [bool,1,1,NOT found]
    column-major ordering?

  --colMajorSqrt [bool,1,1,NOT found]
    colMajor sqrt?

  --height [int,50,50,NOT found]
    height of matrix

  --width [int,100,100,NOT found]
    width of matrix

  --print [bool,0,0,NOT found]
    print matrices?

  --iters [int,100,100,NOT found]
    Iterations (default:100)?

  --g1Width [int,1,1,NOT found]
    width of grid 1?

  --g1Height [int,2,2,NOT found]
    height of grid 1?

  --warmup [int,10,10,NOT found]
    warmup iterations?

  --numSubGrids [int,2,2,NOT found]
    number of subgrids?

  --numMatrix [int,2,2,NOT found]
    number of matrices per subgrid?

  --dim [int,2,2,NOT found]
    dimension of slice dim?

Out of 0 required arguments, 0 were not specified.
Out of 12 optional arguments, 12 were not specified.

[yv:15187] *** An error occurred in MPI_Group_incl
[yv:15187] *** reported by process [3913744385,0]
[yv:15187] *** on communicator MPI_COMM_WORLD
[yv:15187] *** MPI_ERR_RANK: invalid rank
[yv:15187] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[yv:15187] ***    and potentially your MPI job)

      Start 32: DifferentGridsGeneralScatter_mpi_np4.test
32/46 Test #32: DifferentGridsGeneralScatter_mpi_np4.test ........   Passed    0.31 sec
      Start 33: Matrix.test
33/46 Test #33: Matrix.test ......................................   Passed    0.39 sec
      Start 34: Matrix_mpi_np4.test
34/46 Test #34: Matrix_mpi_np4.test ..............................   Passed    0.31 sec
      Start 35: Pow.test
35/46 Test #35: Pow.test .........................................   Passed    0.26 sec
      Start 36: Pow_mpi_np4.test
36/46 Test #36: Pow_mpi_np4.test .................................   Passed    0.29 sec
      Start 37: QDToInt.test
37/46 Test #37: QDToInt.test .....................................   Passed    0.38 sec
      Start 38: QDToInt_mpi_np4.test
38/46 Test #38: QDToInt_mpi_np4.test .............................   Passed    0.35 sec
      Start 39: SafeDiv.test
39/46 Test #39: SafeDiv.test .....................................   Passed    0.32 sec
      Start 40: SafeDiv_mpi_np4.test
40/46 Test #40: SafeDiv_mpi_np4.test .............................   Passed    0.34 sec
      Start 41: Version.test
41/46 Test #41: Version.test .....................................   Passed    0.35 sec
      Start 42: Version_mpi_np4.test
42/46 Test #42: Version_mpi_np4.test .............................   Passed    0.51 sec
      Start 43: HermitianEig.test
43/46 Test #43: HermitianEig.test ................................   Passed    0.55 sec
      Start 44: HermitianEig_mpi_np4.test
44/46 Test #44: HermitianEig_mpi_np4.test ........................   Passed    0.49 sec
      Start 45: Testing Sequential Matrix - float
45/46 Test #45: Testing Sequential Matrix - float ................   Passed    0.07 sec
      Start 46: Testing Sequential Matrix - double
46/46 Test #46: Testing Sequential Matrix - double ...............   Passed    0.08 sec

91% tests passed, 4 tests failed out of 46

Total Test time (real) =  63.88 sec

The following tests FAILED:
	 25 - DifferentGridsGeneralAllreduce.test (Failed)
	 27 - DifferentGridsGeneralBroadcastAll.test (Failed)
	 28 - DifferentGridsGeneralBroadcastAll_mpi_np4.test (Failed)
	 31 - DifferentGridsGeneralScatter.test (Failed)
Errors while running CTest
*** Error code 8

Version: 1.5.2
FreeBSD 13.2

cudaDeviceSynchronize within EL_CHECK_MPI

Is this cudaDeviceSynchronize necessary?

EL_CHECK_CUDA(cudaDeviceSynchronize()); \

It seems it gets called every time EL_CHECK_MPI is used no matter it's debug build or not. We've seen extraordinary longer latency with cudaDeviceSynchronize, so for performance reasons we do not want it in performance critical paths.

This may be a correctness problem too. For example, a data exchange between a pair of MPI ranks:

EL_CHECK_MPI(MPI_Irecv to device-memory from rank X);
EL_CHECK_MPI(MPI_Isend of device-memory to rank X);
EL_CHECK_MPI(MPI_Waitall(...));

should work. However, the cudaDeviceSynchronize after MPI_Irecv may not return if MPI_Irecv uses a persistent CUDA kernel that runs until the irecv is completed.

Is there any relevant documentation available?

Is there any relevant documentation available? I couldn't find it on the Hydrogen README, and many pages of Elemental's documentation are inaccessible. I want to update an outdated dependency on the Elemental project to make it compatible with the latest Hydrogen. Can you help me with this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.