Code Monkey home page Code Monkey logo

umpire's Introduction

Umpire Umpire v2024.02.1

Travis Build Status Azure Pipelines Build Status Documentation Status codecov Join the chat at https://gitter.im/LLNL/Umpire

Umpire is a resource management library that allows the discovery, provision, and management of memory on machines with multiple memory devices like NUMA and GPUs.

Umpire uses CMake and BLT to handle builds. Since BLT is included as a submodule, first make sure you run:

$ git submodule init && git submodule update

Then, make sure that you have a modern compiler loaded, and the configuration is as simple as:

$ mkdir build && cd build
$ cmake ..

CMake will provide output about which compiler is being used. Once CMake has completed, Umpire can be built with Make:

$ make

For more advanced configuration you can use standard CMake variables.

Documentation

Both user and code documentation is available here.

The Umpire tutorial provides a step by step introduction to Umpire features.

If you have build problems, we have comprehensive build system documentation too!

Getting Involved

Umpire is an open-source project, and we welcome contributions from the community.

Mailing List

The Umpire mailing list is hosted on Google Groups, and is a great place to ask questions:

Contributions

We welcome all kinds of contributions: new features, bug fixes, documentation edits; it's all great!

To contribute, make a pull request, with develop as the destination branch. We use Travis to run CI tests, and your branch must pass these tests before being merged.

For more information, see the contributing guide.

Authors

Thanks to all of Umpire's contributors.

Umpire was created by David Beckingsale ([email protected]).

Citing Umpire

If you are referencing Umpire in a publication, please use the following citation:

Release

Umpire is released under an MIT license. For more details, please see the LICENSE and RELEASE files.

LLNL-CODE-747640 OCEC-18-031

umpire's People

Contributors

aaroncblack avatar abagusetty avatar adayton1 avatar adrienbernede avatar agcapps avatar aileenperez avatar asidhu0 avatar bd4 avatar bmhan12 avatar calccrypto avatar corbett5 avatar davidbeckingsale avatar evaleev avatar germasch avatar gitter-badger avatar gzagaris avatar kab163 avatar mcfadden8 avatar mdavis36 avatar mrburmark avatar nanzifan avatar noelchalmers avatar pchong90 avatar philipdeegan avatar trws avatar victor-anisimov avatar vsoch avatar white238 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

umpire's Issues

build error on lassen

On lassen, configured with

cmake3 -DCMAKE_INSTALL_PREFIX=../install_xlC_lassen -DENABLE_CUDA=On -DENABLE_OPENMP=Off -DCMAKE_CXX_COMPILER=xlC -DCMAKE_C_COMPILER=xlc -DENABLE_CUDA=On -DCMAKE_CUDA_FLAGS="-arch sm_70" -DENABLE_C=On ../
-- Using CMake version 3.14.7
-- BLT Version: 0.3.6
-- CMake Version: 3.14.7
-- CMake Executable: /usr/bin/cmake3
-- Git Support is ON
-- Git Executable: /usr/tcetmp/bin/git
-- Git Version: 2.29.1
-- MPI Support is OFF
-- OpenMP Support is Off
-- CUDA Support is On
-- CUDA Version:       10.1
-- CUDA Toolkit Root Dir: /usr/tce/packages/cuda/cuda-10.1.243
-- CUDA Compiler:      /usr/tce/packages/cuda/cuda-10.1.243/bin/nvcc
-- CUDA Host Compiler: /usr/tce/packages/xl/xl-2021.03.11/bin/xlC
-- CUDA Include Path:  /usr/tce/packages/cuda/cuda-10.1.243/include
-- CUDA Libraries:     /usr/tce/packages/cuda/cuda-10.1.243/lib64/libcudart_static.a;dl;/usr/lib64/librt.so
-- CUDA Compile Flags: -arch sm_70
-- CUDA Link Flags:    -L/usr/tce/packages/cuda/cuda-10.1.243/lib64
-- CUDA Separable Compilation:  OFF
-- CUDA Link with NVCC:         
-- CUDA Implicit Link Libraries:   
-- CUDA Implicit Link Directories: 
-- HIP Support is Off
-- HCC Support is OFF
-- Sphinx support is ON
-- Valgrind support is ON
-- AStyle support is ON
-- Failed to locate AStyle executable (missing: ASTYLE_EXECUTABLE) 
-- ClangFormat support is ON
-- Failed to locate ClangFormat executable (missing: CLANGFORMAT_EXECUTABLE) 
-- Uncrustify support is ON
-- Failed to locate Uncrustify executable (missing: UNCRUSTIFY_EXECUTABLE) 
-- Yapf support is ON
-- Failed to locate Yapf executable (missing: YAPF_EXECUTABLE) 
-- CMakeFormat support is ON
-- Failed to locate CMakeFormat executable (missing: CMAKEFORMAT_EXECUTABLE) 
-- Cppcheck support is ON
-- Failed to locate Cppcheck executable (missing: CPPCHECK_EXECUTABLE) 
-- ClangQuery support is Off
-- ClangTidy support is ON
-- Failed to locate ClangTidy executable (missing: CLANGTIDY_EXECUTABLE) 
-- C Compiler family is XL
-- Adding optional BLT definitions and compiler flags
-- Setting CMAKE_CXX_EXTENSIONS to Off
-- Standard C++11 selected
-- Enabling all compiler warnings on all targets.
-- Fortran support disabled.
-- CMAKE_C_FLAGS flags are:  -qthreaded    
-- CMAKE_CXX_FLAGS flags are:  -qthreaded -std=c++11     
-- CMAKE_EXE_LINKER_FLAGS flags are:  
-- Google Test Support is ON
-- Google Mock Support is On
-- Memcheck suppressions file: /usr/workspace/li50/Umpire/cmake/valgrind.supp
-- Setting C standard to 99
-- Checking for std::filesystem
-- std::filesystem NOT found, using POSIX
-- Host Shared Memory Disabled
-- Configuring done

Error

[ 20%] Built target blt_cuda_runtime_smoke
/usr/workspace/li50/Umpire/src/umpire/op/CudaMemsetOperation.cpp:41:10: error: no matching constructor for initialization of 'camp::resources::EventProxy<camp::resources::Resource>'
  return camp::resources::EventProxy<camp::resources::Resource>{ctx};
         ^                                                     ~~~~~
/usr/workspace/li50/Umpire/src/umpire/tpl/camp/include/camp/resource.hpp:165:5: note: candidate constructor not viable: no known conversion from 'camp::resources::Resource' to 'camp::resources::v1::Resource *' for 1st argument; take the address of the argument with &
    EventProxy(Res* r) :
    ^
/usr/workspace/li50/Umpire/src/umpire/tpl/camp/include/camp/resource.hpp:160:5: note: candidate constructor not viable: no known conversion from 'camp::resources::Resource' to 'camp::resources::v1::EventProxy<camp::resources::v1::Resource>' for 1st argument
    EventProxy(EventProxy &&) = default;
    ^
/usr/workspace/li50/Umpire/src/umpire/tpl/camp/include/camp/resource.hpp:161:5: note: candidate constructor not viable: no known conversion from 'camp::resources::Resource' to 'const camp::resources::v1::EventProxy<camp::resources::v1::Resource>' for 1st argument
    EventProxy(EventProxy const &) = delete;
    ^
1 error generated.
Error while processing /usr/workspace/li50/Umpire/src/umpire/op/CudaMemsetOperation.cpp.
make[2]: *** [src/umpire/op/CMakeFiles/umpire_op.dir/build.make:258: src/umpire/op/CMakeFiles/umpire_op.dir/CudaMemsetOperation.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Compiling Umpire 6.0.0 from inside TiledArray for CUDA platform leads to cmake failing on non-existing targets

Describe the bug

Compiling Umpire 6.0.0 from inside TiledArray github project for CUDA platform leads to cmake errors:
CMake Error at cmake/SetupUmpireThirdParty.cmake:66 (get_target_property):
get_target_property() called with non-existent target "cuda".

CMake Error at cmake/SetupUmpireThirdParty.cmake:66 (get_target_property):
get_target_property() called with non-existent target "cuda_runtime".

The error comes from Umpire/cmake/SetupUmpireThirdParty.cmake

set(TPL_DEPS)
blt_list_append(TO TPL_DEPS ELEMENTS cuda cuda_runtime IF ENABLE_CUDA)
blt_list_append(TO TPL_DEPS ELEMENTS hip hip_runtime IF ENABLE_HIP)
blt_list_append(TO TPL_DEPS ELEMENTS openmp IF ENABLE_OPENMP)
blt_list_append(TO TPL_DEPS ELEMENTS mpi IF ENABLE_MPI)

foreach(dep ${TPL_DEPS})
# If the target is EXPORTABLE, add it to the export set
get_target_property(_is_imported ${dep} IMPORTED)
if(NOT ${_is_imported})
install(TARGETS ${dep}
EXPORT umpire-targets
DESTINATION lib)
# Namespace target to avoid conflicts
set_target_properties(${dep} PROPERTIES EXPORT_NAME umpire::${dep})
endif()
endforeach()

It should be possible to revise SetupUmpireThirdParty.cmake so that cmake does not fail when Umpire is configured from another project.

To Reproduce

Steps to reproduce the behavior:
Download TiledArray from github
Replace Umpire tag with that of Umpire 6.0.0 in tiledarray/external/versions.cmake
Comment out line GIT_SUBMODULES "" in tiledarray/external/umpire.cmake to allow recursive git clone
Execute TiledArray cmake configuration

The error happens when Umpire is configured as a child cmake process from a parent cmake process. The TiledArray project depends on Umpire. TiledArray downloads, configures, and compiles Umpire library.

The current workaround is to comment out the line
blt_list_append(TO TPL_DEPS ELEMENTS cuda cuda_runtime IF ENABLE_CUDA)
in Umpire/cmake/SetupUmpireThirdParty.cmake

With that, TiledArray project can configure and compile Umpire library.

Expected behavior

A clear and concise description of what you expected to happen:
It should be possible to configure and compile Umpire library from another (parent) project that depends on Umpire.

Compilers & Libraries (please complete the following information):

  • Compiler & version: Clang 14.0.0
  • CUDA version (if applicable): 11.2.0

Additional context

Add any other context about the problem here.
Configuring and compiling Umpire outside of TiledArray project works just fine. The error happens only when Umpire is configured from inside a parent cmake project as a dependency. TiledArray uses ExternalProject_Add(Umpire ..) to add Umpire as an external cmake project. See, tiledarray/external/umpire.cmake in TiledArray project on github.

Error when trying to build with HIP support

Describe the bug
I encounter the following compilation error:

[ 17%] Building CXX object src/umpire/resource/CMakeFiles/umpire_resource.dir/HostResourceFactory.cpp.o
: error: macro names must be identifiers
In file included from /home/users/coe0093/work/umpire/Umpire/src/umpire/alloc/MallocAllocator.hpp:16,
                 from /home/users/coe0093/work/umpire/Umpire/src/umpire/resource/HostResourceFactory.cpp:9:
/opt/rocm-4.1.1/hip/include/hip/hip_runtime_api.h:387:2: error: #error ("Must define exactly one of __HIP_PLATFORM_HCC__ or __HIP_PLATFORM_NVCC__");
 #error("Must define exactly one of __HIP_PLATFORM_HCC__ or __HIP_PLATFORM_NVCC__");
  ^~~~~
/opt/rocm-4.1.1/hip/include/hip/hip_runtime_api.h:412:61: error: ‘hipHostMallocDefault’ was not declared in this scope
                                        unsigned int flags = hipHostMallocDefault) {
                                                             ^~~~~~~~~~~~~~~~~~~~
/opt/rocm-4.1.1/hip/include/hip/hip_runtime_api.h:412:61: note: suggested alternative: ‘hipComputeModeDefault’
                                        unsigned int flags = hipHostMallocDefault) {
                                                             ^~~~~~~~~~~~~~~~~~~~
                                                             hipComputeModeDefault
/opt/rocm-4.1.1/hip/include/hip/hip_runtime_api.h:418:61: error: ‘hipMemAttachGlobal’ was not declared in this scope
                                        unsigned int flags = hipMemAttachGlobal) {

To Reproduce
I configured using the following:

cmake -DENABLE_HIP=On -DENABLE_C=On ../Umpire

Expected behavior

I expect to be able to compile and install Umpire with HIP support on machines with AMD GPUs.

Compilers & Libraries (please complete the following information):

  • gcc 8.3.1 (default system compiler
  • ROCm 4.1.1

Additional context
Here is the output with VERBOSE=1:

[ 17%] Building CXX object src/umpire/resource/CMakeFiles/umpire_resource.dir/HostResourceFactory.cpp.o
cd /home/users/coe0093/work/umpire/build/src/umpire/resource && /usr/bin/c++   -I/home/users/coe0093/work/umpire/Umpire/src/umpire/tpl/camp/include -isystem /opt/rocm-4.1.1/hip/include -I/home/users/coe0093/work/umpire/Umpire/src -I/home/users/coe0093/work/umpire/build/include  -Wpedantic       -Wall -Wextra  -fPIC   -D -Wno-unused-parameter -std=c++11 -o CMakeFiles/umpire_resource.dir/HostResourceFactory.cpp.o -c /home/users/coe0093/work/umpire/Umpire/src/umpire/resource/HostResourceFactory.cpp
: error: macro names must be identifiers
In file included from /home/users/coe0093/work/umpire/Umpire/src/umpire/alloc/MallocAllocator.hpp:16,
                 from /home/users/coe0093/work/umpire/Umpire/src/umpire/resource/HostResourceFactory.cpp:9:
/opt/rocm-4.1.1/hip/include/hip/hip_runtime_api.h:387:2: error: #error ("Must define exactly one of __HIP_PLATFORM_HCC__ or __HIP_PLATFORM_NVCC__");
 #error("Must define exactly one of __HIP_PLATFORM_HCC__ or __HIP_PLATFORM_NVCC__");
  ^~~~~
/opt/rocm-4.1.1/hip/include/hip/hip_runtime_api.h:412:61: error: ‘hipHostMallocDefault’ was not declared in this scope
                                        unsigned int flags = hipHostMallocDefault) {
                                                             ^~~~~~~~~~~~~~~~~~~~
/opt/rocm-4.1.1/hip/include/hip/hip_runtime_api.h:412:61: note: suggested alternative: ‘hipComputeModeDefault’
                                        unsigned int flags = hipHostMallocDefault) {
                                                             ^~~~~~~~~~~~~~~~~~~~
                                                             hipComputeModeDefault
/opt/rocm-4.1.1/hip/include/hip/hip_runtime_api.h:418:61: error: ‘hipMemAttachGlobal’ was not declared in this scope
                                        unsigned int flags = hipMemAttachGlobal) {

So it appears that -D__HIP_PLATFORM_HCC__ is not being passed to the compilation line. But, I also see that hipcc is not being used. So I tried explicitly using hipcc for the C++ compiler, but the configure step did not complete successfully:

cmake -DCMAKE_CXX_COMPILER=/opt/rocm-4.1.1/bin/hipcc -DENABLE_HIP=On ../Umpire/
-- The CXX compiler identification is Clang 12.0.0
-- The C compiler identification is Clang 12.0.0
-- Check for working CXX compiler: /opt/rocm-4.1.1/bin/hipcc
-- Check for working CXX compiler: /opt/rocm-4.1.1/bin/hipcc -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - failed
-- Check for working C compiler: /opt/cray/pe/craype/2.7.7.4/bin/cc
-- Check for working C compiler: /opt/cray/pe/craype/2.7.7.4/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Using CMake version 3.11.4
-- BLT Version: 0.3.6
-- CMake Version: 3.11.4
-- CMake Executable: /usr/bin/cmake
-- Found Git: /usr/bin/git (found version "2.18.4") 
-- Git Support is ON
-- Git Executable: /usr/bin/git
-- Git Version: 2.18.4
-- MPI Support is OFF
-- OpenMP Support is Off
-- CUDA Support is Off
-- HIP Support is On
-- Found HIP: /opt/rocm-4.1.1/hip (found version "4.1.21114-e9025c25") 
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
-- HIP version:      4.1.21114-e9025c25
-- HIP platform:     amd
-- HCC Support is OFF
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- Sphinx support is ON
-- Failed to locate Sphinx executable (missing: SPHINX_EXECUTABLE) 
-- Valgrind support is ON
-- Found Valgrind: /usr/bin/valgrind  
-- AStyle support is ON
-- Failed to locate AStyle executable (missing: ASTYLE_EXECUTABLE) 
-- ClangFormat support is ON
-- Found ClangFormat: /opt/rocm-4.1.1/llvm/bin/clang-format  
-- Uncrustify support is ON
-- Failed to locate Uncrustify executable (missing: UNCRUSTIFY_EXECUTABLE) 
-- Yapf support is ON
-- Failed to locate Yapf executable (missing: YAPF_EXECUTABLE) 
-- CMakeFormat support is ON
-- Failed to locate CMakeFormat executable (missing: CMAKEFORMAT_EXECUTABLE) 
-- Cppcheck support is ON
-- Failed to locate Cppcheck executable (missing: CPPCHECK_EXECUTABLE) 
-- ClangQuery support is Off
-- ClangTidy support is ON
-- Found ClangTidy: /opt/rocm-4.1.1/llvm/bin/clang-tidy  
-- C Compiler family is Clang
-- Adding optional BLT definitions and compiler flags
-- Setting CMAKE_CXX_EXTENSIONS to Off
-- Standard C++11 selected
-- Enabling all compiler warnings on all targets.
-- Fortran support disabled.
-- CMAKE_C_FLAGS flags are:    -Wall -Wextra 
-- CMAKE_CXX_FLAGS flags are:       -Wall -Wextra 
-- CMAKE_EXE_LINKER_FLAGS flags are:  
-- Google Test Support is ON
-- Google Mock Support is On
-- Found PythonInterp: /usr/bin/python (found version "3.6.8") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE  
-- Memcheck suppressions file: /home/users/coe0093/work/umpire/Umpire/cmake/valgrind.supp
-- Setting C standard to 99
-- Checking for std::filesystem
-- Performing Test UMPIRE_ENABLE_FILESYSTEM
-- Performing Test UMPIRE_ENABLE_FILESYSTEM - Failed
-- std::filesystem NOT found, using POSIX
-- Performing Test UMPIRE_HAS_ASAN
-- Performing Test UMPIRE_HAS_ASAN - Success
-- Umpire may be built with ASAN support
-- Configuring done
CMake Error in src/umpire/CMakeLists.txt:
  No known features for CXX compiler

  "Clang"

  version 12.0.0.

Please let me know if I can provide other details. Thanks!

Umpire Segfaults during initialization of DEVICE_CONST

Describe the bug

Umpire Segfaults while creating the DEVICE_CONST allocator.

To Reproduce

I am using CHAI + UMPIRE in a large multiphysics code. Have not attempted to reproduce yet in a smaller executable. The problem occurs during initialization of umpire.

This is on a P8+ P100 system.

Expected behavior

Don't segfault.

Compilers & Libraries (please complete the following information):

  • Compiler & version:
    rzmanta23{probinso}95: /usr/tce/packages/spectrum-mpi/spectrum-mpi-2018.05.18-clang-coral-2018.04.17/bin/mpiclang++ --version
    clang version 3.8.0 (ibmgithub:/CORAL-LLVM-Compilers/clang.git c4747093b1b58b63a096b78ddcd716c7bd7e9c2c) (ibmgithub:/CORAL-LLVM-Compilers/llvm.git aa08e5a3c3670cd86fb4bee034a7626bb26ad57e)
    Target: powerpc64le-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /usr/tce/packages/clang/clang-coral-2018.04.17/ibm/bin

  • CUDA version (if applicable): 9.2.88

Additional context
Umpire version:
f92f367 Merge pull request #39 from LLNL/feature/coalesce-only-when-coalesceable

Stack Trace:

#0 std::operator<< <char, std::char_traits<char>, std::allocator<char> > (__os=..., __str=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/basic_string.h:2777 #1 umpire::util::Logger::logMessage (this=<optimized out>, level=<optimized out>, message=..., fileName=..., line=38) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/util/Logger.cpp:61 #2 0x000000001560bea8 in umpire::resource::CudaConstantMemoryResource::CudaConstantMemoryResource (this=0x4aa1ddd0, name=..., id=<optimized out>, traits=...) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/CudaConstantMemoryResource.cu:38 #3 0x000000001560bb6c in __gnu_cxx::new_allocator<umpire::resource::CudaConstantMemoryResource>::construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (this=<optimized out>, __p=0x4aa1ddd0, __args=..., __args=..., __args=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/ext/new_allocator.h:120 #4 0x000000001560b8cc in std::allocator_traits<std::allocator<umpire::resource::CudaConstantMemoryResource> >::_S_construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> ( __p=<optimized out>, __args=..., __args=..., __args=..., __a=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:253 #5 std::allocator_traits<std::allocator<umpire::resource::CudaConstantMemoryResource> >::construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__p=<optimized out>, __a=..., __args=..., __args=..., __args=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:399 #6 std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (this=<optimized out>, __a=..., __args=..., __args=..., __args=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:515 #7 __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2> >::construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>, std::allocator<umpire::resource::CudaConstantMemoryResource> const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&>(std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>*, std::allocator<umpire::resource::CudaConstantMemoryResource> const&&, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&) (this=<optimized out>, __p=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/ext/new_allocator.h:120 #8 std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2> > >::_S_construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>, std::allocator<umpire::resource::CudaConstantMemoryResource> const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __p=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:253 #9 std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2> > >::construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>, std::allocator<umpire::resource::CudaConstantMemoryResource> const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __p=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:399 #10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> ( this=0x3fffffffb6e8, __a=..., __args=..., __args=..., __args=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:619 #11 0x000000001560b690 in std::__shared_ptr<umpire::resource::CudaConstantMemoryResource, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<umpire::resource::CudaConstantMemoryResource>, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __args=..., __args=..., __args=..., this=<optimized out>, __tag=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:1089 #12 std::shared_ptr<umpire::resource::CudaConstantMemoryResource>::shared_ptr<std::allocator<umpire::resource::CudaConstantMemoryResource>, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __args=<optimized out>, __args=<optimized out>, this=<optimized out>, __tag=..., __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared---Type <return> to continue, or q <return> to quit--- _ptr.h:316 #13 std::allocate_shared<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr.h:587 #14 std::make_shared<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr.h:603 #15 umpire::resource::CudaConstantMemoryResourceFactory::create (this=<optimized out>, id=4) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/CudaConstantMemoryResourceFactory.cpp:48 #16 0x000000001560acb0 in umpire::resource::MemoryResourceRegistry::makeMemoryResource (this=0x4a9f5da0, name=..., id=<optimized out>) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/MemoryResourceRegistry.cpp:50 #17 0x00000000155f44dc in umpire::ResourceManager::initialize (this=0x4a9e5ab0) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:122 #18 0x00000000155f2364 in umpire::ResourceManager::ResourceManager (this=0x4a9e5ab0) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:96 #19 0x00000000155f0fe4 in umpire::ResourceManager::getInstance () at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:50 #20 0x00000000155ebf10 in chai::ArrayManager::ArrayManager (this=0x4a9e59e0) at /g/g18/probinso/ale3d/bugfixday/imports/chai/src/chai/ArrayManager.cpp:66 #21 0x00000000155ebe0c in chai::ArrayManager::getInstance () at /g/g18/probinso/ale3d/bugfixday/imports/chai/src/chai/ArrayManager.cpp:58 #22 0x0000000010787840 in chai::ManagedArray<globalID>::ManagedArray (this=0x46fbff50 <nodemap>)

strategy_tests failure

*Describe the bug

During the execution of strategy_tests, I get the following failure on the fourth iteration of the loop at line 167 of strategy_tests.cpp:
[ERROR][/home/phenning/UmpProj/pkg/Umpire/src/umpire/alloc/CudaMallocAllocator.hpp:35]: allocate cudaMalloc( bytes = 4029677568 ) failed with error: out of memory
/home/phenning/UmpProj/pkg/Umpire/tests/integration/strategy_tests.cpp:172: Failure
Expected: { alloc1 = allocator.allocate(alloc_size); } doesn't throw an exception.
Actual: it throws.
[ FAILED ] Strategy.Device (313179 ms)

The long execution time is due to some debugging code I added, which allowed me to monitor device memory usage with nvidia-smi. What I saw at the top of each iteration on GPU 0 was:
829MiB / 16130MiB
4669MiB / 16130MiB
8511MiB / 16130MiB
12353MiB / 16130MiB (next allocation fails)
After adding a bit more debugging code, I note that the "allocator.deallocate(alloc1)" call never makes it to CudaMallocAllocator::deallocate(), although each allocator.alloc() call does call CudaMallocAllocator::allocate().

Any suggestions as to how I can track this down?

Compilers & Libraries (please complete the following information):

  1. cmake/3.12.4
  2. git/2.17.1
  3. cuda/10.1
  4. gcc/7.3.0
  5. ibm/xlc-16.1.1.2-xlf-16.1.1.2-gcc-7.3.0
  • CUDA version (if applicable):

Additional context
Power9 system (Darwin at lanl)

Tue Jul 16 08:40:08 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000004:04:00.0 Off | 0 |
| N/A 24C P0 31W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000035:03:00.0 Off | 0 |
| N/A 26C P0 33W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------

Add any other context about the problem here.

Installation problems with ENABLE_C_API

Describe the bug

When attempting to install a build with ENABLE_C_API=ON, the umpire.h, shroudrt.hpp, wrapAllocator.h and wrapResourceManager.h files are installed in $PREFIX/include, rather than $PREFIX/include/interface and PREFIX/include/interface/c_fortran.

To Reproduce

cmake \
    -DENABLE_CUDA=OFF -DENABLE_STATISTICS=ON \
    -DENABLE_C_API=ON \
    -DSHROUD_EXECUTABLE=$PREFIX/bin/shroud \
    -DCMAKE_INSTALL_PREFIX=$PREFIX

Error -DENABLE_NUMA=On

Describe the bug

Hello I am trying to install Umpire with Numa Option but I am receiving the error below. If i do it without that option it Works! Do you have an idea?

To Reproduce

cmake .. -DCMAKE_INSTALL_PREFIX=/home/arubio/Umpire -DENABLE_CUDA=On -DENABLE_NUMA=On -DENABLE_STATISTICS=Off -DENABLE_GMOCK=On -DENABLE_TESTS=Off -DENABLE_OPENMP=On -DENABLE_WARNINGS_AS_ERRORS=On -DENABLE_C=On -DENABLE_BENCHMARKS=On -DENABLE_TOOLS=On

make
Error
.
.
.
[ 13%] Building CXX object src/umpire/strategy/CMakeFiles/umpire_strategy.dir/MonotonicAllocationStrategy.cpp.o
[ 13%] Building CXX object src/umpire/strategy/CMakeFiles/umpire_strategy.dir/SlotPool.cpp.o
[ 14%] Building CXX object src/umpire/strategy/CMakeFiles/umpire_strategy.dir/SizeLimiter.cpp.o
[ 14%] Building CXX object src/umpire/strategy/CMakeFiles/umpire_strategy.dir/ThreadSafeAllocator.cpp.o
[ 15%] Building CXX object src/umpire/strategy/CMakeFiles/umpire_strategy.dir/NumaPolicy.cpp.o
In file included from /home/arubio/Umpire/src/umpire/strategy/NumaPolicy.cpp:7:0:
/home/arubio/Umpire/src/umpire/strategy/NumaPolicy.hpp:40:10: erreur: conflicting return type specified for ‘virtual long int umpire::strategy::NumaPolicy::getCurrentSize() const’
long getCurrentSize() const noexcept;
^
In file included from /home/arubio/Umpire/src/umpire/strategy/NumaPolicy.hpp:12:0,
from /home/arubio/Umpire/src/umpire/strategy/NumaPolicy.cpp:7:
/home/arubio/Umpire/src/umpire/strategy/AllocationStrategy.hpp:69:25: erreur: overriding ‘virtual std::size_t umpire::strategy::AllocationStrategy::getCurrentSize() const’
virtual std::size_t getCurrentSize() const noexcept = 0;
^
In file included from /home/arubio/Umpire/src/umpire/strategy/NumaPolicy.cpp:7:0:
/home/arubio/Umpire/src/umpire/strategy/NumaPolicy.hpp:41:10: erreur: conflicting return type specified for ‘virtual long int umpire::strategy::NumaPolicy::getHighWatermark() const’
long getHighWatermark() const noexcept;
^
In file included from /home/arubio/Umpire/src/umpire/strategy/NumaPolicy.hpp:12:0,
from /home/arubio/Umpire/src/umpire/strategy/NumaPolicy.cpp:7:
/home/arubio/Umpire/src/umpire/strategy/AllocationStrategy.hpp:77:25: erreur: overriding ‘virtual std::size_t umpire::strategy::AllocationStrategy::getHighWatermark() const’
virtual std::size_t getHighWatermark() const noexcept = 0;
^
make[2]: *** [src/umpire/strategy/CMakeFiles/umpire_strategy.dir/NumaPolicy.cpp.o] Erreur 1
make[1]: *** [src/umpire/strategy/CMakeFiles/umpire_strategy.dir/all] Erreur 2
make: *** [all] Erreur 2

Support out of source BLT with BLT_SOURCE_DIR

It would be useful if you supported out of source BLT instances. Here is an example of how to do so:

################################
BLT

################################
if (DEFINED BLT_SOURCE_DIR)

Support having a shared BLT outside of the repository if given a BLT_SOURCE_DIR

if (NOT EXISTS ${BLT_SOURCE_DIR}/SetupBLT.cmake)
message(FATAL_ERROR "Given BLT_SOURCE_DIR does not contain SetupBLT.cmake")
endif()

else()

Use internal BLT if no BLT_SOURCE_DIR is given

set(BLT_SOURCE_DIR "${PROJECT_SOURCE_DIR}/cmake/blt" CACHE PATH "")
if (NOT EXISTS ${BLT_SOURCE_DIR}/SetupBLT.cmake)
message(FATAL_ERROR
"The BLT submodule is not present. "
"Run the following two commands in your git repository: \n"
" git submodule init\n"
" git submodule update" )
endif()
endif()

include(${BLT_SOURCE_DIR}/SetupBLT.cmake)

CUDA compilation on Summit with gcc 10.2 and cuda 11.3

What's the correct way to specify the cuda architecture when configuring with cmake. I tried:

-DCMAKE_CUDA_ARCHITECTURES=70

However, the cmake configure fails via:

-- CUDA Support is ON
-- The CUDA compiler identification is NVIDIA 11.3.109
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - failed
-- Check for working CUDA compiler: /sw/summit/cuda/11.3.1/bin/nvcc
-- Check for working CUDA compiler: /sw/summit/cuda/11.3.1/bin/nvcc - broken
CMake Error at /autofs/nccs-svm1_sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-10.2.0/cmake-3.20.2-hvb6iokb3qczolaj7v63ouqnqsd4ecie/share/cmake-3.20/Modules/CMakeTestCUDACompiler.cmake:52 (message):
The CUDA compiler

"/sw/summit/cuda/11.3.1/bin/nvcc"

is not able to compile a simple test program.

It fails with the following output:

Change Dir: /gpfs/alpine/cfd116/scratch/mullowne/Umpire/build_cuda/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_3e6f2/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_3e6f2.dir/build.make CMakeFiles/cmTC_3e6f2.dir/build
gmake[1]: Entering directory '/gpfs/alpine/cfd116/scratch/mullowne/Umpire/build_cuda/CMakeFiles/CMakeTmp'
Building CUDA object CMakeFiles/cmTC_3e6f2.dir/main.cu.o
/sw/summit/cuda/11.3.1/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/sw/summit/gcc/10.2.0-2/bin/g++   -arch sm_  -std=c++11 -MD -MT CMakeFiles/cmTC_3e6f2.dir/main.cu.o -MF CMakeFiles/cmTC_3e6f2.dir/main.cu.o.d -x cu -c /gpfs/alpine/cfd116/scratch/mullowne/Umpire/build_cuda/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_3e6f2.dir/main.cu.o
nvcc fatal   : Value 'sm_' is not defined for option 'gpu-architecture'
gmake[1]: *** [CMakeFiles/cmTC_3e6f2.dir/build.make:79: CMakeFiles/cmTC_3e6f2.dir/main.cu.o] Error 1
gmake[1]: Leaving directory '/gpfs/alpine/cfd116/scratch/mullowne/Umpire/build_cuda/CMakeFiles/CMakeTmp'
gmake: *** [Makefile:127: cmTC_3e6f2/fast] Error 2

CMake will not be able to correctly generate this project.

support for SYCL

Is your feature request related to a problem? Please describe.

Need to be able to execute on Intel GPUs

Describe the solution you'd like

introduce support for SYCL allocations

Describe alternatives you've considered

We are dependent on Umpire, and would like to avoid the need to find alternatives, if possible :)

Additional context

N/A

Compiler update and categories

Goal

I would like to update compilers/toolchains tested of LC systems, for example adding those used by RAJA.
However:

  • Theres a lot of them, some probably useless.
  • There is no need to test all of them for each change, we could test all of them nightly.

That’s why I would like to classify them it two categories: core/main/basic and extension/secondary/advanced.

Current state

On toss3 machines, the list of compilers/toolchains would be:

Umpire

  • clang_3_9_1
  • clang_4_0_0
  • cudatoolkit_9_1
  • gcc_4_9_3
  • gcc_6_1_0
  • gcc_7_1_0
  • icpc_16_0_4
  • icpc_17_0_2
  • icpc_18_0_0
  • pgi_17_10
  • pgi_18_5

from RAJA

  • clang_6_0_0
  • clangcuda_6_0_0_nvcc_8_0
  • gcc_7_3_0
  • gcc_8_1_0
  • icpc_18_0_2_gcc7headers
  • icpc_19_0_4_gcc8headers

On blueOS machines this list currently is:

Umpire

  • clang_3_9_1
  • clang_4_0_0
  • clang_coral_2017_06_29
  • clang_coral_2017_08_31
  • clang_coral_2017_09_06
  • clang_coral_2017_09_18
  • gcc_4_9_3
  • nvcc_clang_coral_2017_06_29
  • nvcc_clang_coral_2017_08_31
  • nvcc_clang_coral_2017_09_06
  • nvcc_clang_coral_2017_09_18
  • nvcc_gcc_4_9_3
  • nvcc_xl-beta-2017.09.13

from RAJA

  • clang_6_0_0
  • clang_coral_2018_08_08
  • clang_upstream_2018_12_03
  • clangcuda_upstream_2018_12_03_nvcc_9_2
  • gcc_7_3_1
  • nvcc_10_clang_6_0_0
  • nvcc_10_gcc_7_3_1
  • nvcc_10_xl_2019_02_07
  • nvcc_10_xl_2019_04_19
  • nvcc_10_xl_2019_06_12
  • nvcc_9_1_clang_coral_2018_08_08
  • nvcc_9_2_clang_6_0_0
  • nvcc_9_2_clang_coral_2018_08_08
  • nvcc_9_2_clang_upstream_2018_12_03
  • nvcc_9_2_gcc_7_3_1
  • nvcc_9_2_xl_2019_02_07
  • nvcc_9_2_xl_2019_04_19
  • nvcc_9_2_xl_2019_06_12
  • pgi_19_7
  • xl_2019_02_07
  • xl_2019_04_19
  • xl_2019_06_12

Additional considerations:

  • RAJA toolchains use different CUDA versions, which I would like to do with Umpire as well.
  • The selection could also serve to identify which toolchains we would like to test for Spack packaging.

Sum-up:

I can add all toolchains fairly easily. But I need a selection that would be used on every PR.
Then we should remove unnecessary toolchains.

Compiler error when enabling CUDA

Describe the bug

I encountered the following compiler error when trying to compile Umpire:

In file included from /fs/hypre/Umpire/src/umpire/tpl/camp/include/camp/camp.hpp:17,
from /fs/hypre/Umpire/tests/integration/test_helpers.hpp:9,
from /fs/hypre/Umpire/tests/integration/operation_tests.cpp:7:
/fs/hypre/Umpire/src/umpire/tpl/camp/include/camp/defines.hpp: In function 'cudaError_t camp::cudaAssert(cudaError_t, const char*, const char*, int)':
/fs/hypre/Umpire/src/umpire/tpl/camp/include/camp/defines.hpp:203:16: error: 'runtime_error' is not a member of 'std'

It looks the issue is that "#include " was missing in defines.hpp.

To Reproduce

cmake -DCMAKE_INSTALL_PREFIX=$PWD/install -DENABLE_CUDA=On -DENABLE_OPENMP=Off -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc -DCMAKE_CUDA_ARCHITECTURES=70 -DENABLE_C=On ../
make -j8

Expected behavior

A clear and concise description of what you expected to happen: compile with no errors

Compilers & Libraries (please complete the following information):

  • Compiler & version: [e.g. GCC 4.9.3]: gcc 11.1.0
  • CUDA version (if applicable): 11.4

Additional context

Add any other context about the problem here.

Hostconfig and Spack package: Where should we draw the line

The context

In the Spack package, we can change a lot of parameters to configure the build.
All these parameters are not transposed to the host-config file because it would make them too specific, and naming them (for example) would be a pain.

Instead, a host-config file aims at configuring the toolchain, and how the projects adapts to it.

In practices:

Everything "host-config" goes into the hostconfig function:

def hostconfig(self, spec, prefix, py_site_pkgs_dir=None):

We find there the flags, the compilers paths, the cuda info, etc.

Everything "build-config" goes into the cmake_args function:

We find there the languages support, tests mode, openmp option, etc.

The issue

I wanted to point this out to the team because in order to manually build the project, picking a host config may not be enough, unless one wants to rely on the defaults value of the project (if they are set). Instead, one may have to specify some variables on command line.

Also some parameters may be misplaced:

  • "deviceconst" may move from hostconfig to cmake_args,
  • "openmp" may move from cmake_args to hostconfig.

Solution

I would appreciate in someone could review the different parameters and discuss with me if it should be placed in the hostconfig, or specified on command line (typically).

Takaway

A host-config file does not suffice to reproduce a Spack build, although I may have said it was.

Prefetching for CUDA Managed Memory

Is your feature request related to a problem? Please describe.

I'm wanting to use Umpire with a UM pool to help our Fortran directives code simplify data statements (as in we wouldn't need them anymore). We need the pool to rid the costs of many cudaMallocManaged calls as well. However, I looked at the allocate, and it appears not to prefetch. I did a test with a simple stack-based pool linked to Fortran with iso_c_binding, and with only cudaMallocManaged, each kernel that touches new data has a large latency associated with allocating the data last-minute on the GPU upon first touch.

I was hoping the entire pool would be pulled over upon first touch, but in the Fortran context, it appears to do it individually for each variable.

Describe the solution you'd like

If I add the following to my simple stack pool, it gets rid of the latency for each of the kernels touching new data and significantly improves performance:

cudaMallocManaged( &pool_ptr , pool_bytes);
cudaMemPrefetchAsync( pool_ptr , pool_bytes , 0 );
cudaDeviceSynchronize();

Sample Code

program test
  use gator_mod
  implicit none
  integer, parameter :: n = 32*1024
  integer(8), pointer :: a(:), b(:), c(:), d(:)
  integer :: i

  call gator_init(n*4*8)

  call gator_allocate(a,[n])
  call gator_allocate(b,[n])
  call gator_allocate(c,[n])
  call gator_allocate(d,[n])

  !$acc parallel loop default(present)
  do i=1,n
    a(i) = i
  enddo

  !$acc parallel loop default(present)
  do i=1,n
    b(i) = 2*i
  enddo

  !$acc parallel loop default(present)
  do i=1,n
    c(i) = 3*i
  enddo

  !$acc parallel loop default(present)
  do i=1,n
    d(i) = a(i) + b(i) + c(i)
  enddo

  write(*,*) sum(d)

  call gator_deallocate(d)
  call gator_deallocate(c)
  call gator_deallocate(b)
  call gator_deallocate(a)

  call gator_finalize()

end program test

Nvidia Visual Profiler without prefetching

Screenshot from 2019-10-10 09-37-48

Nvidia Visual Profiler with prefetching

Screenshot from 2019-10-10 10-00-59

Thanks for considering adding this feature to the UM pool allocator.

Compilation Error when enabling CUDA

Describe the bug

Compilation Error with CUDA enabled: "Unknown option pthread"

To Reproduce

mkdir build
cd build 
cmake -DENABLE_CUDA=ON -DCMAKE_INSTALL_PREFIX=/home/crtrott/Software/umpire/install/cuda ..
make

Expected behavior

Compilation succeeds

Compilers & Libraries (please complete the following information):
GCC 6.4.0
CUDA 10.0

Additional context

Compile line with make VERBOSE=1

[ 37%] Linking CUDA device code CMakeFiles/version_tests.dir/cmake_device_link.o
cd /home/crtrott/Software/umpire/build/tests/unit && /projects-sems/sems/install/rhel6-x86_64/sems/utility/cmake/3.12.2/bin/cmake -E cmake_link_script CMakeFiles/version_tests.dir/dlink.txt --verbose=1
/home/projects/x86-64/cuda/10.0/bin/nvcc -ccbin=/projects/sems/install/rhel6-x86_64/sems/compiler/gcc/6.1.0/base/bin/g++  -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/version_tests.dir/version_tests.cpp.o -o CMakeFiles/version_tests.dir/cmake_device_link.o  -L/home/projects/x86-64/cuda/10.0/lib64/stubs  -L/home/projects/x86-64/cuda/10.0/lib64 ../../lib/libumpire.a ../../lib/libgtest_main.a ../../lib/libgtest.a -pthread /home/projects/x86-64/cuda/10.0/lib64/libcudart_static.a -lpthread -ldl -Xnvlink /usr/lib64/librt.so -lcudadevrt -lcudart_static -lrt -lpthread -ldl
nvcc fatal   : Unknown option 'pthread'

The offending option is the "-pthread" in the compile line not the "-lpthread".

Build problem with [email protected] with CUDA enabled

Describe the bug

Compiler GCC 10.3.0 crashes when building umpire+cuda.

Tested versions 5.0.1, 6.0.0 and develop(5201a47). Switching to GCC 10.2.0 works.

[ 27%] Building CUDA object src/umpire/CMakeFiles/umpire_device.dir/DeviceAllocator.cpp.o
cd /run/user/25632/ialberto/spack-stage-umpire-develop-werao5orxyvmb33f2nlk2as4jbmu2pvg/spack-build-werao5o/src/umpire && /opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/project/csstaff/ialberto/opt/spack-daint/lib/spack/env/case-insensitive/CC -DCAMP_HAVE_CUDA -I/project/csstaff/ialberto/opt/spack-daint/opt/spack/cray-cnl7-broadwell/gcc-10.3.0.21.05/camp-0.2.2-zlny4bxpdkelb7qkzuspcm2vhqjcdsq4/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -I/run/user/25632/ialberto/spack-stage-umpire-develop-werao5orxyvmb33f2nlk2as4jbmu2pvg/spack-src/src -I/run/user/25632/ialberto/spack-stage-umpire-develop-werao5orxyvmb33f2nlk2as4jbmu2pvg/spack-build-werao5o/include -O2 -g -DNDEBUG -Xcompiler=-fPIC -std=c++11 -MD -MT src/umpire/CMakeFiles/umpire_device.dir/DeviceAllocator.cpp.o -MF CMakeFiles/umpire_device.dir/DeviceAllocator.cpp.o.d -x cu -dc /run/user/25632/ialberto/spack-stage-umpire-develop-werao5orxyvmb33f2nlk2as4jbmu2pvg/spack-src/src/umpire/DeviceAllocator.cpp -o CMakeFiles/umpire_device.dir/DeviceAllocator.cpp.o
/opt/gcc/10.3.0/snos/include/g++/chrono: In substitution of 'template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]':
/opt/gcc/10.3.0/snos/include/g++/chrono:473:154:   required from here
/opt/gcc/10.3.0/snos/include/g++/chrono:428:27: internal compiler error: Segmentation fault
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
      |                           ^~~~~~
0xc989bf crash_signal
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/toplev.c:328
0x79007d tsubst(tree_node*, tree_node*, int, tree_node*)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:15310
0x7a3086 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:13225
0x79ba76 tsubst_aggr_type
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:13428
0x7a5d6f tsubst_function_decl
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:13816
0x79c719 tsubst_decl
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:14267
0x78a701 tsubst_copy
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:16512
0x78dffa tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:20707
0x78cb56 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:19274
0x78cb56 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:19896
0x78bf8d tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:19274
0x78bf8d tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:19588
0x78bf56 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:19274
0x78bf56 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:19587
0x79e534 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:19274
0x79e534 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:18886
0x7a3086 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:13225
0x79ba76 tsubst_aggr_type
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:13428
0x78b977 tsubst_qualified_id
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:16215
0x78d69d tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
        ../../cray-gcc-10.3.0-202104220029.0777bcc28ac1d/gcc/cp/pt.c:19625
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make[2]: *** [src/umpire/CMakeFiles/umpire_device.dir/build.make:79: src/umpire/CMakeFiles/umpire_device.dir/DeviceAllocator.cpp.o] Error 1
make[2]: Leaving directory '/run/user/25632/ialberto/spack-stage-umpire-develop-werao5orxyvmb33f2nlk2as4jbmu2pvg/spack-build-werao5o'
make[1]: *** [CMakeFiles/Makefile2:430: src/umpire/CMakeFiles/umpire_device.dir/all] Error 2
make[1]: Leaving directory '/run/user/25632/ialberto/spack-stage-umpire-develop-werao5orxyvmb33f2nlk2as4jbmu2pvg/spack-build-werao5o'
make: *** [Makefile:139: all] Error 2

Compilers & Libraries

Just reporting the environment of the last test I did.

 -   werao5o  umpire@develop%[email protected]+c+cuda~deviceconst~examples~fortran~ipo~numa~openmp~rocm~shared amdgpu_target=none build_type=RelWithDebInfo cuda_arch=none tests=none arch=cray-cnl7-broadwell
[+]  ttv45xg      ^[email protected]%[email protected] arch=cray-cnl7-broadwell
[+]  5ugkz3c          ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt build_type=Release arch=cray-cnl7-broadwell
[+]  ydvoltg              ^[email protected]%[email protected]~symlinks+termlib abi=6 arch=cray-cnl7-broadwell
[+]  okfekkr              ^[email protected]%[email protected]~docs+systemcerts arch=cray-cnl7-broadwell
[+]  zlny4bx      ^[email protected]%[email protected]+cuda~ipo~rocm~tests amdgpu_target=none build_type=RelWithDebInfo cuda_arch=none arch=cray-cnl7-broadwell
[+]  ih6iopb          ^[email protected]%[email protected] arch=cray-cnl7-broadwell
[+]  ksp6dcr          ^[email protected]_11.2%[email protected]~allow-unsupported-compilers~dev arch=cray-cnl7-broadwell

Version of BLT fails to compile with gcc 9.3

Describe the bug

gtest via BLT does not compile with gcc 9.3 (fixed in current blt). Consider updating blt or adding an option to not compile gtest.

To Reproduce

compile with gcc 9.3

Compiling Umpire 6.0.0 with Clang compiler requires adding #include <stdexcept> to camp/defines.hpp

Describe the bug

A clear and concise description of what the bug is:
Clang compiler requires adding #include to camp/defines.hpp

To Reproduce

git clone --recursive https://github.com/LLNL/Umpire.git Umpire
cd Umpire
git checkout e6bd629
cmake -H. -Bbuild -DENABLE_CUDA=On -DUMPIRE_ENABLE_FILESYSTEM=OFF -DENABLE_TESTS=On -DENABLE_EXAMPLES=On -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_CUDA_HOST_COMPILER=icpx -DCMAKE_CUDA_FLAGS_INIT=-allow-unsupported-compiler -DBLT_CXX_STD=c++17
cd build; export VERBOSE=1; make

Expected behavior

A clear and concise description of what you expected to happen:
Compilation ends up with error when compiling file primary_pool_tests.cpp

In file included from /home/vanisimov/tiledarray/Umpire/tests/integration/primary_pool_tests.cpp:11:
In file included from /home/vanisimov/tiledarray/Umpire/src/umpire/tpl/camp/include/camp/camp.hpp:17:
/home/vanisimov/tiledarray/Umpire/src/umpire/tpl/camp/include/camp/defines.hpp:203:16: error: no member named 'runtime_error' in namespace 'std'
throw std::runtime_error(msg);
~~~~~^
1 error generated.

Compilers & Libraries (please complete the following information):

  • Compiler & version: Clang 14.0.0
  • CUDA version (if applicable): 11.2.0
  • V100 nVidia GPU

Additional context

The latest commit of Umpire, 2775bbe does not require adding #include to camp/defines.hpp. However, after compilation many unit tests make test fail.

Build error with examples

Describe the bug

I tried to build Umpire with CUDA support and examples, but the examples error with a compilation error. Seems like nvcc doesn't know how to locate some device functions when doing device link step. The error is:

[ 85%] Linking CUDA device code CMakeFiles/device_allocator_example.dir/cmake_device_link.o
cd /tmp/harmen/spack-stage/spack-stage-umpire-4.1.2-n2oy7tm7xpeswr2n7jzpgirvuhhbgqwc/spack-build-n2oy7tm/examples && /home/harmen/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cmake-3.19.2-yd5pspkxptim2xuzisherb653h5szshn/bin/cmake -E cmake_link_script CMakeFiles/device_allocator_example.dir/dlink.txt --verbose=1
/home/harmen/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cuda-11.0.2-ntccyyogx3gp23d3v6lwq3sk3janvn5m/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -O2 -g -DNDEBUG -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/device_allocator_example.dir/device-allocator.cpp.o -o CMakeFiles/device_allocator_example.dir/cmake_device_link.o  /home/harmen/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cuda-11.0.2-ntccyyogx3gp23d3v6lwq3sk3janvn5m/lib64/libcudart_static.a -ldl  -lpthread -lcudadevrt -lcudart_static -lrt 
nvlink error   : Undefined reference to '_ZN6umpire15DeviceAllocator8allocateEm' in 'CMakeFiles/device_allocator_example.dir/device-allocator.cpp.o'

To Reproduce

$ spack spec umpire+cuda %gcc@:9 cuda_arch=75 ^cuda@:11.0 
Input spec
--------------------------------
umpire%gcc@:9+cuda cuda_arch=75
    ^cuda@:11.0

Concretized
--------------------------------
[email protected]%[email protected]+c+cuda~deviceconst+examples~fortran~ipo~numa~openmp~rocm+shared amdgpu_target=none build_type=RelWithDebInfo cuda_arch=75 patches=7d912d31cd293df005ba74cb96c6f3e32dc3d84afff49b14509714283693db08 tests=none arch=linux-ubuntu20.04-zen2
    ^[email protected]%[email protected] arch=linux-ubuntu20.04-zen2
        ^[email protected]%[email protected]~doc+ncurses+openssl+ownlibs~qt arch=linux-ubuntu20.04-zen2
    ^[email protected]%[email protected]~cuda~ipo~rocm~tests amdgpu_target=none build_type=RelWithDebInfo cuda_arch=none arch=linux-ubuntu20.04-zen2
    ^[email protected]%[email protected] arch=linux-ubuntu20.04-zen2
$ spack install -v umpire+cuda %gcc@:9 cuda_arch=75 ^cuda@:11.0 

Compilers & Libraries (please complete the following information):
See above

Additional context

I can make it compile when I add src/umpire/CMakeFiles/umpire.dir/DeviceAllocator.cpp.o to the device linking step:

$ pwd
/tmp/harmen/spack-stage/spack-stage-umpire-4.1.2-n2oy7tm7xpeswr2n7jzpgirvuhhbgqwc/spack-build-n2oy7tm/examples

$ /home/harmen/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cuda-11.0.2-ntccyyogx3gp23d3v6lwq3sk3janvn5m/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -O2 -g -DNDEBUG -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/device_allocator_example.dir/device-allocator.cpp.o ../src/umpire/CMakeFiles/umpire.dir/DeviceAllocator.cpp.o -o CMakeFiles/device_allocator_example.dir/cmake_device_link.o  /home/harmen/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/cuda-11.0.2-ntccyyogx3gp23d3v6lwq3sk3janvn5m/lib64/libcudart_static.a -ldl  -lpthread -lcudadevrt -lcudart_static -lrt

$ ls CMakeFiles/device_allocator_example.dir/cmake_device_link.o 
CMakeFiles/device_allocator_example.dir/cmake_device_link.o

but obviously we don't have this object file around in real life, only the shared lib libumpire.so.

xlc compiler

Describe the bug

Cannot build with XL compiler on Summit at OLCF. Error message:

In file included from /gpfs/alpine/mat190/scratch/jeanluc/GIT/Umpire/src/umpire/strategy/MixedPool.cpp:8:
In file included from /gpfs/alpine/mat190/scratch/jeanluc/GIT/Umpire/src/umpire/strategy/MixedPool.hpp:10:
In file included from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/map:62:
/usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/stl_multimap.h:130:41: error: 'rebind'
following the 'template' keyword does not refer to a template
typedef typename _Alloc::template rebind<value_type>::other
^~~~~~
/gpfs/alpine/mat190/scratch/jeanluc/GIT/Umpire/src/umpire/strategy/QuickPool.hpp:150:5: note: in instantiation of
template class 'std::multimap<unsigned long, umpire::strategy::QuickPool::Chunk *, std::less,
umpire::strategy::QuickPool::pool_allocator<std::pair<const unsigned long, umpire::strategy::QuickPool::Chunk
*> > >' requested here
SizeMap::iterator size_map_it;
^
In file included from /gpfs/alpine/mat190/scratch/jeanluc/GIT/Umpire/src/umpire/strategy/MixedPool.cpp:8:
In file included from /gpfs/alpine/mat190/scratch/jeanluc/GIT/Umpire/src/umpire/strategy/MixedPool.hpp:10:
In file included from /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/map:60:
/usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/stl_tree.h:335:24: error: type 'int'
cannot be used prior to '::' because it has no members
typedef typename _Alloc::template rebind<_Rb_tree_node<_Val> >::other
^
/usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/stl_multimap.h:136:17: note: in
instantiation of template class 'std::_Rb_tree<unsigned long, std::pair<const unsigned long,
umpire::strategy::QuickPool::Chunk *>, std::_Select1st<std::pair<const unsigned long,
umpire::strategy::QuickPool::Chunk *> >, std::less, int>' requested here
_Rep_type _M_t;
^
/gpfs/alpine/mat190/scratch/jeanluc/GIT/Umpire/src/umpire/strategy/QuickPool.hpp:150:5: note: in instantiation of
template class 'std::multimap<unsigned long, umpire::strategy::QuickPool::Chunk *, std::less,
umpire::strategy::QuickPool::pool_allocator<std::pair<const unsigned long, umpire::strategy::QuickPool::Chunk
*> > >' requested here
SizeMap::iterator size_map_it;
^

To Reproduce

(using modules loaded by default)
module load cmake
mkdir build
cd build
cmake -DCMAKE_CXX_COMPILER=xlc++ ..
make

Expected behavior

Complete build

Compilers & Libraries (please complete the following information):

  • Compiler & version: xl/16.1.1-5
  • CUDA version (if applicable):

Additional context

Was able to build successfully with gnu compiler

Custom alignment doesn't work in DynamicPoolMap

Describe the bug

#include <cstdlib>
#include <iostream>
#include <umpire/Umpire.hpp>

int
main(int argc, char **argv)
{
  int numIter=16;
  if (2 == argc) {
    numIter = std::atoi(argv[1]);
  }
  auto &rm(umpire::ResourceManager::getInstance());
  auto host(rm.getAllocator("HOST"));
  auto pool(rm.makeAllocator<umpire::strategy::DynamicPool>("HOST_pool",
                                                            host,
                                                            512*1024*1024,
                                                            1024*1024, 64));
  for(int i =0 ; i < numIter; ++i) {
    void *ptr(pool.allocate(2048));
    if (0 != (reinterpret_cast<std::ptrdiff_t>(ptr) % 64)) {
      std::cerr << "Pointer is not 64-byte aligned: " << ptr << "\n";
    }
  }
}

To Reproduce

Compile the program above and run. Pointer return isn't 64-byte aligned as requested.

Expected behavior

I expected allocate to return 64-byte aligned memory addresses.

Compilers & Libraries (please complete the following information):

g++ version 8.3.1 20190223 (Red Hat 8.3.1-2)

Additional context

Add any other context about the problem here.

Regroup builds and unit tests in Gitlab CI

I would like to merge builds with their associated tests in quartz CI.

I think separation of builds and unit-tests of quartz only has proved to be of little interest, and it’s a pain to maintain (duplication).

This would only make it less straightforward to say at a glance if the job failed in build or in test, but we could improve error message in the log to help with this.

Note:

  • For a project managed with GitLab, one could export the tests reports to have them formatted in a Merge Request. Making build and unit-test separation of even smaller interest for projects on Gitlab.
  • A similar result (reporting) could be achieved with a CDash server, I think.

Documentation of resource registration

Is your feature request related to a problem? Please describe.

Insufficient documentation or possible error with registering an allocation with umpire.

Describe the solution you'd like

A small working example of an allocator registering a host pointer as a resource in the documentation.

Describe alternatives you've considered

We are currently copying data from a natively allocated host array into an umpire-allocated array before moving to the device. We would much rather register the pointer and avoid those extra copies.

auto& resmgr = umpire::ResourceManager::getInstance();
auto hostallocator = resmgr.getAllocator("HOST");
auto devallocator = resmgr.getAllocator("DEVICE");

std::size_t size = N * sizeof(double);
auto host_ptr = static_cast<double*>(hostallocator.allocate(size));
auto dev_ptr = static_cast<double*>(devallocator.allocate(size));

// what we are doing:
// we get `double* source` from elsewhere...
std::memcpy(host_ptr, source, size);
resmgr.copy(dev_ptr, host_ptr);

// what we would like to do:
umpire::strategy::AllocationStrategy* strategy = hostallocator.getAllocationStrategy();
umpire::util::AllocationRecord record{source, size, strategy};
resmgr.registerAllocation(source, record);
resmgr.copy(dev_ptr, source);

I have attempted the above based on umpire's source, but the data is not copied to the device.
How should we be registering resources?

Cannot compile Umpire without git

I cannot compile Umpire on an environment without git (which may happen for CI/CD containers).
I get:

CMake Error at src/tpl/umpire/CMakeLists.txt:96 (blt_git_hashcode):
  Unknown CMake command "blt_git_hashcode".

Using persistent files for memory allocation

Adding persistent memory will require being able to pass file names to be used with the mmap call. We can move the shared mmap code into a new class in the alloc directory. This new allocator can be used by both the FileMemoryResource and a new PersistentMemoryResource.

  • Create MmapAllocator in alloc directory
  • Modify FileMemoryResource to use the new MmapAllocator

HIP Runtime Linker Errors

Describe the bug

Exported CMake targets explicitly link against target hip_runtime, although hip_runtime is just a collection of headers and not a concrete library afaik (is hip_runtime an interface/alias cmake target internal to umpire's cmake?).

To circumvent this, I build Umpire and then manually remove hip_runtime from array on line 76 of <prefix>/share/umpire/cmake/umpire-targets.cmake. From there I'm able to build all of my targets that depend on umpire and hip.

Expected behavior

Import Umpire's cmake targets without passing explicit -lhip_runtime link library to user's targets.

Compilers & Libraries (please complete the following information):

I'll have to ensure hardware on early access machines can be discussed here... @pelesh please advise.

Additional context

Installed using spack fwiw.

If I'm simply using the imported cmake targets incorrectly, please let me know. Thank you!

More flexibility in configuring cxxopts

Is your feature request related to a problem? Please describe.

The cxxopts submodule is a required dependency for building umpire, but appears to only be necessary for building umpire's tests, examples and tools.

Describe the solution you'd like

Ideally,

  1. Umpire could be configured to not require cxxopts when its dependant targets (e.g. tools, examples, tests) are disabled.
  2. One could pass in a path to an external cxxopts in a similar manner to how an external blt is passed in.

Alternatives ideas
Since cxxopts is relatively small, perhaps it could be built it.

Additional context

I'm working on a project that does not allow submodules and having a cxxopts submodule appears to currently be mandatory.

Erroneous link to `cuda_runtime`

Describe the bug

Exported targets for Umpire interface link against cuda_runtime instead of cudart here:

# in <prefix>/share/umpire/cmake/umpire-targets.cmake
set_target_properties(umpire PROPERTIES
  INTERFACE_COMPILE_DEFINITIONS "CAMP_HAVE_CUDA"
  INTERFACE_INCLUDE_DIRECTORIES "<camp prefix>/include;<cuda prefix>/include;${_IMPORT_PREFIX}/include;${_IMPORT_PREFIX}/include"
  INTERFACE_LINK_LIBRARIES "camp;umpire_alloc;cuda_runtime;cuda_runtime;cuda_runtime;dl" <--- here
)

To Reproduce

Install Umpire with CUDA. An example spack spec I installed which exhibited this behaviour:

[email protected]+c+cuda~deviceconst~examples~fortran~hip~ipo~numa+openmp~shared amdgpu_target=none build_type=RelWithDebInfo cuda_arch=none tests=none
  ^[email protected]~dev

Expected behavior

Umpire to link against exported or otherwise available targets, such as the cudart target from find_package(CUDAToolkit).

Compilers & Libraries (please complete the following information):

I encountered this on 2/3 platforms on which I upgraded our stack to v5.0.1.

Encountered:

  • gcc/7.4.0
  • cuda/10.2.89
  • power9

Encountered:

  • gcc/7.3.0
  • cuda/10.2.89
  • x86, Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

Did not encounter (identical to first system):

  • gcc/7.4.0
  • cuda/10.2.89
  • power9

On the final platform the absolute path to the static cudart library was provided in Umpire's exported targets, while on the first platform (nearly identical to the last), the dynamic library was used after I removed cuda_runtime from Umpire's targets. I'm wondering if this behavior only occurs when Umpire's build system exports targets that link dynamically against cudart.

CC @pelesh @cnpetra @nychiang @jwang125

support to use different host compiler for CUDA

I'm compiling my code using gcc8 and CUDA9. But CUDA9 does not support gcc 6 and later. So I used gcc5 as CUDA host compiler. The cmake configuation look like this

cmake \
  -DCUDA_NVCC_FLAGS="-ccbin;/path/to/gcc5/bin;--std;c++11" \
  -DCMAKE_INSTALL_PREFIX=/path/to/install/Umpire \
  -DENABLE_OPENMP=OFF \
  -DENABLE_TESTS=OFF \
  -DENABLE_ASSERTS=ON \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  /path/to/source/Umpire

The Umpire cmake script failed when testing CUDA compiler, because it is not able to pick the correct host compiler (which is gcc5) in this case.
The issue seems to be in blt. https://github.com/LLNL/blt/blob/master/cmake/thirdparty/SetupCUDA.cmake#L48
It uses enable_language(cuda), which seems to not pick the correct host compiler.
I workaround this by removing this line.
Any idea how to fix this easily? I would like to have my program build Umpire automaticaly instead of manual hack and build.

Request for CI to produce meaningful email notifications

For Umpire, there are only a few reasons that CI can fail against any given project:

  1. CI Success
  2. Build failure (compile/link errors) - most likely
  3. Test Failures
  4. Git failure
  5. CI Framework failure - least likely

In generatl, each email summary line should contain: <Build #>

Status should be one of "CI Success", "Build Failure", "Test Failure", "Git Failure", "CI Framework Failure".

The content of the mail should begin with links to the applicable assets. This should be followed with a snippet of the error logs showing what the specific error(s) is/are.

Consider using tagged commits of submodules when releasing new versions

Is your feature request related to a problem? Please describe.

In Spack it is preferrable to use tarballs over cloning git repo's with submodules, since that way we can add checksums to verify integrity of the sources. I tried to set that up for Umpire, but couldn't, because there's only a single release of Umpire where blt is on a tagged version. I didn't check camp.

Describe the solution you'd like

When releasing a new version of Umpire, make sure its dependencies are on tagged versions, so that package managers can actually refer to the dependencies and add checksums for all.

Question: Wrapping statically allocated memory (C interface)

Is your feature request related to a problem? Please describe.

I would like to know if there would be a way to wrap memory statically allocated, or allocated from another library, in order to have access to it for copy operations as an example.

Describe the solution you'd like

Proposing an API requesting the pointer and the right allocator (or memory resource), either the object, the id or the string name, so the memory can be added to the internal hashtable and the operation like copy are available. It may require to restrict/forbid operations like reallocation, move, and deallocate, as the reference is not owned by the library, but it could be useful. Another option is to provide and API to create our own AllocationRecord.

Describe alternatives you've considered

I haven't checked yet if a function is available. I think it is in C++, but maybe not in C? I can always do the wrapping between C and C++ myself and then access the functions from the resource manager.

ability shut down umpire's I/O cleanly

Is your feature request related to a problem? Please describe.

umpire::initialize_io grabs std::cerr's rdbuf and flushes it when at the end of lifetime of the corresponding static object (s_error_buffer). This assumes that the buffer is still around. In our app we reset std::cerr's rdbuf. Unfortunately there is no way to access s_error_buffer.

Describe the solution you'd like

Ideally there should be a way to control the lifetime of ResourceManager. Or there should be a finalize_io function that can be called by the user.

Describe alternatives you've considered

Not messing with std::cerr's buffer ... but I don't see anything in the standard forbidding this.

Additional context

n/a.

Error in documentation

Describe the bug

The documentation indicates the options for enabling the testings to be ENABLE_TESTING whereas the correct option (as indicated in CMakeLists.txt) is ENABLE_TESTS. On configuration, CMake reports an unused parameter when ENABLE_TESTING is used.

To Reproduce

cmake -DENABLE_TESTING=OFF

Question: static variable allocated with umpire

Background: I have an array class that uses umpire to allocate and deallocate memory. I have a need to store some resources statically (for performance reasons), but if I have an Array class that is statically declared I have a problem. When it comes time to destruct it, I ask umpire for the "host" allocator, which i assume is stored inside a static variable, but the issue is that the variable that stores the allocator is already gone, i.e. when I ask for the allocators it does not exist and an exception is thrown.

Is there a way around this? I have a workaround for now, but its not ideal.

Umpire is missing an umpire-config-version.cmake file

Currently only an umpire-config.cmake file is generated. Since there is no umpire-config-version.cmake file, it's not possible to find umpire using a version constraint (find_package(umpire x.y.z REQUIRED)). find_package fails since the version "found" is "unknown" without umpire-config-version.cmake.

Edit: this can be used to generate the file: https://cmake.org/cmake/help/latest/module/CMakePackageConfigHelpers.html?highlight=cmakepackageconfighelpers#command:write_basic_package_version_file.

Link hypre with umpire using host linker

Hi, Umpire developers,

We now use host linker for the drivers in hypre. With Umpire, we got the following error,

Building ij ... 
mpixlC -o ij ij.o -L/g/g92/li50/workspace/hypre/test/src/hypre/lib -lHYPRE -Wl,-rpath,/g/g92/li50/workspace/hypre/test/src/hypre/lib           -lm  -L/usr/tce/packages/cuda/cuda-10.1.243/lib64 -lcudart -lcusparse -lcurand      -L/usr/workspace/li50/Umpire-git/Umpire/install_xlC_lassen/lib  -lumpire 
/usr/workspace/li50/Umpire-git/Umpire/install_xlC_lassen/lib/libumpire.a(Allocator.cpp.o): In function `__sti____cudaRegisterAll()':
/var/tmp/li50/tmpxft_000189aa_00000000-5_Allocator.cudafe1.cpp:(.text+0xac0): undefined reference to `__cudaRegisterLinkedBinary_44_tmpxft_000189aa_00000000_6_Allocator_cpp1_ii_a17095a1'
/usr/workspace/li50/Umpire-git/Umpire/install_xlC_lassen/lib/libumpire.a(Replay.cpp.o): In function `__sti____cudaRegisterAll()':
/var/tmp/li50/tmpxft_000189ab_00000000-5_Replay.cudafe1.cpp:(.text+0x5d4): undefined reference to `__cudaRegisterLinkedBinary_41_tmpxft_000189ab_00000000_6_Replay_cpp1_ii_5eca6429'
/usr/workspace/li50/Umpire-git/Umpire/install_xlC_lassen/lib/libumpire.a(ResourceManager.cpp.o): In function `__sti____cudaRegisterAll()':
/var/tmp/li50/tmpxft_000189ac_00000000-5_ResourceManager.cudafe1.cpp:(.text+0x22134): undefined reference to `__cudaRegisterLinkedBinary_50_tmpxft_000189ac_00000000_6_ResourceManager_cpp1_ii_42a9a1b2'
/usr/workspace/li50/Umpire-git/Umpire/install_xlC_lassen/lib/libumpire.a(Umpire.cpp.o): In function `__sti____cudaRegisterAll()':
/var/tmp/li50/tmpxft_000189ad_00000000-5_Umpire.cudafe1.cpp:(.text+0x3088): undefined reference to `__cudaRegisterLinkedBinary_41_tmpxft_000189ad_00000000_6_Umpire_cpp1_ii_4507af2f'
make: *** [Makefile:138: ij] Error 1

It seems that we need a separable compilation step for libumpire.a. I did this by

/usr/tce/packages/cuda/cuda-10.1.243/bin/nvcc -ccbin=mpixlC -gencode arch=compute_70,code=sm_70 -dlink /usr/workspace/li50/Umpire-git/Umpire/install_xlC_lassen/lib/libumpire.a -o dlink.o

and added dlink.o to the link line, and it worked.

Are we supposed to do this on our side?

Thanks

-Ruipeng

thread safe DynamicPool for cuda unified memory

Hi,

I'm trying to use Umpire to manage CUDA unified memory. The program is multi-threaded, so I try to make a thread-safe allocator.
Here is my code

auto& rm = umpire::ResourceManager::getInstance();
auto um_dynamic_pool = rm.makeAllocator<umpire::strategy::DynamicPool>(
          "UMDynamicPool", rm.getAllocator("UM"));
auto thread_safe_um_dynamic_pool =
          rm.makeAllocator<umpire::strategy::ThreadSafeAllocator>(
              "ThreadSafeUMDynamicPool", rm.getAllocator("UMDynamicPool"));

The thread_safe_um_dynamic_pool allocator is stored in a global class and used by all threads to allocate and deallocate cuda unified memory.

However, I found that my program will stochastically throw an error when deallocating unified memory when using multiple threads.
Sometimes it throws the error here https://github.com/LLNL/Umpire/blob/develop/src/umpire/util/AllocationMap.cpp#L121
sometimes here
https://github.com/LLNL/Umpire/blob/develop/src/umpire/util/AllocationMap.cpp#L75

I try to generate a nice log file by lock the logMessage function, but the lock hides the errors and I can't trigger the same error again.

I attached the original log file, but the output is in a mess because of multi-threading. Sorry about the inconvenience.
umpire_log.txt

So, am I doing the correct thing if I want to use a thread-safe dynamic pool for CUDA unified memory?
Would you take a look at this in case there is a real issue?

Thanks,

Minor tweak to src/umpire/util/memory_sanitizers.hpp

When compiling Umpire with our newest compiler (based on clang) I ran across a particular problem in src/umpire/util/memory_sanitizers.hpp. Basically, the problem occurs here:

#if (defined(clang) && !defined(ibmxl)) ||
(defined(GNUC) && GNUC > 4)
#if ( !defined(__SYCL_COMPILER_VERSION) )
#include <sanitizer/asan_interface.h>
#endif

In our particular version, the include file doesn't exist, perhaps because it's in development but there may be reasons it's not. In any event, I worked around this by the following

#if defined __has_include
#if __has_include(<sanitizer/asan_interface.h>)
#include <sanitizer/asan_interface.h>
#endif
#endif

This should work for gcc as well as clang and clang based compilers. Don't know about xl though last I knew there was talk afoot of moving XL to clang. If this has a side effect, please let me know.

Place C++ standard requirements in build system

Is your feature request related to a problem? Please describe.

Umpire specifies either -std=c++11 or -std=c++17 for intel compilers in host-config files.

Describe the solution you'd like

In CMake, it is possible to specify compiler requirements (CMake compile features). Specifying the exact C++ features required in Umpire, CMake will adapt the flags to provide the minimum required standard coverage.

I think this is better that way since it prevents from having these flags (required ?) in host-config files.

Describe alternatives you've considered

BLT also provides a way to set C++ standard requirement. I think CMake method will work well for Umpire.

Additional context

This issue arrives now because I am tracking compilers specific requirement to reduce them to the minimum and integrate them to Umpire Spack package. (Related to host-config file generation).

Umpire does not find camp after spack build

Describe the bug

Umpire does not find camp after spack build. When I try to include umpire in my project, it fails on the camp dir, although umpire itself built fine. Does umpire require camp, and if so, why doesn't it depend on camp inside spack?

Support compiling with clang and -stdlib=libc++

Describe the bug

There is some code to demangle C++ names in tools/replay/ReplayInterpreter.cpp that relies on a non-standard GCC extension, abi::__cxa_demangle. When compiling with clang and using the -stdlib=libc++ option, which uses LLVM's implementation of the standard library, this function is not available.

To Reproduce

Compile the code with

module load clang/9.0.0
cmake  ../Umpire -DCMAKE_BUILD_TYPE=Release  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_CXX_FLAGS="-stdlib=libc++ -DGTEST_HAS_CXXABI_H_=0" -DCMAKE_C_FLAGS="-DGTEST_HAS_CXXABI_H_=0" -DENABLE_CUDA=OFF -DENABLE_FORTRAN=OFF -DENABLE_C=ON  -DCMAKE_BUILD_TYPE=Release -DENABLE_TESTS=TRUE -DENABLE_GTEST=TRUE -DENABLE_GMOCK=TRUE -DENABLE_FRUIT=FALSE -DENABLE_EXAMPLES=TRUE -DENABLE_BENCHMARKS=TRUE -DCMAKE_VERBOSE_MAKEFILE=TRUE -DBLT_SOURCE_DIR=../blt -DENABLE_TOOLS=ON

Expected behavior

The code compiles with clang and libc++

Compilers & Libraries (please complete the following information):

  • clang (any version) with the -stdlib=libcxx flag
  • you also need to Define -DGTEST_HAS_CXXABI_H_=0 in the C/C++ flags to get around the same exact issue in google test.

Additional context

This compiler set/option is one that KULL uses regularly to test.

I have a branch with with this patch on it, but I can't push to the repo:

diff --git a/tools/replay/ReplayInterpreter.cpp b/tools/replay/ReplayInterpreter.cpp
index d0e5859b..b8a9e383 100644
--- a/tools/replay/ReplayInterpreter.cpp
+++ b/tools/replay/ReplayInterpreter.cpp
@@ -16,7 +16,8 @@
 #include "ReplayFile.hpp"
 #include "umpire/tpl/json/json.hpp"
 
-#if !defined(_MSC_VER)
+#include <ciso646>
+#if !defined(_MSC_VER) && !defined(_LIBCPP_VERSION)
 #include <cxxabi.h>
 #endif
 
@@ -352,6 +353,9 @@ void ReplayInterpreter::replay_compileAllocator( void )
       const std::string mangled_type = 
         (type_prefix == "_Z") ? raw_mangled_type : std::string{"_Z"} + raw_mangled_type;
 
+#if defined(_MSC_VER) || defined(_LIBCPP_VERSION)
+      type = mangled_type;
+#else
       auto result = abi::__cxa_demangle(
           mangled_type.c_str(),
           nullptr,
@@ -362,6 +366,7 @@ void ReplayInterpreter::replay_compileAllocator( void )
       }
       type = std::string{result};
       ::free(result);
+#endif
     } else {
       type = raw_mangled_type;
     }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.