acts-project / vecmem Goto Github PK

Vectorised data model base and helper classes.

Home Page: https://acts-project.github.io/vecmem/

License: Mozilla Public License 2.0

CMake 9.39% C++ 88.16% Cuda 2.42% Dockerfile 0.03%

vecmem's Introduction

VecMem

VecMem is part of the ACTS project (R&D line for parallelization), the ACTS project can be found: https://github.com/acts-project/acts.

This project provides a set of base and helper classes to implement a vectorised data model with. One that can be efficiently used across multiple device types.

vecmem's People

Contributors

Stargazers

Watchers

Forkers

stephenswat beomki-yeo cgleggett paulgessinger konradkusiak97 yhatoh krasznaa zzalscv2 drmcfox chamodya-ka yusuf-manjra guilhermealmeida1 stewmh

vecmem's Issues

Buggy behaviour in `jagged_vector_buffer` to `jagged_vector` copy for value_type whose size is 512 X N bytes

jagged_vector_buffer to jagged_vector copy makes an error when the size of jagged_vector is larger than one and the size of element is 512 bytes or its integer multiplication:

unknown file: Failure
C++ exception with description "/home/beomki/projects/vecmem/vecmem/cuda/src/utils/cuda/copy.cpp:52 Failed to execute: cudaMemcpy(to_ptr, from_ptr, size, copy_type_translator[cptype]) (invalid argument)" thrown in the test body.

Following code is the minimal reproduction of the error.

template <std::size_t ROWS, std::size_t COLS>
using matrix = vecmem::static_array<vecmem::static_array<double, ROWS>, COLS>;

/// 512 bytes object (double, 8x8 matrix) makes an error with buffer2 copy!
/// Any other dimension seems OK unless it is the multiple integer of 512 bytes
using value_type = matrix<8,8>;

/// Test buffers with "large" elements (for which alignment becomes important)
TEST_F(cuda_containers_test, very_large_buffer) {

    // The memory resource(s).
    vecmem::cuda::managed_memory_resource managed_resource;

    // Test a (2D) jagged vector with (2D) matrix
    vecmem::data::jagged_vector_buffer<value_type> buffer1({200},
                                                             managed_resource);
    m_copy.setup(buffer1);
    vecmem::jagged_vector<value_type> buffer1_host(&managed_resource);
    
    /// This is OK because the size of jagged vector is 1
    m_copy(buffer1, buffer1_host);

    // Test a (2D) jagged vector with (2D) matrix
    vecmem::data::jagged_vector_buffer<value_type> buffer2({1, 1},
                                                             managed_resource);
    m_copy.setup(buffer2);
    vecmem::jagged_vector<value_type> buffer2_host(&managed_resource);
    
    /// This is NOT OK because the size of jagged vector is larger than 1
    m_copy(buffer2, buffer2_host);
}

It's quite bad (or lucky) that I was playing with 8x8 matrix in detray side :/ ..

Issue with using contiguous memory resource for traccc_seq_example_cuda

I have been trying to use binary page memrory resource and contiguous memory resource(c_mr) for traccc cuda examples. Using contiguous memory resource work effectively for File IO, but when using c_mr for cuda algorithms (seeding algorithm here) there is an invalid global read error due to an address misaligned.
I tested with 2GB of contiguous memory allocation limit as well but got the same error

side note : Got the same issue when tested with clusterization cuda algorithm as well from traccc pr #206

repo to reproduce the error - branch mem_resource_issues

The changes made:

...
    vecmem::host_memory_resource host_mr;
    vecmem::cuda::managed_memory_resource mng_mr;
    vecmem::contiguous_memory_resource c_mr(mng_mr,pow(2,30));//<--
    traccc::clusterization_algorithm ca(mng_mr);
    traccc::spacepoint_formation sf(mng_mr);
    traccc::seeding_algorithm sa(host_mr);
    traccc::track_params_estimation tp(host_mr);

    traccc::cuda::seeding_algorithm sa_cuda(c_mr); //<--
    traccc::cuda::track_params_estimation tp_cuda(c_mr); //<--
...
        traccc::cell_container_types::host cells_per_event =
            traccc::read_cells_from_event(
                event, i_cfg.cell_directory, common_opts.input_data_format,
                surface_transforms, digi_cfg, c_mr);//<--
...

Error:

========= Invalid __global__ read of size 8 bytes
=========     at 0xeb8 in /home/chamodya/myrepos/recent/traccc/build/_deps/vecmem-src/core/include/vecmem/containers/impl/vector_view.ipp:57:vecmem::data::vector_view<traccc::internal_spacepoint<traccc::spacepoint>>::size_ptr() const
=========     by thread (253,0,0) in block (47,0,0)
=========     Address 0x7fdf7ec10244 is misaligned
=========     Device Frame:/home/chamodya/myrepos/recent/traccc/build/_deps/vecmem-src/core/include/vecmem/containers/impl/device_vector.ipp:20:vecmem::device_vector<traccc::internal_spacepoint<traccc::spacepoint>>::device_vector(const vecmem::data::vector_view<traccc::internal_spacepoint<traccc::spacepoint>> &) [0xe10]
=========     Device Frame:/home/chamodya/myrepos/recent/traccc/build/_deps/vecmem-src/core/include/vecmem/containers/impl/jagged_device_vector.ipp:64:vecmem::jagged_device_vector<traccc::internal_spacepoint<traccc::spacepoint>>::at(unsigned long) [0xe10]
=========     Device Frame:/home/chamodya/myrepos/recent/traccc/build/_deps/detray-src/core/include/detray/grids/grid2.hpp:187:detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::device_vector, vecmem::jagged_device_vector, detray::darray, detray::dtuple, traccc::internal_spacepoint<traccc::spacepoint>, (bool)0, (unsigned int)1>::bin(unsigned long, unsigned long) [0xe10]
=========     Device Frame:/home/chamodya/myrepos/recent/traccc/build/_deps/detray-src/core/include/detray/grids/grid2.hpp:203:detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, vecmem::device_vector, vecmem::jagged_device_vector, detray::darray, detray::dtuple, traccc::internal_spacepoint<traccc::spacepoint>, (bool)0, (unsigned int)1>::bin(unsigned long) [0xe10]
=========     Device Frame:/home/chamodya/myrepos/recent/traccc/device/common/include/traccc/seeding/device/impl/populate_grid.ipp:51:traccc::device::populate_grid(unsigned long, const traccc::seedfinder_config &, const traccc::container_view<const unsigned long, const traccc::spacepoint> &, const vecmem::data::vector_view<const std::pair<unsigned long, unsigned long>> &, detray::grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, detray::dvector, detray::djagged_vector, detray::darray, detray::dtuple, traccc::internal_spacepoint<traccc::spacepoint>, (bool)0, (unsigned int)1>>) [0xe10]
=========     Device Frame:/home/chamodya/myrepos/recent/traccc/device/cuda/src/seeding/spacepoint_binning.cu:43:traccc::cuda::kernels::populate_grid(traccc::seedfinder_config, traccc::container_view<const unsigned long, const traccc::spacepoint>, vecmem::data::vector_view<const std::pair<unsigned long, unsigned long>>, detray::grid2_view<detray::grid2<detray::attach_populator, detray::axis::circular, detray::axis::regular, detray::serializer2, detray::dvector, detray::djagged_vector, detray::darray, detray::dtuple, traccc::internal_spacepoint<traccc::spacepoint>, (bool)0, (unsigned int)1>>) [0xf8]
=========     Saved host backtrace up to driver entry point at kernel launch time
...

running the example :

compute-sanitizer build/bin/traccc_seq_example_cuda --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --cell_directory=tml_full/ttbar_mu200/ --events=2 --input-binary --run_cpu=0

Moreover, binary page memory resource did not produce the expected improvements for file IO (it actually worsened it), will create another issue after looking into it further.

Support for intel/llvm 2022-09

The Intel compiler changed some stuff. Making VecMem fail to compile against their latest tag.

[ 38%] Building SYCL object sycl/CMakeFiles/vecmem_sycl.dir/src/utils/sycl/device_selector.sycl.o
In file included from /mnt/hdd1/krasznaa/projects/vecmem/vecmem/sycl/src/utils/sycl/device_selector.sycl:9:
/mnt/hdd1/krasznaa/projects/vecmem/vecmem/sycl/src/utils/sycl/device_selector.hpp:18:42: warning: 'device_selector' is deprecated: Use Callable instead to select device. [-Wdeprecated-declarations]
class device_selector : public cl::sycl::device_selector {
                                         ^
/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/bin/../include/sycl/device_selector.hpp:34:21: note: 'device_selector' has been explicitly marked deprecated here
class __SYCL_EXPORT __SYCL2020_DEPRECATED(
                    ^
/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/bin/../include/sycl/detail/defines_elementary.hpp:52:40: note: expanded from macro '__SYCL2020_DEPRECATED'
#define __SYCL2020_DEPRECATED(message) __SYCL_DEPRECATED(message)
                                       ^
/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/bin/../include/sycl/detail/defines_elementary.hpp:43:38: note: expanded from macro '__SYCL_DEPRECATED'
#define __SYCL_DEPRECATED(message) [[deprecated(message)]]
                                     ^
In file included from /mnt/hdd1/krasznaa/projects/vecmem/vecmem/sycl/src/utils/sycl/device_selector.sycl:9:
/mnt/hdd1/krasznaa/projects/vecmem/vecmem/sycl/src/utils/sycl/device_selector.hpp:29:15: warning: 'default_selector' is deprecated: Use the callable sycl::default_selector_v instead. [-Wdeprecated-declarations]
    cl::sycl::default_selector m_defaultSelector;
              ^
/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/bin/../include/sycl/device_selector.hpp:50:21: note: 'default_selector' has been explicitly marked deprecated here
class __SYCL_EXPORT __SYCL2020_DEPRECATED(
                    ^
/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/bin/../include/sycl/detail/defines_elementary.hpp:52:40: note: expanded from macro '__SYCL2020_DEPRECATED'
#define __SYCL2020_DEPRECATED(message) __SYCL_DEPRECATED(message)
                                       ^
/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/bin/../include/sycl/detail/defines_elementary.hpp:43:38: note: expanded from macro '__SYCL_DEPRECATED'
#define __SYCL_DEPRECATED(message) [[deprecated(message)]]
                                     ^
/mnt/hdd1/krasznaa/projects/vecmem/vecmem/sycl/src/utils/sycl/device_selector.sycl:31:18: error: no type named 'device' in namespace 'sycl::info'; did you mean 'CUdevice'?
    using info = cl::sycl::info::device;
                 ^~~~~~~~~~~~~~~~~~~~~~
                 CUdevice
/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/bin/../include/sycl/detail/backend_traits_cuda.hpp:25:13: note: 'CUdevice' declared here
typedef int CUdevice;
            ^
/mnt/hdd1/krasznaa/projects/vecmem/vecmem/sycl/src/utils/sycl/device_selector.sycl:33:26: error: use of undeclared identifier 'info'; did you mean '::sycl::info'?
        (device.get_info<info::name>() == m_deviceName)) {
                         ^~~~
                         ::sycl::info
/software/intel/clang-2022-09/x86_64-centos9-gcc11-opt/bin/../include/sycl/info/info_desc.hpp:24:11: note: '::sycl::info' declared here
namespace info {
          ^
/mnt/hdd1/krasznaa/projects/vecmem/vecmem/sycl/src/utils/sycl/device_selector.sycl:33:32: error: no member named 'name' in namespace 'sycl::info'
        (device.get_info<info::name>() == m_deviceName)) {
                         ~~~~~~^
2 warnings and 3 errors generated.
make[2]: *** [sycl/CMakeFiles/vecmem_sycl.dir/build.make:105: sycl/CMakeFiles/vecmem_sycl.dir/src/utils/sycl/device_selector.sycl.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1129: sycl/CMakeFiles/vecmem_sycl.dir/all] Error 2
make: *** [Makefile:166: all] Error 2

I don't think there is anything magic to be done, but the code will have to be taught how to choose between the right incantations depending on the version of the compiler being used.

Alignment support should be added to all memory resources

Following #181, this is a long-term goal we should complete; we have a lot of memory resources which are not current able to support alignment.

Calling a host function in static_vector implementation of push_back

I tried using a static_vector in device code which unfortunately is not possible due to the call to memmove in push_back.

vecmem/core/include/vecmem/containers/impl/static_vector.ipp

Line 169 in 7d47188

memmove(static_cast<void*>(id.m_ptr + 1),

warning #20011-D: calling a host function("memmove") from a host device function ("vecmem::static_vector::insert") is not allowed

vecmem::data::vector_buffer doesn't work for `unsigned long`

CUDA version : 11.5.0

vector_buffer doesnt work with unsigned long with following error message:

what(): /home/beomki/projects/vecmem/vecmem/cuda/src/memory/cuda/managed_memory_resource.cpp:33 Failed to execute: cudaFree(p) (misaligned address)

int or unsigned int was OK

you can reproduce the error with the following codes

test_cuda_containers.cpp

TEST_F(cuda_containers_test, buffer_type_test) {

    // The memory resource(s).
    vecmem::cuda::managed_memory_resource managed_resource;

    vecmem::data::vector_buffer<int> int_vec(3, 0, managed_resource);
    m_copy.setup(int_vec);

    int_buffer_test(int_vec); // OK

    vecmem::data::vector_buffer<unsigned int> uint_vec(3, 0, managed_resource);
    m_copy.setup(uint_vec);

    uint_buffer_test(uint_vec); // OK

    vecmem::data::vector_buffer<unsigned long> ulong_vec(3, 0,
                                                         managed_resource);
    m_copy.setup(ulong_vec);

    ulong_buffer_test(ulong_vec);  // doesnt work
}

test_cuda_containers_kernels.cuh

void int_buffer_test(vecmem::data::vector_view<int> vec);

void uint_buffer_test(vecmem::data::vector_view<unsigned int> vec);

void ulong_buffer_test(vecmem::data::vector_view<unsigned long> vec);

test_cuda_containers_kernels.cu

__global__ void int_buffer_test_kernel(
    vecmem::data::vector_view<int> vec_data) {
    vecmem::device_vector<int> vec(vec_data);
    vec.push_back(0);
}
void int_buffer_test(vecmem::data::vector_view<int> vec) {
    int_buffer_test_kernel<<<1, 1>>>(vec);
    // Check whether it succeeded to run.
    VECMEM_CUDA_ERROR_CHECK(cudaGetLastError());
    VECMEM_CUDA_ERROR_CHECK(cudaDeviceSynchronize());
}
__global__ void uint_buffer_test_kernel(
    vecmem::data::vector_view<unsigned int> vec_data) {
    vecmem::device_vector<unsigned int> vec(vec_data);
    vec.push_back(0);
}
void uint_buffer_test(vecmem::data::vector_view<unsigned int> vec) {
    uint_buffer_test_kernel<<<1, 1>>>(vec);
    // Check whether it succeeded to run.
    VECMEM_CUDA_ERROR_CHECK(cudaGetLastError());
    VECMEM_CUDA_ERROR_CHECK(cudaDeviceSynchronize());
}
__global__ void ulong_buffer_test_kernel(
    vecmem::data::vector_view<unsigned long> vec_data) {
    vecmem::device_vector<unsigned long> vec(vec_data);
    vec.push_back(0);
}
void ulong_buffer_test(vecmem::data::vector_view<unsigned long> vec) {
    ulong_buffer_test_kernel<<<1, 1>>>(vec);
    // Check whether it succeeded to run.
    VECMEM_CUDA_ERROR_CHECK(cudaGetLastError());
    VECMEM_CUDA_ERROR_CHECK(cudaDeviceSynchronize());
}

Using binary page memory resource in Traccc for IO takes more time than expected.

Using binary page memory for file IO and algorithm IO in Traccc consumes more time than without using any downstream memory resource, this is contradicting to what is expected. The times get worse after each event. Provided below are file IO times for traccc_seq_example_cuda computing 10 events using managed memory resource and binary page memory resource using managed memory as upstream.

Managed memory resource


$ build/bin/traccc_seq_example_cuda --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --cell_directory=tml_full/ttbar_mu200/ --events=10 --input-binary --run_cpu=0
Running build/bin/traccc_seq_example_cuda tml_detector/trackml-detector.csv tml_full/ttbar_mu200/ 10
0 File_IO: 0.127675
1 File_IO: 0.0567025
2 File_IO: 0.0524904
3 File_IO: 0.0517798
4 File_IO: 0.0586979
5 File_IO: 0.0564024
6 File_IO: 0.0545458
7 File_IO: 0.0526243
8 File_IO: 0.0541083
9 File_IO: 0.0492084

Binary page memory resource

$ build/bin/traccc_seq_example_cuda --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --cell_directory=tml_full/ttbar_mu200/ --events=10 --input-binary --run_cpu=0
Running build/bin/traccc_seq_example_cuda tml_detector/trackml-detector.csv tml_full/ttbar_mu200/ 10
0 File_IO: 0.849157
1 File_IO: 1.45001
2 File_IO: 1.58589
3 File_IO: 1.69902
4 File_IO: 1.74604
5 File_IO: 1.86683
6 File_IO: 1.9433
7 File_IO: 2.03939
8 File_IO: 2.17419
9 File_IO: 2.15853

In addition, when using binary page memory resource with host memory resource as upstream for traccc_seq_example (cpu) algorithm IO takes a really long time. Below are 2 events
host memory resource

$ build/bin/traccc_seq_example --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --cell_directory=tml_full/ttbar_mu200/ --events=2 --input-binary 
Running build/bin/traccc_seq_example tml_detector/trackml-detector.csv tml_full/ttbar_mu200/ 2
0 File_IO: 0.0103446
0 Clusterization: 0.0153028
0 Spacepoint: 0.00112334
0 Seeding: 0.174703
0 Track params est : 0.0041265
1 File_IO: 0.00498015
1 Clusterization: 0.0167117
1 Spacepoint: 0.0012795
1 Seeding: 0.230505
1 Track params est : 0.0055673

binary page memory resource

$ build/bin/traccc_seq_example --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --cell_directory=tml_full/ttbar_mu200/ --events=2 --input-binary 
Running build/bin/traccc_seq_example tml_detector/trackml-detector.csv tml_full/ttbar_mu200/ 2
0 File_IO: 0.795708
0 Clusterization: 251.087
0 Spacepoint: 10.6834
0 Seeding: 7.35312
0 Track params est : 0.0117272
1 File_IO: 1.34328
1 Clusterization: 466.343
1 Spacepoint: 10.4089
1 Seeding: 8.56579
1 Track params est : 0.0199618

Get function for vecmem::static_array

Can we have vecmem::get function for vecmem::static_array, which is like std::get defined here?

detray is using std::get for array in several places.

adding sort function for vecmem

Is it possible for vecmem vector to have a sort function running on device? There are some use-cases (acts seeding) for this functionality

Size type of `vecmem::jagged_device_vector` should be `unsigned int`

This is to bring it in sync with the jagged device vector. Having these two classes have different size types is unlikely to cause any real problems, but it is a little bit of sloppy design that I think we should fix.

Vectors do not support boolean template types

The standard library dictates that std::vector<bool> (and, by extension, vecmem::vector) is a specialization without a guaranteed .data() member function on which vecmem relies. This means booleans cannot be stored in vecmem vectors.

Tests for alignment should be added

Also following #181, we need to add new tests to ensure that the alignment requirements are actually met. Preferably after #183 is completed.

Jagged vector buffer constructor of non-static size not using given sizes.

I stumbled upon this error by chance, as currently we have no using for creating non-static sized buffers starting with size different than 0, but it could come in handy in the future.
Right now the sizes of the created buffer are not initialised, only its capacities.

vecmem/core/include/vecmem/containers/impl/jagged_vector_buffer.ipp

Lines 104 to 145 in 971a7ef

    
           template <typename TYPE> 
        
           jagged_vector_buffer<TYPE>::jagged_vector_buffer( 
        
               const std::vector<std::size_t>& sizes, 
        
               const std::vector<std::size_t>& capacities, memory_resource& resource, 
        
               memory_resource* host_access_resource) 
        
               : base_type(sizes.size(), nullptr), 
        
                 m_outer_memory(::allocate_jagged_buffer_outer_memory<TYPE>( 
        
                     (host_access_resource == nullptr ? 0 : sizes.size()), resource)), 
        
                 m_outer_host_memory(::allocate_jagged_buffer_outer_memory<TYPE>( 
        
                     sizes.size(), 
        
                     (host_access_resource == nullptr ? resource 
        
                                                      : *host_access_resource))) { 
        
               using header_t = typename vecmem::data::jagged_vector_buffer< 
        
                   TYPE>::value_type::size_type; 
        
               // Determine the allocation size. 
        
               std::size_t total_elements = std::accumulate( 
        
                   capacities.begin(), capacities.end(), static_cast<std::size_t>(0)); 
        
               header_t* header_ptr = nullptr; 
        
               TYPE* data_ptr = nullptr; 
        
               std::tie(m_inner_memory, header_ptr, data_ptr) = 
        
                   details::aligned_multiple_placement<header_t, TYPE>( 
        
                       resource, capacities.size(), total_elements); 
        
               // Some sanity check. 
        
               assert(sizes.size() == capacities.size()); 
        
               // Point the base class at the newly allocated memory. 
        
               base_type::m_ptr = 
        
                   ((host_access_resource != nullptr) ? m_outer_memory.get() 
        
                                                      : m_outer_host_memory.get()); 
        
               base_type::m_host_ptr = m_outer_host_memory.get(); 
        
               // Set up the vecmem::vector_view objects in the host accessible memory. 
        
               std::ptrdiff_t ptrdiff = 0; 
        
               for (std::size_t i = 0; i < capacities.size(); ++i) { 
        
                   new (base_type::host_ptr() + i) value_type( 
        
                       static_cast<typename value_type::size_type>(capacities[i]), 
        
                       &header_ptr[i], data_ptr + ptrdiff); 
        
                   ptrdiff += capacities[i]; 
        
               } 
        
           }

Resizable (jagged) vectors can't have a non-zero starting size

As I mentioned in #94, this has been bothering me for a while... With a constructor like

    /// Resizable data constructor
    vector_buffer(size_type capacity, size_type size,
                  memory_resource& resource);

, one should assume that the created vector buffer would have a starting size of size. But in the current implementation it doesn't. It always starts from 0. 😦

When you set up a resizable (jagged) vector currently, you do it like:

    vecmem::data::vector_buffer<int> output_buffer(input.size(), 0,
                                                   device_resource);
    m_copy.setup(output_buffer);

Where m_copy is one of the "platform specific" copy objects. The problem here is that the buffer constructor currently has no way of storing the starting size that it received, in the memory that it would allocate with the memory resource that it receives. In case the memory resource allocates non-host-accessible memory, one needs to write to that memory using the appropriate "copy class". But by the time that we use the vecmem::copy::setup(...) function, that original size variable is lost. (The buffer does not store it anywhere.)

There are a couple of ways out of it, but they either require some bigger re-organisation in the code, or a setup that I wouldn't really like. I think the following are all our options here:

We add one more variable to the buffer classes, which would hold the "starting size" of the buffer. This variable would then be used by vecmem::copy::setup(...) when initialising the (device) memory. Technically this could work (I think...), but I just really don't like this design. (But maybe you guys could convince me otherwise...)
In order to create a resizable (jagged) vector buffer, one would need to provide both a memory resource, and a copy object to its constructor. So that the constructor itself would take care of all of the setup that it needs. I'm leaning towards this option at the moment... (However we might run into problems with circular dependencies on this one, since vecmem::copy already knows about the buffer types...)
We create "factory functions" on vecmem::copy for constructing resizable (jagged) vector buffers. This is very similar to option number 2, and would introduce a design that we don't use anywhere else in the project so far.

So... Any objections to having to create resizable buffers with code like:

vecmem::cuda::device_memory_resource resource;
vecmem::cuda::copy copy;

vecmem::data::vector_buffer<int> buffer1(1000, 100, resource, copy);
vecmem::data::jagged_vector_buffer<int> buffer2({1000, 1000, 1000}, {10, 100, 200}, resource, copy);

@konradkusiak97, I'm interested in your opinion as well. 😉

Update vecmem::atomic to use sycl::atomic_ref when it's available

The latest versions of the Intel compiler started producing the following types of warnings:

[ 92%] Building SYCL object tests/sycl/CMakeFiles/vecmem_test_sycl.dir/test_sycl_containers.sycl.o
In file included from /data/ssd-1tb/projects/vecmem/vecmem/tests/sycl/test_sycl_containers.sycl:15:
In file included from /data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/containers/const_device_vector.hpp:10:
In file included from /data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/containers/device_vector.hpp:12:
In file included from /data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/memory/atomic.hpp:109:
/data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/memory/impl/atomic.ipp:70:12: error: 'atomic<unsigned int, sycl::access::address_space::global_space>' is deprecated: sycl::atomic is deprecated since SYCL 2020 [-Werror,-Wdeprecated-declarations]
    return __VECMEM_SYCL_ATOMIC_CALL0(load, m_ptr);
           ^
/data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/memory/impl/atomic.ipp:19:19: note: expanded from macro '__VECMEM_SYCL_ATOMIC_CALL0'
        cl::sycl::atomic<value_type>(cl::sycl::global_ptr<value_type>(PTR)))
                  ^
/data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/containers/impl/device_vector.ipp:399:22: note: in instantiation of member function 'vecmem::atomic<unsigned int>::load' requested here
        return asize.load();
                     ^
/data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/containers/impl/device_vector.ipp:80:18: note: in instantiation of member function 'vecmem::device_vector<int>::size' requested here
    assert(pos < size());
                 ^
/data/ssd-1tb/projects/vecmem/vecmem/tests/sycl/test_sycl_containers.sycl:73:27: note: in instantiation of member function 'vecmem::device_vector<int>::at' requested here
                outputvec.at(i) = inputvec.at(i) * constantarray1.at(0) +
                          ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/atomic.hpp:171:7: note: 'atomic<unsigned int, sycl::access::address_space::global_space>' has been explicitly marked deprecated here
class __SYCL2020_DEPRECATED(
      ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/detail/defines_elementary.hpp:54:40: note: expanded from macro '__SYCL2020_DEPRECATED'
#define __SYCL2020_DEPRECATED(message) __SYCL_DEPRECATED(message)
                                       ^
/home/krasznaa/software/intel/clang/nightly-20220217/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/detail/defines_elementary.hpp:45:38: note: expanded from macro '__SYCL_DEPRECATED'
#define __SYCL_DEPRECATED(message) [[deprecated(message)]]
                                     ^
In file included from /data/ssd-1tb/projects/vecmem/vecmem/tests/sycl/test_sycl_containers.sycl:15:
In file included from /data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/containers/const_device_vector.hpp:10:
In file included from /data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/containers/device_vector.hpp:12:
In file included from /data/ssd-1tb/projects/vecmem/vecmem/core/include/vecmem/memory/atomic.hpp:109:

So we need to start using sycl::atomic_ref instead of sycl::atomic.

Unfortunately at first we must wait for a resolution to intel/llvm#5647, since NVIDIA backend support is of course a must for our code...

const experssion for `vecmem::static_array`

While integrating vecmem functionality into detray, I've found that vecmem::static_array doesn't support the const expression - is it intended or just missing it?

Segfault in Resizable Vector Buffer

Ignore this if you think it is the duplicate of #95.
The vector buffer with zero size doesn't work. Following is the minimal code for reproducing the problem

TEST_F(cuda_containers_test, zero_size) {

    vecmem::cuda::device_memory_resource device_resource;

    vecmem::data::vector_buffer<int> buff0(10, 0, device_resource);

    // OK
    ASSERT_EQ(buff0.capacity(), 10);
    // Not OK
    ASSERT_EQ(buff0.size(), 0); // Seg fault

    vecmem::data::vector_buffer<int> buff1(10, 10, device_resource);

    // OK
    ASSERT_EQ(buff1.capacity(), 10);
    // OK
    ASSERT_EQ(buff1.size(), 10);
}

Segmentation fault (core dumped)

Harmonise sizes and capacities ordering in buffer creations.

Would be good for the user to have these two constructors use the same ordering rather than opposing ones. (I'm aware for the jagged case we're not actually using the capacities as mentioned in #95 , but regardless this would be a good change)
Could also template the size_type for the second one.

vecmem/core/include/vecmem/containers/data/jagged_vector_buffer.hpp

Lines 104 to 105 in ee21039

    
           jagged_vector_buffer(const std::vector<SIZE_TYPE>& sizes, 
        
                                const std::vector<SIZE_TYPE>& capacities,

vecmem/core/include/vecmem/containers/data/vector_buffer.hpp

Lines 55 to 56 in ee21039

    
           vector_buffer(size_type capacity, size_type size, 
        
                         memory_resource& resource);

	template <typename TYPE>
	jagged_vector_buffer<TYPE>::jagged_vector_buffer(
	const std::vector<std::size_t>& sizes,
	const std::vector<std::size_t>& capacities, memory_resource& resource,
	memory_resource* host_access_resource)
	: base_type(sizes.size(), nullptr),
	m_outer_memory(::allocate_jagged_buffer_outer_memory<TYPE>(
	(host_access_resource == nullptr ? 0 : sizes.size()), resource)),
	m_outer_host_memory(::allocate_jagged_buffer_outer_memory<TYPE>(
	sizes.size(),
	(host_access_resource == nullptr ? resource
	: *host_access_resource))) {
	using header_t = typename vecmem::data::jagged_vector_buffer<
	TYPE>::value_type::size_type;
	// Determine the allocation size.
	std::size_t total_elements = std::accumulate(
	capacities.begin(), capacities.end(), static_cast<std::size_t>(0));

	header_t* header_ptr = nullptr;
	TYPE* data_ptr = nullptr;
	std::tie(m_inner_memory, header_ptr, data_ptr) =
	details::aligned_multiple_placement<header_t, TYPE>(
	resource, capacities.size(), total_elements);

	// Some sanity check.
	assert(sizes.size() == capacities.size());

	// Point the base class at the newly allocated memory.
	base_type::m_ptr =
	((host_access_resource != nullptr) ? m_outer_memory.get()
	: m_outer_host_memory.get());
	base_type::m_host_ptr = m_outer_host_memory.get();

	// Set up the vecmem::vector_view objects in the host accessible memory.
	std::ptrdiff_t ptrdiff = 0;
	for (std::size_t i = 0; i < capacities.size(); ++i) {
	new (base_type::host_ptr() + i) value_type(
	static_cast<typename value_type::size_type>(capacities[i]),
	&header_ptr[i], data_ptr + ptrdiff);
	ptrdiff += capacities[i];
	}
	}

	jagged_vector_buffer(const std::vector<SIZE_TYPE>& sizes,
	const std::vector<SIZE_TYPE>& capacities,

	vector_buffer(size_type capacity, size_type size,
	memory_resource& resource);