computationalradiationphysics / nbody-alpaka Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 451 KB

n body simulation with alpaka

C++ 85.05% CMake 12.14% Python 2.80%

nbody-alpaka's People

Contributors

Watchers

nbody-alpaka's Issues

Simulation class uses one accelerator for kernels with different Dimensions

The simulation class in branch simulation-class utilizes a template TAcc for both kernels, which might not work because the two kernels we have use different indexing dimensions.
The Accelerator type however needs this dimension to be specified.
I'm not sure if a typename can handle a template. That way we could specify indexing dimension and type later.

File in question: simulation.hpp

force-matrix-test: incomplete type GetWorkDiv

I've added the usage of shared mem in the branch sharedmem-force-matrix-kernel.

After fixing most of the errors I am stuck with the last one:

(...)/tests/forceMatrixKernel/forcematrix_test.cpp:243:17:   required from here
(...)/alpaka/include/alpaka/workdiv/Traits.hpp:73:29: error: incomplete type ‘alpaka::workdiv::traits::GetWorkDiv<alpaka::acc::AccCpuSerial<std::integral_
constant<long unsigned int, 2ul>, long unsigned int>, alpaka::origin::Block, alpaka::origin::Thread, void>’ used in nested name specifier
                 ::getWorkDiv(

The part of the code where I'm calling the getValidWorkDiv to create the work division hasn't changed. So I'm guessing it's either some class order issue or there is another way of creating a workdiv if the kernel uses extern shared memory.

Add shared memory code anew.

Shared memory code might have little to no effect on performance on CUDA hardware but it might lead to other results on different hardware.

Said code can be found in commit 3a6f863

Calculate only half of the force matrix

By using F_ij = -F_ji the force matrix can be reduced to half its size.
This can be achieved either by using the already proposed chessboard pattern or by only calculating one side of the matrix resulting in a "triangle".

Proposal: Join kernels together, reduced memory usage, increases runtime

We could join both kernels, which might make the force matrix in global memory obsolete because we sum up forces one by one.

Pro:

less memory

Contra:

pretty sure that'll need more runtime.
adding in log n steps might not be possible anymore.

Optimization: Sum some elements in ForceMatrixKernel

With the now implemented support of the alpaka elements layer, we could already sum elements in the ForceMatrixKernel therefor reducing the size of the force matrix in the global memory.

ForceMatrixKernel test: an illegal memory access was encountered

Running the test with Cuda on Taurus:
error.txt

Edit: AccCpuSerial as well es AccCpuOmp2Blocks and Threads run as expected.
It's probably the different hardware.
Maybe the compiler doesn't compile the operators for die GPU?

Edit2: Englisch.

computationalradiationphysics / nbody-alpaka Goto Github PK

nbody-alpaka's People

Contributors

Watchers

nbody-alpaka's Issues

Simulation class uses one accelerator for kernels with different Dimensions

force-matrix-test: incomplete type GetWorkDiv

Add shared memory code anew.

Calculate only half of the force matrix

Proposal: Join kernels together, reduced memory usage, increases runtime

Optimization: Sum some elements in ForceMatrixKernel

ForceMatrixKernel test: an illegal memory access was encountered

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent