computationalradiationphysics / nbody-alpaka Goto Github PK
View Code? Open in Web Editor NEWn body simulation with alpaka
n body simulation with alpaka
The simulation class in branch simulation-class utilizes a template TAcc for both kernels, which might not work because the two kernels we have use different indexing dimensions.
The Accelerator type however needs this dimension to be specified.
I'm not sure if a typename can handle a template. That way we could specify indexing dimension and type later.
File in question: simulation.hpp
I've added the usage of shared mem in the branch sharedmem-force-matrix-kernel.
After fixing most of the errors I am stuck with the last one:
(...)/tests/forceMatrixKernel/forcematrix_test.cpp:243:17: required from here
(...)/alpaka/include/alpaka/workdiv/Traits.hpp:73:29: error: incomplete type ‘alpaka::workdiv::traits::GetWorkDiv<alpaka::acc::AccCpuSerial<std::integral_
constant<long unsigned int, 2ul>, long unsigned int>, alpaka::origin::Block, alpaka::origin::Thread, void>’ used in nested name specifier
::getWorkDiv(
The part of the code where I'm calling the getValidWorkDiv to create the work division hasn't changed. So I'm guessing it's either some class order issue or there is another way of creating a workdiv if the kernel uses extern shared memory.
Shared memory code might have little to no effect on performance on CUDA hardware but it might lead to other results on different hardware.
Said code can be found in commit 3a6f863
By using F_ij = -F_ji the force matrix can be reduced to half its size.
This can be achieved either by using the already proposed chessboard pattern or by only calculating one side of the matrix resulting in a "triangle".
We could join both kernels, which might make the force matrix in global memory obsolete because we sum up forces one by one.
Pro:
Contra:
With the now implemented support of the alpaka elements layer, we could already sum elements in the ForceMatrixKernel therefor reducing the size of the force matrix in the global memory.
Running the test with Cuda on Taurus:
error.txt
Edit: AccCpuSerial as well es AccCpuOmp2Blocks and Threads run as expected.
It's probably the different hardware.
Maybe the compiler doesn't compile the operators for die GPU?
Edit2: Englisch.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.