To compile the code using the Makefile you'll need to fulfill the following requirments:
-
build-essential package with a gcc version <= 11.x
-
python-dev package, make sure to match the dev package version with your python version. (3.10 have been used during the development)
-
PASIf use pybind11 (https://github.com/pybind/pybind11) to link the C++/CUDA code as a python module. Make sure after cloning the project that the submodule have been downloaded as well:
git submodule init git submodule update
Make sure in the Makefile to make the PYBIND11 macro to match you'r installed version.
-
To compile the CUDA code you'll need the Nvidia compiler nvcc. This come in the NVIDIA HPC SDK: https://developer.nvidia.com/hpc-sdk
This is installed by default in /opt/nvidia/, make sure to add the path to the compiler in your Linux Path:export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/*version*/compilers/bin:$PATH
Then make sure to match the NVCCINCLUDE macro in the Makefile with your installed version of the HPC SDK.
-
To compile, just run:
mkdir build make
To switch the precision, change float to double in /src/helpers.cuh
Depending on the hardware targeted during the compilation make sure that the -arch flag match the architecture of your hardware. This to get the best performances.
The compilation process will generate a .so python module in the ./build folder.
The tools to benchmark Nvidia GPU kernels are Nsight Compute and Nsight systems. They come with the NVIDIA HPC SDK but you can also directly donwload the more recent version here: https://developer.nvidia.com/gameworksdownload#?dn=nsight-systems-2023-2
# Run the code with NSYS event capture
> nsys profile python3 FullInterfaceTesting.py
# You can open the .nsys-rep output using the graphical interface (nsys-ui) or export it in a .sqlite format and open the result directly in the terminal
> nsys export --output=rep_name.sqlite -t sqlite report.nsys-rep
# Open the benchmark in the terminal
> nsys stats *.sqlite