This project demonstrates how FTI can be used with CUDA. It performs a simple vector addition of C = A + B by dividing the vector size evenly among the number of MPI processes. Each process then launches a CUDA kernel to compute their partition of the vector.
To compile the following environment variables need to be set:
- MPI_HOME
- CUDA_HOME
- FTI_HOME
Otherwise the Makefile can be adjusted directly to point to these locations.
These variables should point to the home directory of MPI, CUDA and FTI
respectively. To compile run make
.
Execute the binary with the following argument
- vector-size
- vector-size Specifies the length of the vector
You will need to have FTI built and configured for a successful run. For more information on FTI see their github repository.