Hello, as the title says, I am trying to install Smilei for A100s on

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Problem installing Smilei for A100 GPUs about smilei HOT 7 CLOSED

tmiethlinger commented on July 24, 2024

Problem installing Smilei for A100 GPUs

from smilei.

Comments (7)

charlesprouveur commented on July 24, 2024

Hello, can you show the output of "module list"?

the file jean_zay_gpu_A100 should be used as inspiration, there are probably modifications needed for your system. For instance, we specify -L/gpfslocalsys/cuda/11.2/lib64/ : that would be useless for your cluster.

I see you have COMPILER_INFO : g++ , it should be nvc++ which leads me to think you don't have an nvhpc module loaded and your hdf5 module is also probably not compiled with it.

from smilei.

tmiethlinger commented on July 24, 2024

Thank you for your reply.

Here's the output of module list (here CUDA 11.8 is used):

Currently Loaded Modules:
1) release/23.04 (S)
2) GCCcore/11.3.0
3) zlib/1.2.12
4) binutils/2.38
5) GCC/11.3.0
6) numactl/2.0.14
7) XZ/5.2.5
8) libxml2/2.9.13
9) libpciaccess/0.16
10) hwloc/2.7.1
11) OpenSSL/1.1
12) libevent/2.1.12
13) UCX/1.12.1
14) libfabric/1.15.1
15) PMIx/4.1.2
16) UCC/1.0.0
17) OpenMPI/4.1.4
18) OpenBLAS/0.3.20
19) FlexiBLAS/3.2.0
20) FFTW/3.3.10
21) FFTW.MPI/3.3.10
22) ScaLAPACK/2.2.0-fb
23) foss/2022a
24) CUDA/11.8.0
25) ncurses/6.3
26) bzip2/1.0.8
27) cURL/7.83.0
28) libarchive/3.6.1
29) CMake/3.24.3
30) Szip/2.1.1
31) HDF5/1.13.2

from smilei.

charlesprouveur commented on July 24, 2024

As expected you do not have an nvhpc module loaded (which includes the nvc++ compiler that is required to compile the code) ; the cuda module alone only contains the nvcc compiler used to compile cuda files (but not the rest of the code). I recommend installing nvhpc 23.1 which comes with its own cuda and openmpi. You would only need to compile an hdf5 module with it to be ready in terms of dependencies.

from smilei.

tmiethlinger commented on July 24, 2024

Hi,
so, I now successfully installed nvhpc 23.11.
Which flags would I need to adjust in my machine file? This is what I have now as a machine file (tm_gpu_A100)

SMILEICXX.DEPS = nvcc
THRUSTCXX = nvcc
ACCELERATOR_GPU_FLAGS += -w
ACCELERATOR_GPU_FLAGS += -tp=zen3 -ta=tesla:cc80 -std=c++14  -lcurand -Mcudalib=curand
ACCELERATOR_GPU_KERNEL_FLAGS += -O3 --std c++14 $(DIRS:%=-I%)
ACCELERATOR_GPU_KERNEL_FLAGS += --expt-relaxed-constexpr
ACCELERATOR_GPU_KERNEL_FLAGS += $(shell $(PYTHONCONFIG) --includes)
ACCELERATOR_GPU_KERNEL_FLAGS += -arch=sm_80
ACCELERATOR_GPU_FLAGS        += -Minfo=accel # what is offloaded/copied
ACCELERATOR_GPU_FLAGS += -DSMILEI_OPENACC_MODE
ACCELERATOR_GPU_KERNEL_FLAGS += -DSMILEI_OPENACC_MODE
LDFLAGS += -ta=tesla:cc80 -std=c++14 -Mcudalib=curand -lcudart -lcurand -lacccuda -L/home/myuser/lib/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/lib64/
CXXFLAGS +=  -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1

but using make machine="tm_gpu_A100" config="gpu_nvidia noopenmp verbose" -j1 I get:

Checking dependencies for src/Tools/tabulatedFunctions.cpp
if [ ! -d "build/src/Tools" ]; then mkdir -p "build/src/Tools"; fi;
nvcc -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -D__VERSION=\"5.0-57-gc23dd350a-master\" -DOMPI_SKIP_MPICXX -std=c++14  -I/home/thmi817d/lib/hdf5_nvhpc/include -Isrc -Isrc/ElectroMagnBC -Isrc/SmileiMPI -Isrc/ParticleInjector -Isrc/DomainDecomposition -Isrc/Pusher -Isrc/Species -Isrc/Particles -Isrc/ElectroMagn -Isrc/Params -Isrc/picsar_interface -Isrc/Profiles -Isrc/Radiation -Isrc/Checkpoint -Isrc/ParticleBC -Isrc/Tools -Isrc/Field -Isrc/Collisions -Isrc/Interpolator -Isrc/ElectroMagnSolver -Isrc/MultiphotonBreitWheeler -Isrc/Ionization -Isrc/MovWindow -Isrc/Diagnostic -Isrc/Python -Isrc/Merging -Isrc/Projector -Isrc/Patch -Isrc/PartCompTime -Ibuild/src/Python -I/home/thmi817d/miniconda3/envs/smilei/include/python3.9 -I/home/thmi817d/miniconda3/envs/smilei/include/python3.9 -I/home/thmi817d/miniconda3/envs/smilei/lib/python3.9/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -O3 -g -MF"build/src/Tools/tabulatedFunctions.d" -MM -MP -MT"build/src/Tools/tabulatedFunctions.d build/src/Tools/tabulatedFunctions.o" src/Tools/tabulatedFunctions.cpp
nvcc fatal   : Unknown option '-MFbuild/src/Tools/tabulatedFunctions.d'
Checking dependencies for src/Tools/PyTools.cpp
...

My current Smilei profile looks like:

NVARCH=`uname -s`_`uname -m`; export NVARCH
NVCOMPILERS=/home/myuser/lib/nvidia/hpc_sdk; export NVCOMPILERS
MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/23.11/compilers/man; export MANPATH
PATH=$NVCOMPILERS/$NVARCH/23.11/compilers/bin:$PATH; export PATH

export PATH=$NVCOMPILERS/$NVARCH/23.11/comm_libs/mpi/bin:$PATH
export MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/23.11/comm_libs/mpi/man

export HDF5_ROOT=$HOME/lib/hdf5_nvhpc
export LD_LIBRARY_PATH=$HDF5_ROOT/lib:$LD_LIBRARY_PATH

Do you see what the issue might be? The folders 23.11/compilers and 23.11/comm_libs exists, so that part should be correct I think.

from smilei.

charlesprouveur commented on July 24, 2024

You installed nvhpc 23.11 which might contain cuda 11.8 and/ or cuda 12.3 . for cuda 12.3 there are current known issue that we are working on. For cuda 11.8, modifications in the code might be needed ... which is why i recommended nvhpc 23.1 which you can get there https://developer.nvidia.com/nvidia-hpc-sdk-231-downloads.

To answer your questions:

change SMILEICXX.DEPS to nvc++
The -ta=tesla:cc80 option works with nvhpc 23.1 but not nvhpc >23.4 , you would need different options, which is another reason to use the older nvhpc ( you can look at the machine file ruche_gpu2 as an example where we compiled and executed with nvhpc 23.9 and cuda 11.8, it's possible but some executables had issues so i do not recommend it at this time.
The "error" messages during the dependency check can be ignored, it is not an issue.
The rest should be fine.

from smilei.

mccoys commented on July 24, 2024

In the future, we ask that for support, you should use the chatroom
https://app.element.io/#/room/!LQrdVpOJEohPSWMlmf:matrix.org

If you need more space to write your problem, use the discussions:
https://github.com/SmileiPIC/Smilei/discussions/categories/q-a

Use issues here when you want to report an actual bug or feature request

from smilei.

mccoys commented on July 24, 2024

@tmiethlinger Note that the makefile has been modified to make GPU compilation easier.
See this: https://smileipic.github.io/Smilei/Use/installation.html#setup-environment-variables-for-compilation
and this: https://smileipic.github.io/Smilei/Use/installation.html#compilation-for-gpu-accelerated-nodes

from smilei.

Problem installing Smilei for A100 GPUs about smilei HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent