Code Monkey home page Code Monkey logo

Comments (7)

charlesprouveur avatar charlesprouveur commented on July 24, 2024

Hello, can you show the output of "module list"?

the file jean_zay_gpu_A100 should be used as inspiration, there are probably modifications needed for your system. For instance, we specify -L/gpfslocalsys/cuda/11.2/lib64/ : that would be useless for your cluster.

I see you have COMPILER_INFO : g++ , it should be nvc++ which leads me to think you don't have an nvhpc module loaded and your hdf5 module is also probably not compiled with it.

from smilei.

tmiethlinger avatar tmiethlinger commented on July 24, 2024

Thank you for your reply.

Here's the output of module list (here CUDA 11.8 is used):

Currently Loaded Modules:
1) release/23.04 (S)
2) GCCcore/11.3.0
3) zlib/1.2.12
4) binutils/2.38
5) GCC/11.3.0
6) numactl/2.0.14
7) XZ/5.2.5
8) libxml2/2.9.13
9) libpciaccess/0.16
10) hwloc/2.7.1
11) OpenSSL/1.1
12) libevent/2.1.12
13) UCX/1.12.1
14) libfabric/1.15.1
15) PMIx/4.1.2
16) UCC/1.0.0
17) OpenMPI/4.1.4
18) OpenBLAS/0.3.20
19) FlexiBLAS/3.2.0
20) FFTW/3.3.10
21) FFTW.MPI/3.3.10
22) ScaLAPACK/2.2.0-fb
23) foss/2022a
24) CUDA/11.8.0
25) ncurses/6.3
26) bzip2/1.0.8
27) cURL/7.83.0
28) libarchive/3.6.1
29) CMake/3.24.3
30) Szip/2.1.1
31) HDF5/1.13.2

from smilei.

charlesprouveur avatar charlesprouveur commented on July 24, 2024

As expected you do not have an nvhpc module loaded (which includes the nvc++ compiler that is required to compile the code) ; the cuda module alone only contains the nvcc compiler used to compile cuda files (but not the rest of the code). I recommend installing nvhpc 23.1 which comes with its own cuda and openmpi. You would only need to compile an hdf5 module with it to be ready in terms of dependencies.

from smilei.

tmiethlinger avatar tmiethlinger commented on July 24, 2024

Hi,
so, I now successfully installed nvhpc 23.11.
Which flags would I need to adjust in my machine file? This is what I have now as a machine file (tm_gpu_A100)

SMILEICXX.DEPS = nvcc
THRUSTCXX = nvcc
ACCELERATOR_GPU_FLAGS += -w
ACCELERATOR_GPU_FLAGS += -tp=zen3 -ta=tesla:cc80 -std=c++14  -lcurand -Mcudalib=curand
ACCELERATOR_GPU_KERNEL_FLAGS += -O3 --std c++14 $(DIRS:%=-I%)
ACCELERATOR_GPU_KERNEL_FLAGS += --expt-relaxed-constexpr
ACCELERATOR_GPU_KERNEL_FLAGS += $(shell $(PYTHONCONFIG) --includes)
ACCELERATOR_GPU_KERNEL_FLAGS += -arch=sm_80
ACCELERATOR_GPU_FLAGS        += -Minfo=accel # what is offloaded/copied
ACCELERATOR_GPU_FLAGS += -DSMILEI_OPENACC_MODE
ACCELERATOR_GPU_KERNEL_FLAGS += -DSMILEI_OPENACC_MODE
LDFLAGS += -ta=tesla:cc80 -std=c++14 -Mcudalib=curand -lcudart -lcurand -lacccuda -L/home/myuser/lib/nvidia/hpc_sdk/Linux_x86_64/23.11/cuda/12.3/lib64/
CXXFLAGS +=  -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1

but using make machine="tm_gpu_A100" config="gpu_nvidia noopenmp verbose" -j1 I get:

Checking dependencies for src/Tools/tabulatedFunctions.cpp
if [ ! -d "build/src/Tools" ]; then mkdir -p "build/src/Tools"; fi;
nvcc -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -D__VERSION=\"5.0-57-gc23dd350a-master\" -DOMPI_SKIP_MPICXX -std=c++14  -I/home/thmi817d/lib/hdf5_nvhpc/include -Isrc -Isrc/ElectroMagnBC -Isrc/SmileiMPI -Isrc/ParticleInjector -Isrc/DomainDecomposition -Isrc/Pusher -Isrc/Species -Isrc/Particles -Isrc/ElectroMagn -Isrc/Params -Isrc/picsar_interface -Isrc/Profiles -Isrc/Radiation -Isrc/Checkpoint -Isrc/ParticleBC -Isrc/Tools -Isrc/Field -Isrc/Collisions -Isrc/Interpolator -Isrc/ElectroMagnSolver -Isrc/MultiphotonBreitWheeler -Isrc/Ionization -Isrc/MovWindow -Isrc/Diagnostic -Isrc/Python -Isrc/Merging -Isrc/Projector -Isrc/Patch -Isrc/PartCompTime -Ibuild/src/Python -I/home/thmi817d/miniconda3/envs/smilei/include/python3.9 -I/home/thmi817d/miniconda3/envs/smilei/include/python3.9 -I/home/thmi817d/miniconda3/envs/smilei/lib/python3.9/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -O3 -g -MF"build/src/Tools/tabulatedFunctions.d" -MM -MP -MT"build/src/Tools/tabulatedFunctions.d build/src/Tools/tabulatedFunctions.o" src/Tools/tabulatedFunctions.cpp
nvcc fatal   : Unknown option '-MFbuild/src/Tools/tabulatedFunctions.d'
Checking dependencies for src/Tools/PyTools.cpp
...

My current Smilei profile looks like:

NVARCH=`uname -s`_`uname -m`; export NVARCH
NVCOMPILERS=/home/myuser/lib/nvidia/hpc_sdk; export NVCOMPILERS
MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/23.11/compilers/man; export MANPATH
PATH=$NVCOMPILERS/$NVARCH/23.11/compilers/bin:$PATH; export PATH

export PATH=$NVCOMPILERS/$NVARCH/23.11/comm_libs/mpi/bin:$PATH
export MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/23.11/comm_libs/mpi/man

export HDF5_ROOT=$HOME/lib/hdf5_nvhpc
export LD_LIBRARY_PATH=$HDF5_ROOT/lib:$LD_LIBRARY_PATH

Do you see what the issue might be? The folders 23.11/compilers and 23.11/comm_libs exists, so that part should be correct I think.

from smilei.

charlesprouveur avatar charlesprouveur commented on July 24, 2024

You installed nvhpc 23.11 which might contain cuda 11.8 and/ or cuda 12.3 . for cuda 12.3 there are current known issue that we are working on. For cuda 11.8, modifications in the code might be needed ... which is why i recommended nvhpc 23.1 which you can get there https://developer.nvidia.com/nvidia-hpc-sdk-231-downloads.

To answer your questions:

  • change SMILEICXX.DEPS to nvc++

  • The -ta=tesla:cc80 option works with nvhpc 23.1 but not nvhpc >23.4 , you would need different options, which is another reason to use the older nvhpc ( you can look at the machine file ruche_gpu2 as an example where we compiled and executed with nvhpc 23.9 and cuda 11.8, it's possible but some executables had issues so i do not recommend it at this time.

  • The "error" messages during the dependency check can be ignored, it is not an issue.

  • The rest should be fine.

from smilei.

mccoys avatar mccoys commented on July 24, 2024

In the future, we ask that for support, you should use the chatroom
https://app.element.io/#/room/!LQrdVpOJEohPSWMlmf:matrix.org

If you need more space to write your problem, use the discussions:
https://github.com/SmileiPIC/Smilei/discussions/categories/q-a

Use issues here when you want to report an actual bug or feature request

from smilei.

mccoys avatar mccoys commented on July 24, 2024

@tmiethlinger Note that the makefile has been modified to make GPU compilation easier.
See this: https://smileipic.github.io/Smilei/Use/installation.html#setup-environment-variables-for-compilation
and this: https://smileipic.github.io/Smilei/Use/installation.html#compilation-for-gpu-accelerated-nodes

from smilei.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.