Code Monkey home page Code Monkey logo

usinggrid5000's Introduction

Guide to use Grid5000 (Chroma Team @ Inria)

Grid'5000 is a large-scale and versatile testbed for experiment-driven research in all areas of computer science, with a focus on parallel and distributed computing including Cloud, HPC and Big Data. Grid5000's website

This guide is mainly focused on how to use Grid5000 as an alternative to processing when hardware like GPUs are not available for local use.

First step: Get an account.

There are two main types of accounts:

  • Academics from France: Those currently working on any research project in France or Academics abroad working on a collaboration with academics in France (The former need to ask to their french collaborators for details).
  • Open Access Program: People who are not on a collaboration can request a lower priority account. Private companies interested need to contact Gird5000's executive committee members. For this part you will need to give your ssh public key. If you have not generated one, follow this tutorial to generate one

Second step: Choose a cluster to work in

This is a list of all the hardware available on Grid5000. Check the list to know which cluster better suits your needs. At the moment of creation of this guide, these were the clusters with CUDA capable GPUs:

Site Cluster Available GPUs
Lille chifflet Nvidia GTX 1080Ti x 2
Lille chifflot Nvidia Tesla P100 x 2 and Nvidia Tesla V100 x 2
Lyon orion Nvidia Tesla M2075
Nancy graphique Nvidia Titan Black x 2 and Nvidia GTX 980 x 2
Nancy grele Nvidia GTX 1080Ti x 2
Nancy grimani Nvidia Tesla K40M

Once you have chosen a cluster, you can log in into your account via: ssh [email protected], to then ssh to the site that has the cluster you want to work in, e.g. ssh nancy / ssh lille / ssh lyon. Now you should be able to access your home directory on any of the Grid5000's clusters.

Third step: Set up your work environment

Install Miniconda

The steps listed here are based on this tutorial by the user Ibada. This guide only covers the setting up process with Anaconda, Miniconda specifically since is lighter. All commands are executed from the user's home directory.

First, download Miniconda depending on the version of Python you will be working with. If you are working with Python 2.7, change the version of Miniconda to 2 instead of 3.

For Python 3.7:

user@site:~$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

Then, run it:

user@site:~$ bash Miniconda3-latest-Linux-x86_64.sh

Here, the instalation guide will prompt you to choose the path where Miniconda will be installed. Also, it will prompt you to choose if the conda environment starts with bash (default is no).

Finally, just copy the .bashrc script available on this repository and change user for your username on lines 112, 116, 117, 120. It contains many useful features to personalize your bash experience, but more importantly if you chose the default option for conda to be disabled, sourcing this script allows you to activate the conda environment.

user@site:~$ source .bashrc
(base) user@site:~$ #Environment now 

Create the virtual environment and Install the needed libraries

With Miniconda set up, you can now create environments for your projects via:

conda create --name env

And install conda supported libraries and packages via Anaconda Cloud, Conda Forge or any other channel you want.

Fourth step: Ask for compute time on the GPU clusters

You can check here the status of each of the clusters' availability, to see if your desired hardware is busy or not.

These bash scripts facilitate the process of asking for jobs. Both are mainly using on the oarsub commands and using the default queue, check the hardware wiki page to which queue the GPUs you are going to use are in:

  • ask_for_job_fixed_time.sh has a fixed job time, can be used to quickly test if the environment recognizes the cluster's GPUs
  • ask_for_job_input_time.sh lets you input time as an argument in the format hh:mm:ss. Can be used e.g. when you have an estimated train time for a network.

For example, using the ask_for_job_input_time.sh:

user@flille:~$ bash ask_for_job_scripts/ask_for_job2.sh 00:05:00
 Remember to source bashrc!
 Remember to activate the conda env!
[ADMISSION RULE] Modify resource description with type constraints
[ADMISSION_RULE] Resources properties : \{'property' => 'type = \'default\'','resources' => [{'resource' => 'host','value' => '1}]}
[ADMISSION RULE] Job properties : (GPU <> 'NO') AND maintenance = 'NO'
Generate a job key...
OAR_JOB_ID=1681786
Interactive mode: waiting...
Starting...

Connect to OAR job 1681786 via the node chifflet-6.lille.grid5000.fr
user@chifflet-6:~$ source .bashrc 
(base) user@chifflet-6:~$ conda activate pytorch_env
(pytorch_env) user@chifflet-6:~$ python pytorch_probe_gpus.py
GeForce GTX 1080 Ti detected on device 0
GeForce GTX 1080 Ti detected on device 1
(pytorch_env) user@chifflet-6:~$ #GPUs detected!

Once you are in a job, you can use the available hardware on that specific cluster for your computations.

Other useful commands

To transfer a file from the machines to your local PC via secure copy:

user@localPC:~$ scp [email protected]:site/path_from_home/file.py /home/user/directory/file.py #for single files
user@localPC:~$ scp -r [email protected]:site/path_from_home/directory /home/user/directory/ #for directories

To transfer a file from your PC to a cluster via secure copy:

user@localPC:~$ scp /home/user/directory/file.py [email protected]:site/path_from_home/file.py #for single files
user@localPC:~$ scp -r /home/user/directory/ [email protected]:site/path_from_home/directory #for directories

Commands to check or delete jobs:

user@site:~$ oarstat -u #check if you have any jobs running on this site and the state of them
user@site:~$ oardel JOB_ID #delete any job you no longer need by giving the JOB_ID number

For more in-depth usage of Grid5000 for Deep Learning, Check Ibada's tutorial

usinggrid5000's People

Contributors

manueldiaz96 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.