Code Monkey home page Code Monkey logo

sw's Introduction

IMPORTANT: Public export toward this repository IN PROGRESS, some files will be missing until this operation is completed and this banner disappears

By ULHPC Licence GitHub issues Github GitHub forks

  _   _ _       _   _ ____   ____   ____  _____ ____ ___ _____   _____  ___
 | | | | |     | | | |  _ \ / ___| |  _ \| ____/ ___|_ _|  ___| |___ / / _ \
 | | | | |     | |_| | |_) | |     | |_) |  _| \___ \| || |_      |_ \| | | |
 | |_| | |___  |  _  |  __/| |___  |  _ <| |___ ___) | ||  _|    ___) | |_| |
  \___/|_____| |_| |_|_|    \____| |_| \_\_____|____/___|_|     |____(_)___/

   Copyright (c) 2020-2021 UL HPC Team <[email protected]>

User Software Management for Uni.lu HPC Facility based on the RESIF 3.0 framework

This is the Public repository exposing the main scripts, concepts and documentation facilitating the dissemination of our concepts.

Accepted paper describing RESIF3 concepts and architecture was presented during ACM PEARC'21 [1] -- doi | orbilu

[1] S. Varrette, E. Kieffer, F. Pinel, E. Krishnasamy, S. Peter, H. Cartiaux, and X. Besseron, "RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility", in ACM Practice and Experience in Advanced Research Computing (PEARC'21), Virtual Event, 2021.

BibTex entry

  ```bibtex
  @InProceedings{VKPKPCB_PEARC21,
    author =       {S. Varrette and E. Kieffer and F. Pinel and E. Krishnasamy and S. Peter and H. Cartiaux and X. Besseron},
    title =        {{RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility}},
    booktitle =    {ACM Practice and Experience in Advanced Research Computing (PEARC'21)},
    year =         {2021},
    publisher =    {Association for Computing Machinery (ACM)},
    OPTseries =    {{PEARC'21}},
    month =        {July},
    address =      {Virtual Event},
    doi =          {10.1145/3437359.3465600},
    isbn =         {978-1-4503-8292-2/21/07},
    url =          {https://pearc.acm.org/pearc21/},
  },
  ```

Installation / Repository Setup

To clone this repository, proceed as follows (adapt accordingly):

$ mkdir -p ~/git/github.com/ULHPC
$ cd ~/git/github.com/ULHPC
$ git clone https://github.com/ULHPC/sw.git

/!\ IMPORTANT: Once cloned and after following Preliminaries guidelines (LMod install, virtualenv setup, Github token etc. -- see docs/setup.md and docs/contributing/setup-github-integration.md), initiate your local copy of the repository by running:

$ cd ~/git/github.com/ULHPC/sw
$ make setup

Post setup checks (laptop)

From that point, you should be able to load Easybuild from your laptop, and all the following commands should succeed:

### On your laptop
source settings/default.sh
eb --version
eb --show-config
# check that you are able to interact/update the ULHPC fork copy
make fork-easyconfigs-update
# check that you can interact with github
eb --check-github   # All checks PASSed!

Post setup checks (on supercomputer login node)

When repeating the setup on the cluster, you can check that you are ready if the following commands should succeed:

### On iris, this will go on a broadwell node
./scripts/get-interactive-job
source settings/${ULHPC_CLUSTER}.sh
# enable SSH agent
eval "$(ssh-agent)"
ssh-add ~/.ssh/id_rsa

eb --version
eb --show-config
# check that you are able to interact/update the ULHPC fork copy
make fork-easyconfigs-update
# check that you can interact with github
eb --check-github   # ONLY new-pr and update-pr should FAIL
# reason is that most probably you don't want the ssh key on the cluster authorized
# to push on ULHPC fork
eval "$(ssh-agent -k)"  # ONLY new-pr and update-pr should FAIL
# reason is that most probably you don't want the ssh key on the cluster authorized
# to push on ULHPC fork

Documentation

See docs/.

The documentation for this project is handled by mkdocs. You might wish to generate locally the docs:

  • Install mkdocs
  • Preview your documentation from the project root by running mkdocs serve and visit with your favorite browser the URL http://localhost:8000
    • Alternatively, you can run make doc at the root of the repository.
  • (eventually) build the full documentation locally (in the site/ directory) by running mkdocs build.

Software set organizations

See docs/swsets.md.

Software sets holds a categorised list of software, defined as Module Bundle for the ULHPC environment holding the dependencies of hierarchical bundles structured under easyconfigs/u as follows:

├── ULHPC/[...]-<version>.eb    #### === Default global bundle for 'regular' nodes ===
│   ├── ULHPC-toolchains/[...]-<version>.eb ### Toolchains, compilers, debuggers, programming languages...
│   ├── ULHPC-bio/[...]-<version>.eb        ### Bioinformatics, biology and biomedical
│   ├── ULHPC-cs/[...]-<version>.eb         ### Computational science, including:
│   └── [...]
└── ULHPC-gpu/[...]-<version>.eb #### === Specific GPU versions compiled under {foss,intel}cuda toolchains ===

See easyconfigs/u/ULHPC*

A strong versioning policy is enforced, which fix the core component versions of the bundles.

User Software builds

Slurm launchers are provided under scripts/ to facilitate software builds.

/!\ IMPORTANT RESIF 3 supports 3 operation modes depicted below:

Operation Mode Architecture Launcher script
Easybuild bootstrap/update * setup.sh
Home/Testing builds default [sbatch] ./scripts/[<version>]/launcher-test-build-cpu.sh
CPU non-default [sbatch] ./scripts/[<version>]/launcher-test-build-cpu-<arch>.sh
GPU optimized [sbatch] ./scripts/[<version>]/launcher-test-build-gpu.sh
Production <version> builds default [sbatch] ./scripts/prod/launcher-resif-prod-build-cpu.sh -v <version>
CPU non-default [sbatch] ./scripts/prod/launcher-resif-prod-build-cpu-<arch>.sh -v <version>
GPU optimized [sbatch] ./scripts/prod/launcher-resif-prod-build-gpu.sh -v <version>

See docs/build.md for more details

Note: for convenience, a GNU screen configuration file config/screenrc is provided to quickly bootstrap the appropriate tabs:

screen -c config/screenrc
# 'SW' tab meant for git / sync operations. To enable the ssh agent:
#    eval "$(ssh-agent)"
#    ssh-add ~/.ssh/id_rsa
#    make up
#    make fork-easyconfigs-update
# 'broadwell' tab for associated build. Ex interactive job:
#    ./scripts/get-interactive-job
# 'skylake' tab for associated build. Ex interactive job:
#    ./scripts/get-interactive-job-skylake
# 'gpu' tab for associated build. Ex interactive job:
#    ./scripts/get-interactive-job-gpu
# 'epyc' tab  for aion builds
#    ssh aion
#    ./scripts/get-interactive-job

Don't forget to kill your ssh agent when you have finish: eval "$(ssh-agent -k)"

In all cases, production builds MUST be run as resif using the launcher scripts under scripts/prod/*. Software and modules will be installed in that case under /opt/apps/resif ($LOCAL_RESIF_ROOT_DIR) -- See Technical Docs. You MUST BE VERY CAREFUL when running these scripts as they alter the production environment.

The final organizations of the software and modules is depicted below:

Workflow and ULHPC Bundle Development guidelines

You first need to review the expected Git workflow

To add a new software to one of the ULHPC bundle module, you need to find and eventually adapt an existing Easyconfig file. Searching such files can be done using either eb -S <pattern>, or via the provided script ./scripts/suggest-easyconfigs [-v <version>] <pattern> which

  1. search for Easyconfigs matching the proposed pattern, sorted by increasing version (sort -V)
  2. check among those easyconfigs is any would be available for the target toolchain as that's probably the one you should use
  3. suggest a single easyconfig to try (most recent version)

Example:

$> ./scripts/suggest-easyconfigs -h
$> ./scripts/suggest-easyconfigs PAPI
=> Searching Easyconfigs matching pattern 'PAPI'
PAPI-5.4.3-foss-2016a.eb
PAPI-5.5.1-GCCcore-6.3.0.eb
PAPI-5.5.1-GCCcore-6.4.0.eb
PAPI-5.6.0-GCCcore-6.4.0.eb
PAPI-5.7.0-GCCcore-7.3.0.eb
PAPI-5.7.0-GCCcore-8.2.0.eb
PAPI-6.0.0-GCCcore-8.3.0.eb
Total:        7 entries

... potential exact match for 2019b toolchain
PAPI-6.0.0-GCCcore-8.3.0.eb
 --> suggesting 'PAPI-6.0.0-GCCcore-8.3.0.eb'

See also docs/workflow.md for more details.

Submitting working Easyconfigs to easybuilders

See docs/contributing/

To limit the explosion of custom easyconfigs as was done in the past, the key objective of this project is to minimize the number of custom easyconfigs to the strict minimum and thus to submit a maximum of easyconfigs to the community for integration in the official easybuilders/easybuild-easyconfigs repository. A set of helper scripts are provided to facilitate this operation -- Typical workflow:

# Creating a new pull requests (typically ion your laptop)
./scripts/PR-create -n easyconfigs/<letter>/<software>/<filename>.eb    # Dry-run
./scripts/PR-create easyconfigs/<letter>/<software>/<filename>.eb
# Complete it with a successfull test report ON IRIS/AION
sbatch ./scripts/PR-rebuild-upload-test-report.sh <ID>

# (eventually) Update/complete the pull-request with new version/additional EB files
eb --update-pr <ID> <file>.eb --pr-commit-msg "<message>" # use native easybuild command here
#  Update your local easyconfigs from remote PR commits
./scripts/update-from-PR [-n] <ID>

# Repo cleanup upon merged pull-request
./scripts/PR-close [-n] <ID>

Issues / Feature request

You can submit bug / issues / feature request using the ULHPC/sw Project Tracker

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.