Code Monkey home page Code Monkey logo

exps-launcher's Introduction

Experiments launcher for python scripts (slurm and local)

Handle batch job submissions with ease, both on remote clusters using slurm or locally in background.

Features tracking

  • Automatic wandb group naming from config .yaml files
  • Automatic sweep over hyperparameters (sweep.foo=[1,2,3] sweep.bar=["alice","bob"])
  • Slurm jobs submission
    • common config files for different scripts
    • additive host.time for config files
    • wandb_group_suffix for default wandb name
    • test run on local machine with exps.test=true
    • host.timefactor : multiply time by this factor for a host in particular
    • ignore host retrieval when exps.test=true (since it does not matter)
  • non-slurm background scripts submission on local machine
  • handle cpu cores constraints for local non-slurm scripts

Installation

git clone <this repo>
cd exps_launcher
pip install -r requirements.txt
pip install .

Getting Started

Create the exps_launcher_configs directory inside your project directory, following the structure below. Then, copy the launch_exps.py file inside your project directory.

  1. export EXPS_HOSTNAME=<customhostname>
  2. Project structure:
.
└── exps_launcher_configs/
    ├── config.yaml              # exps_launcher configs (can be overwritten by cli args exps.<params>)
    ├── host/
    │   ├── default.yaml         # default params for all hosts
    │   ├── host2.yaml           # host-specific sbatch parameters (--partition, --project, ...)
    │   ├── host2.yaml
    │   └── ...
    ├── scripts/
    │   ├── script1/             # `python script1.py ...`
    │   │   ├── default.yaml     # default params for `script1.py`
    │   │   ├── test.yaml        # params for testing with `script.py`
    │   │   ├── conf1.yaml       # configuration
    │   │   └── conf2.yaml
    │   └── script2/
    │       └── ...
    └── sweeps/                  # sweep configurations
        └── fiveseeds.yaml
  1. Launch launch_exps.py script:
    • python launch_exps.py script=script1 config=[conf1,conf2] sweep.config=[fiveseeds] sweep.foo=[1,10,100]

Notes:

  • subdirs of scripts/ must be named with the corresponding python script name. E.g. the example above refers to existing python scripts script1.py and script2.py in your project root dir.
  • the script parameter is mandatory, and must be a single string. A single script can be launched.
  • A sequence of configuration files can be provided with the config parameter. Priority is in decreasing order, i.e. conf2 overwrites conf1 in the example above.
  • check out the config files above (e.g. conf1.yaml and conf2.yaml) for quick examples on how to use them.
  • sweep parameters like sweep.foo=[1,10,100] can also be defined in script-specific config files (e.g. in conf1.yaml).
  • host parameters like host.time="03:00:00" can also be defined in script-specific config files (e.g. in conf1.yaml), which overwrite the host definitions. This way you can, e.g., specify different sbatch times for different scripts and their corresponding configurations, or different sbatch names (host.job-name="myscript").
  • you can pass exps.hostname to overwrite the hostname for the current experiment, e.g. to have different host configurations for the same host

Advanced commands:

  • Use exps.<param_name>=<value> for config options. Accepted params are:
    • exps.test=false [launch script in test mode, using test.yaml config]
    • exps.no_confirmation=false [skip asking for confirmation before actually launching the experiments]
    • exps.fake=false [display summary but do not actually launch the experiments]
    • exps.hostname [overwrite hostname defined in env variable EXPS_HOSTNAME]
    • exps.force_hostname_environ=true [force using env variable to define the hostname]
    • exps.group_suffix="_mySuffix" [assumes script has a --group parameter for the wandb group name]
    • exps.noslurm=false [launch script locally, instead of as a slurm job]
    • CPU usage constraints:
      • (noslurm, single script) exps.cpus-list="50,51,52"
      • (noslurm, multiple scripts) exps.cpus-start=50 exps.cpus-per-task=4

Examples

List of repositories that make use of this experiments launcher that you can use as further reference:

Troubleshooting

exps-launcher's People

Contributors

gabrieletiboni avatar

Stargazers

Florentin-Cristian Udrea avatar Stephany Ortuño Chanelo avatar Christian cancedda avatar Andrea Protopapa avatar Gabriele avatar Gabriele Berton avatar Gabriele Trivigno avatar Gabriele Goletto avatar Francesco Capuano avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.