Code Monkey home page Code Monkey logo

asub's Introduction

FAQs

What is asub used for?

asub stands for "array submission". It greatly simplifies batch job submission on LSF (or Grid Engine with limited support). Briefly, it reads mutually independent command lines from stdin or from a shell script and creates a job array with each job for one or multiple command lines in the input.

What is job array? What is the benefit?

A job array is an array of LSF/Grid Engine jobs that are submitted together and have the same JobID. It has two major advantages. Firstly, job array is convenient for a batch jobs having similar input/output. You can kill/stop/resume/modify the whole array or some jobs in the array easily by specifying, for example, JobID or JobID[10-100]. You can also specify how many jobs in the array should be run at the same time with bmod -J%10 JodID. Secondly, Tim Cutts from the Sanger Institute used to show that job arrays put less stress on the LSF scheduler. This makes array jobs submitted much faster and also reduces the burden on the entire LSF system.

Why do we need asub?

Submitting array jobs is harder. You have to learn the mechanism of job arrays and frequently need to write a bsub script that takes an array job index as the input. asub simplifies this procedure. You can easily submit a job array if you have your independent command lines kept in a file/stream (see examples below).

In addition to array submission, asub also simplifies resource requirement. You can submit a multi-threaded job without -R'span[hosts=1]', or set a memory limit without -R'rusage[mem=4096]' (and you don't need to remember 4096 is the per-task limit, not the total limit). asub can also optionally group multiple command lines into one job. This feature could be helpful if each individual command line runs too fast.

asub is easily my mostly used Perl script in nearly ten years.

Does asub support Grid Engine?

For now, asub only has limited support of Grid Engine. I used to have a better version for SGE, but have lost it. Contribution welcomed!

Does asub support SLURM?

Yes, but not as well supported as LSF.

How does asub work?

asub has two modes: submission mode and laucher mode. Endusers only need to care about the submission mode. In this mode, asub writes the input command lines into JobID.sh and generates an on-the-fly bsub script that calls asub -k ${LSB_JOBINDEX} JobID.sh to execute command lines. Here -k <lineno> puts asub in the launcher mode. Without -g, it runs the <lineno>-th command line in JobID.sh.

Basic Examples

  • Compress large FASTQ files:

    ls *.fq | xargs -i echo gzip {} | asub -j run-gzip

    The i-th job in the job array compresses the i-th file.

  • Compress large FASTQ files with parallel jobs:

    ls *.fq | xargs -i echo gzip {} | asub -g2 -q mcore

    The i-th job compresses the 2i-1 and 2i files in parallele. The example is only useful under particular settings (e.g. at HMS). Usually we would not want to do this.

  • Compress small FASTQ files by serial batching:

    ls *.fq | xargs -i echo gzip {} | asub -Gg2

    The i-th job compresses the 2i-1 and 2i files in turn. This can be used to group short processes such that the whole job takes reasonable amount of time. Too many short jobs hurt LSF performance.

asub's People

Contributors

lh3 avatar warrenson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

asub's Issues

asub bug with -R option

Specifying additional -R parameter options to asub using the open lava scheduler causes an error.

asub --help 2>&1 | grep -- '-R'
        With arguments: -P -R -j -k -q -c -w -M -n -m -W -g -C
Options: -R STR    resources string (only one -R allowed) [null]
Note: For option -R and -w, SGE and LSF have different syntax. Please
        -R "select[type==X86_64&&mem>800] rusage[mem=800]"
        -R h_cpu=86400,h_data=1000000000

As described in the documentation if you set the-R option, asub creates two -R switches causing bsub to fail.

A reproducible error is provided.

## Error
$ echo "hello world" | asub -R "rusage[mem=200]"
Invalid syntax; the -R option was used more than once.
. Job not submitted.

## Dry run - two copies of -R switch
$ echo "hello world" | asub -R "rusage[mem=200]" -p 
mkdir -p asub_27423390.out asub_27423390.err && echo '/software/OSutils/asub-2.1/asub  -g 1 -k ${LSB_JOBINDEX} asub_27423390.sh' | bsub -Jasub_27423390"[1-1]" -o asub_27423390.out/%I.out -e asub_27423390.err/%I.err -R "rusage[mem=200]" -n 1 -R "span[hosts=1]"  

@kiwiroy has written a fix for this;

master...kiwiroy:single-R

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.