Code Monkey home page Code Monkey logo

pigeon's Introduction

Pigeon

Introduction

Tool for pipeing inputs and outputs of multiple cli tools. Pigeon takes in only a config file as input. Everything required to run the pipeline are specified in config file. The config file is specified according to python configparser.

Quick Install

Linux&Mac

sudo pip3 install --index-url https://test.pypi.org/simple/ pigeon

Windows

pip install --index-url https://test.pypi.org/simple/ pigeon

Resources for NGS

None of the tools or data files are supplemented by pigeon so they need to be downloaded. For example configuration file, exome sequencing pipeline,

How to use

Create yourself a configuration file

pigeon createconfig

Modify for your analysis. (See below.)

pigeon -c my_config.conf -d

If everything looks alright run for real.

pigeon -c my_config.conf

Config File

Config file consists of three parts.

  • General
  • Pipeline
  • Individual tool blocks

General

Area used to define project name, output directory, input files, and resource files like reference genome or target file. Following variables are necessary for run.

Required:

  • project_name : name of your project

  • output_dir : where to write output files

  • input_files : input files for analysis, space separated, pairs should be next to each other. e.g.

    input_files = A.txt B.txt C.txt

    or for paired

    input_files = A1.txt A2.txt B1.txt B2.txt C1.txt C2.txt

Optional variable can also be decleared here. Based on your or tool requirements. Later these variables can be called in the config file using ${GENERAL:optional_variable}.

Optional(example):

reference_genome = /path/to/my/reference_genome.fa

bed_file = /path/to/my/target.bed

known_snp = /path/to/my/snp.vcf

my_database = /path/to/my/favorite.db

Pipeline

This area should contain paths to tools that is understanble by your shell. As well as the run order of tools. e.g.

pipeline = job1 job2 job3

A = path/to/A

B = path/to/B

C = path/to/C

Tool Blocks

Name of the block should be same as in pipeline. By continuing example above;

[job1]

[job2]

[job3]

Arguments that can be used in these blocks as follows:

Run Args

  • tool: tool variable from pipeline block. e.g.

    tool = ${PIPELINE:A}

  • sub_tool: if tool has a sub tool like 'bwa mem'. e.g.

    sub_tool = mem

  • args: arguments of the tools

  • java: if tool is a jar file add java -jar before it.

  • pass: if True it won't run the block. But the block still be part of the pipeline. This option is helpful for resuming interrupted pipeline.

Input Args

  • input_from: Name of the block that that's output is this jobs input. First jobs input_from should be input_files.

  • input_multi: can be 'paired' or 'all'. Paired option splits input files stream into groups of two. All option uses all of the input files.

  • input_flag_repeat: If tool requires input flag for each input this command will add given flag before each input.

  • secondary_in_placeholder

Output Args

  • suffix: add suffix to output file name

  • ext: file extension of the output

  • dump_dir: creates a directory and outputs there.

  • paired_output: this option will pair the input and the output of the tool.

  • secondary_out_placeholder

  • secondary_suffix

  • secondary_ext

  • secondary_dump_dir

Placeholders

These are joker words that can be used in args.

  • input_placeholder

  • secondary_input_placeholder

  • output_placeholder

  • secondary_output_placeholder

Example Config

[[GENERAL]]
project_name = my_project
output_dir = /path/to/output_directory
input_files = A.txt B.txt C.txt
my_db = /path/to/my.db

[[PIPELINE]]
pipeline = job1 job2 job3
A = /path/to/A
B = /path/to/B
C = /path/to/C

[job1]
tool = A
input_from = input_files
args = -i input_placeholder -o output_placeholder
suffix = job1_A
ext = txt

[job2]
tool = B
input_from = job1
args = -i input_placeholder -o output_placeholder
suffix = job2_B
ext = txt

[job3]
tool = C
input_from = job2
args = -i input_placeholder -o output_placeholder
suffix = job3_C
ext = txt

pigeon's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.