Code Monkey home page Code Monkey logo

bacana's Introduction

### ----------------------------------------------------------------------------
### How to use the pipeline
### ----------------------------------------------------------------------------

--- Command to run -------------------------------------------------------------
First, log into a Sanger machine with /software mounted (e.g. pcs4) as yourself
Then type:

> bash
> source /nfs/users/nfs_a/ap12/git/bacana/setup_annotation_environment.sh
> run_annotation_pipeline.sh -d /lustre/scratch101/sanger/ap12/ann_pipeline/ -n Ashahii_WAL8301 -f /nfs/users/nfs_a/ap12/Ashahii_WAL8301_merged.fasta

--- Usage ----------------------------------------------------------------------
The fasta sequence file provided must contain only one record.

 Usage: run_annotation_pipeline.sh -d DEST_DIR -n COMMON_NAME -f FASTA_FILE 

 Options:
 
 -d DEST_DIR  
    Directory where the output of the pipeline will be written to
    e.g. /lustre/scratch101/sanger/ap12/ann_pipeline/

 -n COMMON_NAME     
    Unique common NAME 
    e.g. Ashahii_WAL8301 

 -f FASTA_FILE     
    Path to the fasta FILE 
    e.g. /nfs/pathogen/metahit/refs/Alistipez/shahii_WAL8301/improved/Alistipez_shahii_WAL8301.fasta

The annotation pipeline run the gene finding and gene function steps.

--- Information about completed, failed and todo jobs --------------------------
> run-pipeline -c /lustre/scratch101/sanger/ap12/ann_pipeline/annotation_pipeline.conf --done
> run-pipeline -c /lustre/scratch101/sanger/ap12/ann_pipeline/annotation_pipeline.conf --failed
> run-pipeline -c /lustre/scratch101/sanger/ap12/ann_pipeline/annotation_pipeline.conf --todo

### ----------------------------------------------------------------------------
### How to find the results
### ----------------------------------------------------------------------------

e.g. /lustre/scratch101/sanger/ap12/ann_pipeline/Ashahii_WAL8301/results/

Each step creates its own sub-directory under $DEST_DIR/$COMMON_NAME/. 

Final results are found under sub-directory:
$DEST_DIR/$COMMON_NAME/results/

### ----------------------------------------------------------------------------
### Which steps are being run?
### ----------------------------------------------------------------------------

--- Gene Prediction ------------------------------------------------------------
Task	 	 Software name	 Pipeline action	 Dev status
CDSs	         Glimmer3	 glimmer	         running
CDSs	         Prodigal	 prodigal	         running
rRNA	         RNAmmer	 rnammer	         running
tRNA	         tRNAscan	 trnascan	         running
Repeats	         RepeatScout	 repeatscout	         running
Genomic islands	 AlienHunter	 alienhunter	         running
Pseudo genes	 BlastX	         blastx	                 in progress
ncRNA	         Rfamscan	 rfamscan	         in progress
Merge CDSs	 gfind_merger	 results	         running

--- Gene Annotation ------------------------------------------------------------
Task	         Software name	 Pipeline action	 Dev status
Protein domains	 Pfamscan	 pfamscan	         in progress
Similarities	 BlastP	         blastp	                 in progress
Hydro. features	 SignalP	 signalp	         in progress
Hydro. features	 TMHMM	         tmhmm	                 in progress
Structure	 HTH	         hth     	         in progress
Protein domains	 Prosite	 prosite                 to implement	
Orthologs	 HAMAP		 hamap                   to implement	
Merge features                   results                 in progress


### ----------------------------------------------------------------------------
### How to get the code base
### ----------------------------------------------------------------------------

> git clone [email protected]:pajanne/bacana.git

--- checkout of the pipeline code ----------------------------------------------
/nfs/users/nfs_a/ap12/git/bacana/

### ----------------------------------------------------------------------------
### Environment variables
### ----------------------------------------------------------------------------

see https://github.com/pajanne/bacana/blob/master/setup_annotation_environment.sh

A special EMBOSS installation is required for running merge_sequence.sh, the operating system has been upgraded and /usr/lib/libpq.so.4 has been replaced by libpq.so.5.
/software/pathogen/external/applications/EMBOSS-6.3.1-no-postgres/bin/

### ----------------------------------------------------------------------------
### Configuration files
### ----------------------------------------------------------------------------

Configuration files are generated using generate_config_file.py.

 Usage: generate_config_file.py -n NAME -f FILE -r ROOT
 
 Options:
  -h, --help  show this help message and exit
  -n NAME     Unique common NAME (e.g. Bpseudomallei_K96243)
  -f FILE     Path to the fasta FILE (e.g. /lustre/scratch103/sanger/ap12/test
              _data/Burkholderia_pseudomallei_K96243.fna)
  -r ROOT     ROOT directory where the output of pipeline will be written to
              (e.g. /lustre/scratch101/sanger/ap12/ann_pipeline/)
  -c CONF     CONFiguration file name (e.g. annotation_pipeline.conf)

--- One example ----------------------------------------------------------------
@ /lustre/scratch101/sanger/ap12/ann_pipeline/annotation_pipeline.conf
glimmer        /lustre/scratch101/sanger/ap12/ann_pipeline//conf/Ashahii_WAL8301_glimmer.conf 
prodigal       /lustre/scratch101/sanger/ap12/ann_pipeline//conf/Ashahii_WAL8301_prodigal.conf
rnammer        /lustre/scratch101/sanger/ap12/ann_pipeline//conf/Ashahii_WAL8301_rnammer.conf
trnascan       /lustre/scratch101/sanger/ap12/ann_pipeline//conf/Ashahii_WAL8301_trnascan.conf
repeatscout    /lustre/scratch101/sanger/ap12/ann_pipeline//conf/Ashahii_WAL8301_repeatscout.conf 
alienhunter    /lustre/scratch101/sanger/ap12/ann_pipeline//conf/Ashahii_WAL8301_alienhunter.conf
results	       /lustre/scratch101/sanger/ap12/ann_pipeline//conf/Ashahii_WAL8301_results.conf

--- Example of Glimmer configuration file --------------------------------------
@ /lustre/scratch101/sanger/ap12/ann_pipeline//conf/Ashahii_WAL8301_glimmer.conf
root    => '/lustre/scratch101/sanger/ap12/ann_pipeline//Ashahii_WAL8301',
module  => 'Pathogens::Annotate::Glimmer',
prefix  => '_',
log	=> '/lustre/scratch101/sanger/ap12/ann_pipeline//log/Ashahii_WAL8301.log',

data => {
   fasta => '/lustre/scratch101/sanger/ap12/ann_pipeline//Ashahii_WAL8301.fasta',
   common_name => 'Ashahii_WAL8301',
},

### ----------------------------------------------------------------------------
### How to add a new step to the pipeline
### ----------------------------------------------------------------------------

--- Create module --------------------------------------------------------------
Add an extra module into modules/Pathogens/Annotate/ using existing one as an example (e.g. https://github.com/pajanne/bacana/blob/master/modules/Pathogens/Annotate/Glimmer.pm)

--- Generate config file -------------------------------------------------------
Add module name into GFIND_STEPS = ['Glimmer', 'Prodigal', 'Rnammer', 'Trnascan', 'RepeatScout', 'AlienHunter', 'Results']
in scripts/generate_config_file.py

--- Merge results --------------------------------------------------------------

bacana's People

Contributors

pajanne avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.