Code Monkey home page Code Monkey logo

pipelines's Introduction

This is the central repository of public MGX pipelines.

Field description:

  • Name: Short pipeline name
  • Description: Pipeline description
  • Author: Pipeline maintainer
  • URL: Web site with additional pipeline information
  • Version: Pipeline version
  • File: Conveyor workflow for read-based analysis
  • File2: Conveyor workflow for gene annotation

All pipelines should use 0-based coordinates when creating sequence observations.

Taxonomic assignment

Name Description Author URL version file file2
MGX default taxonomy MGX default taxonomic classification based on Kraken and Diamond vs. RefSeq proteins Sebastian Jaenicke - 1.0 mgx_default_taxonomy.xml
Centrifuge Centrifuge: rapid and sensitive classification of metagenomic sequences Sebastian Jaenicke - 1.0 centrifuge.xml mgxgene_centrifuge.xml
Kaiju Kaiju: Fast and sensitive taxonomic classification for metagenomics Sebastian Jaenicke - 1.0 kaiju.xml mgxgene_kaiju.xml
Kraken Kraken: ultrafast metagenomic sequence classification using exact alignments Sebastian Jaenicke - 1.0 kraken.xml
KrakenUniq Taxonomic sequence classification with KrakenUniq Sebastian Jaenicke - 1.0 krakenuniq.xml
Kraken 2 Kraken: ultrafast metagenomic sequence classification using exact alignments Sebastian Jaenicke - 1.0 kraken2.xml mgxgene_kraken2.xml
MetaPhlAn 4 MetaPhlAn 4 marker-based taxonomic classification Sebastian Jaenicke - 1.0 metaphlan4.xml
16S Pipeline Classification of 16S rRNA fragments Sebastian Jaenicke 1.0 rdp_pipeline.xml

Functional profiling

Name Description Author URL version file file2
COG COG-based functional classification Sebastian Jaenicke - 1.0 eggnog.xml mgxgene_eggnog.xml
SwissProt EC numbers EC number annotation based on best-Blast-hit vs. SwissProt database Sebastian Jaenicke - 1.0 ecnumber.xml mgxgene_ecnumber.xml
FunGene HMM search HMM search vs. FunGene functional genes Sebastian Jaenicke - 1.0 fungene.xml mgxgene_fungene.xml
dbCAN Automated Carbohydrate-active enzyme annotation based on dbCAN Sebastian Jaenicke - 1.0 dbCAN_besthit.xml mgxgene_dbcan.xml
ClusterMine360 PKS/NRPS analysis based on BLAST vs ClusterMine360 database Sebastian Jaenicke - 1.0 clustermine360.xml mgxgene_clustermine360.xml
Pfam Annotate best Pfam domain hit Sebastian Jaenicke - 1.0 pfam_besthit.xml mgxgene_pfam.xml
TIGRFAMS Annotate best TIGRFAMS domain hit Sebastian Jaenicke - 1.0 tigrfam_besthit.xml mgxgene_tigrfam.xml
PGAP Annotate best NCBI PGAP HMM hit Sebastian Jaenicke - 1.0 pgap.xml mgxgene_pgap.xml
MIBiG MIBiG secondary metabolite screening Sebastian Jaenicke - 1.0 mibig.xml mgxgene_mibig.xml
KOfam KO number annotation Sebastian Jaenicke - 1.0 kofam.xml mgxgene_kofam.xml
UniRef50 UniRef50 annotation Sebastian Jaenicke - 1.0 uniref50.xml mgxgene_uniref50.xml
VOGDB Virus orthologous groups Sebastian Jaenicke - 1.0 - mgxgene_vogdb.xml

Antimicrobial resistance

Name Description Author URL version file file2
ARDB Antibiotic resistance gene annotation using best Blast hit vs. ARDB database Sebastian Jaenicke - 1.0 ardb.xml mgxgene_ardb.xml
ARG-Annot Antibiotic resistance gene annotation using best Blast hit vs. ARG-ANNOT database Sebastian Jaenicke - 1.0 argannot.xml
BacMet Antibacterial biocide- and metal-resistance gene annotation using best Blast hit vs. BacMet database Sebastian Jaenicke - 1.0 bacmet.xml mgxgene_bacmet.xml
CARD Antibiotic resistance gene screening using Blast vs CARD (Comprehensive antibiotic resistance database) Sebastian Jaenicke - 1.0 card.xml mgxgene_card.xml
MVirDB MvirDB-based virulence analysis Sebastian Jaenicke - 1.0 mvirdb.xml mgxgene_mvirdb.xml
AMRFinder+ NCBI AMRFinder+ Sebastian Jaenicke - 1.0 - mgxgene_amrfinder.xml

Reference mapping

Name Description Author URL version file
FR-HIT Fragment recruitment based on NCBI Magic-BLAST Sebastian Jaenicke - 1.0 blastn_refmap.xml
Bowtie2 Bowtie2-based reference mapping Sebastian Jaenicke - 1.0 bowtie_refmap.xml
Blast-Mapping Fragment recruitment employing FR-HIT Sebastian Jaenicke - 1.0 frhit_refmap.xml

Amplicon analysis

Name Description Author URL version file
16S-Amplicons Bergey RDP-based taxonomic assignment for 16S amplicons Burkhard Linke - 1.0 rdp_amplicons.xml

Misc

Name Description Author URL version file
Sebastian Jaenicke - 1.0 besthit_aa.xml
Sebastian Jaenicke - 1.0 discard_rRNAs.xml
Sebastian Jaenicke - 1.0 ecolifilter.xml
Sebastian Jaenicke - 1.0 phageprotscreen.xml
Sebastian Jaenicke - 1.0 phagescreen.xml
Sebastian Jaenicke - 1.0 pks.xml
Sebastian Jaenicke - 1.0 plasmidprotscreen.xml
Sebastian Jaenicke - 1.0 referencefilter.xml

pipelines's People

Contributors

be-el avatar pblumenkamp avatar sjaenick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

oyetibo

pipelines's Issues

MvirDB: switch to ghostz

MvirDB annotation pipelines still use BLAST; as the database is sufficiently small,
GHOSTZ should be used instead.

Affected files:

mgxgene_mvirdb.xml
mvirdb.xml

Swissprot EC: slow GetSubject

In gitlab by @sjaenick on Feb 28, 2017, 22:31

data stalling at input of GetSubject node

id: 147, type GetSubject`2, name: , state: CONNECTED, time: 2/28/2017 10:24:48 PM
        name: input, state: FINISHED, passed: 1567718, pending: 1713608
        name: output, state: CONNECTED, passed: 1567717, pending: 0
--
id: 147, type GetSubject`2, name: , state: CONNECTED, time: 2/28/2017 10:26:48 PM
        name: input, state: FINISHED, passed: 1569221, pending: 1712105
        name: output, state: CONNECTED, passed: 1569220, pending: 0
--
id: 147, type GetSubject`2, name: , state: CONNECTED, time: 2/28/2017 10:28:48 PM
        name: input, state: FINISHED, passed: 1570891, pending: 1710435
        name: output, state: CONNECTED, passed: 1570890, pending: 0
--
id: 147, type GetSubject`2, name: , state: CONNECTED, time: 2/28/2017 10:30:48 PM
        name: input, state: FINISHED, passed: 1573242, pending: 1708084
        name: output, state: CONNECTED, passed: 1573241, pending: 0

16S Pipeline needs support for amplicon / mikrobiome data

In gitlab by @burkhard.linke on Mar 12, 2015, 14:35

Performance with amplicon data

Amplicon reads mostly consists of 16S fragments. The blast filter in the default 16S pipeline is not necessary for this kind of data and consume too much resources (time, compute).

Proposed solution

Since the blast filter acts on the steam of reads and filters out reads, it should be possible to use a Conditional node to enable/disable the filter itself. For amplicon data the filter should be disable and all
reads be passed to the rdp classifier itself

Compatibility

The workflow has to be modified in a way that keeps the standard behaviour (active blast filter) as default. A workflow configuration options should be implemented and documented.

Open questions

  • Automatically deactivate the filter if an amplicon dataset is analysed? Is the sample / extraction information available in Conveyor?

Add phiX-Filter

In gitlab by @sjaenick on Nov 26, 2015, 17:57

phiX residue still present in many datasets provided by commercial companies; need to implement dedicated pipeline that will set the discard flag for phiX matches and remove all existing annotations (including attrcount)

E.coli-Filter broken?

In gitlab by @sjaenick on Nov 26, 2015, 17:54

got exception during validation in node Id:16 () : Conveyor.Core.MissingLinkException: link endpoint jobIn not connected in node Id:16 () (type Conveyor.MGX.MarkDiscard)
at Conveyor.Core.Node.Verify () [0x00027] in /homes/sjaenick/TEST/conveyor/Conveyor.Core/Conveyor.Core/Node.cs:202

Pipeline coordinates

In gitlab by @sjaenick on Mar 8, 2015, 19:02

Make sure all pipelines use 0-based coordinates for start/stop.

Checked and fixed:

  • GC
  • readlength
  • besthit-AA
  • MvirDB
  • MetaCV
  • Pfam/TIGRFAMS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.