Code Monkey home page Code Monkey logo

lcastar's Introduction

LCAStar: an entropy-based measure for taxonomic assignment within assembled metagenomes

Niels W. Hanson, Kishori M. Konwar, Steven J. Hallam

lca_star_logo.png

Abstract

A perennial problem in the analyses of large meta'omic datasets is the taxonomic classification of unknown reads or assembled contigs to their likely taxa of origin. Although the assembly of metagenomic samples has its difficulties, once contigs are found it is often important to classify them to a taxonomy based on their ORF annotations. The popular Lowest Common Ancestor (LCA) algorithm addresses a similar problem with ORF annotations, and it is intuitive to apply the same taxonomic annotation procedure to the annotation of contigs, a procedure we call LCA2. Inspired by Information and Voting Theory we developed an alternative statistics LCA* by viewing the taxonomic classification problem as an election among the different taxonomic annotations, and gen- eralize an algorithm to obtain a sufficiently strong majority α-majority while respecting the entropy of the taxonomic distribution and phylogeny tree-structure of the NCBI Taxonomic Database. Further, using results from order and supremacy statistics, we formulate a likelihood-ratio hypothesis test and p-value for testing the supremacy of the final reported taxonomy. In simulated metage- nomic config experiments, we emperically demonstrate that voting-based methods, majority vote and LCA*, are significantly more accurate than LCA2, and that in many cases LCA* is superior to the simple majority vote procedure. LCA* and its statistical tests have been implemented as a stand-alone Python library, and have been integrated into the latest release of the MetaPathways pipeline.

Installation

LCA* is released as as Python library, requiring Python 2.6 or greater. More installation and useage information can be found on the wiki.

Contents

  • Compute_LCAStar.py: Driver script for running LCAStar.py

    • Usage:
    python Compute_LCAStar.py -i blast_results/refseq.*.parsed.txt \
                              -m preprocessed/*.mapping.txt \
                              --ncbi_tree resources/ncbi_taxonomy_tree.txt \
                              --ncbi_megan_map resources/ncbi.map \
                              -a \
                              -v \
                              --contig_taxa_ref ...contigmap.txt \
                              -o LCAStar.output.txt
    

    where,

    • -i: is a MetaPathways parsed.txt annotation file
    • -m: is a MetaPathways mapping file .mapping.txt
    • --ncbi_tree: the MetaPathways ncbi_taxonomy_tree.txt
    • -a: computes all methods Majority, LCAStar, and LCA^2
    • -v: verbose mode
    • --contig_taxa_ref: file specificing the original taxonomy of input contigs as a tab-delimited file -
    • -o: output text file
  • lca_star_analysis/: contains analysis code for the validation experiments found in the text. The main RMarkdown document can be found here.

  • python_resources/: contains the LCAStar Python library as well as other Python libraries required to perform the analysis.

  • resources/: other resource files required for the analysis

Downloads

Some required files are too large to fit into a GitHub repository and can be found at the following links:

  • lca_star_data.zip: contains MetaPathways output and NCBI genome files used for the validation experiments

lcastar's People

Contributors

nielshanson avatar cmorganl avatar kishori82 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.