Code Monkey home page Code Monkey logo

viralverify's Introduction

viralVerify: viral contig verification tool

viralVerify classifies contigs (output of metaviralSPAdes or other assemblers) as viral, non-viral or uncertain, based on gene content. Also for non-viral contigs it can optionally provide plasmid/non-plasmid classification.

viralVerify predicts genes in the contig using Prodigal in the metagenomic mode, runs hmmsearch on the predicted proteins and classifies the contig as vrial or non-viral by applying the Naive Bayes classifier (NBC). For the set of predicted HMMs, viralVerify uses trained NBC to classify this set to be viral or chromosomal.

To improve results in the case of metagenomes with possible host contamination, we recommend users to filter out reads that align to the host genome prior to assembly. Since viralVerify is based on gene classification, it can be used on contigs on any length, and short viruses can be detected as long as they contain a recognizable virus-specific gene. To help analyze the rapidly growing amount of novel data, we have added a script that allows users to construct their own training database from a set of viral, chromosomal and plasmid contigs, as well as custom HMM database

Requirements

viralVerify is a Python script, thus, installation is not required. However, it has the following dependencies:

or

  • recent release of the Pfam-A database (ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/).

To work properly, viralVerify require Prodigal and hmmsearch in your PATH environment variable.

Optional BLAST verification

You can verify your output by BLAST to check if you found novel viruses or plasmids. In this case, you need to have blastn in your $PATH, Biopython installed, and provide a path to the local copy of nt database.

Usage

./viralverify.py 
        -f Input fasta file
        -o output_directory 
        --hmm HMM  Path to HMM database

        Optional arguments:
        -h, --help  Show the help message and exit
        --db DB     Run BLAST on input contigs against provided db
        -t          Number of threads
        -p          Output predicted plasmidic contigs separately

Output file: comma-separated table <input_file>_result_table.csv

Output format: contig name, prediction result, log-likelihood ratio, list of predicted HMMs

Fasta files with prediction results can be found in the Prediction_results_fasta folder

Retraining classifier

You can retrain the classifier with your custom data using provided script training_script.py. It takes viral, chromosomal and plasmid (optionally) training sequences in fasta format and set of HMMS, predict genes and HMM hits, and returns the frequency table. To use the retrained classifier, replace the "classifier_table.txt" file in the viralVerify directory with the obtained table.

viralverify's People

Contributors

mikeraiko avatar dmitry-antipov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.