Code Monkey home page Code Monkey logo

viromeqc's Introduction

ViromeQC

Description

  • Provides an enrichment score for VLP viromes with respect to metagenomes
  • Useful benchmark for the quality of enrichment of a virome
  • Tested on Linux Ubuntu Server 16.04 LTS and on Linux Mint 19

Requires:

Update: ViromeQC now works with newer versions of diamond (e.g. v0.9.29) Thanks to Ryan Cook (@RyanCookAMR) for the new diamond db

Usage

Step 1: clone or download the repository

git clone --recurse-submodules https://github.com/SegataLab/viromeqc.git

or download the repository from the releases page

Step 2: install the database:

This steps downloads the database file. This needs to be done only the first time you run ViromeQC. This may require a few minutes, depending on your internet connection.

viromeQC.py --install

Alternatively, you can also download the database files from Zenodo. Once downloaded the files, create a folder named index/ in the ViromeQC installation folder and unzip all the files in this folder.

Step 3: Run on your sample

viromeQC.py -i <input_virome_file(s)> -o <report_file.txt>

Please Note: You can pass more than one file as input (e.g. for multiple runs or paired end reads). However, you can process only one sample at a time with this command. If you want to parallelize the execution, this can be easily done with Parallel or equivalent tools.

You can try the test example (test/test.sh) which analyzes 10'000 reads from the sample SRR829034. This should take approximately 1 or 2 minutes.

Parameters:

usage: viromeQC.py -i <input_virome_file> -o <report_file.txt>

optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT [INPUT ...]], --input [INPUT [INPUT ...]]
                        Raw Reads in FASTQ format. Supports multiple inputs
                        (plain, gz o bz2) (default: None)
  -o OUTPUT, --output OUTPUT
                        output file (default: None)
  --minlen MINLEN       Minimum Read Length allowed (default: 75)
  --minqual MINQUAL     Minimum Read Average Phred quality (default: 20)
  --bowtie2_threads BOWTIE2_THREADS
                        Number of Threads to use with Bowtie2 (default: 4)
  --diamond_threads DIAMOND_THREADS
                        Number of Threads to use with Diamond (default: 4)
  -w {human,environmental}, --enrichment_preset {human,environmental}
                        Calculate the enrichment basing on human or
                        environmental metagenomes. Defualt: human-microbiome
                        (default: human)
  --bowtie2_path BOWTIE2_PATH
                        Full path to the bowtie2 command to use, deafult
                        assumes that bowtie2 is present in the system path
                        (default: bowtie2)
  --diamond_path DIAMOND_PATH
                        Full path to the diamond command to use, deafult
                        assumes that diamond is present in the system path
                        (default: diamond)
  --version             Prints version informations (default: False)
  --install             Downloads database files (default: False)
  --sample_name SAMPLE_NAME
                        Optional label for the sample to be included in the
                        output file (default: None)
  --tempdir TEMPDIR     Temporary Directory override (default is the system
                        temp. directory) (default: None)

Pipeline structure

ViromeQC starts from FASTQ files (compressed files are supported), and will:

  1. Elimitate short and low quality reads
    • adjust the minqual and minlen parameters if you want to change the thresholds
  2. Map the reads against a curated collection of rRNAs and single-copy bacteral markers
  3. Filter the reads to remove short and dlsivergent alignments
  4. Compute the enrichment value of the sample, compared to the median observed in human metagenomes
    • use -w environmental for envronmental reads
    • reference medians for un-enriched metagenomes are taken from medians.csv, you can provide your own data to ViromeQC by changing this file accordingly
  5. Produce a report file with the alignment rates and the final enrichment score (which is the minimum enrichment observed across SSU-rRNA, LSU-rRNA and single-copy markers)

Output

Output is given as a TSV file with the following structure:

Sample Reads Reads_HQ SSU rRNA alignment (%) LSU rRNA alignment (%) Bacterial_Markers alignment (%) total enrichmnet score
your_sample.fq 40000 39479 0.00759898 0.0227969 0.01266496 5.795329
  • An alignment score of 5.8 means that the virome is 5.8 times more enriched than a comparable metagenome
  • High score (e.g. 10-50) reflect high VLP enrichment

Citation

If you find this tool useful, please cite:

Zolfo, M., Pinto, F., Asnicar, F., Manghi, P., Tett A., Segata N. Detecting contamination in viromes using ViromeQC, Nature Biotechnology 37, 1408โ€“1412 (2019)

viromeqc's People

Contributors

azufre451 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

hwl26

viromeqc's Issues

[Errno 8] Exec format error:

Hi,
I am getting the following error when I attempt to run the example file that came with the software.

% bash test.sh
Checking Database Files                                           [ -   OK  - ]
[fastq_len_filter] | 9384 / 10000 (94.0%) reads selected          [ -  DONE - ]
[SILVA_SSU]   | Bowtie2 Aligning                                  [ -  ...  - ]Fatal error running Bowtie2 on SSU rRNA. Error message:
[Errno 8] Exec format error:
'/g/scb2/bork/mocat/software/viromeqc/1.0/cmseq/cmseq/filter.py'  [ -  FAIL - ]

I am using the following versions of the dependencies

python/3.6 diamond/0.9.24 samtools/1.7 bowtie2/2.3.4.3

Thank you in advance for your help.

biopython error

Dear developers, I've downloaded viromeqc but I have some problems when running. I created a conda environment for it with python3.8, and I installed pandas and biopython, but after this, the following error occurs:

(viromeqc) pgen@pgen:~/sw/viromeqc$ python3.8 ./viromeQC.py
Failed in importing Biopython. Please check Biopython is
installed properly on your system! [ - FAIL - ]

when I try to reinstall biopython it seems to be properly installed:

(viromeqc) pgen@pgen:~/sw/viromeqc$ pip install biopython
Requirement already satisfied: biopython in /home/pgen/miniconda3/envs/viromeqc/lib/python3.8/site-packages (1.79)
Requirement already satisfied: numpy in /home/pgen/miniconda3/envs/viromeqc/lib/python3.8/site-packages (from biopython) (1.23.2)

Any help will be great!

Singularity/Docker

Hi,
Do you have any plan to release a Singularity/Docker version of viromeqc?
Thanks

Can I turn off length and quality filtering?

Hi,

Can I turn off length and quality filtering?

I'd like to use viromeqc for data trimmed already by other tools such as trimmomatic. In this case, I think filtering by viromeqc may be not necessary.

Further, it seems that the filtering step takes much time compared to bowtie2 and diamond steps because filtering uses only one thread.

Thanks.

Ilnam

Error in testing viromeQC

Hi,

I am new to use the tool to evaluate my virome sequence dataset. I always got an error that came with the tool when I attempted to run the example file as follows:

File "/Users/Ernie/viromeqc/test/../viromeQC.py", line 9, in
import pandas as pd
ModuleNotFoundError: No module named 'pandas'

I've downloaded the pandas via conda/bioconda and saw its version is 1.3.2. Can someone help me resolve this issue please?

Many thanks in advance for your kind help!

Can't detect LSU and SSU reads

Hey SegataLab,
I've installed viromeqc but I'm experiencing the folling problems:

  1. Getting the git repo doesn't download cmseq by default. One needs to get it manually and put it into the correct folder

  2. LSU and SSU reads are not detected, skewing the enrichment score, making the enrichment score better than it is

  3. Although viromeQC runs to completion, there's the following error in the output:

Traceback (most recent call last):
  File "/home/shiraz/apps/viromeqc/cmseq/cmseq/filter.py", line 5, in <module>
    from cmseq import __version__
ImportError: cannot import name '__version__' from 'cmseq' (/home/shiraz/apps/viromeqc/cmseq/cmseq/cmseq.py)
[main_samview] fail to read the header from "-".
Traceback (most recent call last):
  File "/home/shiraz/apps/viromeqc/cmseq/cmseq/filter.py", line 5, in <module>
    from cmseq import __version__
ImportError: cannot import name '__version__' from 'cmseq' (/home/shiraz/apps/viromeqc/cmseq/cmseq/cmseq.py)
[main_samview] fail to read the header from "-".

Failed to retrieve DB error

Hi,
When I run viromeQC.py --install , I got "Failed to retrieve DB". Could you tell me how to solve this problem? Thank you very much.

can get the virome db:

i failed to retrieve DB form the zenodo or dropbox !

can you give me another way to solve this problem?

thanks a lot!

image

viromeqc does not work with current versions of python and samtools

Dear SegataLab,
I'm trying to run viromeQC, but there are a series of errors that seem to relate to dependency versions.

E.g. cmseq does not install by default. Upon getting the newest version of cmseq and inserting it into the viromeQC folder, LSU and SSU reads are undetected with a cmseq python error

Upon installing an old version of cmseq matching the date of the viromeQC release, the following error is produced upon running:
[main_samview] fail to read the header from "-".

This seems to be related to the way newer versions of samtools process stdin and stdout.

Trying to installing an older version of samtools fails through conda because it requires an outdated version of Python (3.1) which is older than the 3.6 you tested with viromeqc

Is it possible for you to update the viromeQC repo so that it can run with current versions of dependencies?

Support for already-qc'd fna files of reads

Dear Moreno,
It would be so convenient to have support for read fna files that have already been QC'd so they can be mapped directly by bowtie2. The current QC step is slow, and allowing for this would speed up viromeQC a lot!

A lot of us have MDA amplified viromes, where a read deduplication step is necessary after read filtering and trimming to remove redundant reads prior to assembly, mapping, etc. Such deduplication is already done by other tools (such as vsearch or rmdup) and dramatically reduces the size of the read files. Following qc and deduplication, the quality scores are no longer important, so the output is normally fna files of reads instead of fastq files.

Incorporating support for qc'd fna files (bowtie2 also supports this) would dramatically increase the speed of viromeQC because the length filtering step is omitted, plus fewer reads have to be mapped by bowtie2. E.g. many of my MDA amplified virome files are often 10x smaller after passing my own qc and deduplication pipeline.

Please consider this, as it would be extremely easy for you to implement!

How to select the parameter "--enrichment_preset"

Dear author,

Thanks for your excellent work. I have some VLP enrichment samples from the animal gut. When I use the tool, how to select the parameter "--enrichment_preset". The default is "human", but I do not know whether I should calculate the enrichment based on the human microbiome. I hope to get your instructions.

Best Regards,
Jiaojiao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.