Comments (3)
Hi,
You shouldn't need to perform host removal as this step is performed by Hecatomb, but there will be a difference due to the way Hecatomb prepares the references for filtering. Viral-like sequences in the host are masked to avoid removing real viral sequences that happen to be similar, but this will result in host sequences that need to be filtered later. Hecatomb currently doesn't remove phix but I think this will change in the next version. I'm interested in hearing what your preference would be re: filtering as we've had this conversation several times about what approach would be best.
I wouldn't expect to find many RNA viruses in a DNA metagenome, but you might still have hits to known RNA viruses if they share homology to DNA viruses in your sample.
from hecatomb.
Thanks for your response. It would be great to remove the phix genome in the next version of Hecatomb. I will be looking forward to the next version.
Regarding the filtering process, unfortunately, there is no easy answer. Based on what I saw in my toy dataset, I think lots of the host DNA reads were wrongly classified as RNA viruses (this was suggested from the large proportion of RNA viruses that were retrieved from a DNA dataset, so an unexpected behaviour). This may be a problem in short reads datasets, in general. On the other hand, you may also lose some DNA viruses if you filtered beforehand. I think if you would like to be more conservative and avoid false positives as much as possible, then removing host DNA beforehand might be needed. However, this still needs some benchmarking on synthetic datasets where a mix of microbial (including viral) and host short reads are included to reach more conclusive thoughts.
In addition, you may wish to include a feature to only search in the DNA vs RNA viral catalogue or both. This way, it may better suit the type of the dataset you are investigating.
I am curious to know your thoughts!
from hecatomb.
Yes, I agree 100%. This misclassification of host DNA as RNA viruses is very typical. I like the idea of switching off searching for RNA viruses; I'll have to think of the best way to implement it as we want to do the same thing for phages.
from hecatomb.
Related Issues (20)
- I want to create a web app for hecatomb. HOT 1
- HPC database installation problems HOT 31
- HPC Execution problem when changing to V.1.1.0 HOT 4
- Skip host removal HOT 1
- Solving OSerror Issues in a WSL Environment during the megahit step. HOT 2
- Can Hecatomb be used for searching not only viruses but also bacteria, fungi, and mycoplasma? HOT 1
- bigtable.tsv column question HOT 1
- No rule to produce assembly HOT 5
- add-host is failing on mask_host HOT 3
- out-of-memory error during population_assembly.flye, STAGE: repeat HOT 2
- 98, 99% Failure Issue HOT 1
- ModuleNotFoundError: No module named 'attrmap' HOT 1
- Errors in newest version 1.3.0 HOT 15
- Placeholder of length limit HOT 3
- hecatomb combine not working HOT 3
- hecatomb.config.yaml HOT 1
- Hecatomb not working ( secondary aa taxonomy) HOT 3
- ERROR: Snakemake failed HOT 4
- Long reads support HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hecatomb.