Code Monkey home page Code Monkey logo

Comments (3)

beardymcjohnface avatar beardymcjohnface commented on June 30, 2024

Hi,
You shouldn't need to perform host removal as this step is performed by Hecatomb, but there will be a difference due to the way Hecatomb prepares the references for filtering. Viral-like sequences in the host are masked to avoid removing real viral sequences that happen to be similar, but this will result in host sequences that need to be filtered later. Hecatomb currently doesn't remove phix but I think this will change in the next version. I'm interested in hearing what your preference would be re: filtering as we've had this conversation several times about what approach would be best.
I wouldn't expect to find many RNA viruses in a DNA metagenome, but you might still have hits to known RNA viruses if they share homology to DNA viruses in your sample.

from hecatomb.

mhmism avatar mhmism commented on June 30, 2024

Thanks for your response. It would be great to remove the phix genome in the next version of Hecatomb. I will be looking forward to the next version.
Regarding the filtering process, unfortunately, there is no easy answer. Based on what I saw in my toy dataset, I think lots of the host DNA reads were wrongly classified as RNA viruses (this was suggested from the large proportion of RNA viruses that were retrieved from a DNA dataset, so an unexpected behaviour). This may be a problem in short reads datasets, in general. On the other hand, you may also lose some DNA viruses if you filtered beforehand. I think if you would like to be more conservative and avoid false positives as much as possible, then removing host DNA beforehand might be needed. However, this still needs some benchmarking on synthetic datasets where a mix of microbial (including viral) and host short reads are included to reach more conclusive thoughts.

In addition, you may wish to include a feature to only search in the DNA vs RNA viral catalogue or both. This way, it may better suit the type of the dataset you are investigating.

I am curious to know your thoughts!

from hecatomb.

beardymcjohnface avatar beardymcjohnface commented on June 30, 2024

Yes, I agree 100%. This misclassification of host DNA as RNA viruses is very typical. I like the idea of switching off searching for RNA viruses; I'll have to think of the best way to implement it as we want to do the same thing for phages.

from hecatomb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.