Code Monkey home page Code Monkey logo

Comments (2)

franciscozorrilla avatar franciscozorrilla commented on September 4, 2024

Hey Zhaoju,

The numbers you describe sound pretty normal to me: I would expect that an assembly-free approach used by short-read profilers like kraken/metaphlan/mOTUs will have many more "hits" for genomes compared to the assembly-based approach used by metaGEM and other similar workflows.

However, I would warn that many of the low relative abundance hits from short read profilers may be false positives from closely related species. I would expect that if you use a relative abunance cutoff to filter our low abundance species from the short-read profiler output, then the number of species will start to approach those obtained from the assembly-based approach. This reflects the fact that assembly-based approaches work great for high coverage or high abundance genomes, but not so well for low abundance/coverage genomes.

Also, 8-12 genomes is on the lower side, did you use coverage across multiple samples for binning? This extra mapping information should allow you to get more out of your samples. As an example, consider this publication (https://doi.org/10.1016/j.cell.2019.01.001) where they reconstruct 154,723 genomes from 9,428 human gut metagenomes = ~ 16 genomes/sample, and note that they did not use coverage across multiple samples for binning. In the metaGEM paper (https://doi.org/10.1093/nar/gkab815), we reconstructed 4,133 genomes from 137 human gut metagenomes = ~ 30 genomes/sample, note that this was using coverage across samples.

Note also that sequencing depth and complexity of your samples will play a big role in the number of genomes reconstructed, if your samples are very shallow and they are complex then you will recover a low number of genomes. If possible, try increasing sequencing depth in your next experiment, or search for a dataset with higher sequencing depth.

I think that the approach you mention regarding the usage of short-read profilers to select AGORA models for simulation is understandable, but not very elegant. The whole point of metaGEM is enable direct reconstuction of metabolic models from metagenomes in order to capture context-and-strain-specific information available in your sequencing samples, which is missing from reference genomes and reference-genome-based-metabolic models (e.g. AGORA). Consider the following text from the metaGEM paper:

Pangenome analysis of the human gut microbiome demonstrated that the functional repertoire of gut species differ significantly, with a median core genome proportion of only 66% [14], revealing differences in metabolic potentials of individual microbiomes.

There is significant variation in the functional repertoire of the same species across humans, and I would expect the differences in metabolism of the same microbial species across human and cow to be even greater.

Hope this helps, let me know if you have further questions!
Best,
Francisco

from metagem.

franciscozorrilla avatar franciscozorrilla commented on September 4, 2024

See also #24

from metagem.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.