Comments (2)
Hey Zhaoju,
The numbers you describe sound pretty normal to me: I would expect that an assembly-free approach used by short-read profilers like kraken/metaphlan/mOTUs will have many more "hits" for genomes compared to the assembly-based approach used by metaGEM and other similar workflows.
However, I would warn that many of the low relative abundance hits from short read profilers may be false positives from closely related species. I would expect that if you use a relative abunance cutoff to filter our low abundance species from the short-read profiler output, then the number of species will start to approach those obtained from the assembly-based approach. This reflects the fact that assembly-based approaches work great for high coverage or high abundance genomes, but not so well for low abundance/coverage genomes.
Also, 8-12 genomes is on the lower side, did you use coverage across multiple samples for binning? This extra mapping information should allow you to get more out of your samples. As an example, consider this publication (https://doi.org/10.1016/j.cell.2019.01.001) where they reconstruct 154,723 genomes from 9,428 human gut metagenomes = ~ 16 genomes/sample, and note that they did not use coverage across multiple samples for binning. In the metaGEM paper (https://doi.org/10.1093/nar/gkab815), we reconstructed 4,133 genomes from 137 human gut metagenomes = ~ 30 genomes/sample, note that this was using coverage across samples.
Note also that sequencing depth and complexity of your samples will play a big role in the number of genomes reconstructed, if your samples are very shallow and they are complex then you will recover a low number of genomes. If possible, try increasing sequencing depth in your next experiment, or search for a dataset with higher sequencing depth.
I think that the approach you mention regarding the usage of short-read profilers to select AGORA models for simulation is understandable, but not very elegant. The whole point of metaGEM is enable direct reconstuction of metabolic models from metagenomes in order to capture context-and-strain-specific information available in your sequencing samples, which is missing from reference genomes and reference-genome-based-metabolic models (e.g. AGORA). Consider the following text from the metaGEM paper:
Pangenome analysis of the human gut microbiome demonstrated that the functional repertoire of gut species differ significantly, with a median core genome proportion of only 66% [14], revealing differences in metabolic potentials of individual microbiomes.
There is significant variation in the functional repertoire of the same species across humans, and I would expect the differences in metabolism of the same microbial species across human and cow to be even greater.
Hope this helps, let me know if you have further questions!
Best,
Francisco
from metagem.
See also #24
from metagem.
Related Issues (20)
- Job submission with qsub HOT 2
- [Bug]: 'BiGG_gene' is both an index level and a column label, which is ambiguous. HOT 4
- [Usage]: running workflow on workstation with local flag HOT 17
- Questions about media, gapfilling, and predicting interactions HOT 11
- Getting the following error while running the bash metaGEM.sh -t check
- Getting the following error while running the bash metaGEM.sh -t check HOT 8
- refined_bins output remains empty after successful binRefine step HOT 2
- [Question]: How to define and construct a custom culture medium component that can be recognized by CarveMe? HOT 2
- [Question]:Why, when I use CarveMe for gap-filling, does it show that my custom medium does not exist in the database? HOT 1
- [Question]: I meet some errors when I use CarveMe for gap-filling? HOT 3
- [Question]: How to use the GEM output of CarveMe to generate these two files? HOT 1
- [Bug]: Metawrap Installation failure HOT 5
- ERROR when using GTDBTK HOT 2
- maintenance: check bonus tool implementation in Snakefile and wrapper
- crossmap with multiple threads HOT 2
- Implementation of EukRep in the Snakemake pipeline HOT 4
- How to set media to interpret and compare the metabolic interactions at different habitats? HOT 5
- abundance | samtools view: failed to add PG line to the header HOT 2
- dir_util.py AttributeError: 'dict' object has no attribute 'add' HOT 3
- [Question]: Feature name data type issues and workflow failures when processing "crossMapSeries" output files in the "binRefine" workflow HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metagem.