Code Monkey home page Code Monkey logo

imgmc's Introduction

DOI

iMGMC - integrated Mouse Gut Metagenomic Catalog

logo

Description

Creation of an new mouse gut gene catalog with special features:

  • more diverse samples from different studies (12 Vendors incl. wild mice and various gut locations)
  • clustering-free approach: all-in-one assembly, keeping track of each ORF to contigs to bins
  • higher taxonomic resolution and more accuracy by using contigs for annotation
  • 16S rRNA gene integration via linkage to bins
  • expansion by 20,927 MAGs from sample-wise assembly of 871 mouse gut metagenomic samples, representing 1,296 species
  • Pipelines and Tutorials to process you own data

See our paper for details.

An integrated metagenome catalog enables new insights into the murine gut microbiome
Till R. Lesker, Abilash C. Durairaj, Eric. J.C. Gálvez, Ilias Lagkouvardos, John F. Baines, Thomas Clavel, Alexander Sczyrba, Alice C. McHardy, Till Strowig. Cell reports 30, no. 9 (2020): 2909-2922. https://doi.org/10.1016/j.celrep.2020.02.036

Data: DOI

Genecatalog:

Please download the files by using the links in the table, use script provided here: download all iMGMC data or use alternative download at Zenodo.

Description Size Link
Catalog ORF sequences 1 GB iMGMC-GeneID.fasta.gz
Full assembly contigs 1.3 GB iMGMC-ConitgID.fasta.gz
Mapping File (GeneID->ContigID->BinID) 30 MB iMGMC-map-Gene-Contig-Bin.tab.gz
Taxonomic annotations 42 MB iMGMC_map_taxonomy.tar.gz
Functional annotations 38 MB iMGMC_map_functionality.tar.gz
16S rRNA sequences 2 MB iMGMC-16SrRNAgenes.fasta

Metagenome-assembled genomes (MAGs) :

Description Size Link
integrated MAGs 0.5 GB iMGMC_MAGs.tar.gz
representative mMAGs (n=1296) 1 GB iMGMC-mMAGs-dereplicated_genomes.tar.gz
representative hqMAGs (n=830) 0.7 GB iMGMC-hqMAGs-dereplicated_genomes.tar.gz
all mMAGs (n=20,927) 15 GB iMGMC-mMAGs.tar.gz
Annotations by CheckM, dRep-Clustering, GTDB-Tk 2 MB MAG-annotation_CheckM_dRep_GTDB-Tk.tar.gz
Functional annotations (hqMAGs by eggNOG mapper v2) 187 MB hqMAGs.emapper.annotations.gz

For species abundance determination you can use CoverM or our bbmap-pipeline.

Mouse gut metagenomic libraries (Raw Data Fastq):

Accession codes of the used gut metagenome sequences: European Nucleotide Archive: ERP008710, ERA473426, PRJEB32890 and to the Metagenomics Rapid Genomes/Metagenomes (MG-RAST) with ID 4661127.3/ mgp5130

Updates:

The following files are additional or updated data. The mainly effects annotations files like KEGG-database links and taxonomic descriptions from GTDB.

Description Size Link
translated ORF sequences 0.7 GB iMGMC-GeneID.proteins.gz
KEGG KofamScan 03/20 14 MB iMGMC-GeneID-KofamScan.fasta.gz
mMAGs Taxonomy GTDB-Tk-v1.3 rs95 1 MB iMGMC-mMAGs-GTDB-Tk_v1.3.0-r95.tsv

Pipelines:

pipeline

We recommend the use of Bioconda

Use of the gene catalog (mapping pipeline using the catalog or MAGs)

Instruction to process your own WGS samples with iMGMC

Using MAGs with iMGMC/sMAGS with your own WGS samples

PICRUSt (mouse gut specific)

Instruction to process your own samples 16S rRNA amplicon samples with PICRUSt and iMGMC

IMNGS (resource of processed 16S rRNA microbial profiles)

Instruction to work with iMGMC-IMNGS data

Workflows to create the iMGMC Catalog

Code for assembly, binning and 16S rRNA gene reconstruction

Code for linking 16S rRNA genes to bins

Workflows to create by sample MAGs (single wise)

Code for the generation and clustering of single-wise assembly MAGs

Code for the evaluation of single-wise assembly MAGs versus all-in-one assembly MAGs


Tutorials

tutorials

Explore MAG abundances with CoverM and Krona Plot

Compare MAGs abundances with CoverM and R heatmap

Ordination of samples by gene and KO profiles


Please open an issue if the problem cannot be solved. We will need to know how to reproduce your problem.

External studies providing data:

Xiao, Liang, et al. "A catalog of the mouse gut metagenome." Nature biotechnology 33.10 (2015): 1103-1108. http://doi.org/10.1038/nbt.3353

Wang, Jun, et al. "Dietary history contributes to enterotype-like clustering and functional metagenomic content in the intestinal microbiome of wild mice." Proceedings of the National Academy of Sciences 111.26 (2014): E2703-E2710. http://doi.org/10.1073/pnas.1402342111

Lagkouvardos, Ilias, et al. "The Mouse Intestinal Bacterial Collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota." Nature microbiology 1 (2016): 16131. http://doi.org/10.1038/nmicrobiol.2016.131

Acknowledgements: TS was funded by the Helmholtz Association (VH-NG-933), by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, STR-1343/1 and STR-1343/2) and the European Union (StG337251). JFB was funded by the DFG under Germany`s Excellence Strategy – EXC 22167-390884018 and by the DFG Collaborative Research Center (CRC) 1182 “Origin and Function of Metaorganisms”. TC received funding from the DFG (project CL481/2-1 and grants within Collaborative Research Center 1382).

imgmc's People

Contributors

tillrobin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

imgmc's Issues

Further stratification

Hello,

Great database, thank you!

  1. I see in the summary image in the pipeline/tutorial section "KO Pathway" is listed for the functions section , although I don't see a way to arrange the KO annotations into pathways using the current tools - is that something that is still being worked on or am I missing the script for it somewhere here? Do I need to move my data into some KEGG specific pipeline to get pathway sums?

  2. Is there an easy way to stratify KOs by taxonomy? I can get taxonomy TPM sums, and KO TPM sums, but it's not clear to me how to get KO sums by taxonomy. Thanks for any info you can provide!

Template filepath does not exist: iMGMC-16SrRNA-alignment.fasta

Hello,

I am trying to use iMGMC for picrust analysis using the following code:

conda create -n iMGMC-PICRUSt -c bioconda picrust pynast fasttree
./iMGMC-PICRUSt.bash run2_feature.biom run2_sequences.fasta

As detailed in the picrust information page. But it seems I am missing a number of required files such as:
iMGMC-CopyNr-16SrRNA.tab
iMGMC-KO_traits.tab
along with a number of other traits files

I can't seem to find these files as part of the downloads on here - am I supposed to generate these somehow?

Thanks for your advice

Gene/ORF coordinates

Hello, I see how the .tab files connect genes and their associated function/taxonomy to the contigs they were annotated in, but is there coordinate information available that describes where in those contigs each ORF starts/stops? Thanks for info you have!

Submit genomes to NCBI

Hello, thank you for this great resource. If I'm not mistaken, you didn't submit your genomes to NCBI. I know it can be difficult sometimes. But the advantage of genomes submitted to NCBI is that they are incorporated in GTDB and other databases ..

If you provide me with a table with the NCBI sample names we can upload your genomes to NCBI and link it to your publication/Bioproject.

Kind regards

CC @ trickovicmatija

PICRUSt2 compatible files

Hi Till

I get the error "usage: conda [-h] [-V] command ...
conda: error: unrecognized arguments: picrust pynast fasttree" when I key conda create iMGMC-PICRUSt -c bioconda picrust pynast fasttree
Do you know how to solve this issue?

Also I would like to use PICRUSt 2 with my amplicon sequences with your reference database. Do you have the PICRUST2 ready files in the following format?
pro_ref.fna.gz
pro_ref.hmm
pro_ref.model
pro_ref.tre

If not, I would greatly appreciate some guidance on how to get the files ready for PICRUST2.

thank you, kind regards
Adrian

Using the pre-build links between 16S rRNA genes with metagenome-assembled genomes?

Hi,
It's really an exciting tool for the mouse gut microbiome analysis! In my study, I have obtained the 16S rRNA gene V3-V4 amplicon sequencing data of C57BL/6J mice. I'm trying to use the pre-build links between 16S rRNA genes with metagenome-assembled genomes in iMGMC. In the paper, full-length, unique 16S rRNA gene sequences were used to build the links with metagenome-assembled genomes. I'm wondering if my 16S rRNA amplicon sequencing data could also employ this tool to get the metagenome-assembled genome informations? With this pipeline Instruction to process your own WGS samples with iMGMC? Many thanks!
Regards!

mgs16S by SRA sample feature table

Hi,
thanks to your great resource, I could match one differential abundant PacBio fulllength 16S feature to your mgs16S-0293. You assign the taxonomy Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Rikenellaceae;Alistipes to it and I'd love to find our more about this specific bug.
Your Graphical Abstract says something about 168,573 SRA samples as input for your pipeline. Do you have some table lists prevalence of mgs16S-0293 in all those samples?
I am looking for something like our S6 table with more than the 11 broad categories.
Thanks in advance,
Stefan

Mice specific 16S database for metabarcoding taxonomic assignments

Dear iMGMC developers,

Thanks for developing this amazing tool.
I was wondering if I could use iMGMC tool/data in order to assign 16S V4 ASV against a reduced, mice-specific microbiota database.
I obtained those ASV using the dada2 pipeline und ultimately would like to assign taxonomy using DECIPHER::IdTaxa().
So I would need a fasta file with mice specific 16S sequences as well as the taxonomic path.

Does it sound feasable? Would you suggest another alternative?

Thanks a lot for your help.

Flo

final-unambiguousReads.tab does not get created

The file
"final-unambiguousReads.tab" does not get created. Also, there is no mention of it in the script from the screenshot below

image

from the linking tutorial.
Maybe that's an error in the documentation

An issue in Mapping part

I couldn't run bbmap if follow the command you mentioned in genecatalog-pipeline.md.
my BBmap version=38.86

bbmap.sh -Xmx30g unpigz=t threads=${usedCores} minid=0.90 \
ref=${iMGCM-data} nodisk \
statsfile=${SampleName}.statsfile \
scafstats=${SampleName}.scafstats \
covstats=${SampleName}.covstat \
rpkm=${SampleName}.rpkm \
sortscafs=f nzo=f \
in=${SampleName}_R1_rmhost.fastq.gz \
in2=${SampleName}_R2_rmhost.fastq.gz

I have to use "path" instead of "ref". Otherwise, BBmap fails to find the index.
Besides, I don't know whether it happened in your system, but minus symbol could not use in RedHat as an environment variable name.
Hope I can hear from you soon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.