tillrobin / imgmc Goto Github PK

View Code? Open in Web Editor NEW

20.0 6.0 14.0 25.64 MB

Repository of the integrated mouse gut metagenomic catalog

License: GNU General Public License v3.0

Shell 66.42% R 33.58%

imgmc's Introduction

iMGMC - integrated Mouse Gut Metagenomic Catalog

Description

Creation of an new mouse gut gene catalog with special features:

more diverse samples from different studies (12 Vendors incl. wild mice and various gut locations)
clustering-free approach: all-in-one assembly, keeping track of each ORF to contigs to bins
higher taxonomic resolution and more accuracy by using contigs for annotation
16S rRNA gene integration via linkage to bins
expansion by 20,927 MAGs from sample-wise assembly of 871 mouse gut metagenomic samples, representing 1,296 species
Pipelines and Tutorials to process you own data

See our paper for details.

An integrated metagenome catalog enables new insights into the murine gut microbiome
Till R. Lesker, Abilash C. Durairaj, Eric. J.C. Gálvez, Ilias Lagkouvardos, John F. Baines, Thomas Clavel, Alexander Sczyrba, Alice C. McHardy, Till Strowig. Cell reports 30, no. 9 (2020): 2909-2922. https://doi.org/10.1016/j.celrep.2020.02.036

Data:

Genecatalog:

Please download the files by using the links in the table, use script provided here: download all iMGMC data or use alternative download at Zenodo.

Description	Size	Link
Catalog ORF sequences	1 GB	iMGMC-GeneID.fasta.gz
Full assembly contigs	1.3 GB	iMGMC-ConitgID.fasta.gz
Mapping File (GeneID->ContigID->BinID)	30 MB	iMGMC-map-Gene-Contig-Bin.tab.gz
Taxonomic annotations	42 MB	iMGMC_map_taxonomy.tar.gz
Functional annotations	38 MB	iMGMC_map_functionality.tar.gz
16S rRNA sequences	2 MB	iMGMC-16SrRNAgenes.fasta

Metagenome-assembled genomes (MAGs) :

Description	Size	Link
integrated MAGs	0.5 GB	iMGMC_MAGs.tar.gz
representative mMAGs (n=1296)	1 GB	iMGMC-mMAGs-dereplicated_genomes.tar.gz
representative hqMAGs (n=830)	0.7 GB	iMGMC-hqMAGs-dereplicated_genomes.tar.gz
all mMAGs (n=20,927)	15 GB	iMGMC-mMAGs.tar.gz
Annotations by CheckM, dRep-Clustering, GTDB-Tk	2 MB	MAG-annotation_CheckM_dRep_GTDB-Tk.tar.gz
Functional annotations (hqMAGs by eggNOG mapper v2)	187 MB	hqMAGs.emapper.annotations.gz

For species abundance determination you can use CoverM or our bbmap-pipeline.

Mouse gut metagenomic libraries (Raw Data Fastq):

Accession codes of the used gut metagenome sequences: European Nucleotide Archive: ERP008710, ERA473426, PRJEB32890 and to the Metagenomics Rapid Genomes/Metagenomes (MG-RAST) with ID 4661127.3/ mgp5130

Updates:

The following files are additional or updated data. The mainly effects annotations files like KEGG-database links and taxonomic descriptions from GTDB.

Description	Size	Link
translated ORF sequences	0.7 GB	iMGMC-GeneID.proteins.gz
KEGG KofamScan 03/20	14 MB	iMGMC-GeneID-KofamScan.fasta.gz
mMAGs Taxonomy GTDB-Tk-v1.3 rs95	1 MB	iMGMC-mMAGs-GTDB-Tk_v1.3.0-r95.tsv

Pipelines:

We recommend the use of Bioconda

Tutorials

Explore MAG abundances with CoverM and Krona Plot

Compare MAGs abundances with CoverM and R heatmap

Ordination of samples by gene and KO profiles

Please open an issue if the problem cannot be solved. We will need to know how to reproduce your problem.

External studies providing data:

Xiao, Liang, et al. "A catalog of the mouse gut metagenome." Nature biotechnology 33.10 (2015): 1103-1108. http://doi.org/10.1038/nbt.3353

Wang, Jun, et al. "Dietary history contributes to enterotype-like clustering and functional metagenomic content in the intestinal microbiome of wild mice." Proceedings of the National Academy of Sciences 111.26 (2014): E2703-E2710. http://doi.org/10.1073/pnas.1402342111

Lagkouvardos, Ilias, et al. "The Mouse Intestinal Bacterial Collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota." Nature microbiology 1 (2016): 16131. http://doi.org/10.1038/nmicrobiol.2016.131

Acknowledgements: TS was funded by the Helmholtz Association (VH-NG-933), by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, STR-1343/1 and STR-1343/2) and the European Union (StG337251). JFB was funded by the DFG under Germany`s Excellence Strategy – EXC 22167-390884018 and by the DFG Collaborative Research Center (CRC) 1182 “Origin and Function of Metaorganisms”. TC received funding from the DFG (project CL481/2-1 and grants within Collaborative Research Center 1382).

imgmc's People

Contributors

Stargazers

Watchers

Forkers

haolianxu solarise94 strowig-lab ericjcgalvez thh32 asundus cpauvert pythseq

imgmc's Issues

Further stratification

Hello,

Great database, thank you!

I see in the summary image in the pipeline/tutorial section "KO Pathway" is listed for the functions section , although I don't see a way to arrange the KO annotations into pathways using the current tools - is that something that is still being worked on or am I missing the script for it somewhere here? Do I need to move my data into some KEGG specific pipeline to get pathway sums?
Is there an easy way to stratify KOs by taxonomy? I can get taxonomy TPM sums, and KO TPM sums, but it's not clear to me how to get KO sums by taxonomy. Thanks for any info you can provide!

Template filepath does not exist: iMGMC-16SrRNA-alignment.fasta

Hello,

I am trying to use iMGMC for picrust analysis using the following code:

conda create -n iMGMC-PICRUSt -c bioconda picrust pynast fasttree
./iMGMC-PICRUSt.bash run2_feature.biom run2_sequences.fasta

As detailed in the picrust information page. But it seems I am missing a number of required files such as:
iMGMC-CopyNr-16SrRNA.tab
iMGMC-KO_traits.tab
along with a number of other traits files

I can't seem to find these files as part of the downloads on here - am I supposed to generate these somehow?

Thanks for your advice

Gene/ORF coordinates

Hello, I see how the .tab files connect genes and their associated function/taxonomy to the contigs they were annotated in, but is there coordinate information available that describes where in those contigs each ORF starts/stops? Thanks for info you have!

Submit genomes to NCBI

Hello, thank you for this great resource. If I'm not mistaken, you didn't submit your genomes to NCBI. I know it can be difficult sometimes. But the advantage of genomes submitted to NCBI is that they are incorporated in GTDB and other databases ..

If you provide me with a table with the NCBI sample names we can upload your genomes to NCBI and link it to your publication/Bioproject.

Kind regards

CC @ trickovicmatija

PICRUSt2 compatible files

Hi Till

I get the error "usage: conda [-h] [-V] command ...
conda: error: unrecognized arguments: picrust pynast fasttree" when I key conda create iMGMC-PICRUSt -c bioconda picrust pynast fasttree
Do you know how to solve this issue?

Also I would like to use PICRUSt 2 with my amplicon sequences with your reference database. Do you have the PICRUST2 ready files in the following format?
pro_ref.fna.gz
pro_ref.hmm
pro_ref.model
pro_ref.tre

If not, I would greatly appreciate some guidance on how to get the files ready for PICRUST2.

thank you, kind regards
Adrian

Using the pre-build links between 16S rRNA genes with metagenome-assembled genomes?

Hi,
It's really an exciting tool for the mouse gut microbiome analysis! In my study, I have obtained the 16S rRNA gene V3-V4 amplicon sequencing data of C57BL/6J mice. I'm trying to use the pre-build links between 16S rRNA genes with metagenome-assembled genomes in iMGMC. In the paper, full-length, unique 16S rRNA gene sequences were used to build the links with metagenome-assembled genomes. I'm wondering if my 16S rRNA amplicon sequencing data could also employ this tool to get the metagenome-assembled genome informations? With this pipeline Instruction to process your own WGS samples with iMGMC? Many thanks!
Regards!

Error during 16s abundance calculation

Does not produce the required directory for the next step.

This results in

mgs16S by SRA sample feature table

Hi,
thanks to your great resource, I could match one differential abundant PacBio fulllength 16S feature to your mgs16S-0293. You assign the taxonomy Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Rikenellaceae;Alistipes to it and I'd love to find our more about this specific bug.
Your Graphical Abstract says something about 168,573 SRA samples as input for your pipeline. Do you have some table lists prevalence of mgs16S-0293 in all those samples?
I am looking for something like our S6 table with more than the 11 broad categories.
Thanks in advance,
Stefan

Mice specific 16S database for metabarcoding taxonomic assignments

Dear iMGMC developers,

Thanks for developing this amazing tool.
I was wondering if I could use iMGMC tool/data in order to assign 16S V4 ASV against a reduced, mice-specific microbiota database.
I obtained those ASV using the dada2 pipeline und ultimately would like to assign taxonomy using DECIPHER::IdTaxa().
So I would need a fasta file with mice specific 16S sequences as well as the taxonomic path.

Does it sound feasable? Would you suggest another alternative?

Thanks a lot for your help.

Flo

final-unambiguousReads.tab does not get created

The file
"final-unambiguousReads.tab" does not get created. Also, there is no mention of it in the script from the screenshot below

from the linking tutorial.
Maybe that's an error in the documentation

An issue in Mapping part

I couldn't run bbmap if follow the command you mentioned in genecatalog-pipeline.md.
my BBmap version=38.86

bbmap.sh -Xmx30g unpigz=t threads=${usedCores} minid=0.90 \
ref=${iMGCM-data} nodisk \
statsfile=${SampleName}.statsfile \
scafstats=${SampleName}.scafstats \
covstats=${SampleName}.covstat \
rpkm=${SampleName}.rpkm \
sortscafs=f nzo=f \
in=${SampleName}_R1_rmhost.fastq.gz \
in2=${SampleName}_R2_rmhost.fastq.gz

I have to use "path" instead of "ref". Otherwise, BBmap fails to find the index.
Besides, I don't know whether it happened in your system, but minus symbol could not use in RedHat as an environment variable name.
Hope I can hear from you soon.

tillrobin / imgmc Goto Github PK

imgmc's Introduction

iMGMC - integrated Mouse Gut Metagenomic Catalog

Description

Data:

Genecatalog:

Metagenome-assembled genomes (MAGs) :

Mouse gut metagenomic libraries (Raw Data Fastq):

Updates:

Pipelines:

Use of the gene catalog (mapping pipeline using the catalog or MAGs)

PICRUSt (mouse gut specific)

IMNGS (resource of processed 16S rRNA microbial profiles)

Workflows to create the iMGMC Catalog

Workflows to create by sample MAGs (single wise)