Investigating Bacillus subtilis query neighborhoods in metagenome graphs

Biological questions

Clouds of species diversity in local environments

Prophage insertion sites

how is this novel? / how compare do existing studies?

analyze sgc neighborhoods, not assemblies (assemblies will drop too much of the juicy gene bits)
- if we are working with sgc nbhds, we should probably use protein kmers to get ANI/AAI estimates
MAGs are composite genomes .. how do we handle?
- must analyze across samples, not w/in a single sample

Approach

Clouds of species diversity

generate a B. sub isolate reference pangenome query
- Start with a B. sub query
- Run sourmash prefetch against GTDBrs202
- Filter to jaccard similarity of >= 0.1
- Format for genome-grist
- genome-grist download prefetch matches
- charcoal decontaminate genomes
- roary generate reference "pangenome" query
spacegraphcats query each metagenome CAtlas
estimate ANI, AAI, Jaccard
MDS plot
pangenome statistics of b. sub
- multifasta annotation of b. sub catlas (genes present)?
- protein k-mers?
determine additional content recovered by sgc queries
- megahit assemble query nbhds
- map nbhd reads back against query nbhds
- count number of unmapped reads
- kmer content analysis in full nbhd vs in assemblies
- ANI estimation from assemblies

Data

The 605 gut microbiome metagenomes analyzed in this repository were originally analysed in the 2020-ibd repository as a meta-cohort of IBD subtypes (CD, UC, and nonIBD). The download and preprocessing code has been copied over to this workflow, but the pre-processed files have been linked into this repository to save harddisk space. Specifically, k-mer abundance trimmed pre-processed files have been sym linked into outputs/abundtrim, and spacegraphcats metagenome CAtlases have been hard linked into outputs/sgc_genome_queries. The sample metadata file has been copied into this repository from here.

Literature

B. subtilis

Other species:

Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species

Random notes

GTDB extract b sub genomes:

grep -i s__Bacillus gtdb-rs202.taxonomy.v2.csv > bacillus.csv
# then add headers, and...
sourmash sig extract --picklist shewanella.csv:ident:ident /group/ctbrowngrp/gtdb/databases/ctb/gtdb-rs202.genomic.k31.zip -o bacillus.zip

Getting started

conda env create --name bsub --file environment.yml
conda activate bsub

snakemake -j 16 --use-conda --rerun-incomplete --latency-wait 15 --resources mem_mb=500000 --cluster "sbatch -t 10080 -J bsub -p bmm -n 1 -N 1 -c {threads} --mem={resources.mem_mb}" -k -n

taylorreiter / 2021-bsub Goto Github PK

2021-bsub's Introduction

Investigating Bacillus subtilis query neighborhoods in metagenome graphs

Biological questions

Clouds of species diversity in local environments

Prophage insertion sites

Approach

Clouds of species diversity

Data

Literature

B. subtilis

Other species:

Random notes

Getting started

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent