Code Monkey home page Code Monkey logo

2021-bsub's Introduction

Investigating Bacillus subtilis query neighborhoods in metagenome graphs

Biological questions

Clouds of species diversity in local environments

Prophage insertion sites

how is this novel? / how compare do existing studies?

  • analyze sgc neighborhoods, not assemblies (assemblies will drop too much of the juicy gene bits)

    • if we are working with sgc nbhds, we should probably use protein kmers to get ANI/AAI estimates
  • MAGs are composite genomes .. how do we handle?

    • must analyze across samples, not w/in a single sample

Approach

Clouds of species diversity

  • generate a B. sub isolate reference pangenome query
    • Start with a B. sub query
    • Run sourmash prefetch against GTDBrs202
    • Filter to jaccard similarity of >= 0.1
    • Format for genome-grist
    • genome-grist download prefetch matches
    • charcoal decontaminate genomes
    • roary generate reference "pangenome" query
  • spacegraphcats query each metagenome CAtlas
  • estimate ANI, AAI, Jaccard
  • MDS plot
  • pangenome statistics of b. sub
    • multifasta annotation of b. sub catlas (genes present)?
    • protein k-mers?
  • determine additional content recovered by sgc queries
    • megahit assemble query nbhds
    • map nbhd reads back against query nbhds
    • count number of unmapped reads
    • kmer content analysis in full nbhd vs in assemblies
    • ANI estimation from assemblies

Data

The 605 gut microbiome metagenomes analyzed in this repository were originally analysed in the 2020-ibd repository as a meta-cohort of IBD subtypes (CD, UC, and nonIBD). The download and preprocessing code has been copied over to this workflow, but the pre-processed files have been linked into this repository to save harddisk space. Specifically, k-mer abundance trimmed pre-processed files have been sym linked into outputs/abundtrim, and spacegraphcats metagenome CAtlases have been hard linked into outputs/sgc_genome_queries. The sample metadata file has been copied into this repository from here.

Literature

B. subtilis

Other species:

Random notes

GTDB extract b sub genomes:

grep -i s__Bacillus gtdb-rs202.taxonomy.v2.csv > bacillus.csv
# then add headers, and...
sourmash sig extract --picklist shewanella.csv:ident:ident /group/ctbrowngrp/gtdb/databases/ctb/gtdb-rs202.genomic.k31.zip -o bacillus.zip

Getting started

conda env create --name bsub --file environment.yml
conda activate bsub

snakemake -j 16 --use-conda --rerun-incomplete --latency-wait 15 --resources mem_mb=500000 --cluster "sbatch -t 10080 -J bsub -p bmm -n 1 -N 1 -c {threads} --mem={resources.mem_mb}" -k -n

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.