Light

aidanfoo96 / minuur Goto Github PK

View Code? Open in Web Editor NEW

6.0 4.0 4.0 3.78 MB

Pipeline to pull microbial reads from WGS data and perform metagenomic analysis

License: GNU General Public License v3.0

Python 11.11% R 6.57% Shell 0.14% Dockerfile 2.76% HTML 48.17% Jupyter Notebook 4.47% JavaScript 12.07% CSS 13.62% TeX 1.09%

bioinformatics metagenomics pipeline

minuur's Introduction

MINUUR - Microbial INsights Using Unmapped Reads

Doi for manuscript: https://doi.org/10.12688/wellcomeopenres.19155.1

Please follow the tutorial in my Jupyter Book Available Here: https://aidanfoo96.github.io/MINUUR/ for reproduction of my analysis or to apply in your host of interest :)

MINUUR is a snakemake pipeline I developed to extract non-host sequencing reads from mosquito whole genome sequencing data and utilise a range of metagenomic analyses to characterise potential host-associated microbes. Its application can be applied to other host-associated WGS data. MINUUR aims to leverage pre-existing WGS data to recover microbial information pertaining to host associated microbiomes.

MINUUR utilises:

KRAKEN2: Classify taxa from unmapped read sequences
KrakenTools: extract classified reads for downstream analysis
BRACKEN: reestimate taxonomic abundance from KRAKEN2
MetaPhlan3: Classify taxa using marker genes
MEGAHIT: Metagenome assemblies using unmapped reads
QUAST: Assembly statistics from MEGAHIT assemblies
MetaBat2: Bin contiguous sequences from MEGAHIT
CheckM: Assess bin quality from MetaBat2

Installation of Snakemake

MINUUR is run using the workflow manager Snakemake

Snakemake is best installed using the package manager Mamba

Once Mamba is installed run

mamba create -c bioconda -c conda-forge --name snakemake snakemake

Installation of MINUUR

Use git clone https://github.com/aidanfoo96/MINUUR/ and cd MINUUR/workflow. This is the reference point from which the pipeline will be run. See the JupyterBooks page for a full tutorial on establishing the configuration to run this pipeline.

Update 09/05/2023:

Added Github actions
Dummy dataset now included in workflow/data, tutorial for running this is included in the JupyterBooks page. Use this to ensure the pipeline works on your machine.
Added the option to run BUSCO to help assess eukaryotic contamination in MAGs

Any feedback or bugs please open an issue or contact: [email protected]

minuur's People

Contributors

Stargazers

Watchers

Forkers

lcerdeira vikash84 ssyamoako gadji-m

minuur's Issues

database set up

rather than having all the databases downloaded using the shell script - get all the databases and zip them into a folder. Afterwards, get the user to install all the databases into the resources folder of the pipeline using a compressed file.

re-edit figure

There are typos in the workflow figure. Fix these

fix hard code threads

Some threads are specified in rules - specify these in the config file

deploying application - docker and conda envs

support for ARM Mac

Hi, I've been trying to run this pipeline in a ARM Macbook, configured a new x86_64 env for snakemake, cloned the repo and used the "--use-conda" option, however, pipeline fails because of several dependencies having support for linux only.

Bam2fastq is one of those, also Quast.

Testing application - docker and conda envs

No module named pandas

Hi @aidanfoo96 and team,
i was gooing through the MINUUR pipeline and i configured everything following the guidelines but wen i run the snakemake -np to check if everything was okay, it displays me an error message below:

ModuleNotFoundError in line 10 of /home/mahamat_g/MINUUR/workflow/Snakefile:
No module named 'pandas'
File "/home/mahamat_g/MINUUR/workflow/Snakefile", line 10, in

please someone could help with this, i'll be very greatfull.

Cheers

G_M

Organize wiki

map_read_to_contig

find out why map_read_to_contig and index_megahit_contigs do not play nice. Occasionally, Snakemake will run map_read_to_contig rule first, before index_megahit_contigs. This obviously results in an error since the indexed contigs aren't there. Find out how to specify the rule so that snakemake knows to do indexing first. The bug is apparent since the index files aren't specified in the input

Review the manuscript

to make suggestions to organize and host the DB

to make suggestions to organize the readme with the Wiki

Create docker container

snakemake: v6.8.1
Megahit: v1.2.9
QUAST: v5.0.2
Samtools: 1.14
BEDTools: v2.30.0
CheckM: v1.1.2
bwa: v0.7.17
metabat2: v2.12.1
FastQC: v0.11.9
cutadapt: v1.15
bowtie2: v9.3.0
Kraken2: v2.1.2
KrakenTools: v1.2
Metaphlan3: v3.0.13
Bracken: v2.5.0
HUMmaNn3: v3.0.0
R: v3.6.2
libraries
--> tidyverse
--> MetBrewer

add plots for humann output

copy over your code into the humann plot section of the pipeline. You have the outputs, put the plotting scripts in as well

put updated bracken plots in

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.