Code Monkey home page Code Monkey logo

minuur's Introduction

MINUUR - Microbial INsights Using Unmapped Reads

Main Code Base License Last Commit Open Issues Repo Size MINUUR

Doi for manuscript: https://doi.org/10.12688/wellcomeopenres.19155.1

Please follow the tutorial in my Jupyter Book Available Here: https://aidanfoo96.github.io/MINUUR/ for reproduction of my analysis or to apply in your host of interest :)

MINUUR is a snakemake pipeline I developed to extract non-host sequencing reads from mosquito whole genome sequencing data and utilise a range of metagenomic analyses to characterise potential host-associated microbes. Its application can be applied to other host-associated WGS data. MINUUR aims to leverage pre-existing WGS data to recover microbial information pertaining to host associated microbiomes.

MINUUR utilises:

  • KRAKEN2: Classify taxa from unmapped read sequences
  • KrakenTools: extract classified reads for downstream analysis
  • BRACKEN: reestimate taxonomic abundance from KRAKEN2
  • MetaPhlan3: Classify taxa using marker genes
  • MEGAHIT: Metagenome assemblies using unmapped reads
  • QUAST: Assembly statistics from MEGAHIT assemblies
  • MetaBat2: Bin contiguous sequences from MEGAHIT
  • CheckM: Assess bin quality from MetaBat2

Installation of Snakemake

MINUUR is run using the workflow manager Snakemake

Snakemake is best installed using the package manager Mamba

Once Mamba is installed run

mamba create -c bioconda -c conda-forge --name snakemake snakemake

Installation of MINUUR

Use git clone https://github.com/aidanfoo96/MINUUR/ and cd MINUUR/workflow. This is the reference point from which the pipeline will be run. See the JupyterBooks page for a full tutorial on establishing the configuration to run this pipeline.

Update 09/05/2023:

  • Added Github actions
  • Dummy dataset now included in workflow/data, tutorial for running this is included in the JupyterBooks page. Use this to ensure the pipeline works on your machine.
  • Added the option to run BUSCO to help assess eukaryotic contamination in MAGs

Any feedback or bugs please open an issue or contact: [email protected]

minuur's People

Contributors

aidanfoo96 avatar lcerdeira avatar

Stargazers

Yair Motro avatar  avatar  avatar  avatar Felipe Marques de Almeida avatar

Watchers

 avatar Suresh Kumar M avatar Kostas Georgiou avatar  avatar

minuur's Issues

database set up

rather than having all the databases downloaded using the shell script - get all the databases and zip them into a folder. Afterwards, get the user to install all the databases into the resources folder of the pipeline using a compressed file.

support for ARM Mac

Hi, I've been trying to run this pipeline in a ARM Macbook, configured a new x86_64 env for snakemake, cloned the repo and used the "--use-conda" option, however, pipeline fails because of several dependencies having support for linux only.

Bam2fastq is one of those, also Quast.

No module named pandas

Hi @aidanfoo96 and team,
i was gooing through the MINUUR pipeline and i configured everything following the guidelines but wen i run the snakemake -np to check if everything was okay, it displays me an error message below:

ModuleNotFoundError in line 10 of /home/mahamat_g/MINUUR/workflow/Snakefile:
No module named 'pandas'
File "/home/mahamat_g/MINUUR/workflow/Snakefile", line 10, in

please someone could help with this, i'll be very greatfull.

Cheers

G_M

map_read_to_contig

find out why map_read_to_contig and index_megahit_contigs do not play nice. Occasionally, Snakemake will run map_read_to_contig rule first, before index_megahit_contigs. This obviously results in an error since the indexed contigs aren't there. Find out how to specify the rule so that snakemake knows to do indexing first. The bug is apparent since the index files aren't specified in the input

Create docker container

snakemake: v6.8.1
Megahit: v1.2.9
QUAST: v5.0.2
Samtools: 1.14
BEDTools: v2.30.0
CheckM: v1.1.2
bwa: v0.7.17
metabat2: v2.12.1
FastQC: v0.11.9
cutadapt: v1.15
bowtie2: v9.3.0
Kraken2: v2.1.2
KrakenTools: v1.2
Metaphlan3: v3.0.13
Bracken: v2.5.0
HUMmaNn3: v3.0.0
R: v3.6.2
libraries
--> tidyverse
--> MetBrewer

add plots for humann output

copy over your code into the humann plot section of the pipeline. You have the outputs, put the plotting scripts in as well

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.