The nanoflow from phiweger

Installation

Clone Jesse's conda-gcc5 repository and create an new environment nanoflow with GCC5 installed

git clone https://github.com/ressy/conda-gcc5.git
bash setup.py nanoflow

Clone this repository into a local directory and activate nanoflow environment

git clone https://github.com/zhaoc1/nanoflow.git nanoflow
cd nanoflow
source activate nanoflow
conda install --file conda-requirements.txt

Clone Ryan Wick's Basecalling-comparison repository

mkdir local
cd local
git clone https://github.com/rrwick/Basecalling-comparison.git

Download other packages into local directory

## Nanopolish v0.9.0
git clone --recursive https://github.com/jts/nanopolish.git
cd nanopolish
make

## Unicycler
git clone https://github.com/rrwick/Unicycler.git
cd Unicycler
python3 setup.py install

## set up for Quast
git clone https://github.com/lucian-ilie/E-MEM.git
cd E-MEM
make

Usage

Basecalling:

snakemake --configfile _all_basecalling

Preprocess: quality filter, confidently-binned, and subsampled subsample long reads

snakemake --configfile config.yml _all_preprocess

Hybrid assembly option 1: Canu + Nanopolish + Circlator + Pilon

snakemake --configfile config.yaml --cores 8 _all_draft1
## command to submit jobs to Respublica
snakemake -j 3 --configfile config.yml --cluster-config cluster.json -w 90 --notemp -p -c "qsub -cwd -r n -V -l h_vmem={cluster.h_vmem} -l m_mem_free={cluster.m_mem_free} -pe smp {threads}" _all_draft1

Hybrid assembly option 2: Unicycler
- depth=X in the FASTA header: to preserve the relative depths. This is mainly used for plasmid sequences, which should be more represented in the reads than the chromosomal sequence.

snakemake --configfile config.yaml _all_draft2

Assembly assess and comparison

Metrics description
- Misjoins: locations where two adjacent sequences in the assembly should be split apart and placed at distinct locations in order to match the reference.
- Relocation: a misjoin where a segments needs to be moved elsewhere on the chromosome.
- Misassemblies: QUAST categories misassemblies as either local (less than 1kbp discrepancy) or extensive (more than 1 kbp discrepancy)
A good reference guide for interpretting the dot plot is available here.
Some good tutorials:
- Align two draft sequences using MUMmer.
- Evaluate the assembly using MUMmer.
- Assembly evaluation with QUAST
- Multiple assemblies comparison using QUAST
- Highly similar sequences with rearrangments using run-mummer3 [TODO].
- Assembly to assembly comparisons using Minimap2 [TODO].
Wish you knew sooner 😔
- Minimap2 and the future of BWA, by Heng Li's blog.
- Long reads assembly: indels cause interrupted genes, by Mick Watson's blog.

snakemake --configfile config.yaml _all_comp --use-conda

IGV: short/long reads mapped to draft assembly

Refer to the subworkflow of sunbeam: sbx_igv

snakemake --configfile config.yaml _all_map_igv

Generate bioinformatics report refer to bioinfo_report.Rmd. An example output is shown in bioinfo_report.pdf.

phiweger / nanoflow Goto Github PK

nanoflow's Introduction

Installation

Usage

nanoflow's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent