Code Monkey home page Code Monkey logo

variant_pipeline's Introduction

Lauring Variant Pipeline

Nominates candidate variants by comparing the sequences in a test sample to those found in a plasmid control. The Pipeline runs as one phase which takes in fastq files and outputs putative variants as well as all base call above a set frequency. It is then up to the user to filter the putative variants based on the characteristics provided.

Directory list

  • bin

    • variantPipeline.py : a python wrapper that runs a provided bpipe pipeline
    • variant_pipeline.pbs : an example pbs script used to implement the Pipeline
  • doc

    • workflow diagram, examples
  • lib

    • supporting libraries - picard and bpipe live here
  • packrat

    • The R dependencies are listed in the lock file. Also the packages will be downloaded here on set up
  • scripts _ supporting scripts (bpipe, python, R) _ some of these are old and not used others are helpful for downstream analysis but are not used by the current stages

    • most have been written to provide a useage message when called with the -h flag
  • test

    • automated tests (mostly python testing the python pipeline)
  • tutorial

    • The directories and instructions needed to run the tutorial. Instructions can be found in the tutorial readme

bin/variantPipeline.py

This script is a thin python wrapper that takes in a bpipe pipeline, input files, output directory and an options yaml. Whenever this is launched, the bpipe scripts are copied from the scripts directory and stored in the output directory as a log of what was run. the output directory will be made if it doesn't exist.

Usage: python variantPipeline.py -h

See the tutorial for more information.

*NOTE: Your fasta is used in the variant calling step and needs to end in .fa*

Outputs

There are 3 main pipelines that can be run. All of the stages for the pipelines are held in ./scripts/variantPipeline.bpipe.stages.groovy

Basic alinging scripts/aligning_pipeline.groovy

  • cutadapt
    • the trimmed fastq files - these are trimmed based on NEBnext primers which is hard coded in the stage
  • fastqc
    • fastqc data on samples
  • align
    • The aligned bam and sorted sam files
  • removed_duplicated
    • bam files with duplicate reads removed

DeepSNV pipeline scripts/deepsnv_pipeline.groovy

Runing this pipeline after the one above is the same as the old single pipeline.

  • deepSNV
    • csv summary files, coverage files and fasta files from deepSNV
  • parsed_fa
    • deepSNV outputs a concatenated fasta file. The parsed ones are here.
  • Variants
    • csv files containing all variants and additional qualty data about each one. (Mapq, phred, read position ect.)
  • Filter Variants
    • csv files containing variants that meet quality thresholds
  • Final Variants
    • csv files containing variants that meet quality thresholds including amino acid information

python pipeline to call all variants and sequencing errors scripts/python_pipeline.groovy

  • consensus
    • The consesus seqeunce of each sample
  • position-stats
    • JSON files with all bases called at every position including amino acid designation

Dependencies

Note : Flux is the name of the computing core used by our lab at the Univeristy of Michigan. Some of the directions may be specific to those working on this platform

The pipeline comes with many of the required programs (bpipe and pycard); however, bowtie2, samtools and certain R and python libraries are required by the variant calling.

To run these all pipelines you must have the java developer kit installed. It can be installed from here. If bpipe doesn't run this is the first place to start.

All the other depedencies, except R and the R packages, are handled by conda. Install conda by following the tutorial here.

We can install the conda environment with the following command (run from the variant_pipeline/ directory)

conda env create -f scripts/environment.yml

We have to activate the environment before running the commands below.

conda activate variant-pipeline

On flux we can achieve an equivalent environment by loading the following modules

module load muscle
module load bowtie2
module load python-anaconda2/201704
module load fastqc
module load R

The R modules are managed by packrat. I am using R 3.5.0. From the main directory run

R
packrat::restore()

to download the needed dependencies. They should be placed the packrat/lib directory. This is important since the R script will look for them there. You may need to install packrat first if you don't have it.

Adapted and developed by JT McCrone based on work done by Chris Gates/Peter Ulintz UM BCRCF Bioinformatics Core

variant_pipeline's People

Contributors

alauring avatar andrewvalesano avatar debbink avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.