Nextstrain build pipeline for the WestNile 4K Project

This is the repository used to build nextstrain.org/WNV/NA

This repository contains the steps to use augur to build the WNV/NA dataset.

Installation / Set-Up

Install conda
Install augur (and its dependencies) into a conda environment

git clone [email protected]:nextstrain/augur.git # the nextstrain bioinformatics toolkit
cd augur
conda env create -f environment.yml
export NCBI_EMAIL=<YOUR_EMAIL_HERE>

This creates the conda environment augur which we must be in for all remaining steps

Enable the conda environment

source activate augur

Install auspice

conda install -c conda-forge nodejs
npm install --global auspice

Clone this repository

git clone [email protected]:grubaughlab/WNV-nextstrain.git
cd WNV-nextstrain

Check augur & auspice are installed:

augur -h
auspice -h

File Structure

Snakefile - contains the augur / WNV-custom steps to run the build. Each snakemake command can be run as a bash command on it's own, but we use snakemake to simplify things.
./data/* - the input files (private, and not committed to github). You are responsible for creating the two required files here: ./data/full_dataset.fasta and ./data/headers.csv (these are referenced in the Snakefile).
./scripts/* custom WNV scripts. Called by commands in the `Snakefile.
./results/ augur will produce a number of (intermediate) files including the alignment, newick trees etc. Not committed to github.
./auspice/ will contain the JSONs necessary for visualisation by auspice.

Run the build

The Snakefile details each step in the buil (See that file for the specifics). As such, it should be as simple as running

snakemake clean # remove any files from a previous build
snakemake # run the build pipeline. Takes about 40min

and the entire build will run through.

It's worth explaining some of the commands here, many of which are quick and can be re-run on their own to change the output. (For instance, changing colours doesn't require you to re-run the tree building steps.)

The commands listed will re-run just those steps -- so it's best to have run through the entire Snakefile before tweaking steps. Note that you'll also have to run snakemake --printshellcmds --force export to regenerate the auspice JSONs for viewing.

Parsing the metadata CSV & adding authors:

snakemake clean #
snakemake --printshellcmds --force parse
snakemake --printshellcmds --force add_authors
snakemake export # will run all the remaining steps

Parses the input CSV + FASTA -- this involves parsing the dates, interpreting the header of the CSV etc etc. The authors are added by a mixture of pattern matching strain names, as well as querying entrez for author information. The latter step is slow, and so a cache is created at ./results/author_cache.tsv so that repeating this step can run faster. See ./scripts/add_authors.py for more information.

Generating colours:

snakemake --printshellcmds --force create_colors
snakemake --printshellcmds --force export

This uses the ./scripts/make_colors.py script to dynamically generate a colour palette. Please edit this file to make changes to the colour scheme.

Generating lat-longs:

snakemake --printshellcmds --force create_lat_longs
snakemake --printshellcmds --force export

This uses the ./scripts/create_lat_longs.py script to dynamically generate the lat-longs based on the contents of the metadata file. Currently all the states are hardcoded here (only those present in the metadata are actually exported tho), and the divisions are created dynamically by averaging the GPS values provided for each sample. The latter approach may wish to be improved.

Visualise the results:

From within the current directory, simply run auspice view --datasetDir ./auspice and then load http://localhost:4000/ in a browser to see the results 🎉

Deploy the JSONs to nextstrain.org

Currently this has to be done from the bedford lab

mattcreates25 / wnv-nextstrain Goto Github PK

wnv-nextstrain's Introduction

Nextstrain build pipeline for the WestNile 4K Project

Installation / Set-Up

File Structure

Run the build

Parsing the metadata CSV & adding authors:

Generating colours:

Generating lat-longs:

Visualise the results:

Deploy the JSONs to nextstrain.org

wnv-nextstrain's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent