Code Monkey home page Code Monkey logo

phage-dms-vacc-analysis's Introduction

Comprehensive characterization of the antibody responses to SARS-CoV-2 Spike protein finds additional vaccine- induced epitopes beyond those for mild infection

Published in eLife. Authors: Meghan E. Garrett*, Jared G. Galloway*, Caitlin Wolf, Jennifer K. Logue, Nicholas Franko, Helen Chu, Frederick A. Matsen IV^, Julie Overbaugh^

* these authors contributed equally to this work. ^ co-corresponding authors

What is this?

This repository is here to provide a static archive for all analyses done in our study. The files provide a complete set of materials to replicate the analysis (from fastq to figures) found in our manuscript with a single execution of a Nextflow pipeline. For exploring the data interactively, please see our DMS-View data repository.

Ultimately, running the pipeline will result in an xarray DataSet, (see phippery, for more on this dataset organization), as well as Figures as seen in the manuscript. The pipeline runs the analysis and plotting code for two sets of of phage-display library batch replicates. The figure sets for each respective batch are separated here:

  1. Figure set from "SPIKE1 Replicates"
  2. Figure set from "SPIKE2 Replicates" <- presented in the manuscript

If interested in obtaining raw data to perform the analysis yourself, please feel free to contact Jared Galloway: jgallowa (at) fredhutch (dot) org.

Exploring the data, interactively

Running the pipeline here is not the suggested approach for exploring our data. While running the pipeline is quite simple with some configuration (see Running the Pipeline), it involves processing over 600 sequence alignments and running downstream esoteric analysis/plotting code specific to our sample's metadata. Thus, tweaking parameters may be a headache.

Instead, if you're interested in simply exploring our the rich amount of data from this study, we strongly suggest checking out the pre-processed and publicly explorable DMS-View data repository. There, we have formatted and hosted the data for every sample in the study (398 replicates across two library batches of phage display) to be used with the amazing DMS-View tool put out by the Bloom Lab. For more on this, see the repository README

Material Overview

We provide a fully reproducible and automated workflow which ingests raw sequencing data and performs all analyses presented in the paper. The workflow extends our more generalized PhIP-Seq alignment pipeline, PhIP-flow The materials for analysis are primarily broken down into three categories:

  1. image-template/ The configuration scripts defining a container image, which is used to build the container with all version-specific phippery source code along with other non-local python package dependencies for analysis and plotting.

  2. analysis-scripts/ The python scripts for computing normalizations on the data, as well as plotting code to produce our final figures.

  3. nextflow-pipeline-config/ The Nextflow pipeline script as well as all necessary configuration scripts to run the workflow either (a) locally on a computer with docker installed, or (b) a SLURM managed cluster with singularity available.

Running the Pipeline

What do I need?

Working installation of Docker and Nextflow. Maybe some computing power if starting from raw fastq.

How do I run it?

  1. For running locally (not recommended) install Docker + Nextflow. Otherwise, we have a configuration script that would take very little editing to run the analysis on a SLURM managed cluster with access to Nextflow and Singularity modules

  2. Clone this repository and obtain the raw fastq sequences. The data will likely come in the form of a tarball archive which when extracted, will provide an NGS/ folder containing all the demultiplexed sample sequence data as described in our sample table. Place the NGS/ directory within the repository's nextflow-pipeline-config subdirectory.

  3. Generate a config script specific to your compute infrastructure. Consult the Nextflow documentation for instructions on fitting the parameters to your specific infrastructure. We provide an example of such a configuration for our Fred Hutch SLURM managed cluster in this file. Whatever your configuration, the file must include parameters specified in the PARAMS{...} block of our config script.

  4. Run the pipeline. An example of how we call the nextflow run command on our compute infrastructure the pipeline can be seen in the nextflow-pipeline-config/run_analysis.sh

(base) quokka phage-dms-vacc-analysis/nextflow-pipeline-config ‹master*› » ./run_analysis.sh 
N E X T F L O W  ~  version 20.07.1
Launching `PhIP-analysis.nf` [golden_ekeblad] - revision: 02870c3fbe
[01/807cac] process > generate_fasta_reference (1) [100%] 1 of 1, cached: 1 ✔
[f3/915808] process > generate_index (1)           [100%] 1 of 1, cached: 1 ✔
[95/1f1df0] process > short_read_alignment (633)   [100%] 633 of 633, cached: 633 ✔
[25/3262fa] process > sam_to_stats (633)           [100%] 633 of 633, cached: 633 ✔
[40/d53f0f] process > sam_to_counts (633)          [100%] 633 of 633, cached: 633 ✔
[37/3ab3b4] process > collect_phip_data (1)        [100%] 1 of 1, cached: 1 ✔
[22/efe9a0] process > compute_enrichment_stats (1) [100%] 1 of 1, cached: 1 ✔
[2d/f9fe6f] process > analysis_plotting (1)        [100%] 1 of 1, cached: 1 ✔

71.45user 5.57system 0:20.03elapsed 384%CPU (0avgtext+0avgdata 1743320maxresident)k
8200inputs+39768outputs (4major+906688minor)pagefaults 0swaps

The pipeline will put all batch-specific figures and the respective xarray datasets in the phip_data_dir as defined by the phip-flow configuration scripts.

Static containers

vacc-ms-analysis:vacc-ms-analysis vacc-ms-analysis An extension of the phippery container with all back end function source code and dependencies listed in the image-template/requirements.txt

phippery:vacc-ms-analysis Docker Repository on Quay phippery container

quay.io/jgallowa/bowtie2:vacc-ms-analysis - a static container containing the bowtie2 alignment tool. The original image was hosted by Biocontainers

quay.io/matsengrp/samtools-1.3:vacc-ms-analysis a static container containing the samtools software. The original image was hosted by Biocontainers

phage-dms-vacc-analysis's People

Contributors

jgallowa07 avatar matsen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.