Comprehensive characterization of the antibody responses to SARS-CoV-2 Spike protein finds additional vaccine- induced epitopes beyond those for mild infection
Published in eLife. Authors: Meghan E. Garrett*, Jared G. Galloway*, Caitlin Wolf, Jennifer K. Logue, Nicholas Franko, Helen Chu, Frederick A. Matsen IV^, Julie Overbaugh^
* these authors contributed equally to this work. ^ co-corresponding authors
What is this?
This repository is here to provide a static archive for all analyses done in our study. The files provide a complete set of materials to replicate the analysis (from fastq to figures) found in our manuscript with a single execution of a Nextflow pipeline. For exploring the data interactively, please see our DMS-View data repository.
Ultimately, running the pipeline will result in an xarray DataSet, (see phippery, for more on this dataset organization), as well as Figures as seen in the manuscript. The pipeline runs the analysis and plotting code for two sets of of phage-display library batch replicates. The figure sets for each respective batch are separated here:
- Figure set from "SPIKE1 Replicates"
- Figure set from "SPIKE2 Replicates" <- presented in the manuscript
If interested in obtaining raw data to perform the analysis yourself, please feel free to contact Jared Galloway: jgallowa (at) fredhutch (dot) org.
Exploring the data, interactively
Running the pipeline here is not the suggested approach for exploring our data. While running the pipeline is quite simple with some configuration (see Running the Pipeline), it involves processing over 600 sequence alignments and running downstream esoteric analysis/plotting code specific to our sample's metadata. Thus, tweaking parameters may be a headache.
Instead, if you're interested in simply exploring our the rich amount of data from this study, we strongly suggest checking out the pre-processed and publicly explorable DMS-View data repository. There, we have formatted and hosted the data for every sample in the study (398 replicates across two library batches of phage display) to be used with the amazing DMS-View tool put out by the Bloom Lab. For more on this, see the repository README
Material Overview
We provide a fully reproducible and automated workflow which ingests raw sequencing data and performs all analyses presented in the paper. The workflow extends our more generalized PhIP-Seq alignment pipeline, PhIP-flow The materials for analysis are primarily broken down into three categories:
-
image-template/
The configuration scripts defining a container image, which is used to build the container with all version-specific phippery source code along with other non-local python package dependencies for analysis and plotting. -
analysis-scripts/
The python scripts for computing normalizations on the data, as well as plotting code to produce our final figures. -
nextflow-pipeline-config/
The Nextflow pipeline script as well as all necessary configuration scripts to run the workflow either (a) locally on a computer with docker installed, or (b) a SLURM managed cluster with singularity available.
Running the Pipeline
What do I need?
Working installation of Docker and Nextflow. Maybe some computing power if starting from raw fastq.
How do I run it?
-
For running locally (not recommended) install Docker + Nextflow. Otherwise, we have a configuration script that would take very little editing to run the analysis on a SLURM managed cluster with access to Nextflow and Singularity modules
-
Clone this repository and obtain the raw fastq sequences. The data will likely come in the form of a tarball archive which when extracted, will provide an
NGS/
folder containing all the demultiplexed sample sequence data as described in our sample table. Place theNGS/
directory within the repository's nextflow-pipeline-config subdirectory. -
Generate a config script specific to your compute infrastructure. Consult the Nextflow documentation for instructions on fitting the parameters to your specific infrastructure. We provide an example of such a configuration for our Fred Hutch SLURM managed cluster in this file. Whatever your configuration, the file must include parameters specified in the
PARAMS{...}
block of our config script. -
Run the pipeline. An example of how we call the
nextflow run
command on our compute infrastructure the pipeline can be seen in thenextflow-pipeline-config/run_analysis.sh
(base) quokka phage-dms-vacc-analysis/nextflow-pipeline-config ‹master*› » ./run_analysis.sh
N E X T F L O W ~ version 20.07.1
Launching `PhIP-analysis.nf` [golden_ekeblad] - revision: 02870c3fbe
[01/807cac] process > generate_fasta_reference (1) [100%] 1 of 1, cached: 1 ✔
[f3/915808] process > generate_index (1) [100%] 1 of 1, cached: 1 ✔
[95/1f1df0] process > short_read_alignment (633) [100%] 633 of 633, cached: 633 ✔
[25/3262fa] process > sam_to_stats (633) [100%] 633 of 633, cached: 633 ✔
[40/d53f0f] process > sam_to_counts (633) [100%] 633 of 633, cached: 633 ✔
[37/3ab3b4] process > collect_phip_data (1) [100%] 1 of 1, cached: 1 ✔
[22/efe9a0] process > compute_enrichment_stats (1) [100%] 1 of 1, cached: 1 ✔
[2d/f9fe6f] process > analysis_plotting (1) [100%] 1 of 1, cached: 1 ✔
71.45user 5.57system 0:20.03elapsed 384%CPU (0avgtext+0avgdata 1743320maxresident)k
8200inputs+39768outputs (4major+906688minor)pagefaults 0swaps
The pipeline will put all batch-specific figures and the respective xarray datasets in the phip_data_dir
as defined by the phip-flow configuration scripts.
Static containers
vacc-ms-analysis:vacc-ms-analysis An extension of the phippery container with all back end function source code and dependencies listed in the
image-template/requirements.txt
phippery:vacc-ms-analysis phippery container
quay.io/jgallowa/bowtie2:vacc-ms-analysis - a static container containing the bowtie2 alignment tool. The original image was hosted by Biocontainers
quay.io/matsengrp/samtools-1.3:vacc-ms-analysis a static container containing the samtools software. The original image was hosted by Biocontainers