Code Monkey home page Code Monkey logo

pdx_exomeseq's Introduction

Whole Exome Sequencing Pipeline for JAX FNA-PDX models of Pancreatic Cancer

Gregory Way1, Casey Greene1, Yolanda Sanchez2

  1. University of Pennsylvania
  2. Geisel School of Medicine at Dartmouth

Summary

Patient derived xenograft (PDX) models were derived from primary and metastatic tumors from patients admitted to Dartmouth-Hitchcock Medical Center (DHMC) with pancreatic adenocarcinoma (PAAD). The PDX models and tumor samples were whole exome sequenced (WES) to determine how the mutations from primary tissue and metastases propagate and evolve. The following repository outlines the wes and analysis pipelines.

This is a tumor-only analysis; there were no pooled or patient-matched normal samples available. The following flowchart summarizes the wes pipeline.

pdx wes flowchart

Figure 1A describes the technical replicates and data-types available across tumor and mouse passages. Figure 1B outlines our whole exome sequencing pipeline. We first apply quality control processing to raw reads, then align and remove mouse reads, and finally call and annotate variants.

WES Pipeline

See wes_pipeline.sh for our current variant-calling pipeline for tumor-only WES. This script was run step-by-step on the Dartmouth Discovery compute cluster.

WES Compute Environment

All work was performed using the Dartmouth Discovery Cluster Computer with the conda environment specified in environment.yml.

Steps to Reproduce

There are 3 major steps this repository provides to get from raw sequencing reads to annotated variants.

1. Setup reproducible computational environment (setup_environment.sh, install.sh)

# Setup conda (version 4.5 or greater) environment
bash setup_environment.sh

# NOTE: run `conda activate pdx-exomeseq` at the beginning of each session

# Install dependencies and initialize files
# This includes downloading reference genomes and generating several index files
bash install.sh

2. Run data processing pipeline (wes_pipeline.sh)

# NOTE: the commands in the following script must be run sequentially
# The script will submit several jobs per specified file that can take upwards of
# 12 hours per sample to run _for each command_. This requires the user to specify
# which command is being run by commenting out all others.
bash wes_pipeline.sh

Also note that the configuration file discovery_variables.yml includes absolute paths to each tool or resource. It is sufficient to update this file only if paths to current tools change.

3. Visualize and summarize results (analysis_pipeline.sh)

We use Jupyter notebooks and R scripts to visualize and summarize results. We describe the analysis in the next section.

Analysis Pipeline

After obtaining the called variants, we perform a series of analyses and visualizations. These analyses use a separate conda environment which is specified in analysis_environment.yml.

Computational Environment

Follow these steps to install and begin using this conda environment:

# Using conda version 4.5 or greater
conda env create --force --file analysis_environment.yml
conda activate pdx-exomeseq-analysis

Reproduce Results

In order to reproduce the results of the analysis pipeline perform the following steps. (Note that the variants are expected to be processed before running the pipeline)

bash analysis_pipeline.sh

Scripts

The following notebooks perform the analysis and obtain figures and results:

Script Output
1.read-depth-stats.ipynb Determine read depth against proportion of genome covered
2.disambiguate-reads.ipynb Visualizing the separation of mouse and human reads
3.filter-variants.ipynb Visualize variant filtration and process filtered VCFs
4.variant-allele-frequency.ipynb visualize gnomAD by SIFT scores for replicates and filtered merged files
5.upset-plots.ipynb Generate UpSet plots to visualize variant overlaps across patient sets
6.generate-oncoprint-data.ipynb Wrangle variant calls to generate data for input into oncoprint visualization
7.visualize-oncoprint.ipynb Visualize oncoprint diagrams and variant similarity matrices

pdx_exomeseq's People

Contributors

gwaybio avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.