IBDVar

A tool for prioritising identity-by-descent (IBD) variants in Whole Genome Sequencing (WGS) data from families with rare heritable diseases. IBDVar consists of a variant prioritisation pipeline command-line program and an intereactive Shiny dashboard for starting the pipeline and visualising output.

Overview

The use of IBDVar follows a three step process:

The prioritisation pipeline is composed of two sub-pipelines (short variants and structural variants (SV)) that are started independently. Users can upload a multi-sample VCF file and configure the short variants or structural variants prioritisation pipeline in the Shiny dashboard or run the pipelines on a multi-sample VCF file at the command line using a configuration file. Once the pipeline has completed the output can be explored interactively in the corresponding pipeline tab in the Shiny dashboard. Unique to the tool, is the integration of IBD segment detection in variant prioritisation for WGS data. An overview of the key steps is shown below.

System Requirements

For running the bash pipeline backend:

Linux OS (developed and tested on Ubuntu 22 LTS)
R (>=4.2)
BCFtools (1.15.1)
ClinVar VCF file (GRCh38)
IBIS (v1.20.9)
Variant Effect Predictor (VEP)
CADD (v1.6) plugin resources (SNVs and indels)
CCDS (release number 22) text file

For the deploying the shiny dashboard, the following R dependencies are required:

shiny
shinydashboard
shinyFiles
shinyJS
htmlwidgets
dplyr
jsonlite
purrr
readxl
DT
ideogram
reshape2

To install these R packages, type the following in an R console:

install.packages(c("shiny", "shinydashboard", "shinyFiles", "shinyJS", "htmlwidgets", "dplyr", "jsonlite", "purrr", "readxl", "DT", "reshape2"))

To install the ideogram library, find the path of ideogram tarball file (.tar.gz) and type:

install.packages("path/to/ideogram_0.0.0.9000.tar.gz", type="source", repos=NULL)

Variant Priorisation Pipelines

IBDVar can prioritise both short variants and structural variants (SV) from multi-sample VCF files generated from the Illumina DRAGEN Pipeline. Both prioritisation pipelines can be initiated from the command-line or inside the Shiny dashboard "Start pipeline" tab.

Short Variants (Command line start)

Input VCF file

A multi-sample VCF file contained short variants (indels/ SNPs) called from the Illumina DRAGEN pipeline is used as input (see the Illumina website for details). The VCF file format should adhere to version 4.2 specification. The pipeline expects chromosome naming to be prefixed with "chr" however, the tool checks for naming consistencies between the input VCF and the annotation resources implemented in the pipeline.

Configuration Parameters

To run the short variants pipeline at the command line, you will need to create a configuration file with parameters (with "=" separating the parameter and its value) described in the table below:

Category	Configuration parameter	Description
General settings	in_vcf	An input file path for the small variants VCF produced from Illumina DRAGEN Germline Pipeline.
	out_dir	An output directory path location to generate pipeline output
	threads	The number of threads (CPU) for executing the pipeline (default: 4)
QC filtering	GQ	Minimum genotype quality threshold for each sample (default: 20)
	DP	Minimum (FORMAT) read depth threshold per sample (default:10)
	MAF	Minimum allele frequency for variants to be selected for the PLINK dataset
IBD detection	mind	Maximum percentage of missing genotype data e.g., 0.1 excludes samples with > 10% missing genotype data (default: 0.1)
	geno	Select variants with missing calling rates lower than the provided value (default: 0.1)
	max_af	Maximum allele frequency threshold for rare variants from the gnomAD, ESP or 100 genomes project populations. (Default: 0.05)
	ibis_mt1	Minimum number of markers for IBIS to call a segment IBD1
	ibis_mt2	Minimum number of markers for IBIS to call a segment IBD2
	genes	A list of genes of interest for selecting variants in specified genes (optional)
Tools	tools_dir	Optional tools base directory path for tools required by the pipeline
	plink	PLINK2 directory path
	vep	Vep executable file path
	ibis	Ibis directory path
Resources	resources	Optional base directory path for resources
	clinvar	ClinVar VCF file path
	genetic_map	The file path for the genetic recombination map for the human genome
	cadd	CADD plugin resource directory path

Click here for an example of a short variants config file.

Using a screen to run the short variants pipeline

As the short variants pipeline can take a few hours to complete, it is highly recommended to run the pipeline in a Linux GNU screen to prevent abrupt termination of the pipeline, for example, in the event of a connection drop or a sudden SSH session termination. To install Linux GNU Screen on Ubuntu / Debian systems:

sudo apt update
sudo apt install screen

On CentOS/Fedora type:

sudo yum install screen

To create a screen type screen in the terminal, or create a named screen by typing the following:

screen -S <screen_name>

Attach the screen to the terminal as follows:

screen -r <screen_name>

Once the screen is attached execute the pipeline as described in the usage section below.

After attaching the screen to the terminal and initiating the short variants pipeline, detach the screen by pressing CtrA and d, or typing in the terminal:

screen -d <screen_name>

This will allow exiting of the terminal window without terminating the pipeline. To reattach the screen, simply type in the terminal:

screen -r <screen_name>

Usage

./short_variants.sh -c pipeline.config [-m in_vcf.md5sum ]

Options:

-c: config file (ending with .config) containing all parameters to execute the pipeline (required)
-m: md5sum file to perform and and md5sum check on the input VCF file specified in the config file
-h: help message with usage details and options

Structural Variants (Command line start)

Input VCF File

A multi-sample VCF file contained structural variants called (using Manta) from the Illumina DRAGEN pipeline is used as input (see the Illumina website for details). The VCF file format should adhere to version 4.2 specification. The pipeline expects chromosome naming to be prefixed with "chr" however, the tool checks for naming consistencies between the input VCF and the annotation resources implemented in the pipeline.

Configuration File

To start the structural variants pipeline at the command line, you will need to create a configuration file using the parameters specified in the table below:

Category	Configuration parameter	Description
General settings	sv_vcf	Input VCF file path
	out_dir	Directory path for pipeline output
	threads	Number of threads (CPU)
Variant selection	ibd_seg	IBD segment file path (from the short variants pipeline) for selecting SV in IBD segments.
	genes	A list of genes of interest to be used to filter variants.
Tools	tools_dir	(Optional) base directory for tools
Resources	resources	The base directory for resources (optional)
	ccds	CCDS directory path

Click here for an example of a structural variants config file.

Usage

./structural_variants.sh -c pipeline.config

Options:

-c: config file (ending with .config) containing all parameters to execute the pipeline (required)
-h: help message with usage details and options

Shiny Dashboard

The shiny dashboard allows users to start prioritisation pipelines for short or strucutral variants and to analyse the output interactively.

To start the Shiny Dashboard in the Cranfield Univeristy server, log into the Linux server deploying the tool and type the application URL (can be requested from the author) in a web-browser. (Note that development and testing was performed using the Google Chrome browser so performance may vary with other browsers.) The shiny dashboard can also be started in RStudio however it is not recommended, since most of views have been configured for browser display and may affect performance of the tool.

The "Start Pipeline" tab will be open first by default.

Start Pipeline

In the "Start Pipeline" tab you can start the short variants or structural variants pipeline by selecting an input VCF file, output folder for results and configuring parameters listed in the respective pipeline box.

Once parameters have been specified, click Start in the respective pipeline box to run the pipeline. A notification message should appear in the bottom right corner indicating pipeline initiation.

Short Variants

In the "Short Variants" tab you can explore the short variants pipeline output interactively.
The tab features:

a "Files" box to upload the following files which are located in the "final_output" folder of the output folder specified at run-time of the pipeline:
1. A prioritised and annotated list of variants produced from the short variants prioritisation pipeline.
2. An IBIS IBD segment file produced from the pipeline
- An optional file containing list of genes of interest can also be uploaded to filter the variants by these genes
Interactive variants table - users can filter, sort, search and download a TSV file of variants reported in the table.
Filters panel - contains a series of checkboxes to filter variants by CADD score, predicted consequence, SIFT and PolyPhen calls, clinical significance (ClinVar) and VEP predicted impact (loss of function etc.)
Interactive ideogram - filters variants in the interactive data table below by the IBD region clicked by the user. A tool-tip reporting the chromosome number, start and end position of a given IBD region is displayed when a user hovers over an IBD region.
"Summary" box summarising:
- total number of variants
- number of pathogenic variants identified by ClinVar
- number of detected IBD segments, the total number of deleterious missense variants predicted by SIFT, PolyPhen and CADD
- number of loss of function variants

Structural Variants

In the "Structural Variants" tab, the prioritised SV calls from the pipeline can be interactively explored using filters and an interactive data table.

SV tab features include:

"Files" box for uploading the prioritised list of SV calls (.tsv) file
Interactive table of variants that can filtered, sorted, searched and downloaded as a TSV file.
"Summary" tab providing summary statistics on the various counts of SV types and also the mean SV lengths.
Filters panel containing checkboxes to filter the variants table by: SV type, chromosome number, precision of breakpoints of called SVs and genes of interest.

Questions, Feature Requests, Bug Reports and Issues

For any questions, feature requests, bug reports or issues regarding the latest version of IBDVar, please click on the "issues" tab present at the top-left of the GitHub repository page.

Licence

MIT

Collaborators

This codebase was developed as part of an MSc thesis project (MSc Applied Bioinformatics, Cranfield University 2021-2022) under the supervision of Dr Alexey Larionov.

a-thind / ibdvar Goto Github PK

ibdvar's Introduction

IBDVar

Table of Contents

Overview

System Requirements

Variant Priorisation Pipelines

Short Variants (Command line start)

Input VCF file

Configuration Parameters

Using a screen to run the short variants pipeline

Usage

Options:

Structural Variants (Command line start)

Input VCF File

Configuration File

Usage

Options:

Shiny Dashboard

Start Pipeline

Short Variants

Structural Variants

Questions, Feature Requests, Bug Reports and Issues

Licence

Collaborators

ibdvar's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org