Code Monkey home page Code Monkey logo

glimpse-low-coverage-wgs-imputation's Introduction

Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients

Shell Language Badge Code style: Prettier Python Language Badge Code style: Black License: MIT

This repository contains the scripts used in Santos et al. Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients. Supporting data, including validation files and patient clinical histories, are archived in our Figshare collection. Genetic data for the patient cohort can be found in our European Genome-phenome Archive study EGAS00001007573. This repository will remain open for support with reproducibility and issues.

Abstract

Background

Despite advances in identification of genetic markers associated to severe COVID symptoms, the full genetic characterisation of the disease remains elusive. Imputation of low-coverage whole genome sequencing has emerged as a competitive method to study such disease-related genetic markers as they enable genotyping of most common genetic variants used for genome wide association studies. This study aims at exploring the potential use of imputation in low-coverage whole genome sequencing for a highly selected severe COVID-19 patient cohort.

Findings

We generated an imputed dataset of 79 variant call format (VCF) patient files using the GLIMPSE1 tool, each containing, on average, 9.5 million single nucleotide variants. The validation assessment of imputation accuracy yielded a squared Pearson correlation of approximately 0.97 across sequencing platforms, showing that GLIMPSE1 can be used to confidently impute variants with minor allele frequency up to approximately 2% in Spanish ancestry individuals. We conducted a comprehensive analysis on the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms specifically developed to characterise severe COVID-19 symptoms for this cohort.

Conclusion

This dataset highlights the utility and accuracy of low-coverage whole genome sequencing imputation in the study of COVID-19 severity, setting a precedent for other applications in resource-constrained environments linked to comprehensive analyses of genetic components for various complex diseases. The methods and findings presented here may be leveraged in future genomic projects, providing vital insights for health challenges like COVID-19.

Software implementation

All the source code used to generate the results and figures in the paper are in the scripts folder. See the README.md files in each directory for a full description of each figure.

Setup

Getting the code

You can download a copy of all the files in this repository by cloning this git repository.

git clone https://github.com/renatosantos98/GLIMPSE-low-coverage-WGS-imputation.git

A copy of the repository is also archived at doi.org/10.25452/figshare.plus.21679799.

Dependencies

You'll need a working Python environment to run the code. We recommend you set up your environment through Anaconda, which provides the conda package manager.

Run the following command in the main repository folder (where environment.yml is located) to create a conda environment and install all required dependencies in it.

conda env create -f environment.yml
conda activate glimpse

Input data requirements

The data required as input for this pipeline consists of .cram case and validation files, stored inside a directory named bam.

The 1000 Genomes reference panel will be retrieved and set up by the 1_setup.sh script.

Running the code

All scripts were designed to be run from the main repository folder. To reproduce the data generated in the paper, run the scripts in the following order and syntax:

bash scripts/1_setup.sh
bash scripts/2_gl_calling.sh
bash scripts/3_glimpse_impute_parallel.sh
bash scripts/4_vcf_filtering.sh
bash scripts/5_glimpse_concordance.sh
bash scripts/6_pca.sh

See the README.md files in the scripts directory for a full description of each script and required files.

License

All source code is made available under an MIT license. You can freely use and modify the code, without warranty. See LICENSE.md for the full license text. The authors reserve the rights to the article content, which is currently submitted for publication.

glimpse-low-coverage-wgs-imputation's People

Contributors

renatosantos98 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.