Code Monkey home page Code Monkey logo

cagi6_sickkids's Introduction

Code repository for the CAGI6 SickKids challenge

A challenge submission by the DROPpers.

The team members are: Julien Gagneur, Christian Mertes, Ines Scheller, Nicholas H. Smith, Vicente A. Yépez.

This is the code repository used to generate the results for our two submissions for the CAGI6 SickKids challenge. We tackled the challenge by predicting the molecular events underlying disease from a patient's genome and transcriptome using variant annotation, aberrant gene expression events, and human phenotype ontology.

The code consists of 4 parts that are described below:

  1. Aberrant event detection in RNA-seq data using DROP.
  2. Annotating and filtering variants
  3. Computing phenotypic similarity scores
  4. Prioritizing events using XGBoost

A detailed description of our full analysis can be found here.

Aberrant event detection in RNA-seq

We used DROP with the default configuration to call aberrant events. To run the full pipeline, we suggest in a nutshell (i) to install DROP through bioconda, (ii) put all relevant data into Data/project_data/raw/, and (iii) create a sample annotation in Data/project_data/sample_annotation.tsv. You can then run the full DROP pipeline with

snakemake -j 20

The main pipeline configuration can be found here.

Variant annotation and filtering

As described in the method, we used VEP to annotate the variants. In short, we annotated all default information from VEP, allele frequencies through gnomAD, added CADD, SpliceAI, and EVE scores, as well as ClinVar and UTRannotator information. The respective configuration and scripts can be found here and here. After adapting the config to your local infrastructure and a successful run of the DROP pipeline, you should be able to run it with snakemake as following:

snakemake -j 20 --snakefile Snakefile_vep_anno.smk

Phenotypic similarity scores

We computed the phenotypic similarity scores as described by Kopajtich et al. A more detailed version can be found also in our Methods section. The scripts to run it can be found here.

Prioritizing events using XGBoost

For the final submission of the SickKids challenge, we used XGBoost to predict the disease-causing gene given the HPO terms, genetic information, as well as RNA-seq-based aberrant events of an individual. The code for our model can be found here and here. The model can be trained as soon as the RNA-seq outliers are called, the variants are annotated, filtered, and preprocessed, and the phenotypic similarity scores are calculated.

Disclaimer

This code was put together for the CAGI6 SickKids challenge and is not production-ready. This repository is meant to be complementary to our method description and to help others to get started. If there is any question about the model/code please create a new issue.

cagi6_sickkids's People

Contributors

c-mertes avatar vyepez88 avatar

Watchers

 Florian R. Hölzlwimmer avatar Julien Gagneur avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.