Code Monkey home page Code Monkey logo

drug-disease-profile-matching's Introduction

Drug-disease profile matching vs disease stratification

Multi-omics disease sub-type specific drug repositioning aided with expression signatures from ConnectivityMap.

Overview

Structured abstract

Background

Attempts to guide the selection of drug candidates with machine learning has increased in recent years. One of the popular approaches is so-called "guilt-by-association" (GBA), using drug-drug and disease-disease similarity to guide the drug candidates selection. Comparison of genes expression profiles (perturbation profiles) after treatment with the candidate substances (perturbagenes) is a well-established approach to generate compound similarity maps for GBA methods. Large-scale projects, such as The Connectivity Map, offer ways of systematic perturbagene screening.

It was proposed that the perturbation profiles may be also used to find drug candidates by matching the profiles against differential expression profiles of diseases (being a simpler alternative to advanced machine-learning methods). This approach is referred to as pattern- or profile-matching and in the simplest setting corresponds to searching for anticorrelation of drug-disease profiles.

Previous studies demonstrated the merits of multi-omics disease stratification, evaluating the predictive ability of novel clusters for cancer patients survival or analyzing the functional enrichment in the clusters.

Introduction

In this work, perturbagen-disease profile matching is applied to diseases and disease sub-types selected by multiple multi-omics stratification methods, in order to prioritize new drug repositioning candidates.

Multiple perturbation profile-disease expression matching methods (scoring functions) are evaluated, and then applied to cancer cohorts having enough data.

Gene-set enrichment (GSE) based methods are hypothesized to provide overall benefit by incorporation of additional biological information and availability of stringent significance estimates.

Finally, a hypothesis that scoring functions may be used to recognize stratifications based on meaningful molecular clustering, using only drug indications-contraindications classification performance is proposed.

Materials and methods

Cancer data from The Cancer Genome Atlas are used, with an extensive case study on breast carcinoma (BRCA) cohort, limited validation with prostate (PRAD) and skin (STAD) adenocarcinomas, and a pan-cancer analysis.

Indications-contraindications classification performance is used for scoring function evaluation. The evaluation was performed on 16 scoring functions of which six were proposed in previous works. Six scoring functions are chosen and applied in further analyses.

Performance of the scoring functions is compared using four previously published stratifications of breast cancer (including three based on multi-omics data): PAM50, iCluster, PARADIGM and Pan-Gyn.

Results

  • The ability of profile-matching approaches to recover known drugs (as previously reported) is confirmed.
  • A few previously unreported breast-cancer drug candidates are highlighted.
  • The advantages and disadvantages of proposed indications-contraindications classification use.
  • Multiple cancer drugs are noted to be known carcinogenic substances
  • GSE-based methods require large numbers of samples, high-performance computing facilities and may not increase the chances of drug recovery in certain circumstances.

While the results obtained with meaningful stratifications do not always perform better than random permutations, limited benefit of stratification is observed for the drug recovery performance, with promising results from XSum and mROAST scoring functions.

Despite no definite evidence for the superiority of multi-omics stratifications use for classification of drug indications-contraindications, two multi-omics stratifications are highlighted as performing better than others: PARADIGM and Pan-Gyn.

Graphical results summary

Setup and requirements

Recommended packages for Ubuntu can be installed with:

bash ubuntu_setup.sh

Python in version 3.7 is recommended (minimum CPython 3.6). To install the required Python packages run:

pip3 install -r requirements.txt

R in version 3.5.1 is required; the dependencies can be easily installed with:

Rscript install.R

Finally, two major third-party applications (GSEA from Broad Institute, and custom fork of cudaGSEA) can be installed with:

cd thirdparty
bash download.sh

cudaGSEA needs to be compiled with:

./thirdparty/cudaGSEA/cudaGSEA/src/compile

Testing

Limited number of unit tests is included and can be run to verify corresponding application fragments and integrity of the installation with:

./run_tests.sh

Data

Each of the data sources has corresponding subdirectory (in data directory) containing download.sh script, which will download the required data. For example, to download TCGA data use:

bash data/tcga/download.sh

If you wish to reproduce only part of the findings, you may want to download only required sources due to large file sizes.

Acknowledgements

The cells, RNA, DNA and histone pictograms are derivative works based on graphics from Reactome Icon Library (licensed under Creative Commons Attribution 4.0 International License).

About

DOI

The code in this repository was written as a part of MRes research project at Imperial College London. The research was conducted under the supervision of Dr Paul-Michael Agapow.

References

TBD

drug-disease-profile-matching's People

Contributors

krassowski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

drug-disease-profile-matching's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.