Code Monkey home page Code Monkey logo

deepce's Introduction

DeepCE - A novel and robust deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to COVID-19 drug repurposing


Code by Thai-Hoang Pham at Ohio State University.

1. Introduction

DeepCE is a Python implementation of the mechanism-driven neural network-based model which captures high-dimensional associations among biological features as well as non-linear relationships between biological features and outputs to predict gene expression profiles given a chemical compound.

DeepCE achieves state-of-the-art results of predicting gene expression profiles compared to other models not only in de novo chemical setting but also in the traditional imputation setting. More importantly, DeepCE is shown to be effective in the challenge and urgent problem, finding treatment for COVID-19. In summary, DeepCE could be a powerful tool for phenotype-based compound screening.

2. Pipeline

alt text

Figure 1: General framework of training DeepCE for L1000 gene expression profile prediction and using it for downstream application (i.e. drug repurposing). The objective for the learning process is minimizing the loss between predicted profiles and grouth-truth profiles in L1000 dataset. After training, DeepCE is used for generating profiles for new chemicals in external molecular database (e.g. DrugBank, ChEMBL). These profiles are then used for in silico screening to find potential drugs for disease treatment

3. DeepCE

alt text

Figure 2: Overall architecture of DeepCE

4. Installation

DeepCE depends on Numpy, SciPy, PyTorch (CUDA toolkit if use GPU), scikit-learn, and RDKit. You must have them installed before using DeepCE.

The simple way to install them is using conda:

	$ conda install numpy scipy scikit-learn rdkit pytorch

5. Usage

5.1. Data

The datasets used to train DeepCE are located at folder DeepCE/data/

5.2. Training DeepCE

The training script for DeepCE is located at folder script/

    $ cd script
    $ bash train_deepce.sh

Arguments in this scripts:

  • --drug_file: path for SMILES representation file
  • --gene_file: path for L1000 gene feature file
  • --train_file: path for L1000 gene expression training data
  • --dev_file: path for L1000 gene expression development data
  • --test_file: path for L1000 gene expression testing data
  • --dropout: dropout value used in DeepCE
  • --batch_size: batch size value for each training step
  • --max_epoch: maximum number of training iterations

5.3. COVID-19 drug repurposing by scanning all drugs in Drugbank

Besides DeepCE source code, we also publicize the chemical-induced gene expression profiles generated from DeepCE at 8 cell lines including A375, A549, HA1E, HELA, HT29, MCF7, PC3, and YAPC for all drugs (i.e. 11179 drugs) in Drugbank and COVID-19 patients' gene expression profiles. Drug repurposing for COVID-19 can be conducted by comparing chemical-induced gene expression profiles with COVID-19 patients' gene expression profiles. We hope that this dataset could make significant a contribution to drug discovery and development in particular, and computational chemistry and biology research in general.

The script for getting these chemical-induced gene expression profiles and COVID-19 patients' gene expressions is located at folder script/

    $ cd script
    $ bash get_gene_expression_data.sh

The downloaded dataset will be located at folder DeepCE/data/covid_data/

After downloading this dataset, users can generate potential drugs for COVID-19 by running the scripts which are located at folder script/ as follows:

    $ cd script
    $ bash covid_drug_repurposing.sh

Arguments in this scripts:

  • --data_dir: path for COVID-19 data folder
  • --patient_file: COVID-19 patient gene expression file name
  • --num_cell: minimum number of cell lines that drugs appear on top ranked list
  • --top: size of top drugs in ranked list for each cell line

6. Contact

Thai-Hoang Pham < [email protected] >

Department of Computer Science and Engineering, Ohio State University, USA

deepce's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.