Code Monkey home page Code Monkey logo

dtinet's Introduction

DTINet: A Network Integration Approach for Drug-Target Interaction Prediction

DTINet is a computational pipeline to predict novel drug-target interactions (DTIs) from heterogeneous network. DTINet focuses on learning a low-dimensional vector representation of features for each node in the heterogeneous network, and then predicts the likelihood of a new DTI based on these representations via a vector space projection scheme. See our paper on Nature Communications and preprint on bioRxiv:100305.

Quick start

We provide an example script to run experiments on our dataset:

  • Run run_DTINet.m: predict drug-target interactions, and evaluate the results with cross-validation.

Note: See the "Tutorial" section below for a detailed instruction on how to specify parameters of DTINet, or how to run DTINet on your own dataset.

Code and data

src/ directory

  • DTINet.m: predict drug-target interactions (DTIs)
  • DCA.m: compact feature learning by integrating heterogeneous network
  • diffusionRWR.m: network diffusion algorithm (random walk with restart)
  • compute_similarity.m: compute Jaccard similarity based on interaction/association network
  • auc.m: evaluation script
  • run_DCA.m: example code of running DCA.m for feature learning
  • run_DTINet.m: example code of running DTINet.m for drug-target prediction
  • train_mf.mexa64: pre-built binary file of inductive matrix completion algorithm (downloaded from here)
  • download_imc.sh: download the inductive matrix completion source and build the executable library from source.

data/ directory

  • drug.txt: list of drug names
  • protein.txt: list of protein names
  • disease.txt: list of disease names
  • se.txt: list of side effect names
  • drug_dict_map: a complete ID mapping between drug names and DrugBank ID
  • protein_dict_map: a complete ID mapping between protein names and UniProt ID
  • mat_drug_se.txt : Drug-SideEffect association matrix
  • mat_protein_protein.txt : Protein-Protein interaction matrix
  • mat_protein_drug.txt : Protein-Drug interaction matrix
  • mat_drug_protein.txt : Drug_Protein interaction matrix (transpose of the above matrix)
  • mat_drug_protein_remove_homo.txt: Drug_Protein interaction matrix, in which homologous proteins with identity score >40% were excluded (see the paper).
  • mat_drug_drug.txt : Drug-Drug interaction matrix
  • mat_protein_disease.txt : Protein-Disease association matrix
  • mat_drug_disease.txt : Drug-Disease association matrix
  • Similarity_Matrix_Drugs.txt : Drug similarity scores based on chemical structures of drugs
  • Similarity_Matrix_Proteins.txt : Protein similarity scores based on primary sequences of proteins Note: drugs, proteins, diseases and side-effects are organized in the same order across all files, including name lists, ID mappings and interaction/association matrices.

feature/ directory

We provided the pre-trained vector representations for drugs and proteins, which were used to produce the results in our paper.

  • drug_vector_d100.txt
  • protein_vector_d400.txt

Third-party software

Our implementation requires the Inductive Matrix Completion (IMC) library. We provide an executable binary file in the src/ folder for convenience. The executable binary file was built on a typical Ubuntu 14.04 (64 bit) system. If you are using other Linux platforms, please consider building the library from its source by running bash install_imc.sh.

Tips: We recommend users to install the IMC library using the install_imc.sh script. If you download the library yourself from the website of IMC, please be aware that DTINet requires the C/C++ version (with Python and Matlab interfaces). Please do not use the other version, i.e., a pure MATLAB implementation. The pure MATLAB version treats the unknown/missing entries in the interaction matrix as zeros, which is not the same as required in DTINet.

Tutorial

  1. Put interaction/association matrices in the data/ folder.
  2. Create a network/ folder under DTINet/ and run compute_similarity.m, which will compute the Jaccard similarity of drugs and proteins, based on interaction/association matrices.
  3. Specify parameters (number of dimensions of feature vectors, restart probability, the maximum number of iterations) and run run_DCA.m, which will learn the feature vectors of drugs and proteins and save them in the feature/ folder.
  4. Set the path of feature vectors and corresponding parameters in run_DTINet.m and execute it. This script will predict the drug-target interactions and evaluate the results using a ten-fold cross-validation.

Supplementary Information

supplementary/ directory

  • Supplementary_Data_1.xlsx: The list of top 150 novel drug-target interactions predicted by DTINet, which was trained based all on drugs and targets that have at least one known interacting pair. Known drug-target pairs (corresponding to those non-zero entries in the drug-target interaction matrix) and novel predicted DTIs that share homologous proteins (with sequence identity scores >40%) with known DTIs were excluded from the list.
  • Supplementary_Data_2.xlsx: The entire list of novel drug-target interactions predicted by DTINet, which was trained based on all drugs and targets that have at least one known interacting pair.
  • Supplementary_Data_3.xlsx: Examples of the novel predictions which can be supported by the previous known evidence in the literature.

Citation

Luo, Y., Zhao, X., Zhou, J., Yang, J., Zhang, Y., Kuang, W., Peng, J., Chen, L. & Zeng, J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nature Communications 8, (2017).

@article{Luo2017,
  author = {Yunan Luo and Xinbin Zhao and Jingtian Zhou and Jinglin Yang and Yanqing Zhang and Wenhua Kuang and Jian Peng and Ligong Chen and Jianyang Zeng},
  title = {A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information},
  doi = {10.1038/s41467-017-00680-8},
  url = {https://doi.org/10.1038/s41467-017-00680-8},
  year  = {2017},
  month = {sep},
  publisher = {Springer Nature},
  volume = {8},
  number = {1},
  journal = {Nature Communications}
}

Contacts

If you have any questions or comments, please feel free to email Yunan Luo (luoyunan[at]gmail[dot]com) and/or Jianyang Zeng (zengjy321[at]tsinghua[dot]edu[dot]cn).

dtinet's People

Contributors

luoyunan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.