Code Monkey home page Code Monkey logo

high-ppi's Introduction

HIGH-PPI

Hierarchical Graph Learning for Protein-Protein Interaction

Dependencies

HIGH-PPI runs on Python 3.7-3.9. To install all dependencies, directly run:

cd HIGH-PPI-main
conda env create -f environment.yml
conda activate HIGH-PPI

Download the following whl files to ./file/: torch-scatter, torch-sparse, torch-cluster, torch-spline-conv.

cd ./file
pip install torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl
pip install torch_sparse-0.6.13-cp39-cp39-linux_x86_64.whl
pip install torch_cluster-1.6.0-cp39-cp39-linux_x86_64.whl
pip install torch_spline_conv-1.2.1-cp39-cp39-linux_x86_64.whl
pip install torch-geometric

Datasets

Three datasets (SHS27k, SHS148k and STRING) can be downloaded from the Google Drive:

  • protein.actions.SHS27k.STRING.pro2.txt PPI network of SHS27k
  • protein.SHS27k.sequences.dictionary.pro3.tsv Protein sequences of SHS27k
  • protein.actions.SHS148k.STRING.txt PPI network of SHS148k
  • protein.SHS148k.sequences.dictionary.tsv Protein sequences of SHS148k
  • 9606.protein.action.v11.0.txt PPI network of STRING
  • protein.STRING_all_connected.sequences.dictionary.tsv Protein sequences of STRING
  • edge_list_12 Adjacency matrix for all proteins in SHS27k
  • x_list Feature matrix for all proteins in SHS27k

PPI Prediction

Example: predicting unknown PPIs in SHS27k datasets with native structures:

Using Processed Data for SHS27k Dataset

Download protein.actions.SHS27k.STRING.pro2.txt, protein.SHS27k.sequences.dictionary.pro3.tsv, edge_list_12, x_list and vec5_CTC.txt to ./HIGH-PPI-main/protein_info/.

Data Processing for New Datasets (if applicable)

Prepare all related PDB files. Native protein structures can be downloaded in batches from the RCSB PDB, and predicted protein structures with errors can be downloaded from the AlphaFold database. Put all of the PDB files in ./protein_info/.

Generate adjacency matrix with native PDB files:

python ./protein_info/generate_adj.py --distance 12

Generate feature matrix:

python ./protein_info/generate_feat.py

Training

To predict PPIs, use 'model_train.py' script to train HIGH-PPI with the following options:

  • ppi_path str, PPI network information
  • pseq_path str, Protein sequences
  • p_feat_matrix str, The feature matrix of all protein graphs
  • p_adj_matrix str, The adjacency matrix of all protein graphs
  • split str, Dataset split mode
  • save_path str, Path for saving models, configs and results
  • 'epoch_num' int, Training epochs
python model_train.py --ppi_path ./protein_info/protein.actions.SHS27k.STRING.pro2.txt --pseq ./protein_info/protein.SHS27k.sequences.dictionary.pro3.tsv --split random --p_feat_matrix ./protein_info/x_list.pt --p_adj_matrix ./protein_info/edge_list_12.npy --save_path ./result_save --epoch_num 500

Testing

Run 'model_test.py' script to test HIGH-PPI with the following options:

  • ppi_path str, PPI network information
  • pseq_path str, Protein sequences
  • p_feat_matrix str, The feature matrix of all protein graphs
  • p_adj_matrix str, The adjacency matrix of all protein graphs
  • model_path str, Path for trained model
  • index_path str, Path for index being tested
python model_test.py --ppi_path ./protein.actions.SHS27k.STRING.pro2.txt --pseq ./protein.SHS27k.sequences.dictionary.pro3.tsv --p_feat_matrix ./x_list.pt --p_adj_matrix ./edge_list_12.npy --model_path ./result_save/gnn_training_seed_1/gnn_model_valid_best.ckpt --index_path ./train_val_split_data/train_val_split_1.json

Output

The output after running 'model_test.py' includes:

  • valid_label_list Real PPI labels for the test index
  • test_pre_result_list Predicted PPI results for the test index
  • best_f1 Overall performance in terms of best-F1 score
  • aupr Performance in terms of AUPR score for all seven PPI types (reaction, binding, ptmod, activation, inhibition, catalysis and expression)

high-ppi's People

Contributors

zqgao22 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.