zqgao22 / high-ppi Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 10.0 1.01 MB

License: MIT License

Python 100.00%

high-ppi's Introduction

zqgao22

high-ppi's People

Contributors

Stargazers

Watchers

Forkers

xiyou3368 zshuyinggg kehan777 dot23 stefano-t liudan111 altriavin datanumanitoba rucvma wook2014

high-ppi's Issues

Questions about the importance of residues

Can you provide the relevant code on how to get the importance of the residues? Thank you.

Question of ppi label

Thanks for your fantastic work. I have a question.

In the Suppl. Data 2, you show the groud truth label of the PPI of "9606.ENSP00000254722 9606.ENSP00000261349" (index of 748) is "Reaction Binding Ptmod Activation Inhibition Catalysis Expression = 1 1 0 0 0 1 0". However, The interaction type of "9606.ENSP00000254722 9606.ENSP00000261349" found in the file link (https://drive.google.com/file/d/1CtS2V52lCG0bEjss0MguesJq19ZZ2LCZ/view?usp=drive_link), which is also you provided, is "9606.ENSP00000254722 9606.ENSP00000261349 inhibition inhibition f f 800". The two labels of the same ppi data from two files both you provided are not same. Are there some other definitions of the ppi label?

Thanks.

About vec5_CTC.txt: what is the basis for determining these vectors?

Thanks you for making such a project public.
At the same time, I have some issues about this project:

In the vec5_CTC.txt file, each amino acid have a corresponding vector. What is the basis for determining these vectors? Or is it according to the conventions of a previous project?
Why not to use the Protein Language Models to get the amino acid embedding?

How to generate the Fig.2a?

I am primarily focused on a specific PR-curve. PPI presents a multi-label problem, and to the best of my knowledge, it's standard to draw a PR-curve for each class.
However, in Fig.2a, a model is represented by a single line. How is the transition made from a multi-label PR-curve to a single curve?

error of generate_adj.py and generate_feat.py

Hello!I have some problem when running the generate_adj.py & generate_feat.py,it shows that:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Thanks for your reply.

Questions about the Data Processing for New Datasets

Hello Ziqi, we have some issues about the data processing for new datasets.

We used ID in ensp_uniprot.txt to download native protein structures, but we discovered that over 600 native protein structures cannot be found. Could you please provide the address of the native protein structures or can you provide the detailed process of data processing, thanks.
We used "generate_adj.py" and "generate_feat.py" to process the downloaded native protein structures, but found that they did not match. Is there any problem that needs to be remedied? Thank you.

environment.yml

Hi! I ran your conda env create -f environment.yml and found that the version incompatibility does not cause the creation of a virtual environment, can you provide some solutions?

Environment not working

Hi, Ziqi. I've been trying to use environment.yml file to create my Conda environment. However, some error occurs to me that some packages were found to be conflicted. I was wandering if you mind providing another basic environment.yml file.

F1

Hello! Wondering why the experiment turned out differently every time you set up a random seed, failing to review your best metrics

Questions about the gnn_models_sag.py

Hello Ziqi, I checked the gnn_models_sag.py file and observed that the model architectures are quite different from the models depicted in Supplementary Fig. 1 of the paper.

Regarding BGNN, several discrepancies are present, including the feature dimension, activation function type, GCN block count, the order of activation function and batch normalization layers, and additional GCNConv modules following SAG pooling, which contradicts Supplementary Fig. 1. Similarly, in TGNN, replacing the feature_fusion argument with 'concat' (i.e., concatenation used in the paper) is insufficient, and I must modify the feature dimension of fc2. I may have overlooked other discrepancies.

Based on the above findings, it appears that there may be a final version of the model structure. Would it be possible for you to release the final version of the gnn_models_sag.py file so that we can directly implement the best model architecture as described in the paper? Thank you!

Question about how to obtain the PDB files.

Hi~

Thank you for the release of this great work! I need some help about how to obtain the PDB files we needed.

Because I am not familar to the PDB website, any suggestions are helpful for me. Taking the SHS27k dataset as an example, I have typed the "SHS27k" into the search box, and it returns nothing.

Could you please give some solutions to get the right PDB file for the SHS27k dataset? Thank you in advance!

Could you provide PDBs of SHS27k dataset?

Questions about the Metrictor_PPI function

I would like to express my gratitude for making the code open-source. I have a question regarding the Metrictor_PPI function in HIGH-PPI/utils.py and I was hoping you could assist me.

It seems that the arguments of the sklearn precision_recall_curve function are reversed. Could you kindly clarify if this is the case?
self.pre is (0,1) label instead of the predicted probability $P(y=1)$. Line 150 in HIGH-PPI/model_train.py. I am concerned that this could affect the correctness of AURPC.

How you define the "all_assign.txt" and "vec5_CTC.txt" ?

How to make these two files? Do they have relationship with bio knowledge?