fouticus / pipgcn Goto Github PK

View Code? Open in Web Editor NEW

90.0 90.0 23.0 30 KB

Protein Interface Prediction using Graph Convolutional Networks

License: MIT License

Python 97.63% Shell 2.37%

pipgcn's People

Stargazers

Watchers

pipgcn's Issues

How to convert the .pdb file into the graph.

I am looking for the script to preprocess my protein data i.e. '.pdb' file. Can you provide me with the exact file in the GitHub repo for the process?
Thank you for your help.

Order of node features

I am trying to determine how removing certain node features changes the predictive performance. Are the 70 node features in the array in the same order in which they are presented in Appendix A.1 (e.g. first 20 being PSSM features)?

Stacking dropout layers?

I just wanted to check with you about the design of some of the layers in nn_components.py.

In the nn_components,py file, the dense() function starts and ends with a call to the dropout layer ie:

def dense(input, params, out_dims=None, dropout_keep_prob=1.0, nonlin=True, trainable=True, **kwargs):

   input = tf.nn.dropout(input, dropout_keep_prob)

   # some other code...

   Z = tf.nn.dropout(Z, dropout_keep_prob)

   return Z, params

This means that when this two layers are stacked together, then the output of the dense layer 1 (which already has dropout applied to it) will be passed through another dropout layer at the start of dense layer 2. A similar situation also occurs in no_conv layer.

Was this what was intended?

How to compute node features

Dear authors,

Thank you for the code! Could you point to a reference or a script on how to obtain amino acid features? Its not clear from the pkl files how to do that.

Question about counting nonzero nh_indices in node_edge_average()

In line 59 of the nn_components.py file, there is this line:

nh_sizes = tf.expand_dims(tf.count_nonzero(nh_indices + 1, axis=1, dtype=tf.float32), -1) # for fixed number of neighbors, -1 is a pad value

What is the purpose of counting the non-zero elements in nh_indices + 1? The number of non-zero elements in each row of nh_indices + 1 is always 20.

tensorflow placeholder for vertex and transition

 self.in_vertex1 = tf.placeholder(tf.float32, [None, self.in_nv_dims], "vertex1")
            self.in_vertex2 = tf.placeholder(tf.float32, [None, self.in_nv_dims], "vertex2")
            if self.diffusion:
                self.power_transition1 = tf.placeholder(tf.float32, [None, self.maxpower, None], name="power_transition_matrices")
                self.power_transition2 = tf.placeholder(tf.float32, [None, self.maxpower, None], name="power_transition_matrices")
                input1 = self.in_vertex1, self.power_transition1
                input2 = self.in_vertex2, self.power_transition2

Can I know why None is used here instead of an actual value?

Application to additional proteins

Hi,
Thanks for posting the code. Is there a script to get the features for a custom protein from a pdb/seq? Maybe I missed it somewhere.
Cheers,

about node features

Hello, Thank you for your sharing!
I am trying to reproduce your code. However, l meet some questions about node features.
I have computed he rASA of each residue, but I don't understand how you normalize the data.
Could you describe it?
Thank you!

Sorry, my mistake

What does each node feature represents?

Thanks for sharing your code!

The node features have 70 elements for each node, but what does each represent?
The paper describes that 20 of them are amino acid id and others are conservation score, accessible surface etc. but I can't figure out which is which.
Could you please specify the order of the node features?

some question about data meaning

In line 59 of the pw_classifier.py, there is this line:

self.in_nhood_size = train_data[0]["l_hood_indices"].shape[1]

i know that the shape of train_data[0]["l_hood_indices"] is (185, 20, 1),but i don't know the meaning of each dimension.Could u tell me the meaning of each dimension of train_data[0]["l_hood_indices"] ,or how to get the matrix。i want to apply your method to my own graph.
Thx!

What does the pipgcn really learn?

For the case of node average (Equation 1), the features of center node i and its neighbor nodes were used to learn a map at each node in the graph which has the form z_i by activating a non-linear function.

For each input pairwise ligand and receptor protein with different number of residues, the node-averaged graphs with different number of nodes will be got. What does the pipgcn really learn? What is the role of the activation function?

By the way how to handle the graphs with different nodes, or protein with different residues?

Looking forward to your explanation. Thanks.

The sequence information

Hi! I am doing some experiments using your code and data, while I cannot find any residue type information about the sequences (I mean like fasta files). May I have the sequences present in the data given?

training and testing data

What tools were used to generate the data matrix of structure information from pdb file? I'd like to add other complexes to train the model.

Could you please provide the code for preprocessing from protein file to tensor?

I want to run some test on my own data, but I don't know how to do the transformation from pdb files to model inputs.

data processing part

Hello,

In your data processing part, why there are only 20 neighbors in each node? I have calculated the distance between each node and all other nodes once in a graph. If it is less than 6A, it is considered that there are edges. The number of neighbors in this way may only be 12 or other numbers, but not 20. How do you deal with your 20?

Question about loss function formulation..

Can you point to a reference for the sample weighting and the operations done on the ground truth labels in the loss function?

Datasets

Hello, the work is very interesting, and i am very interested in your work. Could you share your training dataset and validation dataset separately in experiments? Thanks!

have a hard time to understand the input to the GCN

Thanks for publishing your well-organized code and the data.

I visualized the graph of the GCN (as below). Besides vertex1 and vertex2, there are other four inputs: edge1, edge2, hood_indices1 and hood_indices2. But the last four inputs are not connected to the computation graph. In the corresponding code, I saw only the first element of the input was used. So I was thinking about the reason why we included the edge data and hood_indices data.

In the node_edge_average method, the edges and hood_indices are used. But this method was not used. Did I miss anything?

fouticus / pipgcn Goto Github PK

pipgcn's People

Stargazers

Watchers

Forkers

pipgcn's Issues

Recommend Projects

Recommend Topics

Recommend Org