fouticus / pipgcn Goto Github PK
View Code? Open in Web Editor NEWProtein Interface Prediction using Graph Convolutional Networks
License: MIT License
Protein Interface Prediction using Graph Convolutional Networks
License: MIT License
I am looking for the script to preprocess my protein data i.e. '.pdb' file. Can you provide me with the exact file in the GitHub repo for the process?
Thank you for your help.
I am trying to determine how removing certain node features changes the predictive performance. Are the 70 node features in the array in the same order in which they are presented in Appendix A.1 (e.g. first 20 being PSSM features)?
I just wanted to check with you about the design of some of the layers in nn_components.py.
In the nn_components,py file, the dense() function starts and ends with a call to the dropout layer ie:
def dense(input, params, out_dims=None, dropout_keep_prob=1.0, nonlin=True, trainable=True, **kwargs):
input = tf.nn.dropout(input, dropout_keep_prob)
# some other code...
Z = tf.nn.dropout(Z, dropout_keep_prob)
return Z, params
This means that when this two layers are stacked together, then the output of the dense layer 1 (which already has dropout applied to it) will be passed through another dropout layer at the start of dense layer 2. A similar situation also occurs in no_conv layer.
Was this what was intended?
Dear authors,
Thank you for the code! Could you point to a reference or a script on how to obtain amino acid features? Its not clear from the pkl files how to do that.
In line 59 of the nn_components.py file, there is this line:
nh_sizes = tf.expand_dims(tf.count_nonzero(nh_indices + 1, axis=1, dtype=tf.float32), -1) # for fixed number of neighbors, -1 is a pad value
What is the purpose of counting the non-zero elements in nh_indices + 1? The number of non-zero elements in each row of nh_indices + 1 is always 20.
self.in_vertex1 = tf.placeholder(tf.float32, [None, self.in_nv_dims], "vertex1")
self.in_vertex2 = tf.placeholder(tf.float32, [None, self.in_nv_dims], "vertex2")
if self.diffusion:
self.power_transition1 = tf.placeholder(tf.float32, [None, self.maxpower, None], name="power_transition_matrices")
self.power_transition2 = tf.placeholder(tf.float32, [None, self.maxpower, None], name="power_transition_matrices")
input1 = self.in_vertex1, self.power_transition1
input2 = self.in_vertex2, self.power_transition2
Can I know why None is used here instead of an actual value?
Hi,
Thanks for posting the code. Is there a script to get the features for a custom protein from a pdb/seq? Maybe I missed it somewhere.
Cheers,
Hello, Thank you for your sharing!
I am trying to reproduce your code. However, l meet some questions about node features.
I have computed he rASA of each residue, but I don't understand how you normalize the data.
Could you describe it?
Thank you!
Thanks for sharing your code!
The node features have 70 elements for each node, but what does each represent?
The paper describes that 20 of them are amino acid id and others are conservation score, accessible surface etc. but I can't figure out which is which.
Could you please specify the order of the node features?
In line 59 of the pw_classifier.py, there is this line:
self.in_nhood_size = train_data[0]["l_hood_indices"].shape[1]
i know that the shape of train_data[0]["l_hood_indices"] is (185, 20, 1),but i don't know the meaning of each dimension.Could u tell me the meaning of each dimension of train_data[0]["l_hood_indices"] ,or how to get the matrix。i want to apply your method to my own graph.
Thx!
For the case of node average (Equation 1), the features of center node i and its neighbor nodes were used to learn a map at each node in the graph which has the form z_i by activating a non-linear function.
For each input pairwise ligand and receptor protein with different number of residues, the node-averaged graphs with different number of nodes will be got. What does the pipgcn really learn? What is the role of the activation function?
By the way how to handle the graphs with different nodes, or protein with different residues?
Looking forward to your explanation. Thanks.
Hi! I am doing some experiments using your code and data, while I cannot find any residue type information about the sequences (I mean like fasta files). May I have the sequences present in the data given?
What tools were used to generate the data matrix of structure information from pdb file? I'd like to add other complexes to train the model.
I want to run some test on my own data, but I don't know how to do the transformation from pdb files to model inputs.
Hello,
In your data processing part, why there are only 20 neighbors in each node? I have calculated the distance between each node and all other nodes once in a graph. If it is less than 6A, it is considered that there are edges. The number of neighbors in this way may only be 12 or other numbers, but not 20. How do you deal with your 20?
Can you point to a reference for the sample weighting and the operations done on the ground truth labels in the loss function?
Hello, the work is very interesting, and i am very interested in your work. Could you share your training dataset and validation dataset separately in experiments? Thanks!
Thanks for publishing your well-organized code and the data.
I visualized the graph of the GCN (as below). Besides vertex1 and vertex2, there are other four inputs: edge1, edge2, hood_indices1 and hood_indices2. But the last four inputs are not connected to the computation graph. In the corresponding code, I saw only the first element of the input was used. So I was thinking about the reason why we included the edge data and hood_indices data.
In the node_edge_average method, the edges and hood_indices are used. But this method was not used. Did I miss anything?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.