Code Monkey home page Code Monkey logo

graphtransformer's People

Contributors

jmbr avatar vijaydwivedi75 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphtransformer's Issues

Why did you divide this term?

Hi there,

I was reading your code on graphtransformer, I'm kind of curious on the operation shown below. Why did you divide the wV score by the w(or so called 'score' term), I didn't see any terms shown in your equation 4 or equation 9 in the paper. Could you illustrated that?

h_out = g.ndata['wV'] / (g.ndata['z'] + torch.full_like(g.ndata['z'], 1e-6)) # adding eps to all values here

Thanks

about attention

图片
图片
Hello, about the problem of calculating attention, the attention of node i and its adjacent nodes is calculated in the formula, but I find that the final calculation is the attention of all nodes, and it does not distinguish whether the nodes are connected. Is there a problem with my understanding?

Error in Using this Graph Transformer Layer on random graph

Hi!
I am trying to use this Graph transformer layer on the random graph (see below). The error is occurring KeyError: 'wV'

import torch
import dgl
import networkx as nx
from model import GraphTransformerLayer (# this imports the layers/graph_transformer_layer.py )

torch.manual_seed(42)

def create_random_graph(num_nodes, node_feature_dim):
g = dgl.DGLGraph()
g.add_nodes(num_nodes)
node_features = torch.randn(num_nodes, node_feature_dim)
g.ndata['feat'] = node_features
return g

num_nodes = 10
node_feature_dim = 16
num_heads = 4

random_graph = create_random_graph(num_nodes, node_feature_dim)

model_layer = GraphTransformerLayer(in_dim=node_feature_dim, out_dim=node_feature_dim, num_heads=num_heads)

output_features = model_layer(random_graph, random_graph.ndata['feat'])

Details of the laplacian encoding

Hi,

Neat work! I was looking at the implementation of the laplacian encoding, and some things weren't clear.

def laplacian_positional_encoding(g, pos_enc_dim):

    # Laplacian
    A = g.adjacency_matrix_scipy(return_edge_ids=False).astype(float)
    N = sp.diags(dgl.backend.asnumpy(g.in_degrees()).clip(1) ** -0.5, dtype=float)
    L = sp.eye(g.number_of_nodes()) - N * A * N

    # Eigenvectors with numpy
    EigVal, EigVec = np.linalg.eig(L.toarray())
    idx = EigVal.argsort() # increasing order
    EigVal, EigVec = EigVal[idx], np.real(EigVec[:,idx])
    g.ndata['lap_pos_enc'] = torch.from_numpy(EigVec[:,1:pos_enc_dim+1]).float() 
    
    return g

Why do you drop the first eigenvector in the last line (i.e. why do you run use indexes 1:pos_enc_dim+1)? Does this come from the assumption that the first eigenvalue will be very close to 0?

    g.ndata['lap_pos_enc'] = torch.from_numpy(EigVec[:,1:pos_enc_dim+1]).float() 

Another quick Q, could you explain why you need to use np.real in the line below? Are there any cases when we would have complex numbers here?

   EigVal, EigVec = EigVal[idx], np.real(EigVec[:,idx])

Thanks in advance!

Superpixel dataset

Thanks for sharing codes for your interesting paper. I'm interested in applying this method to MNIST superpixel dataset (your benchmarking work) where the task is graph classification, graphs have both node features and edge features, and the number of nodes/edges are different between graphs. What modifications should I made to the current code?

is the installation instruction still valid ?

I'm trying to setup the environment on mac, followed your instructions carefully however I got the below error when building the env by conda:
ResolvePackageNotFound:

  • h5py=2.9.0
  • tensorboard=1.14.0
  • requests==2.22.0
  • ipython=7.7.0
  • ipykernel=5.1.2
  • notebook=6.0.0
  • pip=19.2.3
  • scikit-image=0.15.0
  • scipy=1.3.0
  • torchvision==0.7.0
  • pytorch=1.6.0
  • matplotlib=3.1.0
  • pillow==6.1
  • mkl=2019.4
  • dgl=0.6.1
  • scikit-learn=0.21.2
  • python=3.7.4

Sparse graph and full graph

Thanks for the innovative work! Could you please tell me how can we get a full graph?Did full graph mean the full attention map?Did sparse graph mean that we only retain the immediate neighbor nodes‘ value of full graph?

Technical question

Hi, thanks for the great paper :)

I was just curious as to what the 'z' variable is in line 59 of the graph_transformer_layer.py code? I cannot seem to find the equivalent in the paper. It seems you are normalizing the output heads by the sum of the attention weights?

Would appreciate a little point :)

Thanks,
Devin

pos_enc_dim value

For graphs with large difference in the number of vertices, how to determine the value of pos_enc_dim? For example, the number of vertices of a graph is 7, and the number of vertices of another graph is 3? Do you take the pos_enc_dim is 8 because the number of vertices in your experimental data set is greater than 8?

Scaling of Laplacian pre-computation

First, I would like to say that I think there are some very good ideas in the paper. Nice work! I have some questions though:

Could you tell me what the largest graph is that you've used this approach on? Do you have any recommendations for Laplacian eigenvector encodings for large graphs? The way it's implemented now, using np.linalg.eig and the .to_array() call, which seems to lose the sparsity, could give some problems.

Memory consumption

Could you provide some additional information about the memory consumption using your Graph Transformer?

You state, that sparse-attention favors both computation time and memory consumption, but do not provide actual measurements of the second property in your evaluation or do not state clearly, if and how your implementation is able to take advantage of it.
Some peak memory measurements of your experiments as an addendum to your evaluation of the computation times (e.g. Table 1) could be beneficial to others, too. As in my case the quadratic growth of the memory consumption w.r.t. the sequence length prevents an efficient use of Transformers for some task, where connectivity information is given and can be simply modeled by masking out (-Inf) the attention scores in the attention matrix.

Also some exemplary or artificial data could be interesting, e.g. (Mean) number of nodes n = {128, 1024, 2048, 4096}, (mean) number of edges per node e = {4, 8, 16, 64, 128}, to get an impression of the resource consumption of your Graph Transformer with Sparse Graph vs. NLP-Transformer (Full Graph with masking).

(I hope, that I could run the experiments myself, but I suppose your evaluation pipeline is already running and data provided by the original author should be more precise and trustworthy to other researchers, too.)

Graph Classification

Hello there,
First of all, thank you for providing such an amazing work.
I'd like to know how can I leverage graphtransformer on Graph Classification task with textual data, for instance, I first extract nodes and edge info from the text data, given node features and edge information (only one type of edge in my case), the model generate binary targets based on those given features.

Kind Regards
Michael

laplacian positional encoding

how to deal with the laplacian positional encoding of directed graph? The adjacency matrix of a directed graph is not an identity matrix.

Dataset request

Hello, author, please tell me how to handle the datasets of this article. I want to run other datasets on your code. What should I do?Or, if you have other data sets, please send them to me. Thank you very much!

[email protected]

node update

g.send_and_recv(eids, fn.src_mul_edge('V_h', 'score', 'V_h'), fn.sum('V_h', 'wV')) this only update target node;
head_out = g.ndata['wV']/g.ndata['z'] so this only update target node;the source node not update?

About Equations 11~12

Hi,

Great work!

I want to confirm whether my understanding of equations 11~12 is correct.

I understand equation 12 in this way: (Q h_i * K h_j / sqrt(d_k)) is a scalar, and (E e_ij) is a d_k-dim vector. Then a scalar multiplying a vector gives a d_k-dim vector. In equation 11, this d_k-dim vector is transformed to a scalar by computing w_1+w_2+...+w_dk. Is it correct?

AttributeError: Can't get attribute 'DGLHeteroGraph' on <module 'dgl.heterograph' >

Hi, I try to run the example code "main_SBMs_node_classification.py" by the following command:

python main_SBMs_node_classification.py --gpu_id 0 --config 'configs/SBMs_GraphTransformer_CLUSTER_500k_full_graph_BN.json'

But it comes with the following error:
AttributeError: Can't get attribute 'DGLHeteroGraph' on <module 'dgl.heterograph' from '/home/rody/.local/share/virtualenvs/rody-V7qEFACp/lib/python3.6/site-packages/dgl/heterograph.py'>

How can I fix this bug, thanks!

Detail on softmax

Great work!

I have a question concerning the implementation of softmax in the graph_transformer_edge_layer.py

When you define the softmax, you use the following function:

def exp(field):
    def func(edges):
        # clamp for softmax numerical stability
        return {field: torch.exp((edges.data[field].sum(-1, keepdim=True)).clamp(-5, 5))}
    return func

Shouldn't the attention weights/scores be scalars? From what I see, each head has an 8-dimensional score vector which you then compute .sum() on. The graph_transformer_layer.py layer does not have this .sum() function.

def scaled_exp(field, scale_constant):
    def func(edges):
        # clamp for softmax numerical stability
        return {field: torch.exp((edges.data[field] / scale_constant).clamp(-5, 5))}

    return func

Would appreciate any clarification on this :)

Best,
Devin

Eval sign flipping

Hi Vijay,

Thanks for your repo!

Question: I see your doing sign flipping of eigen pos_enc during training, but it seems that you are not doing so during eval time. I understand that we want to make deterministic predictions so we don't have random flipping when evaluating it. Do you have further comments or justification for this?

Best
Kezhi

Attention Matrix

Hi ! Congratulations for your paper and thank you for making the implementation publicly available as well.

Quick question on this function :

    def func(edges):
        return {out_field: (edges.src[src_field] * edges.dst[dst_field])}
    return func

Why do you do a multiplication of K and Q and not a dot product? The dimensions of the scores are [num_edges, num_heads, hidden_dim/num_heads]. But I expect a [num_edges,num_edges] matrix .

You can also reach me here : [email protected]
Hope to hear from you soon , Pietro Bonazzi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.