ysig / grakel Goto Github PK

A scikit-learn compatible library for graph kernels

Home Page: https://ysig.github.io/GraKeL/

License: Other

Shell 0.52% Python 67.15% C++ 24.17% C 0.67% Jupyter Notebook 5.04% Cython 2.45%

graph-kernels graph-classification scikit-learn graph-mining bioinformatics chemoinformatics graph-similarity graph-similarity-algorithms

grakel's People

Contributors

Stargazers

Watchers

Forkers

yang-tradelab yangqiu gperakis y3nk0 ghostintheshellarise ramaroberto ethiy erickcai vishalbelsare tsdalton shengminjin raamana buszk jiaxings locussam pyliangbo zx191231 beattymg juexinwang cogdof kalofoli liujichao-ljc cslele wojamesyegit songjianbin xhy13770656533 alessi0x yueyueyaoyao lohith6036 songfgh jinghangli wangsiwei2010 paulmorio naihemeng nonlocal yefan19 wrx812 gear crystalxiaoxiao salensoft clwan siriuswy xyza11808 jdlc105 haipinglu faraazrm jhuow lorwin0130 oliverbscott anu-bioinfo 1304740908 xtomasch fanwangm wamawama zerounnet moan0s dlwbm123 leonasimba alvarodeleglise amirabbasasadi mp20140036 public-deep ml-d aja114 mightycrane leelasd sailfish009 harry-zhou yiran-hao peacegui yuhuangying pwelke susurrant xxl1995 cristyguzman kachingasilwimba real-lhj aneepi eddiebergman huddy1985 aryan-at-ul ahmad-abdellatif lyx9536 wendtvi etheng-jiao jasminezhen218 arpitjain799 danlangas yascoma benjamin-loison jimsow niuju lucabrown theoliver7 meiye-lj lagexmt mmanchanda

grakel's Issues

Kernels suitable for continuous adjacency matrices?

Hi, nice library!

I am wondering, can the kernels, such as the WL kernel, along with the SVM classifier, take graph representations whose adjacency matrix is continuous/soft? I.e. not discrete integral matrices, an example is [[.9, .1], [.1, .9]].

I saw in the documentation that vertex and edge attributes can be continuous features, but can the adjacency matrix be continuous as well? I tested this out and didn't see an obvious error message yet, but wanted to double check to see if I can expect the results to always be right.

Thanks!

full documentation link down

The link towards documentation provided by README is currently down. Any fix or backup?

Dealing with multiple sets of continuous attributes?

Hi there!

First, nice job with GraKel ;) It took me very little time to get my graphs into a format that can be handled by GraKel (thanks to the networkx import method and the corresponding example)!

My graphs (computed from some brain imaging data) have multiple vector-valued attributes on each node... Can any of the kernels availabe in GraKel deal with this? (I of course have the solution of concatenating the different vectors into one, but I'm just wondering whether there's a cleaner solution...) Or any kernel in general?

Thanks,

Sylvain

TypeError with Submatching kernel

from grakel import Graph
adj=[[0, 1, 0, 0], [1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0]]
node_label={0: '1', 1: '2', 2: '1', 3: '2'}
edge_label={(0, 1): '1', (1, 2): '1', (2, 3): '1'}
g1=Graph(initialization_object=adj, node_labels=node_label, edge_labels=edge_label)
from grakel.kernels import SubgraphMatching
sm_kernel=SubgraphMatching()
sm_kernel.fit_transform([g1])

Hello!
when I use the Submatching kernel, there is an error as showed in the picture above. Is there something wrong with the code I entered? Thanks a lot for your reply!

GraphHopper error

While running your library in a HPC (I have a singularity image with the required python packages) I am encountering the following error:

Traceback (most recent call last):
File "MI.py", line 163, in
K_GH = gh_kernel.fit_transform(Gs)
File "/opt/conda/lib/python3.6/site-packages/grakel/kernels/kernel.py", line 194, in fit_transform
self.fit(X)
File "/opt/conda/lib/python3.6/site-packages/grakel/kernels/kernel.py", line 123, in fit
self.X = self.parse_input(X)
File "/opt/conda/lib/python3.6/site-packages/grakel/kernels/graph_hopper.py", line 206, in parse_input
occ_p, des_p = od_vectors_dag(A_cc, D_cc)
File "/opt/conda/lib/python3.6/site-packages/grakel/kernels/graph_hopper.py", line 407, in od_vectors_dag
np.matlib.repmat(np.hstack([0, occ[i, :-1]]), edges_starting_at_ith.shape[0],
AttributeError: module 'numpy' has no attribute 'matlib'

Doing a quick search in stack overflow I wonder if there is the need to explicitly import numpy.matlib? I haven't had this issue while running the file locally. Thanks in advance!

How to get the extracted kernel dependent features of WL kernel?

The documentation said the fit() function extracts that. However, I try to print the result of fit() by print(wl_kernel.fit([G1,G2])) and just got the "WeisfeilerLehman(n_iter=1, normalize=True)" in the screen. I am sure that the graph G1, G2, and the WeisfeilerLehman kernel are built correctly. I want to calculate the Jaccard similarity of two graphs by the feature of WL kernel, so printing the feature is necessary. Please help me! Thanks a lot!

Using networkx graph format

Is there any way to use networkx graph format or do you have some function to transform .gexf formats to the appropriate format for the library?

gaussian_random_partition_graph constructor option directed doesn't work

gaussian_random_partition_graph constructor always returns a directed graph no matter how I set the parameter "directed"

something wrong using graphlet sampling

I used graphlet kernel for graph classification, but got following error.
Would you please help me?

Traceback (most recent call last):
File "group_kernel.py", line 79, in
K_test = gk.transform(X_test)
File "/home/yukuocen/py_env/lib/python2.7/site-packages/grakel-0.1a2-py2.7-linux-x86_64.egg/grakel/graph_kernels.py", line 344, in transform
K = self.kernel_.transform(X)
File "/home/yukuocen/py_env/lib/python2.7/site-packages/grakel-0.1a2-py2.7-linux-x86_64.egg/grakel/kernels/graphlet_sampling.py", line 247, in transform
Y = self.parse_input(X)
File "/home/yukuocen/py_env/lib/python2.7/site-packages/grakel-0.1a2-py2.7-linux-x86_64.egg/grakel/kernels/graphlet_sampling.py", line 439, in parse_input
self._Y_graph_bins[j], sg):
KeyError: 1

how to plot a Grakel graph object

Hi,

  Does someone know how to plot a new created Grakel graph?
  Thanks

something wrong in the code annotation

https://ysig.github.io/GraKeL/dev/_modules/grakel/utils.html#graph_from_pandas；
func :graph_from_pandas ;in the third explaining of node_df

Documentation returns Error 404

When going to https://ysig.github.io/GraKeL/0.1a8, I get a "File not found" error.

Using Graph Kernels for regression

First, thank you very much for this wonderful library. I am a beginner in graph kernels, and I have a question. Is there any way to use these graph kernels for regression tasks? Thank you very much again.

pynauty

Hi,

I am the author of pynauty. Due to some github magic I discovered that at some stage you used pynauty. This is just to let you know that I have recently moved the package to Github - pynauty and updated it. I also made it available from PyPi - pynauty, many binary wheels are also provided for Linux and macOS platforms.

How to use adjacency matrix to construct graph in the weisfeiler_lehman_subtree example

Code：
gk = WeisfeilerLehman(n_iter=10, base_graph_kernel=VertexHistogram, normalize=True)
K_train = gk.fit_transform(G_train)
For example，G_train is 149 adjacency matrixs of 90*90
fit_transform call the parse_input(), and then call the Graph()
Error:
line 45, in
K_train = gk.fit_transform(G_train)
File "D:\Anaconda\lib\site-packages\grakel\kernels\weisfeiler_lehman.py", line 295, in fit_transform
km, self.X = self.parse_input(X)
File "D:\Anaconda\lib\site-packages\grakel\kernels\weisfeiler_lehman.py", line 176, in parse_input
'a graph like object and node labels ' +
TypeError: each element of X must be either a graph object or a list with at least a graph like object and node labels dict

how to install, I just find how to install extension but not find how to install this package

Kernel for rdf graphs

Hello, I have a Networkx multigraph that I have created using the AIFB dataset. I want to use the kernels to perform classification tasks on nodes. Since these graphs have edge labels, do you have any suggestions as to what kernel might be good for this case?

get_isomorphism assertion error

I encountered another assertion error when using graphlet_sampling kernel.

  File "/usr/local/lib/python2.7/site-packages/grakel/graph_kernels.py", line 386, in fit_transform
    K = self.kernel_.fit_transform(X)
  File "/usr/local/lib/python2.7/site-packages/grakel/kernels/graphlet_sampling.py", line 300, in fit_transform
    self.fit(X)
  File "/usr/local/lib/python2.7/site-packages/grakel/kernels/kernel.py", line 123, in fit
    self.X = self.parse_input(X)
  File "/usr/local/lib/python2.7/site-packages/grakel/kernels/graphlet_sampling.py", line 417, in parse_input
    if self._graph_bins[k].isomorphic(sg):
  File "grakel/kernels/_isomorphism/bliss.pyx", line 365, in grakel.kernels._isomorphism.bliss.Graph.isomorphic
  File "grakel/kernels/_isomorphism/bliss.pyx", line 361, in grakel.kernels._isomorphism.bliss.Graph.get_isomorphism
AssertionError

Looks like the error was raised within bliss library. For GraKeL, maybe an easy fix is to catch the error of the following conditional check code if self._graph_bins[k].isomorphic(sg):

How to deal this?Thx ImportError: cannot import name 'k_to_ij_triangular' from 'grakel.kernels._c_functions'

Traceback (most recent call last):
File "D:\python\test\GraKeL-develop\test.py", line 9, in
from grakel import GraphKernel
File "D:\python\test\GraKeL-develop\grakel_init_.py", line 6, in
from grakel.graph_kernels import GraphKernel
File "D:\python\test\GraKeL-develop\grakel\graph_kernels.py", line 13, in
from grakel.kernels import GraphletSampling
File "D:\python\test\GraKeL-develop\grakel\kernels_init_.py", line 4, in
from grakel.kernels.kernel import Kernel
File "D:\python\test\GraKeL-develop\grakel\kernels\kernel.py", line 17, in
from grakel.kernels._c_functions import k_to_ij_triangular
ImportError: cannot import name 'k_to_ij_triangular' from 'grakel.kernels._c_functions' (unknown location)
[Finished in 1.2s with exit code 1]

I just installed grakel according to the tutorial on github and ran the test. The above BUG appeared, and the package could not be installed.

Weisfeiler Lehman kernel fitting process

Hello,

I'm using the grakel library (thank you for the very clear documentation and the gathering work !) in order to make classification. But i'm confused about some results I have regarding the Weisfeiler Lehman kernel.

For what I understand about the kernel is that there is no "learning" process : if we fit the kernel on the entire dataset or a subset we should have the same result about the pairwise similarities between the graphs.

However when I run the following code I'm not getting the same kernel at the end. First I fit_transform on all Mutag data getting a K_from_all kernel and then I select a subset (with respect to train and test indices) of this kernel.

I then compare with the same kernel but fitted on a small subset of the data (with respect to the train subset) and transformed on the test subset. I'm getting a K_from_small kernel which is different from the K_from_all kernel :

Did I miss details about the fitting procedure of the kernel ? (for the shortest path kernel I recover the same kernels)

Thank you very much

Titouan

Example optimizing_hyperparameters unable to run

The example examples/optimizing_hyperparameters.py doesn't work as

The line gk = WeisfeilerLehman(n_iter=i, base_kernel=VertexHistogram, normalize=True) throws an error as
base_kernel should be base_graph_kernel (see https://ysig.github.io/GraKeL/0.1a8/generated/grakel.WeisfeilerLehman.html#grakel.WeisfeilerLehman).

Problems with random walk

Hi,
I've two problems with random walk kernel. The first one is that putting the flag of normalizing = True, the similarity matrix is all 1's.
The second issue is that, if I put the normalizing flag = False, the diagonal of the similarity matrix is not 0.
This is the code:

def computeKernelRW(graphs):
    print("-- computing kernel")
    rw_kernel = GraphKernel(kernel=[{"name": "random_walk","lamda":0.5}],normalize=False)
    return  rw_kernel.fit_transform(graphs)

And this is the call on the main:

K = computeKernelSP(graphs)

How to load a networkx graph into GraKel

Hi There,

I'm trying to understand how one would use ones own graph data with GraKel via networkx and I am having some issues.

For example the following example code will fail:

H2O = scipy.sparse.csr_matrix(([1, 1, 1, 1], ([0, 0, 1, 2], [1, 2, 0, 0])), shape=(3, 3)) G = nx.from_scipy_sparse_matrix(H2O) graph = grakel.graph_from_networkx(G) sp_kernel.fit_transform(graph)

With this rather strange error:

AttributeError: 'int' object has no attribute 'nodes'

Any help you could provide on how to use custom data with GraKel would be most helpful! Is it required that graphs must have labels on the vertices and edges to be used?

One to one graph comparison example broken

When following the documentation to conduct one to one graph comparison using the following code:

wl_kernel.fit(H2O).transform(H3O)

It raises an AttributeError exception:

AttributeError                       Traceback (most recent call last)
<ipython-input-267-4071ae7832b8> in <module>()
----> 1 wl_kernel.fit(H2O)

/usr/local/lib/python2.7/site-packages/grakel/graph_kernels.pyc in fit(self, X, y)
    315             self.component_indices_ = inds
    316         else:
--> 317             self.kernel.fit(X)
    318
    319         # Return the transformer

AttributeError: 'dict' object has no attribute 'fit'

calculate graph similarity by using GraKel

Hi, I wonder if we can use GraKel to calculate graph similarity only? I don't need to classify the graphs, I only want to get the similarity among them, I have generated graphs with networkX, since multiple strategies of graph kernel have been implemented in GraKel, I think it should have the ability to export the graph similarity, but sadly I failed to find such an API.

Random Walk Kernel

To perform one step of random walk in the random walk kernel, we need to apply the adjacency matrix of the product graph to the probability of current node. In the line P*=XY, it is equivalent to np.multiply (element-wise multiplication). But we actually need np.matmul (matrix multiplication) instead. So the correct one should be P = np.matmul(XY, P) to perform one step of random walk.

if self.p is not None:
    P = np.eye(XY.shape[0])
    S = self._mu[0] * P
    for k in self._mu[1:]:
        P *= XY
        S += k*P

joblib raised Deprecation Error

DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=DeprecationWarning)

cannot import name 'joblib' from 'sklearn.externals

problem with sklearn's train_test_split after networkx import

Hi there, it seems train_test_split is not happy when you give it a "generator object graph_from_networkx" as input...

To reproduce this:

simply run the grakel example nx_to_grakel.py
and then try to run train_test_split on the resulting G:

G_train, G_test, y_train, y_test = train_test_split(G, [-1,1], test_size=0.5)

I dunno whether it's a bug, but anyhow, do you have a workaround in the meanwhile? (or maybe I don't do things correctly...)

Thanks,

Sylvain

Singular matrix for RandomWalk kernel

I was testing some directed unlabeled unweighted graphs. One of them triggered the "Singular matrix" error, which corresponds to the following code:

                def invert(n, w, v):
                    return (np.real(np.sum(v, axis=0))/n, np.real(w), np.real(np.sum(inv(v), axis=1))/n)

                def add_input(x):
                    return invert(x.shape[0], *eig(x))

How to get the matching graph?

We now could use transform function to measure similarity between two graphs, but how to get the matching graph?
Could we get corresponding nodes and edges in comparing graphs?
Any suggestions?

How to get mediate feature of graphs

Thank for your implementation and detailed documentation!
The graph kernels produce a similarity matrix of graphs, and it's a dot product of graph features.I want to know how to do to get these features.
I have studied the implementation of Weisfeiler Lehman kernel for half of a day and can't figure out so far :(

Unexpected output to STDOUT when iterating over iterable returned by graph_from_networkx

Thank you so much for your work on this library. I'm a graduate student just getting into this technique, and your framework and documentation has been quite helpful! 🙇‍♂️👏

I'm doing a project involving callgraphs of Android malware (~40k vertices), which are currently in gml format. I've been converting them into grakel graphs by way of networkx, as below:

gml_paths = [os.path.join(root, file) for root, directories, files in os.walk(path) for file in files if file.endswith(".gml")]
networkx_graphs = [nx.read_gml(path) for path in gml_paths]
grakel_graphs = graph_from_networkx(networkx_graphs, as_Graph=True)

The thing that is surprising to me is that the second I use grakel_graphs, for example, be coercing into a list list(grakel_graphs), a string representation of the graph gets dumped to STDOUT.

When I work in Jupyter Notebook, this results in the following error:

Is this a logging feature or is there some way to disable this functionality to silently convert to lists?

Apologies in advance if there is a naive Python issue on my end. I'm an experienced developer, but new to the language ecosystem.

Thanks!

MultiscaleLaplacianFast parse_input doesn't accept length one input

Implementation parse_input doesn't consider when every input is of length one with data type graph. To be consistent with the other kernels, I have added

elif type(x) is Graph:
pass

Unable to transform same graph twice wth ShortestPath kernel

Running GraKel version 0.1b7

Trying to run the H2O tutorial on the GraKel introduction, but noticed that I couldn't transform the same graph twice. Does fit_transform or transform do anything to the Graph object?

from grakel import Graph
from grakel.kernels import ShortestPath

H2O_adjacency = [[0, 1, 1], [1, 0, 0], [1, 0, 0]]
H2O = Graph(initialization_object=H2O_adjacency)

H3O_adjacency = [[0, 1, 1, 1], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]]
H3O = Graph(initialization_object=H3O_adjacency)

sp_kernel = ShortestPath(normalize=True, with_labels=False)
sp_kernel.fit_transform([H2O]) #works
sp_kernel.transform([H3O]) #works
sp_kernel.transform([H2O]) #error

Error on the second transform of H2O:

AttributeError: 'tuple' object has no attribute 'shape'

Including node attributes in GraKel

Is there any way to include node attributes (like a feature vector for each node) for the graph in GraKel?

GraphHopper: data type must provide an itemsize

Hi,
I tried running the GH kernel, but got this error: data type must provide an itemsize:

anaconda3\lib\site-packages\grakel\graph.py:312: UserWarning: changing format from "adjacency" to "all"
warnings.warn('changing format from "adjacency" to "all"')
Traceback (most recent call last):
File "test.py", line 71, in
t11 = sp_kernel2.fit_transform([g1])
File "anaconda3\lib\site-packages\grakel\kernels\kernel.py", line 197, in fit_transform
km = self._calculate_kernel_matrix()
File "anaconda3\lib\site-packages\grakel\kernels\kernel.py", line 231, in calculate_kernel_matrix
K[i, i] = self.pairwise_operation(x, x)
File "anaconda3\lib\site-packages\grakel\kernels\graph_hopper.py", line 261, in pairwise_operation
return self.metric((xp.reshape(xp.shape[0], m_sq),) + x[1:],
File "anaconda3\lib\site-packages\grakel\kernels\graph_hopper.py", line 282, in linear_kernel
NA_linear_kernel = np.dot(NA_i, NA_j.T)
File "<array_function internals>", line 5, in dot
ValueError: data type must provide an itemsize

Here is my code (which works fine with the shortest path kernel):

from grakel import Graph
from grakel import GraphKernel
#sp_kernel = GraphKernel(kernel="shortest_path")
from grakel.kernels import ShortestPath, GraphHopper
g1_edges = {(1, 2): 1, (1, 3): 1, (2, 1): 1, (3, 1): 1}
g1_edge_labels = {(1, 2): [1], (1, 3): [2], (2, 1): [1], (3, 1): [2]}
g1_node_labels = {1:'1',2:'2',3:'3'}
g1 = Graph(g1_edges, node_labels=g1_node_labels, edge_labels=g1_edge_labels)

g2_edges = {(1, 2): 1, (1, 3): 1, (2, 1): 1, (3, 1): 1}
g2_edge_labels = {(1, 2): [1], (1, 3): [2], (2, 1): [2], (3, 1): [1]}
g2_node_labels = {1:'1',2:'2',3:'3'}
g2 = Graph(g2_edges, node_labels=g2_node_labels, edge_labels=g2_edge_labels)

sp_kernel2 = GraphHopper(normalize=True)
t11 = sp_kernel2.fit_transform([g1])
t12 = sp_kernel2.transform([g2])
print(t11)
print(t12)

I also tried it with edge labels instead of edge attributes, but it didn't help. The issue happens in this line:
sp_kernel2 = GraphHopper(normalize=True)

Possible improvement for initialisation of Floyd-Warshall?

Dear Grakel developers,
today I was playing with the Shortest Path kernel (Floyd-Warshall algorithm for building the shortest path matrix) and I was curious to look at the source code.

Your floyd_warshall() algorithm has been nicely divided into an Initialization step and a Calculation step. The initialization step seems a bit inefficient to me (a double for-loop with three if/else branches). May I propose the following vectorized alternative?

dist = copy.deepcopy(adjacency_matrix)
dist[dist==0] = float("Inf")
np.fill_diagonal(dist, 0)

where, obviously np stands for numpy and copy is included in the standard Python library.

I have been trying this new implementation on the DD dataset (a quite challenging one) by taking the wall-clock time between the original initialization step and the vectorized one (the calculation block has obviously been omitted) and on a Linux 19.04 machine with Intel i7-3770K I obtained 183.60 seconds (original) against 1.03 seconds (vectorized).

Hope you find this little investigation useful.
All the best

Issue with fetch_dataset and MultiscaleLaplacian kernel

Hi,
I'm having some trouble with the ML kernel. Here's the code (very simple):

from __future__ import print_function

from grakel import kernels
from grakel.datasets import fetch_dataset
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
import numpy as np

# Loads the dataset
dataset = fetch_dataset("IMDB-BINARY", produce_labels_nodes=True)
G, y = dataset.data, dataset.target

# Splits the dataset into a training and a test set
G_train, G_test, y_train, y_test = train_test_split(G, y, test_size=0.1, random_state=42)

# Uses the shortest path kernel to generate the kernel matrices
gk = kernels.MultiscaleLaplacian()
gk.fit(G_train)
K_train = gk.fit_transform(G_train)
K_test = gk.transform(G_test)

# Uses the SVM classifier to perform classification
print("Starting training")
clf = SVC(kernel="precomputed", verbose=True)
clf.fit(K_train, y_train)
y_pred = clf.predict(K_test)

# Computes and prints the classification accuracy
acc = accuracy_score(y_test, y_pred)
print("Accuracy:", str(round(acc * 100, 2)) + "%")

But it fails with this error:

Traceback (most recent call last):
  File ".../gkernel/lib/python3.7/site-packages/grakel/kernels/multiscale_laplacian.py", line 465, in parse_input
    phi = np.array([list(phi_d[i]) for i in range(A.shape[0])])
  File ".../gkernel/lib/python3.7/site-packages/grakel/kernels/multiscale_laplacian.py", line 465, in <listcomp>
    phi = np.array([list(phi_d[i]) for i in range(A.shape[0])])
TypeError: 'int' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/sdb-seagate/graph-kernels/train.py", line 43, in <module>
    gk.fit(G_train)
  File ".../gkernel/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 123, in fit
    self.X = self.parse_input(X)
  File ".../gkernel/lib/python3.7/site-packages/grakel/kernels/multiscale_laplacian.py", line 467, in parse_input
    raise TypeError('Features must be iterable and castable ' +
TypeError: Features must be iterable and castable in total to a numpy array.

Am I doing something wrong? Thank you for your time.

Normalized RandomWalk returning k(G,G) = -1

Hi,

I am using GraKeL as installed from the repository (grakel-dev, version 0.1a4). When playing around with RandomWalk, I found that the normalized kernel of a graph G with itself, k(G,G), is sometimes equals to -1. Here's a minimum working example to reproduce the result:

from grakel import RandomWalk

rw_kernel = RandomWalk(normalize=True)

A = [[0, 1, 1, 1],
     [1, 0, 1, 1],
     [1, 1, 0, 1],
     [1, 1, 1, 0]]
labels_A = {0: 'A', 1: 'B', 2: 'C', 3: 'D'}
print(rw_kernel.fit_transform([[A, labels_A]])) # [[1.]]

B = [[0, 1, 1, 1, 1],
     [1, 0, 1, 1, 1],
     [1, 1, 0, 1, 1],
     [1, 1, 1, 0, 1],
     [1, 1, 1, 1, 0]]

labels_B = {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E'}
print(rw_kernel.fit_transform([[B, labels_B]])) # [[-1.]]

Why is this happening?

Thanks in advance for your support.

Error when fitting graph kernels

I am trying to work on graph classification and I keep getting this error when fitting different kernels. Is there any guesses of what might be happening? I am using weighted edges and node features

/usr/local/lib/python3.6/dist-packages/grakel/kernels/shortest_path.py in lhash_labels(S, u, v, *args)
497
498 def lhash_labels(S, u, v, *args):
--> 499 return (args[0][u], args[0][v], S[u, v])
500
501

KeyError: 0

UnboundLocalError: local variable 'n_samples' referenced before assignment

Hi,
I'm trying to run my project in a server and it returns me this error (while in my computer not):

  File "/home/nbaldan/.local/lib/python3.5/site-packages/grakel/kernels/graphlet_sampling.py", line 229, in initialize
    self.n_samples_ = n_samples
UnboundLocalError: local variable 'n_samples' referenced before assignment

Do you know why?

RuntimeWarning: invalid value encountered in divide

Got the following RuntimeWarning when using graphlet_sampling kernel:

/usr/local/lib/python2.7/site-packages/grakel/kernels/graphlet_sampling.py:313: RuntimeWarning: invalid value encountered in divide
  return np.divide(km, np.sqrt(np.outer(self._X_diag, self._X_diag)))

It is probobally caused by INF/NaN value. Not a major problem, but may correspond to some edge cases.

Which algorithm can be used when both node and edge are labeled?

Hi, I want to use GraKel to compare program's function call graph similarity, each caller address is a node, I use the caller address itself as the node's label, besides, loop always exist in program execution, thus one edge may be executed multiple times before the program exit, so I use execution time as edge label.

I want to know whether there is an algorithm in GraKel can handle the situation above? I checked the example, but it only shows the graph with labeled node, I also tried to use WeisfeilerLehman algorithm, but it seems this algorithm won't take edge label into consideration :(

Graph_from_CSV

Hi,

I'm having the same problem as mentioned in a previous issue of not being able to produce the correct input format, not from a networkX graph nor the CSV file. Both functions itself work fine, but as soon as I try to use the fit_transform(X) it throws an error:

for graph_from_csv object it throws:

ValueError Traceback (most recent call last)
in
----> 1 sp_kernel.fit_transform(new_G)

~/anaconda3/envs/py36/lib/python3.6/site-packages/grakel/graph_kernels.py in fit_transform(self, X, y)
386 K = self.kernel_.transform(X).dot(self.nystroem_normalization_.T)
387 else:
--> 388 K = self.kernel_.fit_transform(X)
389
390 return K

~/anaconda3/envs/py36/lib/python3.6/site-packages/grakel/kernels/shortest_path.py in fit_transform(self, X, y)
379 """
380 self._method_calling = 2
--> 381 self.fit(X)
382
383 # calculate feature matrices.

~/anaconda3/envs/py36/lib/python3.6/site-packages/grakel/kernels/kernel.py in fit(self, X, y)
121 raise ValueError('fit input cannot be None')
122 else:
--> 123 self.X = self.parse_input(X)
124
125 # Return the transformer

~/anaconda3/envs/py36/lib/python3.6/site-packages/grakel/kernels/shortest_path.py in parse_input(self, X)
428 elif self._method_calling == 3:
429 self._Y_enum = dict()
--> 430 for (idx, x) in enumerate(iter(X)):
431 is_iter = isinstance(x, collections.Iterable)
432 if is_iter:

~/anaconda3/envs/py36/lib/python3.6/site-packages/grakel/utils.py in graph_from_csv(edge_files, node_files, index_type, directed, sep, as_Graph)
566 type(edge_files[1]) is not bool or
567 type(edge_files[2]) not in [bool, None]):
--> 568 edge_files_error()
569 else:
570 if edge_files[1]:

~/anaconda3/envs/py36/lib/python3.6/site-packages/grakel/utils.py in edge_files_error()
553 """
554 def edge_files_error():
--> 555 raise ValueError('edge_file argument must contain an iterable of strings of edge files, '
556 'a bool weight_flag and attributes_flag bool or None')
557

ValueError: edge_file argument must contain an iterable of strings of edge files, a bool weight_flag and attributes_flag bool or None

whereas I get the same missing,edge, node labels error as in the other issue for my networkX input graph. Essentially i have a large set of unlabelled binary adjacency matrices as csv files and want to measure their similarity using your kernels. I have already tried the PM kernel in Matlab you developed and it was working like a charm! Thanks for your help.

About the data in your paper.

I used GraKel on 3 graph classiﬁcation datasets (MUTAG, ENZYMES, NCI1), it seems that my results are far away from your paper. I dont't know the reason. Could you explain this? Btw, PTC-MR is currently unsupported. And my source code is here: https://github.com/xnuohz/graph-kernel. Thanks:)

Calculate a default kernel value

I was reading this introduction:
https://ysig.github.io/GraKeL/dev/user_manual/longer_introduction.html

I exeuted the following lines:

from grakel import GraphKernel
H2O = [[[[0, 1, 1], [1, 0, 0], [1, 0, 0]], {0: 'O', 1: 'H', 2: 'H'}]]
H3O = [[[[0, 1, 1, 1], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]], {0: 'O', 1: 'H', 2: 'H', 3:'H'}]]
gs_kernel = GraphKernel(kernel=dict(name="graphlet_sampling", n_samples=5))
gs_kernel.fit(H2O)

An error occurs on fit method:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/grakel/graph_kernels.py", line 290, in fit
    self.initialize_()
  File "/usr/local/lib/python3.7/site-packages/grakel/graph_kernels.py", line 430, in initialize_
    self.kernel_ = kernel(**params)
TypeError: __init__() got an unexpected keyword argument 'n_samples'

I'm using grakel 0.1a5 under python 3.7.1

NotImplementedError: Pairwise operation is not implemented!

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

from grakel.datasets import fetch_dataset
from grakel.kernels import ShortestPath

Loads the MUTAG dataset

MUTAG = fetch_dataset("MUTAG", verbose=False)
G, y = MUTAG.data, MUTAG.target

Splits the dataset into a training and a test set

G_train, G_test, y_train, y_test = train_test_split(G, y, test_size=0.1, random_state=42)

Uses the shortest path kernel to generate the kernel matrices

gk = ShortestPath(normalize=True)
K_train = gk.fit_transform(G_train)
gk.pairwise_operation(G_test[0], G_test[1])

Traceback (most recent call last):
File "", line 1, in
File "/mnt/blossom/more/kgoyal/repos/graphsearch/venv/lib/python3.6/site-packages/grakel/kernels/kernel.py", line 384, in pairwise_operation
raise NotImplementedError('Pairwise operation is not implemented!')
NotImplementedError: Pairwise operation is not implemented!

Graphlet Sampling kernel errors when sampling is 'None'

I am using the Graphlet Sampling kernel with no sampling, i.e.

GK = grakel.GraphletSampling(n_jobs=None, normalize=False, verbose=False, random_state=None, k=this_k, sampling=None)

where this_k is user-defined.

However, when I do GK.fit_transform() on a dataset, it shows the following error

Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/scipy/optimize/_differentialevolution.py", line 1265, in __call__
    return self.f(x, *self.args)
  File "main_benchmarkKernels.py", line 68, in myFitness
    thisKernelMatrix = GK.fit_transform(thisDataset)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/grakel/kernels/graphlet_sampling.py", line 309, in fit_transform
    self.fit(X)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/grakel/kernels/kernel.py", line 117, in fit
    self.initialize()
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/grakel/kernels/graphlet_sampling.py", line 229, in initialize
    self.n_samples_ = n_samples
UnboundLocalError: local variable 'n_samples' referenced before assignment

By looking at the code (graphlet_sampling.py) a variable n_samples is assigned to self.n_samples_ even when sampling is None (line 229).

Maybe it's just a minor if/else issue? The if branch at line 158 considers the case when sampling is None and ends at line 160. The elseif branch starts at 161 and indeed declares n_samples, which can be assigned to self.n_samples at line 229.

Setup: Python 3.6.9 with grakel-dev==0.1a6

Division by zero error when using Propagation kernel

Hi, sometimes a division by zero error happens when using the Propagation kernel.
It seems that's because of normalizing the rows of the transition matrix in the following line:

transition_matrix[i] = (T.T / np.sum(T, axis=1)).T

when I replaced the above line by sklearn normalization function, the problem was solved and div by zero didn't happen:

from sklearn.preprocessing import normalize
transition_matrix[i] = normalize(T, axis=1, norm='l1')

implementation of EMD Kernel

Hi, this Graph Kernel repo is perfect. Could you provide the code implementation of EMD Kernel, which proposed in "Matching Node Embeddings for Graph Similarity" AAAI'17?