Code Monkey home page Code Monkey logo

nodevectors's Introduction

Build Status

This package implements fast/scalable node embedding algorithms. This can be used to embed the nodes in graph objects and arbitrary scipy CSR Sparse Matrices. We support NetworkX graph types natively.

alt tag

Installing

pip install nodevectors

This package depends on the CSRGraphs package, which is automatically installed along it using pip. Most development happens there, so running pip install --upgrade csrgraph once in a while can update the underlying graph library.

Supported Algorithms

  • Node2Vec (paper). Note that despite popularity this isn't always the best method. We recommend trying ProNE or GGVec if you run into issues.

  • GGVec (paper upcoming). A flexible default algorithm. Best on large graphs and for visualization.

  • ProNE (paper). The fastest and most reliable sparse matrix/graph embedding algorithm.

  • GraRep (paper)

  • GLoVe (paper). This is useful to embed sparse matrices of positive counts, like word co-occurence.

  • Any Scikit-Learn API model that supports the fit_transform method with the n_component attribute (eg. all manifold learning models, UMAP, etc.). Used with the SKLearnEmbedder object.

Quick Example:

import networkx as nx
from nodevectors import Node2Vec

# Test Graph
G = nx.generators.classic.wheel_graph(100)

# Fit embedding model to graph
g2v = Node2Vec(
    n_components=32,
    walklen=10
)
# way faster than other node2vec implementations
# Graph edge weights are handled automatically
g2v.fit(G)

# query embeddings for node 42
g2v.predict(42)

# Save and load whole node2vec model
# Uses a smart pickling method to avoid serialization errors
# Don't put a file extension after the `.save()` filename, `.zip` is automatically added
g2v.save('node2vec')
# You however need to specify the extension when reading it back
g2v = Node2Vec.load('node2vec.zip')

# Save model to gensim.KeyedVector format
g2v.save_vectors("wheel_model.bin")

# load in gensim
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format("wheel_model.bin")
model[str(43)] # need to make nodeID a str for gensim

Warning: Saving in Gensim format is only supported for the Node2Vec model at this point. Other models build a Dict or embeddings.

Embedding a large graph

NetworkX doesn't support large graphs (>500,000 nodes) because it uses lots of memory for each node. We recommend using CSRGraphs (which is installed with this package) to load the graph in memory:

import csrgraph as cg
import nodevectors

G = cg.read_edgelist("path_to_file.csv", directed=False, sep=',')
ggvec_model = nodevectors.GGVec() 
embeddings = ggvec_model.fit_transform(G)

The read_edgelist can take all the file-reading parameters of pandas.read_csv. You can also specify whether the graph is undirected (so all edges go both ways) or directed (so edges are one-way)

Best algorithms to embed a large graph

The ProNE and GGVec algorithms are the fastest. GGVec uses the least RAM to embed larger graphs. Additionally here are some recommendations:

  • Don't use the return_weight and neighbor_weight if you are using the Node2Vec algorithm. It necessarily makes the walk generation step 40x-100x slower.

  • If you are using GGVec, keep order at 1. Using higher order embeddings will take quadratically more time. Additionally, keep negative_ratio low (~0.05-0.1), learning_rate high (~0.1), and use aggressive early stopping values. GGVec generally only needs a few (less than 100) epochs to get most of the embedding quality you need.

  • If you are using ProNE, keep the step parameter low.

  • If you are using GraRep, keep the default embedder (TruncatedSVD) and keep the order low (1 or 2 at most).

Preprocessing to visualize large graphs

You can use our algorithms to preprocess data for algorithms like UMAP or T-SNE. You can first embed the graph to 16-400 dimensions then use these embeddings in the final visualization algorithm.

Here is an example of this on the full english Wikipedia link graph (6M nodes) by Owen Cornec:

alt tag

The GGVec algorithm often produces the best visualizations, but can have some numerical instability with very high n_components or too high negative_ratio. Node2Vec tends to produce elongated and filamented structures in the visualizations due to the embedding graph being sampled on random walks.

Embedding a VERY LARGE graph

(Upcoming).

GGVec can be used to learn embeddings directly from an edgelist file (or stream) when the order parameter is constrained to be 1. This means you can embed arbitrarily large graphs without ever loading them entirely into RAM.

Related Projects

  • DGL for Graph Neural networks.

  • KarateClub for node embeddings specifically on NetworkX graphs. The implementations are less scalable, because of it, but the package has more types of embedding algorithms.

  • GraphVite is not a python package but has GPU-enabled embedding algorithm implementations.

  • Cleora, another fast/scalable node embedding algorithm implementation

Why is it so fast?

We leverage CSRGraphs for most algorithms. This uses CSR graph representations and a lot of Numba JIT'ed procedures.

nodevectors's People

Contributors

cthoyt avatar kannankumar avatar ldorigo avatar vhranger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nodevectors's Issues

Why is generating walks so slow with non-default parameters?

I initially arrived at this code via your blog post https://www.singlelunch.com/2019/08/01/700x-faster-node2vec-models-fastest-random-walks-on-a-graph/#note-3-692 - and indeed the speedup with default parameters (q=1,p=1) is impressive.

But as you also mention in the readme, much of that is lost when using non-default parameters. I have a network of 100k nodes and 1M edges, and the "default" walk generation takes 14 seconds, while trying different parameters takes well over 10 hours. Is there anything that can be done to improve speed for different values of p and q? much of the flexibility of Node2Vec comes form being able to capture local vs. global information by tuning the parameters, and even the Node2Vec paper shows that the best results are usually obtained with values for p and q that are different from 1.

G.mat got an asymmetric sparse matrix

Hello! Thanks for the great great work!
I encountered an issue while using nodevectors to train the prone embeddings:
I ran
G = cg.read_edgelist("..", directed=True, sep=',')
g2v = ProNE()
g2v.fit(G)

and I got:

ValueError Traceback (most recent call last)
Input In [34], in <cell line: 2>()
1 g2v = ProNE()
----> 2 g2v.fit(G)

File ~/miniforge3/envs/alphaA/lib/python3.8/site-packages/nodevectors/prone.py:82, in ProNE.fit(self, graph)
78 G = cg.csrgraph(graph)
79 features_matrix = self.pre_factorization(G.mat,
80 self.n_components,
81 self.exponent)
---> 82 vectors = ProNE.chebyshev_gaussian(
83 G.mat, features_matrix, self.n_components,
84 step=self.step, mu=self.mu, theta=self.theta)
85 self.model = dict(zip(G.nodes(), vectors))

File ~/miniforge3/envs/alphaA/lib/python3.8/site-packages/nodevectors/prone.py:154, in ProNE.chebyshev_gaussian(G, a, n_components, step, mu, theta)
151 return a
152 print(G.shape)
--> 154 A = sparse.eye(nnodes) + G
155 DA = preprocessing.normalize(A, norm='l1')
156 # L is graph laplacian

File ~/miniforge3/envs/alphaA/lib/python3.8/site-packages/scipy/sparse/base.py:414, in spmatrix.add(self, other)
412 elif isspmatrix(other):
413 if other.shape != self.shape:
--> 414 raise ValueError("inconsistent shapes")
415 return self._add_sparse(other)
416 elif isdense(other):

ValueError: inconsistent shapes

I further check the error and it showed that the G.mat is an asymmetric sparse matrix with shape (830421x830420)
Could you please give me any clue on this?

Node2Vec IndexError: list index out of range

Hi,

After the fix of VHRanger/CSRGraph#3. I was successfully able to load my dataset in CSRGraph. But when I ran the following command, I get an error -
from nodevectors import Node2Vec
g2v = Node2Vec()
g2v.fit(G)

Error - ---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in
3 # way faster than other node2vec implementations
4 # Graph edge weights are handled automatically
----> 5 g2v.fit(G)

~/SageMaker/CSRGraph/nodevectors/nodevectors/node2vec.py in fit(self, nxGraph)
93 node_names = list(nxGraph)
94 G = cg.csrgraph(nxGraph, threads=self.threads)
---> 95 if type(node_names[0]) not in [int, str, np.int32, np.uint32,
96 np.int64, np.uint64]:
97 raise ValueError("Graph node names must be int or str!")

IndexError: list index out of range

Ids in my datafile are int64 datatype. Interestingly when I run the following command. I can execute successfully.
from nodevectors import GGVec
ggvec_model = GGVec()
embeddings = ggvec_model.fit_transform(G)

I run the given short example... (partial success)

Hello Mr. Matt Ranger,

I installed the nodevectors package on my Mac OS Sierra, I verified to have all the required Python packages available with 'pip list' and then tried to run the given short example as a filename.py file. Here the CL trace:

% python networkx-test.py
Making walks... Done, T=2.94
Mapping Walk Names... Done, T=0.08
Training W2V... Done, T=0.85
Traceback (most recent call last):
File "networkx-test.py", line 19, in
g2v = Node2vec.load('node2vec.pckl') # it gets blocked at this point.
NameError: name 'Node2vec' is not defined

...any hint/feedback/re-testing would be appreciated.
Thank you, BR

H.

Minor issues with the new release

There seems to be init.py missing in the evaluation folder which causes an error on import.

Additionally, umap is missing from the requirements.

Also, a small suggestion - when I ran into this issue today I tried installing the last version that worked for me (0.1.12), which is also broken since you don't specify package versions in your requirements. In this case your other package CSRGraph created a compatibility issue, so maybe you only need to specify the CSRGraph version since you're frequently updating it.

I really appreciate the work you've put into this package, when I was looking for a node2vec implementation many months ago yours was by far the cleanest and fastest. Thanks!

pypi

Hi,
Your package is great, but you should really put it on PyPi to make the installation easier.

TypeError: 'method' object is not iterable

I am getting this error when trying to run the unit tests.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-b3bd86b0b5de> in <module>()
      1 # Fit embedding model to graph
      2 g2v = Node2Vec()
----> 3 g2v.fit(G)

/home/dionysis/Documents/git_repos/graph2vec/graph2vec/graph.py in fit(self, nxGraph, verbose)
    356             Whether to print output while working
    357         """
--> 358         node_names = list(nxGraph.nodes)
    359         if type(node_names[0]) not in [int, str, np.int32, np.int64]:
    360             raise ValueError("Graph node names must be int or str!")

TypeError: 'method' object is not iterable

Do you think this line:
https://github.com/VHRanger/graph2vec/blob/8474f7ccf5d9b34d82fbf5ac16f04bcc37143cd6/graph2vec/graph.py#L358

Should change to this:

 node_names = list(nxGraph.nodes())

Reading the edgelist using CSRGraphs.

Thanks for this great work.

I have a big graph of size 10 GB I use CSRGraphs to load the edgelist and compute the node embedding using node2vec. But, I got this problem while reading a graph. Here is the error I encountered for what I mean.

import csrgraph as cg
G = cg.read_edgelist("karate.txt",sep = "\t")
TypeError: sort_values() got an unexpected keyword argument 'ignore_index'

Any suggestion to fix this.
Thanks in advance.

Node2Vec:About the return_weight and neighbor_weight

Dear author,
I read the source code of the Node2vec, and found that the default value of return_weight and neighbor_weight is equal to 1, Isn't that deepwalk?
However, if I change the value of the return_weight and neighbor_weight, then the speed will be very slow,I want to customize the embedding of BFS and DFS, how to keep it fast?

Handling edge weights?

Hello,

First, thanks for a great package. The performance boost compared to other implementations is pretty incredible.

One thing I don't see is support for using edge weights in the input graph. Is there a way to do this now, or are there plans to add this functionality?

All the best,
Chad

Issue with gensim 4.0.0+

It appears one of the argument names has changed in the newly released version of GenSim. This has also caused some pain in other libraries using this package for node2vec implementations (e.g., krishnanlab/PecanPy#16)

Traceback (most recent call last):
  File "embed_nodevectors.py", line 150, in <module>
    main()
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "embed_nodevectors.py", line 137, in main
    model.fit(graph)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/nodevectors/node2vec.py", line 130, in fit
    self.model = gensim.models.Word2Vec(
TypeError: __init__() got an unexpected keyword argument 'size'

Error reading in CSR graph

I am trying to load a 150MB edgelist in csr graph using the command G = cg.read_edgelist("samplelist.edgelist", sep="\t")
But I get the following error:
ValueError Traceback (most recent call last)
in
2 import nodevectors
3
----> 4 G = cg.read_edgelist("samplelist.edgelist", sep="\t")

~/anaconda3/envs/python3/lib/python3.6/site-packages/csrgraph/graph.py in read_edgelist(f, sep, header, **readcsvkwargs)
457 SRC: {elist.src.max()}, {elist.src.min()}
458 DST: {elist.dst.max()}, {elist.dst.min()}
--> 459 """)
460 elist.src = elist.src.astype(np.uint32)
461 elist.dst = elist.dst.astype(np.uint32)

ValueError:
Invalid uint32 value in node IDs. Max/min :
SRC: 8278237827, 15830
DST: 8237827382738273827382, 2111364

is it possible to split n2v to generate walks only?

Hi! ,

I am using node2vec to generate walks on graphs which i then pass to a different gensim modified by another tool (ths is for alignment of temporal models) -

Given the speed of carrying out walks with nodevectors - is it possible to separate the walks from the .fit method (as in have an option to ONLY carry out the walks without fitting the model so that i can just then save the walks to take on to the next tool?

thanks!

w2vparams["batch_word"] default parameter cripples node2vec's performance

The Node2Vec class constructor sets the default value of w2vparams["batch_words"] to 128. The default value in gensim's lib is 10000. According to their docs:

batch_words (int, optional) – Target size (in words) for batches of examples passed to worker threads (and thus cython routines).(Larger batches will be passed if individual texts are longer than 10000 words, but the standard cython code truncates to that maximum.)

I don't know what exactly it does behind the scenes, but using the current default value of 128 severely affects the training performance.

Line of code:

"batch_words":128}):

enable export walks

Very nice project. Here is a suggestion: Would be great to be able to call n2v.walks and get a list of all generated random walks after running the fit(). I think it should be an easy upgrade :)

Load into W2V does not work

Awesome work! Unfortunately, when I load my bin file, I get the following error message:
ValueError: invalid vector on line 0 (is this really the text format?)

Any suggestions? There are spaces in the node names (e.g., 'Leonardo da Vinci').

ProNE option: "inconsistent shapes" error

I get an:

raise ValueError("inconsistent shapes")

from:
./nodevectors/prone.py line 61 in fit_transorm
./nodevectors/prone.py line 152, in chebyshev_gaussian
.../scipy/sparse/_base.py line 471, in add

the defaults work for small graph, ~ tens of thousands, but fail for 7M nodes and 50M edges graph

About painting

Hello,Could you share the Wikipedia 6M.png and 3d graph.png drawing code?

How to increase number of components(features) in output vectors

Currently, n_components is set to 32 in all available algorithms like node2vec, GGVec etc. How can I increase to 128? I tried modifying the .py files of these algorithms to increase from 32 to 128. But it did not work. Once I set n_components=128 in .py files and imported package again, running algorithm still outputs vector that has 32 components.

How to get node's list

I have trained and saved the model with

    import csrgraph as cg
    import nodevectors
    G = cg.read_edgelist("edges.txt", directed=False, sep=' ')
    ggvec_model = nodevectors.GGVec()
    embeddings = ggvec_model.fit_transform(G)
    ggvec_model.save("embeddings.emb")

Now I want to load and iterate over the embeddings but I'm unable to find any method that returns the nodes list.

import nodevectors
ggvec_model = nodevectors.GGVec()
ggvec_model.load("embeddings.emb.zip")

Jupyter notebook kernel dies while computing ggvec embeddings

I am loading about 7MM edges in a graph object using networkx and then running
import nodevectors
ggvec_model = nodevectors.GGVec()
embeddings = ggvec_model.fit_transform(G)

After running for a few minutes jupyter notebook kernel dies. Is there any way forward in this scenario ?

Node2Vec in a large graph

Hi,
Thanks for the clarification to solve the Issue number 27. Now that works fine after I upgrade the csrgraph to version 0.1.27. Now the next issue is that I got Segmentation fault while running node2vec. Is there any suggestion to fix this?

G = csr_matrix(G)
n2v_model = nodevectors.Node2Vec()
n2v_model.fit(G)

Segmentation fault

Do I need to update the nodevectors package as well after I update the csrgraph ? If so which version is needed?

Thanks in advance !

Print training progression (node2vec)?

Hi, is there any way to monitor training progression? Even with verbose=True, nothing gets printed out after "Mapping Walk Names... Done" (and if the training can be expected to take several hours, it's a bit annoying to have no idea if anything is actually happening).

node2vec uses CBOW instead of skip-gram

Node2vec and DeepWalk original proposals are built upon the skip-gram model. By default, nodevectors does not set the parameter w2vparams["sg"] to 1, therefore the underlying Word2Vec model uses the default value of 0, which means using CBOW instead of skip-gram. This has major consequences in the quality of the embeddings.

defining random state or seed option parameters

I need the option to assign random state or seed values to get stable results. I don't think there is such an option.
Unfortunately, my attempts to fix the general seed that I have listed below did not solve the problem.

import random
random.seed(1)
from numpy.random import seed
seed(1)

What can be done about it? Do you have any advice?
thanks in advance

Rename repo/package

Unfortunately, graph2vec has already been used in 2017 in a paper on representation learning for whole graphs (not nodes). Link: https://arxiv.org/abs/1707.05005, implementations at https://github.com/MLDroid/graph2vec_tf (author) and https://github.com/benedekrozemberczki/graph2vec (reimplementation)

Also, graph2vec has already been taken on PyPI by another project https://pypi.org/project/graph2vec, but I think you're aware of this.

I think the solution was to rename this to graph2vec-learn but I would encourage you pick a more informative name because this doesn't alleviate the original name conflict.

Either way, could you please update the name of this repo so the PyPI project matches the repo and folder inside the repo?

Bug when Train gensim word2vec model on random walks

import networkx as nx
from nodevectors import Node2Vec
# the edgelist file has 895608 lines
nx.read_weighted_edgelist('edgelist',create_using=nx.DiGraph)
g2v = Node2Vec(n_components=dimension,verbose=True)
g2v.fit(G)

Here is the error trace.

File "./lib/utils/twitter_data.py", line 410, in _learn_node2vec_nodevectors
g2v.fit(G)
File "./venv/lib64/python3.6/site-packages/nodevectors/node2vec.py", line 133, in fit
**self.w2vparams)
File "./venv/lib64/python3.6/site-packages/gensim/models/word2vec.py", line 591, in init
self.wv = Word2VecKeyedVectors(size)
File "./venv/lib64/python3.6/site-packages/gensim/models/keyedvectors.py", line 380, in init
super(WordEmbeddingsKeyedVectors, self).init(vector_size=vector_size)
File "./venv/lib64/python3.6/site-packages/gensim/models/keyedvectors.py", line 218, in init
self.vectors = zeros((0, vector_size), dtype=REAL)
TypeError: 'str' object cannot be interpreted as an integer

word2vec parameters changed

Hi!

In node2vec.py, you should modify the 'iter' parameter to 'epochs' and the 'size' parameter to 'vector_size'.

(And thank you for the library, I use it extensively in my research!)

fit_transform tries to query non-existent node "0"

from nodevectors import Node2Vec
import networkx as nx

G = nx.Graph()
G.add_edge("1", "2")
n2v = Node2Vec(n_components=128)
n2v.fit_transform(G)

Output:

Making walks... Done, T=3.98
Mapping Walk Names... Done, T=0.07
Training W2V... WARNING: gensim word2vec version is unoptimizedTry version 3.6 if on windows, versions 3.7 and 3.8 have had issues
Done, T=0.39
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-43e45de9791e> in <module>
      2 G.add_edge("1", "2")
      3 n2v = Node2Vec(n_components=128)
----> 4 n2v.fit_transform(G)

~/miniconda3/envs/graphs/lib/python3.7/site-packages/nodevectors/node2vec.py in fit_transform(self, G)
    151             pd.DataFrame.from_records(
    152             pd.Series(np.arange(len(G.nodes)))
--> 153               .apply(self.predict)
    154               .values)
    155         )

~/miniconda3/envs/graphs/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4106             else:
   4107                 values = self.astype(object)._values
-> 4108                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4109 
   4110         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

~/miniconda3/envs/graphs/lib/python3.7/site-packages/nodevectors/node2vec.py in predict(self, node_name)
    166         if type(node_name) is not str:
    167             node_name = str(node_name)
--> 168         return self.model.wv.__getitem__(node_name)
    169 
    170     def save_vectors(self, out_file):

~/miniconda3/envs/graphs/lib/python3.7/site-packages/gensim/models/keyedvectors.py in __getitem__(self, entities)
    351         if isinstance(entities, string_types):
    352             # allow calls like trained_model['office'], as a shorthand for trained_model[['office']]
--> 353             return self.get_vector(entities)
    354 
    355         return vstack([self.get_vector(entity) for entity in entities])

~/miniconda3/envs/graphs/lib/python3.7/site-packages/gensim/models/keyedvectors.py in get_vector(self, word)
    469 
    470     def get_vector(self, word):
--> 471         return self.word_vec(word)
    472 
    473     def words_closer_than(self, w1, w2):

~/miniconda3/envs/graphs/lib/python3.7/site-packages/gensim/models/keyedvectors.py in word_vec(self, word, use_norm)
    466             return result
    467         else:
--> 468             raise KeyError("word '%s' not in vocabulary" % word)
    469 
    470     def get_vector(self, word):

KeyError: "word '0' not in vocabulary"

Fitting and then predicting works fine:

n2v.fit(G)

for node in G:
    print(n2v.predict(node))

Output:

Making walks... Done, T=0.00
Mapping Walk Names... Done, T=0.06
Training W2V... WARNING: gensim word2vec version is unoptimizedTry version 3.6 if on windows, versions 3.7 and 3.8 have had issues
Done, T=0.38
[ 0.01669522  0.01119813 -0.00566072 -0.0134473   0.01121703  0.00379648
  0.01170088 -0.0121789  -0.01429367 -0.00849178  0.00943886 -0.00981773
  0.00337284 -0.0013884  -0.01287963 -0.00460479 -0.00217993 -0.01019352
  0.00615602 -0.00658679  0.01679845 -0.00747446  0.0019177  -0.00912566
 -0.01688758  0.00983168  0.00286994  0.00739604  0.01249113  0.00116864
  0.00235101 -0.01515406 -0.00786685 -0.01675885 -0.01421799 -0.00829282
 -0.00385966 -0.00779916 -0.00067812  0.01312324  0.0154448  -0.0107193
 -0.00059914 -0.00439935 -0.01970238 -0.00585162 -0.01741348 -0.00118494
 -0.01365886 -0.007099    0.00806013 -0.00448715 -0.00633816 -0.009869
  0.01835089  0.01462685  0.00408294  0.01042183  0.00773886  0.00500051
  0.00697436 -0.00052141 -0.00307364  0.00916708 -0.0059573  -0.00794462
  0.00316458 -0.01120937  0.00820292 -0.00175512 -0.00426679  0.00403081
  0.0036373  -0.00538955  0.00169757 -0.00476247  0.00011785 -0.00015604
 -0.02005355  0.00293106 -0.00457922  0.01199162 -0.01039407 -0.00975906
 -0.00386479  0.00380202  0.0150509   0.00117078  0.01009431 -0.01518334
 -0.01550014 -0.00316153 -0.01638743  0.00911983 -0.00656796 -0.01130522
  0.00696332  0.00222521 -0.01348531  0.01745371 -0.01043333  0.00377076
  0.00168364 -0.01029514 -0.01187336 -0.00047892  0.01747731  0.01539742
 -0.00317966  0.01036133  0.00348293  0.00357884  0.01691393 -0.01314759
 -0.00387712  0.01349622  0.00886216  0.01269572 -0.014981    0.01047694
 -0.01591979  0.00815849  0.0053769  -0.01705019  0.00478466 -0.00967307
  0.00100743 -0.00627678]
[ 1.74459908e-02  9.29250382e-03 -5.62654436e-03 -1.58256646e-02
  6.62352284e-03 -1.04596815e-03  7.46087125e-03 -1.52283600e-02
 -1.47760203e-02 -4.99586575e-03  8.37715156e-03 -1.14215305e-02
  8.03218782e-03 -4.57122130e-03 -1.37374401e-02 -6.70122309e-03
  5.60258329e-03 -1.36625227e-02  2.69854977e-03 -2.01221928e-03
  1.41100660e-02 -1.21530667e-02  7.38256099e-03 -7.29203923e-03
 -1.45003749e-02  8.89602769e-03 -1.07536477e-03  1.66074419e-03
  7.48369843e-03  8.18155764e-04  3.80413979e-03 -1.41491415e-02
 -1.12004904e-03 -1.57257933e-02 -1.23076690e-02 -9.28518735e-03
 -5.15399221e-03 -5.42826438e-03  9.19695070e-04  9.03129764e-03
  1.57911442e-02 -5.36569115e-03 -1.36574614e-03 -2.82609137e-03
 -1.89300030e-02 -5.67972986e-03 -1.65421404e-02 -3.22455773e-04
 -1.18535999e-02 -7.90045224e-03  9.72144585e-03 -7.91174080e-03
 -4.45207767e-03 -1.19799254e-02  1.93504207e-02  1.06750363e-02
  4.26934101e-03  1.17199738e-02  6.25003641e-03  1.98470801e-03
  4.88949660e-03  7.53012951e-04 -8.29974841e-03  6.85363356e-03
 -2.72968784e-03 -5.58869634e-03  1.48452440e-04 -8.40961654e-03
  3.35645187e-03 -3.52724968e-03  3.98239447e-03 -2.40911031e-03
  4.06429684e-03 -3.92150227e-03  6.94983220e-03 -8.35845713e-03
  9.88924527e-04 -1.79716619e-03 -1.90840866e-02  2.46768352e-03
 -4.37452644e-03  1.30511560e-02 -6.40019309e-03 -1.33609995e-02
  3.72520881e-04  5.42262476e-03  1.41993044e-02  7.35963322e-03
  1.08134123e-02 -1.49347940e-02 -1.22990599e-02 -9.69778374e-03
 -1.74602009e-02  8.74316972e-03 -5.31877764e-03 -7.91502465e-03
  3.98375420e-03  4.59250668e-03 -1.26426788e-02  1.60577614e-02
 -1.03733260e-02  4.70442930e-03  6.72380021e-03 -1.34339379e-02
 -1.50517235e-02  3.45687894e-03  1.50700649e-02  1.58219878e-02
  4.28991532e-03  9.33015719e-03  7.03065936e-03  3.41207208e-03
  1.49237625e-02 -1.07398266e-02 -1.00340396e-02  9.12039913e-03
  1.27081424e-02  1.08739929e-02 -1.16528282e-02  4.42440435e-03
 -1.53663196e-02  3.64650693e-03  5.37529076e-03 -1.76296048e-02
  3.67483153e-05 -7.88922701e-03 -5.40610822e-03 -1.80462585e-03]

Could not broadcast input array

Hi VHRanger,
Thanks for your great works. I am trying to run node2vec with around 4 million nodes and more than 48 million edges. But I got this issue. Can you give me some advice to deal with this big graph?

sys:1: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
  File "node2vec/graph_builder/csr.py", line 26, in <module>
    gr = CSRGraphNode2Vec()
  File "node2vec/graph_builder/csr.py", line 9, in __init__
    self.graph = cg.read_edgelist(file_path, directed=False, sep=',')
  File "/data/quocpbc/anaconda3/lib/python3.8/site-packages/csrgraph/graph.py", line 523, in read_edgelist
    G = methods._edgelist_to_graph(
  File "/data/quocpbc/anaconda3/lib/python3.8/site-packages/csrgraph/methods.py", line 31, in _edgelist_to_graph
    new_src[1:] = np.cumsum(np.bincount(src, minlength=nnodes))
ValueError: could not broadcast input array from shape (2147483649) into shape (4790294)

some problem on the accuracy

Hi,

I an testing the code with blogcatelog datasets(download from the OpenNe a repository of github) with your work.
Additionally, I have compare the multi-class result with the code in OpenNe.
In my test, if I use 10 percent of data as training data, the result of your work is
{'micro': 0.25313039723661485, 'macro': 0.12076017464146425};

In the same time, the code of OpenNe has achieve
{'micro': 0.2903713298791019, 'macro': 0.1674684546080052};

I am very confused about this, because I find both of you use the gensim. I simplely think the problem occur in the node samples.

I haven't do a deeper job right now but your code is really inspired me that it could used in a huge number of nodes. A good spark in Graph process.

Best regards,
Tade

Continue fitting process

Hi,
Can I update node embeddings given and already trained model? I want to fit a model but then I want to update the network periodically and update the node embedding and not start from zero.

Node2Vec Segmentation Fault

Hi,

Thanks for solving previous issue #19. However, now I am receiving segmentation fault error on running
from nodevectors import Node2Vec
g2v = Node2Vec()
g2v.fit(G)

Additionally, when I pip3 install CSRGraph and nodevectors, installation completes, but when I import them, I get No module found error.

Tuning model

Is there a way to pass parameters (e.g., epoch=100) in the command line?

For example:
g2v = Node2Vec()
g2v.fit(GCC, walklen=30, epochs=100 )

Thanks!

Dealing with unseen nodes

Hi! Thanks for your library!

I'm using it to vectorize network graph - graph of IPs communicated with each other. What do you think might be an approach when dealing with new previously unseen IPs (nodes)?
It seems like there are no other options than retrain n2v model from scratch.

In my case skipping them is not an option, and I don't see how I can use tricks from NLP like using synonymous to the unseen word.

I would be grateful for any thoughts or suggestions.

Cheers,
Alex

Has node2vec implementation been updated to use skip-gram as default?

Hi!,

Related to #40

I was wondering if node2vec now uses skip-gram by default (I cannot see it anywhere in the source code, but i am sure i am missing it!!)

If it hasn't, does the following line of code automatically set sg=1 if i add this?

n2v = Node2Vec(n_components=32, walklen=80, epochs=100, keep_walks=True, w2vparams={'sg':1}) 
n2v.fit(nx_graph)

I want to be sure this is correct, as when i set {'sg': 50} (just a very silly example to invoke an error), no error is thrown - and so I wonder if w2vparams={'sg':1} is actually selecting skip-gram instead of CBOW or if I am doing something incorrectly. Any advice (or the right way to do it) is appreciated :)

Secondly: instead of saving embeddings and then loading them as keyedvectors with word2vec - is there a way of converting the fitted object (n2v above) directly to a Word2Vec gensim object?

Thank you!

ProNE multithread

Just want to know if ProNE is multithreaded? Is there a way to control the number of threads like the implemented Node2Vec?

Old parameter shows up in Word2Vec call

size=self.n_components,

This line refers to the old size parameter in gensim Word2Vec. It looks like the parameter was renamed to vector_size ref.

Getting this error:

    129 # Train gensim word2vec model on random walks
--> 130 self.model = gensim.models.Word2Vec(
    131     sentences=self.walks,
    132     size=self.n_components,
    133     **self.w2vparams)
    134 if not self.keep_walks:
    135     del self.walks

TypeError: __init__() got an unexpected keyword argument 'size'

When running with gensim==4.3.2

TypeError: 'NoneType' object is not subscriptable` Node2Vec

Hi,

Thanks for this great module. I have a large sparse csr graph of 10GB and I wanted to learn the node embedding using Node2Vec. However, I am keep getting this error:
TypeError: 'NoneType' object is not subscriptable

To reproduce this error in my machine here is my toy script:

from scipy.sparse import csr_matrix
import numpy as np
import nodevectors
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 1, 1, 1, 1, 1])
G = csr_matrix((data, (row, col)), shape=(3, 3))
n2v_model = nodevectors.Node2Vec()
n2v_model.fit(G)

Isn't it true that Node2Vec() module directly works with csr_matrix? I even tried the converting CSR matrix to CSRGraphs but stll get the same error. Any help would be great?

import csrgraph as cg
G = cg.csrgraph(G)
n2v_model = nodevectors.Node2Vec()
n2v_model.fit(G)

TypeError: 'NoneType' object is not subscriptable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.