Code Monkey home page Code Monkey logo

bear's Introduction

Bear

Python implementation of BEAR and SlashBurn. This is an experimental repo for PPR (personalized pagerank), but well, it does work in blazing speed.


Requirements

  • scipy
  • numpy
  • networkx (optional)
  • tensorflow (optional)

I used tensorflow for GPU acceleration of matrix computations. Why not pytorch or theano? PR it.


Documentation

All PPR implementation assumes that given graph has no sink (irreducible), which means normalized-tranposed google matrix should be a valid stochastic matrix. Note that any of approximation scheme can produce a non-stochastic vectors. You should normalize it if necessary.

Any of PPR class defined in ppr.py has 4(5) methods:

  • __init__(self, jump_prob=0.05, *args, **kwargs): jump_prob is a common option for all algorithms denoting a jumping probability for PPR, while other options are specified for each algorithm.
  • preprocess(self, filename): Reads matrix directly from csv file, then proceed to preprocessing step defined in each algorithm. The csv file must be csv format of graph edges, sorted in row-first order (see data/small.csv for sample). Note that we can compute the H matrix while reading!
  • query(self, q): Computes PPR for given query vector q. It should be a 1-d numpy array of dimension n. Returns pagerank vector.
  • save(self, filename): Just pickle dumping.
  • load(self, filename): Just pickle loading. Not fully implemented (PR it).

Four implemented PPR algorithms are:

  • PPRNetworkx: Computes PPR using networkx. Requires installation of the library. Note that this is the slowest iterative method (since networkx is implemented for general purpose).
  • PPRIterative: Computes PPR using iterative method. Simplest baseline. No preprocessing, but slow in query-time.
  • PPRLUDecomposition: Computes PPR using LU decomposition after ordering nodes with degree of nodes.
  • PPRBear: Computes PPR using BEAR with SlashBurn. I do know that this PPR prefix is lame.

Usage

Assume graph is store in some csv file named small.csv with row-first order (must be):

0,1
1,0
1,2
2,1
2,3
...

(This example data can be found in data/small.csv.) You can compute a simple personalized pagerank via following code.

import numpy as np
from ppr import PPRBear as Bear
bear = Bear()
bear.preprocess('data/small.csv')
r = bear.query(np.ones(15)/15)
print(r.sum())  # 1.0

Simple benchmark code is located in benchmark.py. But well, you should try these algorithms on graphs with millions of nodes.

Note that there are also many useful functions in utils.py! Specially, full implementation of Strongly Connected Components (Kosaraju's algorithm) and SlashBurn (Kang's algorithm) is in there.

from utils import scc
nodes = [1, 2, 3, 4]
edges = [(1, 2), (2, 1), (3, 4), (4, 3)]
ccs = scc(nodes, edges)
print(ccs)  # [[3, 4], [1, 2]]
import numpy as np
from scipy.sparse import coo_matrix
from utils import verbose_matrix, reorder_matrix, slashburn
A = np.array([[1, 1, 0, 0], [1, 1, 1, 1], [0, 1, 1, 0], [0, 1, 0, 1]], dtype=np.int32)
A = coo_matrix(A)
print(verbose_matrix(A))
"""
1 1
1 1 1 1
  1 1
  1   1
"""
perm, wing = slashburn(A)
print(wing)  # 1
A = reorder_matrix(A, perm)
print(verbose_matrix(A))
"""
1     1
  1   1
    1 1
1 1 1 1
"""

Using with Tensorflow

They're located in ppr_tf.py:

  • PPRIterativeTF
  • PPRLUDecompositionTF
  • PPRBearTF

Due to the initialization of tensorflow variables, preprocessing steps are merged into __init__. See benchmark.py for more detailed usage. Currently, since tensorflow does not support various sparse matrix manipulations like LU decomposition solver, only sparse-dense multiplication is used. Note that my GPU is GeForce GTX 1080. Results of some experiments are located in profile.txt. You can download some large datasets in here.

bear's People

Contributors

theeluwin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.