Code Monkey home page Code Monkey logo

receipt's Introduction

Parallel Bipartite Network Peeling (PBNG) Framework

Parallel Tip and Wing Decomposition to mine hierarchical dense structures in bipartite graphs

Introduction

PBNG implements a two-phased peeling for scalable parallel tip and wing decomposition on shared-memory multi core servers. It partitions the vertices/edges into smaller subsets and creates independent tasks to process the partitions concurrently. It can decompose the largest open bipartite datasets in few minutes, orders of magnitude faster than the baselines.

Graphical Illustration of PBNG
Graphical Illustration of two-phased peeling in PBNG

Compile

make

It generates the following executables:

  1. decomposeParWing - parallel edge peeling (for wing decomposition)
  2. decomposeParTip - parallel vertex peeling (for tip decomposition)

Baselines:

  1. decomposeSeqWing - sequential edge peeling (for wing decomposition)
  2. decomposeSeqTip - sequential vertex peeling (for tip decomposition)
  3. decomposePCWing - progressive compression approach for wing decomposition

Baseline References:

  1. Sequential Tip/Wing Decomposition algorithm
  2. Progressive Compression Wing Decomposition

Prerequisites

Boost Sort Parallel Library

Run

Wing Decomposition

numactl -l ./decomposeParWing -i <inputFile> -o <outputFile> -t <# threads> -p <# partitions to create> 

Tip Decomposition

numactl -l ./decomposeParTip -i <inputFile> -o <outputFile> -t <# threads> -p <# partitions to create> -s <peelSide>

Arguments

  1. -i : text file containing input graph in COO format (edge list)
  2. -o : output file where tip/wing numbers will be written (optional)
  3. -t : numeric value specifying number of threads to use for decomposition (optional, default = 1)
  4. -p : numeric value specifying number of partitions to create (optional, recommended for tip decomposition = 150, for wing decomposition = 400)
  5. -s : enum. Use "-s 0" to peel vertex set U (LHS in input file) and "-s 1" to peel set "V" (RHS in input file) (optional, default = 0)

Options -t and -p are not required for sequential decomposition baselines.

Input

The input file should represent graph in an edge list format where each line is a tuple of two integers as shown below:

u v

This indicates that there is an edge between vertices u and v.

The vertices in left column constitute the U set and those in right column constitute set V.

An example input file is given in the datasets directory.
PBNG has been tested extensively on large bipartite graphs from KOBLENZ collection and Network Data Repository.

Output

Wing Decomposition

The output is written in the specified file in a text format.
Every line in the output file is a tuple of three integers as shown below:

u v t

where u and v are the vertices of an edge and t is it's wing number

Tip Decomposition

The output is written in the specified file in a text format.
Every line in the output file is a tuple of two integers as shown below:

u t

where u is the vertex id and t is it's tip number

Paper

Please refer to the paper RECEIPT: REfine CoarsE-grained IndePendent Tasks for Parallel Tip decomposition of Bipartite Graphs for details, and cite if you use the code.

receipt's People

Contributors

kartiklakhotia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

receipt's Issues

decomposeParWing: Interpretation of Output unclear

Hi,

thanks for sharing this interesting repository. I was experimenting with decomposeParWing but I am not sure what to make of the output.

Using the provided example dataset, I run the following command:

numactl -l ./decomposeParWing -i datasets/github.txt -o github_result.txt -t 8

And I get the following output:

12, 0, 934 
12, 2, 934 
12, 3, 816 
12, 4, 912 
12, 5, 934 
12, 6, 816 
12, 10, 816 
12, 11, 640 
14, 0, 1014 
14, 1, 1014 
...
177376, 15292, 0 
177377, 36949, 0 
177378, 94353, 0 
177379, 722, 0 
177380, 2513, 0 
177381, 668, 0 
177382, 44882, 0 
177383, 50589, 0 
177384, 887, 0 
177385, 52598, 0

There are two things that I do not understand, and maybe you could help me to clarify them.

First, the v values (or IDs) of the output start at 0, although the input data does not contain any zero IDs for v. Should I just add +1 to all of the v values?

Second, I noticed that the IDs of u in the input data only range until 56519. However, the output data goes to 177385. How to interpret these "new" IDs for u?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.