Code Monkey home page Code Monkey logo

subdue's Introduction

Subdue Graph Miner

The Subdue graph miner discovers highly-compressing patterns in an input graph. Compression is measured by the reduction in graph size after replacing each instance of the pattern with a single node. This version of Subdue can also find temporal patterns, i.e., the instances of the pattern also match in terms of the arrival order of the nodes and edges that comprise the pattern.

Starting with version 1.2, Subdue uses a polynomially-bounded, approximate graph isomorphism algorithm to improve efficiency while maintaining reasonable accuracy.

Subdue is designed to accept input graphs in the format produced by the Graph Stream Generator.

Author: Dr. Larry Holder, School of Electrical Engineering and Computer Science, Washington State University, email: [email protected].

Support: This material is based upon work supported by the National Science Foundation under Grant No. 1646640.

Running Subdue

CLI

Subdue is run using the following command-line format (requires Python 3):

python Subdue.py [options] <inputfile>

The options, input file format and output are described below.

networkx

Alternatively, if you work with networkx, you can execute Subdue.nx_subdue directly on the graph. See function's documentation for details. Simple usage is as follows:

from Subdue import nx_subdue
out = nx_subdue(graph, **params)

Options

The following options are available in Subdue.

--beam <n>

Number of patterns to retain after each expansion of previous patterns; based on their compression value. Default is 4.

--iterations <n>

Number of iterations of Subdue's discovery process. If more than 1, Subdue compresses the graph with the best pattern and then runs again using the compressed graph. If 0, then Subdue runs until no more compression (i.e., set to |E|). Default is 1.

--limit <n>

Number of patterns considered in each iteration of Subdue. A value of 0 implies |E|/2. Default is 0.

--maxsize <n>

Maximum size (#edges) of a pattern. A value of 0 implies |E|/2. Default is 0.

--minsize <n>

Minimum size (#edges) of a pattern. Default is 1.

--numbest <n>

Number of best patterns to report at end. Default is 3.

--overlap <overlap_type>

Controls how instances of a pattern may overlap. Possible overlap_type values are: none, vertex, edge. Overlap of "none" means no part of two instances of a pattern can overlap. Overlap of "vertex" means that any number of vertices can overlap, but no edges. Overlap of "edge" means that edges and vertices can overlap, but the two instances cannot be identical. Default is "none".

--prune

If enabled, Subdue removes any pattern whose value is worse than its parent pattern. Disabled by default.

--temporal

If enabled, Subdue discovers temporal patterns, i.e., patterns whose instances are not only isomorphic, but also match in terms of their vertex and edge arrival order. Disabled by default (i.e., static patterns that ignore timestamps).

--valuebased

If enabled, then all patterns with the top beam values are retained during the discovery process. Disabled by default.

--writecompressed

If enabled, Subdue writes the compressed graph after each iteration i to the file outputFileName-compressed-i.json, where outputFileName is the same as the input file name, but with .json removed if present. Disabled by default.

--writeinstances

If enabled, Subdue writes instances of the best pattern at iteration i as one graph to file outputFileName-instances-i.json. Disabled by default.

--writepattern

If enabled, Subdue writes the best pattern at iteration i to file outputFileName-pattern-i.json. Disabled by default.

Input File

The input file represents a graph in JSON format. An example is in the file inputgraph.json, and a visualization of this graph is in inputgraph.png. The input graph contains a JSON array of vertices and edges, as described below.

Vertex

A vertex is a JSON object with name "vertex" and whose value is a JSON object with the following properties.

  • id: A string identifier that uniquely identifies this vertex.
  • attributes: A JSON object of name/value pairs, where both the name and value are strings. For two vertices to match, all of their attributes must match.
  • timestamp: An integer value (as a string) representing the time at which this vertex first appeared in the graph. Timestamps are only used if Subdue is run with the --temporal option.

Edge

An edge is a JSON object with name "edge" and whose value is a JSON object with the following properties.

  • id: A string identifier that uniquely identifies this edge.
  • source: Vertex identifier string for the edge's source vertex.
  • target: Vertex identifier string for the edge's target vertex.
  • directed: Whether the edge is directed from source to target ("true"), or undirected ("false").
  • attributes: A JSON object of name/value pairs, where both the name and value are strings. For two edges to match, all of their attributes must match.
  • timestamp: An integer value (as a string) representing the time at which this edge first appeared in the graph. Timestamps are only used if Subdue is run with the --temporal option.

Output

Subdue outputs the top patterns according to their compression value along with their instances in the input graph. The file output.txt contains the output produced by Subdue on inputgraph.json using default options.

Questions?

Contact: Dr. Larry Holder, School of Electrical Engineering and Computer Science, Washington State University, email: [email protected].

subdue's People

Contributors

demetrio92 avatar holderlb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

subdue's Issues

The program runs a bit slow

Hi Dr. Holder,

I entered 8,000 nodes with 13,000 edges. I found that the program was running very slowly. I still had no results after running for one night. Is there any way to solve it?

Thank you for your code sharing!

Having trouble with installation

Hello, I was delighted to come across this implementation of the Subdue method. However, I am having trouble with the installation process. I've tried to install the package in two different virtual environments, using pip, and I get the same error each time that there seems to be a file missing. Do you have recommendations as to how the package should be installed?

global name 'UniqueValues' is not defined

Hi Dr. Holder,

When executing the script, I face the following error

Traceback (most recent call last):
  File "Subdue.py", line 156, in <module>
    main()
  File "Subdue.py", line 153, in main
    Subdue(parameters, graph)
  File "Subdue.py", line 107, in Subdue
    patternList = DiscoverPatterns(parameters, graph)
  File "Subdue.py", line 54, in DiscoverPatterns
    Pattern.PatternListInsert(extendedPattern, childPatternList, parameters.beamWidth, parameters.valueBased)
  File "/home/rishi/Downloads/PythonSubdue/Pattern.py", line 220, in PatternListInsert
    uniqueValues = UniqueValues(patternList)
NameError: global name 'UniqueValues' is not defined

I could not find the function "UniqueValues". Let me know the possible fix.

Directed networkx graphs

Hello,
Thank you for this great and maintained project.
While trying to use the algorithm on networkx directed graph (by running nx_subdue),
I've noticed that the internal presentation of the graph is undirected (i.e. the 'directed' property of the edges is 'False').
In the previous version it caused wrong pattern results. In the last Subdue version (with overlap parameter) it doesn't influence the results anymore, as far as I saw, but I'm afraid it can influence other things.

I'm adding a link to the problematic line, which I changed in my fork:
https://github.com/bar-ta/Subdue/blob/master/src/Graph.py#L93

It seems like we use a different version of networkx maybe. I'm using the last version: networkx==2.5. Which one do you use?
Thank you,
Bar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.