Code Monkey home page Code Monkey logo

prep's Introduction

PReP

This repository provides (1) a reference implementation of PReP as described in the paper:

PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks
Yu Shi, Po-Wei Chan, Honglei Zhuang, Huan Gui, and Jiawei Han.
In Proceedings of the 23nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2017.

and (2) the domain label for venues used in the experiment of the paper.

Basic Usage

Input

The supported input HIN should be provided with concerned meta-paths. Note that our implementation further assumes (1) each concerned meta-path has at least one path instance in the given HIN, and (2) the given HIN has not dangling nodes, i.e., nodes not attaching to any path instances under any given meta-paths.

There are three required input files.

  1. Matrix file contains the matrix for path counts between node pairs under each meta-paht, which is kept in a dok-based matrix format. The first line specifies the shape of the matrix:

     num_node_pair num_path_type
    

    The following lines are all the nonzero entries in the matrix, in the format of:

     node_pair_id meta_path_id path_count
    
  2. Node2pair file contains the indices of nodes that each node pair is made of, in the order of node pair indices. That is, the i-th line gives indices of nodes that form the node pair with id == i:

     node1_id node2_id
    
  3. Truth file contains the ground truth of each node pair, in the order of node pair indices. Each line should contain a single digit of 1 or 0. (For evaluation only.)

Hyperparameters to be specified as part of input include the number of clusters and beta for the PReP model (optional).

Execute

All the commands are executed from the project home directory.

To train PReP model:
python src/train.py matrix_file pair2node_file output_model_file num_clus [optional: beta]

To evaluate the ouput PReP model:
python eval/link_pred.py matrix_file pair2node_file model_file truth_file num_clus [optional: beta]

Alternatively, to run a shell script that first trains the model and then evaluates it:
./run.sh matrix_file pair2node_file model_file truth_file num_clus [optional: beta]

Example command

An example dataset can be found under the data folder, which is a subset of the Facebook dataset we used for evaluation reported in the paper.

To run the previous three commands on example dataset, execute respectively:
python src/train.py data/matrix.txt data/pair2node.txt data/example.model 15 1.e-4
python eval/link_pred.py data/matrix.txt data/pair2node.txt data/example.model data/truth.txt 15 1.e-4
./run.sh data/matrix.txt data/pair2node.txt data/example.model data/truth.txt 15 1.e-4

Domain Label for Venues

In the DBLP experiment of the paper, each venue is associated with one of the fourteen venue domains, where a venue domain corresponds to a computer science research area as defined in the Wikipedia page: List of computer science conferences.

We hereby release the mapping from venues to their domain labels that we have generated.

In the folder venue_label/, venue_index.txt and domain_index.txt specify the indices of venues and domain labels, respectively; and venue_to_domain.txt provides the mapping from venue indices to domain indices.

Citing

If you find PReP useful for your research, please consider citing the following paper:

@inproceedings{shi2017prep,
author = {Shi, Yu and Chan, Po-Wei and Zhuang, Honglei and Gui, Huan and Han, Jiawei},
 title = {PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks},
 booktitle = {Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 year = {2017},
 organization={ACM}
}

Miscellaneous

Please send any questions you might have about the codes and/or the algorithm to [email protected] or [email protected].

Note: This is only a reference implementation of the PReP algorithm and could benefit from several performance enhancement schemes, some of which are discussed in the paper.

prep's People

Contributors

ysyushi avatar

Stargazers

 avatar  avatar  avatar Chenrui Zhang avatar  avatar fdS avatar Yuri Dias avatar  avatar yuanke avatar Tin C. avatar GinobiLi avatar  avatar Guoji (Leo) Fu avatar Liyuan Liu avatar Xiaotao Gu avatar Shuqing Bian avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.