Code Monkey home page Code Monkey logo

aspem's Introduction

AspEm

This repository provides codes and data for the paper:

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks
Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han.
In Proceedings of the 2018 SIAM International Conference on Data Mining, SIAM, 2018.

Particularly, it includes (1) a reference implementation of incompatibility measure, (2) ad-hoc implementations of the single-aspect embedding algorithm for the datasets used in the paper, (3) the IMDb dataset (the full DBLP dataset is excluded from this repository due to its file size), and (4) the class labels used in the DBLP classification tasks.

Basic Usage

Input

  1. The supported input HIN file should contain all edges of the input HIN. Each line corresponds to an edge, with the format

     node_1 node_2 edge_weight edge_type
    

    Note that node_1 and node_2 should be in the form

     node_type:node_name
    

    An example input HIN file can be found at data/imdb/imdb.hin.

  2. Additionally, to run the ad-hoc implementation of the single-aspect embedding algorithm for star-schema datasets, one should also have a file of all center nodes (e.g., data/imdb/movie.node) and a file of all attribute nodes as input (e.g., data/imdb/uadg.node).

Execute

To measure the incompactibility of all base aspects in an HIN:

$ python src/calc_base_aspect_inconsistency.py --input $input-hin-file --output $base-aspect-inc-file [optional: --sample-rate $sample-rate] 

To aggregate incompactibility for all base aspects from the result of the previous step:

$ python src/agg_aspect_inconsistency.py $base-aspect-inc-file 

As an exmaple, to calculate the incompatibility of each aspect of the IMDb dataset, execute the following commands sequentially:

$ python src/calc_base_aspect_inconsistency.py --input data/imdb/imdb.hin --output data/imdb/imdb_base_aspect_inc.csv 
$ python src/agg_aspect_inconsistency.py $data/imdb/imdb_base_aspect_inc.csv 

To execute the ad-hoc implementation of the embedding algorithm, one should makefile in the corresponding source code directory in src/, and then execute the binary code in its bin/. The argument -types specifies the attribute node types involved in the current aspect with the following mapping: in IMDb -- u for user, a for actor, d for director, g for genre; in DBLP -- a for author, p for reference, v for venue, w for term, y for year.

As an example, to embed the IMDb network with only attribute node types user and director, execute the following commands sequentially:

$ cd src/emb_imdb/; make; cd ../..
$ ./src/emb_imdb/bin/emb_imdb  -types ud -hin data/imdb/imdb.hin -center data/imdb/movie.node -attribute data/imdb/uadg.node -output data/imdb/attribute.emb -output-center data/imdb/center.em``

Class Labels for DBLP Classification

In the DBLP experiment of the paper, two classification tasks were conducted based on the two class label files in

data/class_label/

Citing

If you find PReP useful for your research, please consider citing the following paper:

@inproceedings{shi2018aspem,
author = {Shi, Yu and Gui, Huan and Zhu, Qi and Kaplan, Lance and Han, Jiawei},
 title = {AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks},
 booktitle = {Proceedings of the 2018 SIAM International Conference on Data Mining},
 year = {2018},
 organization={SIAM}
}

Miscellaneous

Please send any questions you might have about the codes and/or the algorithm to [email protected].

Note: This is only a reference implementation of the AspEm algorithm. As discussed in the paper, AspEm is a flexible framework and one can choose their favorite network embedding algorithm to embed every single aspect.

aspem's People

Contributors

ysyushi avatar

Watchers

Hardik avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.