Code Monkey home page Code Monkey logo

hgat's Introduction

An implement of EMNLP 2019 paper "Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification" and its extension "HGAT: Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification" (TOIS 2021).

Thank you for your interest in our work! ๐Ÿ˜„

Requirements

  • Anaconda3 (python 3.6)
  • Pytorch 1.3.1
  • gensim 3.6.0

Easy Run

cd ./model/code/
python train.py

You may change the dataset by modifying the variable "dataset = 'example'" in the top of the code "train.py" or use arguments (see train.py).

Our datasets can be downloaded from Google Drive. PS: I have accidentally deleted some files, but I tried to restore them, hope they will run correctly.

Prepare for your own dataset

The following files are required:

./model/data/YourData/
    ---- YourData.cites                // the adjcencies
    ---- YourData.content.text         // the features of texts
    ---- YourData.content.entity       // the features of entities
    ---- YourData.content.topic        // the features of topics
    ---- train.map                     // the index of the training node
    ---- vali.map                      // the index of the validation nodes
    ---- test.map                      // the index of the testing nodes

The format is as following:

  • YourData.cites

    Each line contains an edge: "idx1\tidx2\n". eg: "98 13"

  • YourData.content.text

    Each line contains a node: "idx\t[features]\t[category]\n", note that the [features] is a list of floats with '\t' as the delimiter. eg: "59 1.0 0.5 0.751 0.0 0.659 0.0 computers" If used for multi-label classification, [category] must be one-hot with space as delimiter, eg: "59 1.0 0.5 0.751 0.0 0.659 0.0 0 1 1 0 1 0".

  • YourData.content.entity

    Similar with .text, just change the [category] to "entity". eg: "13 0.0 0.0 1.0 0.0 0.0 entity"

  • YourData.content.topic

    Similar with .text, just change the [category] to "topic". eg: "64 0.10 1.21 8.09 0.10 topic"

  • *.map

    Each line contains an index: "idx\n". eg: "98"

You can see the example in ./model/data/example/*


A simple data preprocessing code is provided. Successfully running it requires a token of tagme's account (my personal token is provided in tagme.py, but may be invalid in the future), Wikipedia's entity descriptions, and a word2vec model containing entity embeddings. You can prepare them yourself or obtain our files from Google Drive and unzip them to ./data/ .

Then, you should prepare a data file like ./data/example/example.txt, whose format is: "[idx]\t[category]\t[content]\n".

Finally, modify the variable "dataset = 'example'" in the top of following codes and run:

python tagMe.py
python build_network.py
python build_features.py
python build_data.py

Use HGAT as GNN

If you just wanna use the HGAT model as a graph neural network, you can just prepare some files following the above format:

 ./model/data/YourData/
    ---- YourData.cites                // the adjcencies
    ---- YourData.content.*            // the features of *, namely node_type1, node_type2, ...
    ---- train.map                     // the index of the training node
    ---- vali.map                      // the index of the validation nodes
    ---- test.map                      // the index of the testing nodes

And change the "load_data()" in ./model/code/utils.py

type_list = [node_type1, node_type2, ...]
type_have_label = node_type

See the codes for more details.

Citation

If you make advantage of the HGAT model in your research, please cite the following in your manuscript:

@article{yang2021hgat,
  author = {Yang, Tianchi and Hu, Linmei and Shi, Chuan and Ji, Houye and Li, Xiaoli and Nie, Liqiang},
  title = {HGAT: Heterogeneous Graph Attention Networks for Semi-Supervised Short Text Classification},
  year = {2021},
  publisher = {Association for Computing Machinery},
  volume = {39},
  number = {3},
  doi = {10.1145/3450352},
  journal = {ACM Transactions on Information Systems},
  month = may,
  articleno = {32},
  numpages = {29},
}

@inproceedings{linmei2019heterogeneous,
  title={Heterogeneous graph attention networks for semi-supervised short text classification},
  author={Linmei, Hu and Yang, Tianchi and Shi, Chuan and Ji, Houye and Li, Xiaoli},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={4823--4832},
  year={2019}
}

hgat's People

Contributors

ytc272098215 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.