Code Monkey home page Code Monkey logo

dtgb's Introduction

Dataset

All eight dynamic text-attributed graphs provided by DTGB can be downloaded from here. image

Data Format

Each graph is preserved through three files.

  • edge_list.csv: stores each edge in DyTAG as a tuple. i.e., (u, v, r, t, l). u is the id of the source entity, v is the id of the target entity, r is the id of the relation between them, t is the occurring timestamp of this edge, l is the label of this edge.
  • entity_text.csv: stores the mapping from entity ids (e.g., u and v) to the text descriptions of entities.
  • relation_text.csv: stores the mapping from relation ids (e.g., r) to the text descriptions of relations.

Usage

  • After downloading the datasets, they should be uncompressed into the DyLink_Datasets folder.
  • Run get_pretrained_embeddings.py to obtain the Bert-based node and edge text embeddings. They will be saved as e_feat.npy and r_feat.npy respectively.
  • Run get_LLM_data.ipynb to get the train and test set for the textual relation generation task. They will be saved as LLM_train.pkl and LLM_test.pkl respectively.

Reproduce the Results

Future Link Prediction Task

  • Example of training DyGFormer on GDELT dataset without text attributes:
python train_link_prediction.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature no
  • Example of training DyGFormer on GDELT dataset with text attributes:
python train_link_prediction.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature Bert
  • The AP and AUC-ROC metrics on the test set (both transductive setting and inductive setting) will be automatically saved in saved_resuts/DyGFormer/GDELT/DyGFormer_seed0no.json
  • The best checkpoint will be saved in saved_resuts/DyGFormer/GDELT/ folder, and the checkpoint will be used to reproduce the performance on the node retrieval task.

Destination Node Retrieval Task

After obtaining the best checkpoint on the Future Link Prediction Task. The Hits@k metrics of the Destination Node Retrieval Task can be reproduced by running:

python evaluate_node_retrieval.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --negative_sample_strategy random --num_runs 5 --gpu 0  --use_feature no
  • The negative_sample_strategy hyper-parameter is used to control the candidate sampling strategies, which can be random and historical.
  • The use_feature hyper-parameter is used to control whether to use Bert-based embeddings, which can be no and Bert.

Edge Classification Task

  • Example of training DyGFormer on GDELT dataset without text attributes:
python train_edge_classification.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature no
  • The Precision, Recall, and F1-score metrics on the test set will be automatically saved in saved_resuts/DyGFormer/GDELT/edge_classification_DyGFormer_seed0no.json

Textual Relation Generation Task

After obtaining the LLM_train.pkl and LLM_test.pkl files. You can directly reproduce the performance of original LLMs by running

python LLM_eval.py -config_path=LLM_configs/vicuna_7b_qlora_uncensored.yaml -model=raw
  • You can change the LLMs through the config_path hyper-parameter.
  • The generated text will be saved in s_his_o_des_his_result_vicuna7b.pkl.

And then to get the Bert_score metrics, you should change the file path in LLM_metric.py and run:

python LLM_metric.py

If you want to fine-tune the LLMs, you should run:

python LLM_train.py LLM_configs/vicuna_7b_qlora_uncensored.yaml

and then reproduce the performance of the fine-tunned LLMs by running

python LLM_eval.py -config_path=LLM_configs/vicuna_7b_qlora_uncensored.yaml -model=lora

Contact

For any questions or suggestions, you can use the issues section or contact us at ([email protected]).

Acknowledge

Codes and model implementations are referred to DyGLib project. Thanks for their great contributions!

Reference

@article{zhang2024dtgb,
  title={DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs},
  author={Zhang, Jiasheng and Chen, Jialin and Yang, Menglin and Feng, Aosong and Liang, Shuang and Shao, Jie and Ying, Rex},
  journal={arXiv preprint arXiv:2406.12072},
  year={2024}
}

dtgb's People

Contributors

zjs123 avatar

Stargazers

 avatar  avatar Zeyu Song avatar Runlin Lei avatar Qiyao Ma avatar  avatar Xiaodong Lu avatar CatherChen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.