🎇AdaKGC

👋 新闻!

论文代码Schema-adaptable Knowledge Graph Construction.
我们的工作已被EMNLP2023 Findings会议接受。

🎉 快速链接

👋 新闻!
🎉 快速链接
🎈 环境依赖
🪄 模型
🎏 数据集
⚾ 运行
🎰 推理
🏳‍🌈 Acknowledgment
🚩 Papers for the Project & How to Cite

🎈 环境依赖

要运行代码，您需要安装以下要求:

conda create -n adakgc python=3.8
pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

🪄 模型

我们的模型tokenizer部分采用了UIE, 其他部分采用t5, 因此是个混合文件, 这里提供了下载链接, 请确保使用这个模型。 hf_models/mix

🎏 数据集

数据集构造的详细信息请参见Data Construction.

您可以通过以下Google Drive链接找到数据集。

Dataset ACE05、Few-NERD、NYT

⚾ 运行

mkdir hf_models
cd hf_models
git lfs install
git clone https://huggingface.co/google/t5-v1_1-base
cd ..

mkdir output           # */AdaKGC/output

实体识别任务

# Current path:  */AdaKGC
mode=H
data_name=Few-NERD
task=entity
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/${data_name}_${mode}/iter_1 --output=output/${data_name}_${mode}_${ratio} --config=config/prompt_conf/Few-NERD.ini --device=${device} --negative_ratio=${ratio} --record2=data/${data_name}_${mode}/iter_7/record.schema  --use_prompt=True --init_prompt=True

model: 预训练的模型的名称或路径。

data: 数据集的路径。

output: 保存的微调检查点的路径，最终自动生成的输出路径`AdaKGC/output/ace05_event_H_e30_lr1e-4_b14_n0。

config: 默认配置文件, 在config/prompt_conf目录下, 每个任务的配置不同。

mode: 数据集模式（H、V、M或R）。

device: CUDA_VISIBLE_DEVICES。

batch: batch size。

（有关详细的命令行参数，请参阅bash脚本和Python文件）

关系抽取任务

mode=H
data_name=NYT
task=relation
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/${data_name}_${mode}/iter_1 --output=output/${data_name}_${mode}_${ratio} --config=config/prompt_conf/NYT.ini --device=${device} --negative_ratio=${ratio} --record2=data/${data_name}_${mode}/iter_7/record.schema  --use_prompt=True --init_prompt=True

事件抽取任务

mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/${data_name}_${mode}/iter_1 --output=output/${data_name}_${mode}_${ratio} --config=config/prompt_conf/ace05_event.ini --device=${device} --negative_ratio=${ratio} --record2=data/${data_name}_${mode}/iter_7/record.schema  --use_prompt=True --init_prompt=True

🎰 推理

仅对单个数据集进行推理（例如data/ace05_event_H/iter_1）

mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
python3 inference.py --dataname=data/${data_name}/${data_name}_${mode}/iter_2 --t5_path=hf_models/mix --model=output/${data_name}_${mode}_${ratio} --task=${task} --cuda=${device} --mode=${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512

datasetname: 要预测的数据集的路径(ace05_event、NYT or Few-NERD)。

model: 前面训练后得到的模型的路径(训练阶段的output)。

t5_path: 基座模型T5(训练阶段的model)。

task: 任务类型(entity、relation、event)。

cuda: CUDA_VISIBLE_DEVICES。

mode: 数据集模式（H、V、M或R）。

use_ssi、use_prompt、prompt_len、prompt_dim需要跟训练时保持一致, 可以在对应的配置文件config/prompt_conf/ace05_event.ini中查看并设置。

在所有迭代数据集上的自动推理（即data/iter_1/ace05_event_H~data/iter _7/ace05_event_H）

mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
python3 inference_mul.py --dataname=data/${data_name}/${data_name}_${mode} --t5_path=hf_models/mix --model=output/${data_name}_${mode}_${ratio} --task=${task} --cuda=${device} --mode=${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512

use_ssi、use_prompt、prompt_len、prompt_dim需要跟训练时保持一致。

完整的过程，包括微调和推理（在"scripts/run.bash"中）：

mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
bash scripts/run_prompt.bash --model=hf_models/mix --data=data/${data_name}_${mode}/iter_1 --output=output/${data_name}_${mode}_${ratio} --config=config/prompt_conf/ace05_event.ini --device=${device} --negative_ratio=${ratio} --record2=data/${data_name}_${mode}/iter_7/record.schema --use_prompt=True --init_prompt=True
python3 inference_mul.py --dataname=data/${data_name}/${data_name}_${mode} --t5_path=hf_models/mix --model=output/${data_name}_${mode}_${ratio} --task=${task} --cuda=${device} --mode=${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512

指标	定义	F1
ent-(P/R/F1)	实体的Micro-F1分数(Entity Type, Entity Span)	spot-F1
rel-strict-(P/R/F1)	关系严格模式的Micro-F1分数(Relation Type, Arg1 Span, Arg1 Type, Arg2 Span, Arg2 Type)	asoc-F1 用于关系，spot-F1 用于实体
evt-trigger-(P/R/F1)	事件触发词的Micro-F1分数(Event Type, Trigger Span)	spot-F1
evt-role-(P/R/F1)	事件角色的Micro-F1分数 (Event Type, Arg Role, Arg Span)	asoc-F1

overall-F1指的是 spot-F1 和 asoc-F1 的总和，可能超100。

🏳‍🌈 Acknowledgment

Part of our code is borrowed from UIE and UnifiedSKG, many thanks.

🚩 Papers for the Project & How to Cite

If you use or extend our work, please cite the paper as follows:

@article{DBLP:journals/corr/abs-2305-08703,
  author       = {Hongbin Ye and
                  Honghao Gui and
                  Xin Xu and
                  Huajun Chen and
                  Ningyu Zhang},
  title        = {Schema-adaptable Knowledge Graph Construction},
  journal      = {CoRR},
  volume       = {abs/2305.08703},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2305.08703},
  doi          = {10.48550/arXiv.2305.08703},
  eprinttype    = {arXiv},
  eprint       = {2305.08703},
  timestamp    = {Wed, 17 May 2023 15:47:36 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2305-08703.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

代码运行后F1值异常

我配置了完整的开发环境，并对以下任务进行测试：
任务名称：Relation Extraction Task
指令：
1、. config/prompt_conf/NYT_H.ini
2、bash scripts/run_finetune.bash --model=hf_models/t5-v1_1-base --data=data/NYT_H/iter_1 --output=output/NYT --mode=H --device=0 --batch=8（修改了batch_size）
最后的测试结果：
12/25/2023 20:29:01 - INFO - main - ***** Test results *****
12/25/2023 20:29:01 - INFO - main - test_asoc-F1 = 92.8138
12/25/2023 20:29:01 - INFO - main - test_asoc-P = 91.96
12/25/2023 20:29:01 - INFO - main - test_asoc-R = 93.6835
12/25/2023 20:29:01 - INFO - main - test_asoc-gold = 4322.0
12/25/2023 20:29:01 - INFO - main - test_asoc-pred = 4403.0
12/25/2023 20:29:01 - INFO - main - test_asoc-tp = 4049.0
12/25/2023 20:29:01 - INFO - main - test_fixed = 1
12/25/2023 20:29:01 - INFO - main - test_gen_len = 32.7888
12/25/2023 20:29:01 - INFO - main - test_gold_tree = 10833
12/25/2023 20:29:01 - INFO - main - test_gold_tree add_bracket = 1
12/25/2023 20:29:01 - INFO - main - test_loss = 0.0562
12/25/2023 20:29:01 - INFO - main - test_ordered-record-F1 = 90.2995
12/25/2023 20:29:01 - INFO - main - test_ordered-record-P = 89.61
12/25/2023 20:29:01 - INFO - main - test_ordered-record-R = 90.9997
12/25/2023 20:29:01 - INFO - main - test_ordered-record-gold = 10833.0
12/25/2023 20:29:01 - INFO - main - test_ordered-record-pred = 11001.0
12/25/2023 20:29:01 - INFO - main - test_ordered-record-tp = 9858.0
12/25/2023 20:29:01 - INFO - main - test_overall-F1 = 187.5559
12/25/2023 20:29:01 - INFO - main - test_pred_tree = 11002
12/25/2023 20:29:01 - INFO - main - test_record-F1 = 91.2613
12/25/2023 20:29:01 - INFO - main - test_record-P = 90.5645
12/25/2023 20:29:01 - INFO - main - test_record-R = 91.969
12/25/2023 20:29:01 - INFO - main - test_record-gold = 10833.0
12/25/2023 20:29:01 - INFO - main - test_record-pred = 11001.0
12/25/2023 20:29:01 - INFO - main - test_record-tp = 9963.0
12/25/2023 20:29:01 - INFO - main - test_runtime = 233.7369
12/25/2023 20:29:01 - INFO - main - test_samples_per_second = 21.392
12/25/2023 20:29:01 - INFO - main - test_spot-F1 = 94.7421
12/25/2023 20:29:01 - INFO - main - test_spot-P = 94.0187
12/25/2023 20:29:01 - INFO - main - test_spot-R = 95.4768
12/25/2023 20:29:01 - INFO - main - test_spot-gold = 10833.0
12/25/2023 20:29:01 - INFO - main - test_spot-pred = 11001.0
12/25/2023 20:29:01 - INFO - main - test_spot-tp = 10343.0
12/25/2023 20:29:01 - INFO - main - test_steps_per_second = 0.672
12/25/2023 20:29:01 - INFO - main - test_well-formed = 5000
想问一下这里面哪个是论文中提到的F1，另外 test_overall-F1 = 187.5559这个值为什么会超过100呢？感谢大佬回复！

zjunlp / adakgc Goto Github PK

adakgc's Introduction

🎇AdaKGC

👋 新闻!

🎉 快速链接

🎈 环境依赖

🪄 模型

🎏 数据集

⚾ 运行

实体识别任务

关系抽取任务

事件抽取任务

🎰 推理

🏳‍🌈 Acknowledgment

🚩 Papers for the Project & How to Cite

adakgc's People

Contributors

Stargazers

Watchers

Forkers

adakgc's Issues

Recommend Projects

Recommend Topics

Recommend Org