beyonderxx / instructuie Goto Github PK

View Code? Open in Web Editor NEW

359.0 6.0 33.0 65.12 MB

Universal information extraction with instruction learning

License: MIT License

Dockerfile 0.24% Shell 3.91% Python 95.84%

instructuie's Introduction

InstructUIE

This repo releases our implementation for the InstructUIE model.
It is built based on the pretrained Flan T5 model, and finetuned on our data.

Requirements

Our main experiments and analysis are conducted on the following environment:

CUDA (11.3)
cuDNN (8.2.0.53)
Pytorch (1.10.0)
Transformers (4.26.1)
DeepSpeed (0.7.7)

You can install the required libraries by running

bash setup.sh

Data

Our models are trained and evaluated on IE INSTRUCTIONS. You can download the data from Baidu NetDisk or Google Drive.

Training

A sample script for training the InstructUIE model in our paper can be found at scripts/train_flan-t5.sh. You can run it as follows:

bash ./scripts/train_flan-t5.sh

Released Checkpoints

We have released our 11B UIE model, click here for download.

Evaluation

A sample script for evaluating the InstructUIE model in our paper can be found at scripts/eval_flan-t5.sh. You can run it as follows:

bash ./scripts/eval_flan-t5.sh

The decoded results would save to predict_eval_predictions.jsonl in your output dir. To calculate f1 with predict_eval_predictions.jsonl

python calculate_f1.py

Citation

@article{wang2023instructuie,
  title={InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction},
  author={Wang, Xiao and Zhou, Weikang and Zu, Can and Xia, Han and Chen, Tianze and Zhang, Yuansen and Zheng, Rui and Ye, Junjie and Zhang, Qi and Gui, Tao and others},
  journal={arXiv preprint arXiv:2304.08085},
  year={2023}
}

instructuie's People

Contributors

Stargazers

Watchers

instructuie's Issues

NER task的具体形式

请问您一下针对于NER任务对于table8 你们选择了哪个prompt格式进行了训练

Zero-shot NER experiment settings

Hi, there. Thanks for the excellent work!
Since the NER datasets in zero-shot settings are also included in the IE INSTRUCTIONS dataset, I'm wondering if there are special instructions for zero-shot experiments?
Do we have to remove all the seven NER datasets and train a new model to get fair results?

Thanks very much!

what is the means for samlpe_15000 in dataset?

the paper shows sample 10000 for dataset.
what the means for the CoNLL 2003_sample_15000 and Ontonotes_sample_30000 in the dataset folder?

如何在 demo 模式下运行 run_uie.py ?

我使用和 eval_flan-t5.sh 一样的配置，将 do_predict 改为了 do_demo，先是报错 trainer 缺少 _gen_kwargs，我手动设置了 _gen_kwargs 后又报错 model 和 input 不处于同一 device 上，这是因为 do_demo 模式下 deepspeed 没有初始化模型成功吗？

where can we find the InstructUIE model besides huggingface？

question about evaluator.py

您的工作非常棒！
src/evaluation/evaluator.py#L717
#因为后来的实际格式与最初表格中的不同，因此下列测试可能无法通过，仅作为使用示例

for rel in self._format(json_data['Instance']['ground_truth']).split(';'): # FIXME:字段名可能有变
KeyError: 'Instance'

请问能否给一下实际格式的示例呢？

BuildConfig does not work

BuilderConfig UIEConfig(name='default', version=0.0.0, data_dir=None, data_files=None, description='Default config for NaturalInstructions') doesn't have a 'task_config_dir' key.

更小的模型效果如何

非常欣赏这篇论文和开源代码。
有个问题想请教一下：作者试过参数量小一点的模型吗？比如Flan-T5-3B，GPT2等，效果会差很多吗？

chinese field

Is there any chinese instruction for the open-source LLM model

DatasetGenerationError: An error occurred while generating the dataset

论文的NER等任务的实验结果都是用flan-t5-11b进行全量参数微调的结果吗？

The mismatch between InstructUIE datasets and UIE datasets

EE datasets in InstructUIE such as ACE05 and CASIE have fewer examples than these datasets in UIE. Moreover, there is no offset annotation in these datasets in InstructUIE. What causes these differences?
Datasets statistics in your paper

Datasets statistics in the UIE paper

输入长度限制

请问这个模型的最大输入长度限制是必须固定512吗，能否增加呢？如果要做文档级的抽取任务是否合适呢？

期待你的回复！

zero-shot for EE datasets？

Hello, thanks you for your contribution! I'd like to ask you two questions:

May I ask why there is no zero-shot experiment on event extraction datasets?
Can the model be modified to the few-shot setting? If so, what should I pay attention to?

Looking forward to your reply!

the format of the dataset

Can you disclose the format of the dataset that the current code can run in? I would like to run my own dataset on your model.

关于论文里的Event Trigger F1、Event Argument F1问题

我注意到InstructUIE/configs/multi_task_configs/test_tasks.json文件中提到的任务是EEA、EET没有EE，所以论文中的结果是两步得到的吗？即EET的结果是Event Trigger F1，EEA的结果是Event Argument F1

Instructions in the released dataset

Hi there!
Great paper and thanks for releasing the dataset. However, may I please ask where I can find the expert written instructions for the dataset? In the drive link I just found the json-ified IE datasets. Thank You in advance!

混合精度训练(fp16)

请问训练3b、11b模型的时候，需要指定--fp16参数吗？我看代码里的训练脚本里面没有使用混合精度训练

scripts/train_flan-t5.sh
scripts/train_uie_instruct_multi_node.sh

复现的时候，需要指定--fp16或--bf16参数吗

如何改成支持中文的

run eval_flan-t5.sh with do_demo

Please enter your input to the model, or enter 'quit' to exit: Please enter your input to the model, or enter 'quit' to exit: 06/09/2023 18:02:35 - INFO - main - Serving the model as a demo...
Please enter your input to the model, or enter 'quit' to exit: Please enter your input to the model, or enter 'quit' to exit:
Traceback (most recent call last):
File "src/run_uie.py", line 560, in
main()
File "src/run_uie.py", line 553, in main
_, preds, _ = trainer.prediction_step(model, inputs=inputs, prediction_loss_only=False)
File "/home/nlp/ding/InstructUIE/src/uie_trainer.py", line 263, in prediction_step
gen_kwargs = self._gen_kwargs
AttributeError: 'UIETrainer' object has no attribute '_gen_kwargs'

How much GPU memory do I need for fine-tuning and inference?

eval fan-T5 11b 的参数该如何设置

如题

请问8张A100 80GB怎么设置参数

您好，感谢您的工作。我想请问一下8张A100 80GB上微调flan-t5-11B原论文是如何设置各项参数的。例如deepspeed选择什么模式，batch_size等等参数

Question that whether this method support Chinese Task?

Hi 大佬，想问下这个工作是否支持中文呢？因为我看训练数据集好像都是英文的？第一次看这个领域的工作，比较小白哈哈哈。谢谢大佬！

Want to get clarity on how to give the input for zero-shot NER.

I have few questions:-

Here are the list of prompts given in the paper for NER:-
a. Please list all entity words in the text that fit the category. Output format is "type1: word1; type2: word2".
b. Please find all the entity words associated with the category in the given text. Output format is "type1: word1; type2: word2".
c. Please tell me all the entity words in the text that belong to a given category. Output format is "type1: word1; type2: word2".
d. Please list all entity words in the text that fit the category. Output format is word1, word2.
e. Given options, please tell me the categories of all the listed entity words. Output format is "type1: word1; type2: word2".
So, are these prompts dataset specific? And if not, so does that mean that all of these prompts will give best results on all of the datasets? And if it is dataset specific, then can you please throw a light on this like which prompt to be used given the dataset specifics.
What is the exact format of input to the model for zero-shot NER. Let's say if I want to use prompt p1 = "Please list all entity words in the text that fit the category. Output format is "type1: word1; type2: word2"." and the sentence s = "My name is alexa and I live in washigton" and options O = "location, person, organization, miscellaneous". So input would be """p1 + \n + 'Option: ' + O + \n + 'Text: '+s""". Right?

When will the dataset and the checkpoint files been released？

训练时数据集下载出现问题，data_dir是不是下载的ie_instruction的路径呢？

Traceback (most recent call last):
File "src/run_uie.py", line 560, in
main()
File "src/run_uie.py", line 296, in main
raw_datasets = load_dataset(
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/datasets/load.py", line 1694, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/datasets/builder.py", line 595, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/datasets/builder.py", line 683, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/datasets/builder.py", line 1075, in _prepare_split
for key, record in utils.tqdm(
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "/root/.cache/huggingface/modules/datasets_modules/datasets/uie_dataset/f3e8d02f5ffb4e66435bbe181a28a4403a3ee701bbd8a110780c5e90beb86581/uie_dataset.py", line 661, in _generate_examples
assert os.path.exists(ds_path)
AssertionError
[2023-09-06 10:52:42,767] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1069
[2023-09-06 10:52:42,767] [ERROR] [launch.py:324:sigkill_handler] ['/root/miniconda3/envs/instruct-uie/bin/python', '-u', 'src/run_uie.py', '--local_rank=0', '--do_train', '--do_predict', '--predict_with_generate', '--model_name_or_path', '/root/InstructUIE-master/model_cache/t5-base', '--data_dir', '/root/InstructUIE-master/data/IE_INSTRUCTIONS/RE/ADE_corpus/train.json', '--task_config_dir', '/root/InstructUIE-master/configs/multi_task_configs', '--instruction_file', '/root/InstructUIE-master/configs/instruction_config.json', '--instruction_strategy', 'single', '--output_dir', 'output/t5-re-single', '--input_record_file', 'flan-t5.record', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '16', '--gradient_accumulation_steps', '8', '--learning_rate', '5e-03', '--num_train_epochs', '5', '--deepspeed', 'configs/ds_configs/stage0.config', '--run_name', 't5-base-mult-mi-experiment', '--max_source_length', '512', '--max_target_length', '50', '--generation_max_length', '50', '--max_num_instances_per_task', '10000', '--max_num_instances_per_eval_task', '200', '--add_task_name', 'False', '--add_dataset_name', 'False', '--num_examples', '0', '--overwrite_output_dir', '--overwrite_cache', '--lr_scheduler_type', 'constant', '--warmup_steps', '0', '--logging_strategy', 'steps', '--logging_steps', '500', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '2000'] exits with return code = 1

Auxiliary Tasks 是怎么添加的？

您好，我看论文2.2.2章节提出使用辅助任务进行更细粒度的训练。但是代码中并没用体现，请问一下是怎么做的呢？

Performance of GENIA-Evt in paper missing

The tables about Event Trigger F1 and Event Argument F1 are missing the performance about GENIA-Evt.

a. Event Trigger F1

Dataset	UIE	USM	Bert-base	Ours
ACE2005	73.36	72.41	72.5	77.13
CASIE	69.33	71.73	68.98	67.80
PHEE	-	-	-	70.14
Avg	-	-	-	71.69

b. Event Argument F1

Dataset	UIE	USM	Bert-base	Ours
ACE2005	54.79	55.83	59.9	72.94
CASIE	61.30	63.26	60.37	63.53
PHEE	-	-	-	62.91
Avg	-	-	-	66.46

Ref. https://arxiv.org/abs/2304.08085

这个程序应该在什么目录下执行呢

deepspeed模式下如何debug

使用deepspeed就不能使用pdb，这是为什么
在使用deepspeed的情况下如何调试
在不使用deepspeed的情况下要改哪些地方，才能关掉deepspeed这个功能

Dataset

Hello, can you please publish this IE INSTRUCTIONS.?

NER的ACE2004在论文中没有实验结果？

您好，InstructUIE是11B全参数微调吗？有没有用微调工具，类似lora什么的？另外复现这个项目要的硬件环境是啥呢？我目前有4块3090，不知道能不能行

文章中提到的zero-shot实验情景，指令仅在训练期间为任务子集提供，并在看不见的任务上评估模型，而无需额外的微调。这里的训练是什么意思呢？我理解zero-shot可能没有训练这个环节（可能是我了解太少）

最后我想问，我们是用多种数据集，一起训练出三种类型的模型（NER，RE和EE），那么每一种模型是可以预测任何的数据集，评估也是这样的。还是说每一个数据单独训练一个模型评估呢？

期待您的回复，谢谢~

When can we expect the releasing of the checkpoint?

4块titan微调模型时显存不够了，用glm-6b换了flan-t5的模型（fp16），改小了batch参数，还是不行，请问有哪些参数还可以调一下，并且不怎么影响效果的

开源的checkpoint的评估结果与论文差异较大？

自己根据开源的checkpoint评估的结果：
ACE 2004 0.799446
ACE 2005_sample_15000 0.823431
AnatEM 0.906356
bc2gm 0.853468
bc4chemd 0.903051
bc5cdr 0.896358
Broad Tweet Corpus 0.807502
CoNLL 2003_sample_15000 0.920924
FabNER 0.778983
FindVehicle 0.862235
GENIA_NER 0.760505
HarveyNER 0.881229
mit-movie 0.891786
mit-restaurant 0.825387
MultiNERD 0.905533
ncbi 0.894619
Ontonotes_sample_30000 0.907458
PolyglotNER 0.581986
TweetNER7_sample_15000 0.654393
WikiANN en 0.609424
WikiNeural 0.877284
Average 0.825779

论文当中的Average是0.85，差了3个点。

Bugs in Data Preprocess and may cause the wrong evaluation results: when there are multiple events in a sentence, the output data only contains one event and misses other events.

Nice work and thanks for publishing your codes and datasets! I find a problem in your data preprocess that may cause the wrong evaluation results: when there are multiple events in a sentence, the output data only contains one event and misses other events.

For example， in the file "EE/ACE05/text.json", the parsed results of your codes are as follows:
{
"sentence": "Welch also wants details on Jane Beasley Welch ' s salary , benefits , retirement plan and other compensation paid to her .",
"events": [
{
"trigger": "retirement",
"type": "end position",
"pos": [],
"arguments": [
{
"name": "Jane Beasley Welch",
"role": "person",
"pos": []
}
]
}
]
}

However, in the ACE05 datasets, there are two events in the sentence. The right parsing results of the sentence are as follows:
{
"event_mentions": [
{
"event_type": "Personnel:End-Position",
"id": "APW_ENG_20030325.0786-10-EV0",
"trigger": {
"start": 13,
"end": 14,
"text": "retirement"
},
"arguments": [
{
"entity_id": "APW_ENG_20030325.0786-10-E1",
"text": "Jane Beasley Welch",
"role": "Person"
}
]
},
{
"event_type": "Transaction:Transfer-Money",
"id": "APW_ENG_20030325.0786-10-EV1",
"trigger": {
"start": 18,
"end": 19,
"text": "paid"
},
"arguments": [
{
"entity_id": "APW_ENG_20030325.0786-10-E",
"text": "Jane Beasley Welch",
"role": "Recipient"
}
]
}
]
}

If your files are not the same as the origin datasets, the evaluation results in your paper may not be directly compared to the baselines. Please check this problem. Thanks very much.