Code Monkey home page Code Monkey logo

freedomintelligence / evaluation-of-chatgpt-on-information-extraction Goto Github PK

View Code? Open in Web Editor NEW
119.0 7.0 9.0 779 KB

An Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).

Home Page: https://drive.google.com/drive/folders/1vvmXnWRUu_4y9lI89Xh3SkrfBIrGt3RL?usp=sharing

Python 59.40% Shell 40.60%
chatgpt evaluation information-extraction sentiment-analysis evaluation-criteria event-detection event-extraction named-entity-recognition performance relation-classification

evaluation-of-chatgpt-on-information-extraction's Introduction

Evaluation-of-ChatGPT-on-Information-Extraction

An Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).

Abstract

ChatGPT has stimulated the research boom in the field of large language models. In this paper, we assess the capabilities of ChatGPT from four perspectives including Performance, Evaluation Criteria, Robustness and Error Types. Specifically, we first evaluate ChatGPT's performance on 17 datasets with 14 IE sub-tasks under the zero-shot, few-shot and chain-of-thought scenarios, and find a huge performance gap between ChatGPT and SOTA results. Next, we rethink this gap and propose a soft-matching strategy for evaluation to more accurately reflect ChatGPT's performance. Then, we analyze the robustness of ChatGPT on 14 IE sub-tasks, and find that: 1) ChatGPT rarely outputs invalid responses; 2) Irrelevant context and long-tail target types greatly affect ChatGPT's performance; 3) ChatGPT cannot understand well the subject-object relationships in RE task. Finally, we analyze the errors of ChatGPT, and find that "unannotated spans" is the most dominant error type. This raises concerns about the quality of annotated data, and indicates the possibility of annotating data with ChatGPT. The data and code are released at Github site.

Datasets, processed data, output result files

All datasets, processed data and output result files are available at the google drive, except ACE04, ACE05 and TACRED raw datasets (for copyright reasons).

Download all the files, unzip them, and place them in the corresponding directories.

Test with API

bash ./scripts/absa/eval.sh
bash ./scripts/ner/eval.sh
bash ./scripts/re/eval_rc.sh
bash ./scripts/re/eval_triplet.sh
bash ./scripts/ee/eval_trigger.sh
bash ./scripts/ee/eval_argument.sh
bash ./scripts/ee/eval_joint.sh

Before testing, you need to modify all --api_key and --result_file arguments in all *.sh scripts.

Get Evaluation Metrics

bash ./scripts/absa/report.sh
bash ./scripts/ner/report.sh
bash ./scripts/re/report_rc.sh
bash ./scripts/re/report_triplet.sh
bash ./scripts/ee/report_trigger.sh
bash ./scripts/ee/report_argument.sh
bash ./scripts/ee/report_joint.sh

By default, the metrics are calculated based on our output result files at Google Drive.

Main results

main results

Examples of prompts

Zero-shot Few-shot ICL Few-shot COT

Future Work

We will add the results and analysis of GPT-4.

Citation

@article{han2023-chatgpt-IE-evaluation,
  author       = {Ridong Han and
                  Tao Peng and
                  Chaohao Yang and
                  Benyou Wang and
                  Lu Liu and
                  Xiang Wan},
  title        = {Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors},
  journal      = {CoRR},
  volume       = {abs/2305.14450},
  year         = {2023},
  eprinttype   = {arXiv},
  eprint       = {2305.14450},
  url          = {https://doi.org/10.48550/arXiv.2305.14450},
  doi          = {10.48550/ARXIV.2305.14450},
}

evaluation-of-chatgpt-on-information-extraction's People

Contributors

ridonghan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

evaluation-of-chatgpt-on-information-extraction's Issues

Creation of ICL

Hi,
I go through your code and find that it is interesting. I have a query about how you create a few shot instances of a test sentence.

Great!

You did a very comprehensive and detailed experiment. Thank you for the open source, which provided a lot of help.

文档级关系抽取任务定义不一致

尊敬的作者您好,感谢您杰出的工作,您的工作为ChatGPT在信息抽取中的应用做出了很大的贡献!

最近在阅读您代码的过程中我发现您在文档级关系抽取数据集(如DocRED、Re-DocRED、DWIE)等数据集上进行的实验可能与原始的DocRED等数据集的定义不一致。

具体而言,您在文档级关系抽取的prompt中预先给定了可能有关系的实体对,而在DocRED论文定义([https://aclanthology.org/P19-1074.pdf])中,我们不能预先给定可能有关系的实体对,而是需要对所有可能的实体对进行分类。所以,在这种情况下您可能造成了一定的数据泄露,导致F1值偏高。举例而言,您在Re-DocRED集上的结果为大约20%-30% F1,远高于最近同类论文报告的约为10% F1结果https://aclanthology.org/2023.emnlp-main.334.pdf

请您在论文的更新版本中注明此任务定义的不一致,以避免对社区造成更大的困扰。

数据集

您好,用到的NER和RE数据集可以上传一下吗,谢谢🙏

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.