Code Monkey home page Code Monkey logo

zerofec's Introduction

Zero-shot Faithful Factual Error Correction (ACL 2023)

Kung-Hsiang (Steeve) Huang, Hou Pong (Ken) Chan and Heng Ji

Paper link: ACL

Abstract

Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' ability to identify and correct factual errors, we present a zero-shot framework that formulates questions about input claims, looks for correct answers in the given evidence, and assesses the faithfulness of each correction based on its consistency with the evidence. Our zero-shot framework outperforms fully-supervised approaches, as demonstrated by experiments on the FEVER and SciFact datasets, where our outputs are shown to be more faithful. More importantly, the decomposability nature of our framework inherently provides interpretability. Additionally, to reveal the most suitable metrics for evaluating factual error corrections, we analyze the correlation between commonly used metrics with human judgments in terms of three different dimensions regarding intelligibility and faithfulness.

Dependencies

All the required packages are listed in requirements.txt. To install all the dependencies, run

conda create -n zerofec python=3.7
conda activate zerofec
pip install -r requirements.txt

Data

The FEVER and SciFact datasets used in our experiments can be downloaded here.

Components

No training is needed as our framework corrects factual error is a zero-shot fashion. All individual components have been trained on the corresponding sub-task. Please download each component as follows:

  • Candidate Generation: Download spacy models by running python -m spacy download en_core_web_lg and python -m spacy download en_core_sci_md.
  • Question Generation: It's already on HuggingFace Salesforce/mixqg-base.
  • Question Answering: It's already on HuggingFace allenai/unifiedqa-v2-t5-base-1251000. The domain-adapted version is also made available on Huggingface khhuang/zerofec-daqa-t5-base.
  • QA-to-Claim: We have made it available on HuggingFace khhuang/zerofec-qa2claim-t5-base.
  • Correction Scoring: Download the DocNLI model from its repo such that the checkpoint is at zerofec/docnli/DocNLI.pretrained.RoBERTA.model.pt. The checkpoints for the domain-adapted models can be found here.

Direct Use

Example use of ZeroFEC is shown below.

from types import SimpleNamespace
from zerofec.zerofec import ZeroFEC

model_args = {
    'qg_path': 'Salesforce/mixqg-base',
    'qa_model_path': 'allenai/unifiedqa-v2-t5-base-1251000',
    'qa_tokenizer_path': 't5-base',
    'entailment_model_path': 'PATH/TO/DocNLI.pretrained.RoBERTA.model.pt',
    'entailment_tokenizer_path':'roberta-large',
    'qa2s_tokenizer_path': 'khhuang/zerofec-qa2claim-t5-base',
    'qa2s_model_path': 'khhuang/zerofec-qa2claim-t5-base',
    'use_scispacy': False
}

model_args = SimpleNamespace(**model_args)

zerofec = ZeroFEC(model_args)

sample = {
  "input_claim": "Night of the Living Dead is a Spanish comic book."
  "evidence": "Night of the Living Dead is a 1968 American independent horror film , directed by George A. Romero ..."
}
corrected_claim = zerofec.correct(sample)

The corrected_claim dictionary now contains a key final_answer, which is the final correction, as well as all intermediate outputs that provide interpretability.

Batch processing is supported via:

zerofec.batch_correct(samples)

where samples is a list of dictionary.

For additional information about model_args, please refer to main.py.

To run prediction on the two datasets we used for our experiments, use the following

python main.py --input_path PATH/TO/scifact_test.jsonl  --output_path  outputs/scifact_outputs.jsonl
python main.py --input_path PATH/TO/fever_test.jsonl  --output_path  outputs/fever_outputs.jsonl

Evaluation

Download the metrics by followng instructions in their corresponding repos:

The evaluation scripts are in the evals directory. All evaluation, except for QAFactEval, is in the script evals.sh. This is because QAFactEval has a different set of dependencies from metrics. For all other metrics, you can install corresponding dependencies in the zerofec virtual enviroment we created at the begging of this README. For QAFactEval, you need to create a new environment and install QAFactEval's dependencies in it.

cd evals
bash evals.sh $OUTPUT_PATH

where $OUTPUT_PATH is the path to the output file from main.py (e.g. outputs/fever_outputs.jsonl). Following a similar procedure, you can evaluate our performance in QAFactEval using the evals_qafactevals.sh script.

Citation

If you find this work useful, please consider citing:

@inproceedings{huang2023zero,
  title     = "Zero-shot Faithful Factual Error Correction",
  author    = "Huang, Kung-Hsiang and Chan, Hou Pong and Ji, Heng",
  year = "2023",
  month= july,
  booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics",
  publisher = "Association for Computational Linguistics",
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.