Factcheck-GPT

Fact-checking the Output of Generative Large Language Models in both Annotation and Evaluation.

Pipeline
Get Started
Dataset
Baseline
Citation

Pipeline

This work define a framework for document-level fact checking, from decomposation, decontextualisation, check-worthiness identification to evidence retrieval, stance detection and make edits to fix the hallucinated information.

Get Started

Steps to check factuality of a document. See details in reproduce_tutorial.ipynb for subtasks.

import sys
sys.path.append("./src")
import pandas as pd
from pipeline import check_document, check_documents

doc = "MBZUAI ranks 19 globally in its areas of specialization – AI in 2023."  # document to check
label, log = check_document(doc, model = "gpt-3.5-turbo-0613")

Dataset

We construct a dataset --- Factcheck-GPT with human-annotated factual labels for 94 ChatGPT responses.

Statistics

Statitics over three question sources as below:

Claim analysis:

Whether raters can de�termine the factuality of a claim depending on the automatically-collected evidence (Yes/No).
Does the evidence support the claim (CP:completely support, PS:partially support, RE: refute, IR: irrelevant)
Does the claim need to be corrected. NA (17) refers to 16 opinion-claims + 1 not-a-claim

Annotation Tool

We release the annotation tool in this repository.

FactBench

We further gather another three human-annotated datasets (FacTool-KB, FELM-WK, HaluEval) that are used to evaluate the effectiveness of automatic fact-checkers, resulting in FactBench with 4,835 examples.

Baselines

We set five subtasks and show the subtask 4: verfication below. See details in notebook: reproduce_tutorial.

Subtasks

Subtask 1. Sentence Checkworthyness: Given a sentence, identify whether the sentence contains factual statement (Yes/No).
Subtask 2. Claim Checkworthyness: Given a claim, detect its checkworthiness by categories of factual, opinion, not a claim and other.
Subtask 3. Stance Detection: Given a (evidence passage, claim) pair, judge the stance of evidence against the claim, whether it supports, partially supports, refute or is irrelevant to the claim.
Subtask 4. Claim Verification: Given a claim without gold evidence, determine whether it is factually true or false, if false, revise.
Subtask 5. Edit False Response: Given a list of true claims and the original response, eidt to correct the factual errors while preserving the linguistic features and style of the original.

Verification Results

Citation

The Factcheck-GPT is described in the following arXiv paper:

@article{Wang2023FactcheckGPTEF,
  title={Factcheck-GPT: End-to-End Fine-Grained Document-Level 
         Fact-Checking and Correction of LLM Output},
  author={Yuxia Wang and 
          Revanth Gangi Reddy and 
          Zain Muhammad Mujahid and 
          Arnav Arora and 
          Aleksandr Rubashevskii and 
          Jiahui Geng and 
          Osama Mohammed Afzal and 
          Liangming Pan and 
          Nadav Borenstein and 
          Aditya Pillai and 
          Isabelle Augenstein and 
          Iryna Gurevych and 
          Preslav Nakov},
  journal={ArXiv},
  year={2023},
  volume={abs/2311.09000},
}

yuxiaw / factcheck-gpt Goto Github PK

factcheck-gpt's Introduction

Factcheck-GPT

Table of Contents

Pipeline

Get Started

Dataset

Statistics

Annotation Tool

FactBench

Baselines

Subtasks

Verification Results

Citation

factcheck-gpt's People

Contributors

Stargazers

Watchers

Forkers

factcheck-gpt's Issues

Recommend Projects

Recommend Topics

Recommend Org