Code Monkey home page Code Monkey logo

MetaCheckGPT: SemEval-2024 Task 6 on Hallucination Detection in Large Language Models

The Halu-NLP team from various prestigious institutions has devised MetaCheckGPT, a cutting-edge solution that clinched the top spots in the SemEval-2024 Task 6. This task, named SHROOM (Shared-task on Hallucinations and Related Observable Overgeneration Mistakes), aimed at identifying hallucinations in large language models (LLMs). Our approach integrates meta-regressor frameworks and experiments with transformer-based models and black-box methods, achieving success in both model agnostic and model aware tracks.

Problem Statement

LLMs often produce hallucinated content, presenting major reliability issues. The challenge was to develop a method capable of detecting such hallucinations effectively across different tasks, including machine translation, paraphrase generation, and definition modeling.

Key Results

Our solution achieved first and second ranks in the model agnostic and model aware tracks, respectively. We showcased the efficacy of our meta-regressor framework by surpassing traditional hallucination detection methods. This framework leverages uncertainty signals from a variety of LLMs, improving detection robustness.

Methodologies

We introduced a novel meta-regressor model that evaluates and integrates outputs from multiple LLMs. This process involves comparing LLM-generated sentences against stochastically generated responses, with a meta-model evaluating the outputs. The methodology emphasized the importance of diversity in base models to capture a broader range of hallucination indicators.

Data Sources

For our experiments, we utilized datasets provided by the SHROOM organizers, encompassing tasks like definition modeling, machine translation, and paraphrase generation. The datasets and code for our approach will be available at our GitHub repository: MetaCheckGPT at GitHub.

Technologies Used

  • PyTorch
  • OpenAI API
  • Google CoLab GPU resources

Authors

For further information, feel free to contact us:

Andrew Hoblitzell's Projects

pytorch icon pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.