Code Monkey home page Code Monkey logo

llm-eval-survey's People

Contributors

aml-hassan-abd-el-hamid avatar cyp-jlu-ai avatar jindongwang avatar kennethleungty avatar kennymckormick avatar mengf1 avatar mlgroupjlu avatar sileod avatar tahmedge avatar up700 avatar wangxu0820 avatar yanglinyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llm-eval-survey's Issues

Paper Title Change

Hello,
Thank you for your excellent work on the survey paper!

I am one of the author for the papers you have listed but we had a major title change.

I am unsure if you guys are planning for regular updates for the paper, but if you do can you consider changing the our paper title from "Can Large Language Models Infer and Disagree Like Humans?" to "Can Large Language Models Capture Dissenting Human Voices?".

Thanks once again for this great work!

Add a new paper.

Thank you for your nice survey.

Please consider adding our recent work, Large Language Models are not Fair Evaluators (https://arxiv.org/abs/2305.17926), to the list.

Our research has identified the biases present while using LLM as an evaluator, and we have proposed two strategies to alleviate these biases.

Thanks.😊

Can you add LRV-Instruction to Your update Arxiv Version?

Paper: Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
link: https://arxiv.org/pdf/2306.14565.pdf
Name: LRV-Instruction
Focus: Multimodal
Notes: A benchmark to evaluate the hallucination and instruction following ability

bib:
@Article{liu2023aligning,
title={Aligning Large Multi-Modal Model with Robust Instruction Tuning},
author={Liu, Fuxiao and Lin, Kevin and Li, Linjie and Wang, Jianfeng and Yacoob, Yaser and Wang, Lijuan},
journal={arXiv preprint arXiv:2306.14565},
year={2023}
}

Can you add our recent work to your survey?

Hi,

I have read your insightful paper and found it to be a valuable contribution to the field.

I would like to kindly suggest adding our recent work to your survey.

📄 Paper: Ask Again, Then Fail: Large Language Models' Vacillations in Judgement

This paper uncovers that the judgement consistency of LLM dramatically decreases when confronted with disruptions like questioning, negation, or misleading, even though its previous judgments were correct. It also explores several prompting methods to mitigate this issue and demonstrates their effectiveness.

Thank you for your consideration! :)

Suggestion for adding OpenCompass to survey

Hi team,

Thanks for your awesome survey! I was wondering if you might consider including the OpenCompass evaluation toolkit in your survey. At present, OpenCompass serves as a repository for over 50 benchmarks and enables systematic evaluation of LLMs. We are continually upgrading it to keep pace with the latest evaluation trends. I believe that its addition could provide an even richer context for your survey.

Best

Suggestion about adding one evaluation paper about LLMs in science

Thanks for your interesting and comprehensive survey.

If possible, please consider adding our evaluation work about LLMs in chemistry, "What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks" (https://arxiv.org/abs/2305.18365) to the list.

Our work mainly establish a comprehensive benchmark containing 8 practical chemistry tasks to evaluate LLMs (GPT-4, GPT-3.5,and Davinci-003) for each chemistry task in zero-shot and few-shot in-context learning settings. We aim to solve the lack of comprehensive assessment of LLMs in the field of chemistry.

Thanks! 😊

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.