Comments (5)
-
Thanks for pointing this issue out! Definitely agree that this format looks weird. Most of the OASST instructions don't look like this, so it could be an exporting issue for the few that are affected. We will look more into this (but given that only 12 are affected, it shouldn't change the final win-rates too much).
-
Can you clarify what you mean use the prompt template as instructions? The prompt template for each model is provided in the respective model_configs directory.
from alpaca_eval.
This prompt is applied rather often
https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/models_configs/text_davinci_003/prompt.txt
I think davinci 3 does not need any template at all, the input can just be an instruction. The same is true for many other models, and I think this may be hurting the Cohere model in particular.
from alpaca_eval.
So the current prompt works well for Davinci003, in the sense that it doesn't make a mistake in understanding the formatting. If you have a suggestion to update the Cohere model template, please submit a PR with the updated config and results and we’d be happy to incorporate it.
from alpaca_eval.
Hi @sanderland quick follow-up saying that we went through the Cohere prompt engineering page when making the prompt and we didn't see any information about a special template, which is why we originally used a simple one like davinci-003. Other models have specific prompt templates, e.g. Claude. Let us know if we missed the Cohere prompt template!
from alpaca_eval.
Hey @YannDubs
- Command is an instruction finetuned model, and when using it with instructions, it indeed does not need any template.
- I think this is also true for davinci-003 and most instruction finetuned models in general.
- The template you used may induce shorter answers for command in particular.
- The Client.chat method in the sdk will automatically add a template based suitable for inducing a more conversational style.
- However, it is not clear what this benchmark is primarily testing (instruct models vs conversational ones being a little different)
from alpaca_eval.
Related Issues (20)
- Potential length-controlled metric for Alpaca Eval 2.0 HOT 25
- Evaluator fails on prompts that violates Azure content moderation policy HOT 3
- TypeError: Subscripted generics cannot be used with class and instance checks HOT 1
- Repeated deprecation errors HOT 2
- Possibility of adding a version signature HOT 3
- A bug in `weighted_alpaca_eval_gpt4_turbo` HOT 3
- Remove Deprecated model.to_bettertransformer() Call for Compatibility with Latest Transformers and Torch HOT 1
- Reproducing numbers for evaluator human-agreement leaderboard. HOT 1
- Latest LC-AlpacaEval update broken? HOT 4
- It is possible to use existing evaluation files to complete the evaluation? HOT 12
- Logistic regression for length-controlled winrate HOT 2
- Question on Using Character-Level Length HOT 1
- With unstable GPT-4 API, I encounterd a tricky problem HOT 1
- With unstable GPT-4 API, I encounterd a tricky problem HOT 2
- Question about the GPT-4 API HOT 11
- openai_configs.yaml when using Azure only HOT 7
- Confusion in Model Evaluation Results Due to GPT Updates HOT 2
- How instruction_difficulty feature is obtained HOT 1
- Llama-3-Instruct not using official prompt template? HOT 1
- Unable to reproduce results HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alpaca_eval.