Comments (7)
Also @hank0316 -- if you want to open a PR with that solution we can test it further!
from reward-bench.
@hank0316 nope, not to my knowledge. Most use tokenizers implementation.
from reward-bench.
Thanks for raising this. Let's take a look. Maybe @ValentinaPy (mentioned some interested in continuing on this project)
from reward-bench.
@natolambert , I have opened the PR. Would you kindly review it? Additionally, I apologize for inadvertently pressing the close button; I am not familiar with adding comments to an issue.
from reward-bench.
Hey @natolambert, I have a question about training a reward model. Do you think it's necessary to incorporate chat templates during data preprocessing for RM training? And if yes, should the template align with those used in SFT?
from reward-bench.
Yes @hank0316 chat templates are important. I think there can be slightly differences (e.g. for RM you aren't generating after iirc), but it should match at a high level.
from reward-bench.
Yes @hank0316 chat templates are important. I think there can be slightly differences (e.g. for RM you aren't generating after iirc), but it should match at a high level.
Sure, @natolambert! I appreciate your response and this fantastic benchmark. I'm also curious if any models on the leaderboard employ the tulu
chat template for evaluation but utilize their own chat template in SFT. In other words, the template used by that model in SFT differs from the one utilized in this benchmark.
from reward-bench.
Related Issues (20)
- [Core team] Migrate Prior Sets to 50% weight HOT 1
- Dataset v2 discussion & feedback HOT 1
- Saving bug (non breaking)
- Check EOS token on FastChat models HOT 1
- Output leaderboard scores when running `run_rm.py`
- Visualization requests HOT 1
- Experiment request: DPO with different betas HOT 1
- multi gpu inference with run_rm.py HOT 3
- adding kto as a separate category HOT 4
- [Model Request] mightbe/Better-PairRM HOT 2
- Is eval set on huggingface the eval set or train set? HOT 1
- New LLaMA-3 Seq. Classfier Model HOT 6
- Add `rewardbench` on pypi + basic release management
- Clarification Needed on DPO Reward Evaluation HOT 4
- Add PoLL for generative RM
- Add generative models to pip install (probably with optional dependencies)
- Set up OpenRouter for llm-as-a-judge HOT 1
- Do we need to add system prompt when training/evaluating RM? HOT 1
- rewardbench.py results are different for different batch size for beaver-7b HOT 40
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reward-bench.