I want to propose adding a version signature to AlpacaEval a la <a href="https://githu

Possibility of adding a version signature about alpaca_eval HOT 3 OPEN

mathewhuen commented on June 17, 2024

Possibility of adding a version signature

from alpaca_eval.

Comments (3)

YannDubs commented on June 17, 2024

Yeah, I like that! Are you suggesting to save that in the csv leaderboard? Note that all os "m:win|e:weighted_ae_g4turbo |b:g4turbo|ae:2" is actually equivalent to "AlpacaEval 2" and thus constant inside of a csv leaderboard but it can be good to have the signature for people to refer to in a paper. I see how v:0.5.4 can be very useful to know!

Related: I think we should also save the date in the annotations & model_output JSON as those change over time + caching means that not all examples were annotated on the same day.

from alpaca_eval.

mathewhuen commented on June 17, 2024

Oh I think including the date would definitely be helpful for calculating the leaderboard! Would saving both the date and either the repo version or commit hash to the annotation/model_output JSON files be overkill?

As for the signature, my motivation was strictly for reporting in papers or other research. Exactly like you said, the parameters for the leaderboard are mostly fixed (and any changes could be tracked by the inclusion of dates).

If the leaderboard is the primary goal of AlpacaEval, adding better metadata to the JSON files would probably be a higher priority. What do you think?

from alpaca_eval.

YannDubs commented on June 17, 2024

Todo:

add a signature in the csv leaderboard so that people can report them in papers and make sure it's comparable.
print the signature in the terminal when running alpaca_eval ... so that people can quickly check that they are running what they are thinking
add the date & repo version in the JSONs (only for new annotations & generations)

Not sure when I'll have time but should be able to do it this week or next. Feel free to open a PR also if you want to help.

Note:

IMO leaderboard isn't the primary goal of AE, I think the goal is an evaluation that people can hyperparameter tune while knowing that it's pretty highly correlated with human preference. But leaderboard is important to provide the trust for people to use it for hparam tuning!

from alpaca_eval.

Recommend Projects

Possibility of adding a version signature about alpaca_eval HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent