Comments (1)
Atm, it's not possible; however, if you run a task with many subsets (using a config file), you should get a display of the average at the task level in the score table.
If you want to get results comparable to the Open LLM Leaderboard, you'll need to use lighteval
(you can take a look at the differences between the 3 versions here).
from lighteval.
Related Issues (20)
- Feature: Checkpointing on task level. HOT 2
- Expose a few model predictions / gold answers in the logs HOT 1
- DROP Evaluation with Llama3 (vs. lm-evaluation-harness) HOT 1
- `Could not initialize the JudgeOpenAI model` and `openi` import error HOT 1
- Add Sympy equivalence for MATH / GSM8K? HOT 1
- Version of a task should be configurable. HOT 6
- Error: `ModuleNotFoundError: No module named 'openai'`. HOT 5
- Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! HOT 3
- Performance compared to lm-evaluation-harness HOT 6
- Evaluate EncoderDecoderModels HOT 2
- Zero scores on cnn-dm benchmark from HELM HOT 3
- Issue when saving results with fsspec==2023.12.1
- `apply_target_perplexity_metric` pops only the first response
- `LightevalTask.process_results()` is not aligned with `LightevalTask.get_request_type()` HOT 3
- An apparent bug in drop's dealing with multi-span answer HOT 3
- Expose `stop_sequence` at command line
- The helm|piqa task is generative but has generation_size=-1. HOT 1
- ModuleNotFoundError: No module named 'lighteval' HOT 2
- quantized model not loading HOT 6
- Different results for gsm8k via lighteval compared to internal pipeline
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lighteval.