Comments (9)
- mmlu college_chemistry 就是只有 100 题,请注意该子集有一些题目是跨行的。此处 的统计也是 100 题
- 请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py
- 猜测 445 是指目前正在跑的某一个子任务的进度条,不是总的进度条
from opencompass.
We use ceval-val as default:
from opencompass.
- mmlu college_chemistry 就是只有 100 题,请注意该子集有一些题目是跨行的。此处 的统计也是 100 题
- 请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py
- 猜测 445 是指目前正在跑的某一个子任务的进度条,不是总的进度条
hendrycksTest-college_chemistry 就是MMLU-college_chemistry?
from opencompass.
对,hendrycksTest-college_chemistry 就是 MMLU-college_chemistry
from opencompass.
- mmlu college_chemistry 就是只有 100 题,请注意该子集有一些题目是跨行的。此处 的统计也是 100 题
- 请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py
- 猜测 445 是指目前正在跑的某一个子任务的进度条,不是总的进度条
hendrycksTest-college_chemistry 就是MMLU-college_chemistry?
第3个问题,445是因为跑了所有数据集,不是已经指定了--datasets mmlu_gen ceval_gen了么?
configs/eval_test.py配置文件内容如下:
from mmengine.config import read_base
from opencompass.models import OpenAI
from opencompass.partitioners import NaivePartitioner
from opencompass.runners import LocalRunner
from opencompass.tasks import OpenICLInferTask
with read_base():
from .datasets.collections.chat_medium import datasets
from .summarizers.medium import summarizer
GPT4 needs a special humaneval postprocessor
#from opencompass.datasets.humaneval import humaneval_gpt_postprocess
#for _dataset in datasets:
if _dataset['path'] == 'openai_humaneval':
_dataset['eval_cfg']['pred_postprocessor']['type'] = humaneval_gpt_postprocess
api_meta_template = dict(
round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
],
)
models = [
dict(abbr='deng',
type=OpenAI, path='deng',
key='http://127.0.0.1:19201', # The key will be obtained from $OPENAI_API_KEY, but you can write down your key here as well
meta_template=api_meta_template,
query_per_second=1,
max_out_len=4096, max_seq_len=4096, batch_size=8),
]
infer = dict(
partitioner=dict(type=NaivePartitioner),
runner=dict(
type=LocalRunner,
max_num_workers=1,
task=dict(type=OpenICLInferTask)),
)
from opencompass.
对,hendrycksTest-college_chemistry 就是 MMLU-college_chemistry
ceval数据集差的更大,例如,ceval-college_chemistry数据集,评测日志文件中,只有24道题,但是数据集文件(ceval数据集的test目录)college_chemistry_test.csv有224道题,差别很大,绝对不是数据集跨行问题,想这个问题,是因为担心用Opencompass评测ceval数据集测试少了,最终得到分数和C-Eval官方测出的分数没有可比性!(因为测试题目数量不一样)。
from opencompass.
ceval是有val和test 两个split
from opencompass.
We use ceval-val as default:
问题解决了,多谢!
from opencompass.
- mmlu college_chemistry 就是只有 100 题,请注意该子集有一些题目是跨行的。此处 的统计也是 100 题
- 请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py
- 猜测 445 是指目前正在跑的某一个子任务的进度条,不是总的进度条
问题已经解决,多谢!
from opencompass.
Related Issues (20)
- [Feature] Why is the leaderboard called "Multi-modal Modal Leaderboard"? HOT 2
- Is qwen1.5 supported? HOT 2
- [Bug] 无法测评openai接口格式部署的模型 HOT 5
- [Bug] TypeError: Fields of type "<class 'typing.IO'>" are not supported.
- [Bug] FileNotFoundError: Couldn't find a module script at xxx/accuracy/accuracy.py. Module 'accuracy' doesn't exist on the Hugging Face Hub either. HOT 1
- [Bug] 新增API模型报错 KeyError: 'opencompass.models.xxx is not in the opencompass::model registry. HOT 2
- 关于 医疗方面 MedBench, 在连接模型测试时的问题 HOT 4
- [Bug] 评测mbpp数据集时,infer过程报错TypeError: can only concatenate tuple (not "str") to tuple
- [Bug] 'NoneType' object cannot be interpreted as an integer HOT 1
- [Bug] Evaluations on Mistral-7B-v0.1 couldn't be reproduced HOT 1
- [Bug] KeyError: 'opencompass.openicl.icl_evaluator.TEvalEvaluator is not in the opencompass::icl_evaluators registry. HOT 2
- [Bug] KeyError: 'path' when executing python run.py configs/multimodal/tasks.py --mm-eval HOT 1
- 关于大海捞针测试的问题
- [Bug] 🐛 Type Err in Humaneval
- [Feature] Will OpenCompass provide the CPM Model evaluation result ? HOT 1
- [Bug] FileNotFoundError: [Errno 2] No such file or directory: 'data/subjective/mtbench.json' HOT 3
- [Bug] Prompt with trailing whitespace may hurt model performance HOT 5
- [Feature] Improve evaluation scripts for mbpp datasets HOT 4
- [Bug] 测试训练的baichuan2遇到的问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opencompass.