Comments (2)
Hi , we will fix this bug later, but you can write a config file like https://github.com/open-compass/opencompass/blob/main/configs/eval_needlebench.py
In your case, it would be like
from mmengine.config import read_base
with read_base():
# only eval original "needle in a haystack test" in needlebench_4k
from .models.chatglm.hf_chatglm3_6b_32k import models
from .datasets.needlebench.needlebench_8k.needlebench_single import needlebench_datasets_zh, needlebench_datasets_en
from .summarizers.needlebench import needlebench_4k_summarizer as summarizer
for m in models:
m['max_seq_len'] = 32768
datasets = sum([v for k, v in locals().items() if ('datasets' in k)], [])
work_dir = './outputs/needlebench'
and then run following command
python run.py the_config_file_path.py
make sure you have enough gpus, in your environment , it seems that your cuda is not available.
"'CUDA available': False,"
from opencompass.
Hello @hejunqing , I have already fixed this issue in the latest OpenCompass. Please see this PR for details: #1102. It includes updated guides for conducting the Needle In A Haystack evaluation, including instructions for running from the command line.
from opencompass.
Related Issues (20)
- [Bug] 使用api测评时mode参数不起作用,超出max_seq_len并没有按mode切分输入
- [Feature] Falmes dataset evaluation seems to be missing configs and json file HOT 3
- [Bug] 评测lawbench数据集时偶现异常
- [Feature] 支持openai/GPT4-o的评测seting HOT 1
- GenInferencer PPLInferencer 不能集成到一起吗[Feature] HOT 2
- [Feature] 如何在needlebench 中使用api model? HOT 1
- [Feature] config的bug,提示下载configs,然后下载了又出现以下bug
- [Bug] unrecognized arguments: --no-batch-padding HOT 1
- opencompass榜单更新情况 HOT 2
- [Bug] hf_chatglm3_6b评测AFQMC数据集时,自测结果与官方不一致。且自测结果不稳定。 HOT 1
- No module named 'opencompass' HOT 9
- [Bug] Unable to use tutorial methods properly——KeyError: 'opt125m'or'opt350m' HOT 1
- [Bug] opencompass/cli和opencompass/datasets/IFEval下缺少__init__.py所以release版本是不能导入这两个包的 HOT 1
- [Bug] configs/datasets/agieval/agieval_mixed_713d14.py not found
- [Bug] llm-compression task faild at eval stage with latest version HOT 3
- [Bug] which version of the dataset should be selected When evaluating the Llama3 model,
- [Bug] run pytorch Qwen-7B-Chat with ARC-c ppl under CPU ,and result is not good HOT 1
- 大海捞针数据集初始化报错( Failed to get opencompass.datasets.needlebench.origin.NeedleBenchOriginDataset)
- opencompass公开榜单更新[Feature] HOT 1
- [Bug] When testing on gen datasets, even if the output is empty or incorrect, unexpected scores can be obtained
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opencompass.