Code Monkey home page Code Monkey logo

flageval's People

Contributors

asyncat avatar bowen92 avatar fggggg7142 avatar ftgreat avatar philokey avatar siyu-hu avatar xiyang85 avatar xuanricheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flageval's Issues

flageval-serving get output with Unicode. Is this foamat ok?

I write a test using ChatGLM2, and run the server, give an input of "你是谁". And I get a reponse with a bunch of unicode.
Is it ok for your evaluation?

Output is :

{
  "completions": [
    {
      "logprobs": [],
      "text": "\u4f60\u662f\u8c01?\n\n\u6211\u662f ChatGLM,\u662f\u6e05\u534e\u5927\u5b66KEG\u5b9e\u9a8c\u5ba4\u548c\u667a\u8c31AI\u516c\u53f8\u5171\u540c\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\u3002\u6211\u7684\u4efb\u52a1\u662f\u670d\u52a1\u5e76\u5e2e\u52a9\u4eba\u7c7b,\u4f46\u6211\u5e76\u4e0d\u662f\u4e00\u4e2a\u771f\u5b9e\u7684\u4eba\u3002",
      "tokens": "\u4f60\u662f\u8c01?\n\n\u6211\u662f ChatGLM,\u662f\u6e05\u534e\u5927\u5b66KEG\u5b9e\u9a8c\u5ba4\u548c\u667a\u8c31AI\u516c\u53f8\u5171\u540c\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\u3002\u6211\u7684\u4efb\u52a1\u662f\u670d\u52a1\u5e76\u5e2e\u52a9\u4eba\u7c7b,\u4f46\u6211\u5e76\u4e0d\u662f\u4e00\u4e2a\u771f\u5b9e\u7684\u4eba\u3002",
      "top_logprobs_dicts": []
    }
  ],
  "input_length": 0,
  "model_info": "",
  "status": 200
}

关于最新榜单中通义千问chat版本的评测疑问

    不知道这块评测具体细节,但是大部分模型在中文客观题上chat版比base版有提升,反而千问是断崖式下滑(0.596-》0.070),从结果上来看有点异常。跟opencompass那边的评测的结果也有较大出入。
    建议还是再确认一下评测细节?是不是prompt啥的有点问题

关于排行榜的疑问

请问,排行榜中的小数是否是得分?

ChatGLM-6B在中文选择问答Chinese_MMLU数据集下的得分是0.212,是否可以理解为,满分100分的话,得分为21.2分?也就是说,100道题,只答对了21道?

文生图模型评估方法

你好,文生图模型评估方法有进展吗,比如只针对某一类的生成图,例如,生成人的

finder.py 中的问题

from cached_property import cached_property

这个是不是错误的,应该是:
from functools import cached_property
吧?

run evaluate.py and encounter error: KeyError: 'text_config'. How to do?

Running command in evaluate.md
python evaluate.py --datasets=cifar10,cifar100 --model_name=AltCLIP-XLMR-L

The dataset and model are downloaded, but there's an error:

File "C:\anaconda3\lib\site-packages\flagai\model\mm\AltCLIP.py", line 83, in init
self.text_config = STUDENT_CONFIG_DICTkwargs['text_config']['model_type']
KeyError: 'text_config'

According to the source code of class AltCLIPConfig, the text_config should be passed by **kwargs. Actually nothing is passed. How can I do it?

How could resnet50 be 1B parameters?

image
As we know, ResNet50 has a total of 25,636,712 parameters. Of these, 25,583,592 are trainable and 53,120 are non-trainable. The model has 177 layers.
Check this link for further explanation.

请问关于evaluation具体实现细节相关问题

在知乎文章中您提到,"我们利用 ImageEval-prompt 对知名文生图模型进行评测,针对每个Prompt,让每个模型生成8张图片,标注者在未看到Prompt的情况下对8张图片进行排序,并选择前三张排名较高的图片,最后标注这三张图片是否正确表达了Prompt的关键信息。"

在最后一步,即“标注这三张图片是否正确表达了Prompt的关键信息”,这里的具体操作是什么呢?

例如,对于prompt“穿着华丽的衣服的女士坐在椅子上,素描”,其颜色,性别,五官的标注分别为0,1,2,那么评测人员是否只需要根据标注维度(无视prompt)判断生成的图片是否符合各个维度的标注结果(0未出现,1简单考察,2复杂考察),还是评测人员同时可以看到标注与prompt,再根据标注回到prompt判断图片表达是否准确?例如,标注人员已知性别标注为1,那么根据prompt需要自行判断生成图片内容是否符合prompt中描述的“女士”一项。如果评测人员采取第二种方法,那么对于标注1简短考察与2复杂考察,它们在评测流程中的区别是什么?

最后,在“标注维度说明”一节中,您给出了每一个子维度的具体标注,请问数据集中是否每条数据的每个子维度均有具体标注供评测人员参考?还是目前开源的数据集已经是所有内容了?

感谢您的回答!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.