Comments (3)
We use the answer
field as the GT answer not raw_answer
. It's a common practice.
I strongly advise using the official evaluation code from VQA v2 datasets to obtain a standard evaluation scheme/process.
from prophet.
We use the
answer
field as the GT answer notraw_answer
. It's a common practice.I strongly advise using the official evaluation code from VQA v2 datasets to obtain a standard evaluation scheme/process.
Thank you very much. For the official evaluation code from VQA v2 datasets, I am not familiar with it, and I will study further. Using above code and the answer field as the GT answer, I did get the performance about 53 on OKVQA val. But I got 49.01 on A-OKVQA val using the following code. Is my calculation method wrong again?
import json
#load data
with open('candidates_aokvqa_val.json') as f:
answer_candidates = json.load(f)
with open('aokvqa_v1p0_val.json') as f:
val_datasets = json.load(f)
#compute score for a predicted answer
def direct_scores(pred_answer, direct_answers):
acc_num = 0
cnt = 0
for _, answer_id in enumerate(direct_answers):
if pred_answer == answer_id:
cnt += 1
if cnt ==1:
acc_num = 0.3
elif cnt == 2:
acc_num = 0.6
elif cnt > 2:
acc_num = 1
return acc_num
#Calculate the accuracy of the first candidate answer for all samples
acc = 0.0
for single_sample in val_datasets:
single_sample['DA_candidate'] = [each_answer['answer'] for each_answer in answer_candidates[str(single_sample['question_id'])]]
score = []
for i in single_sample['DA_candidate']:
score.append(direct_scores(i, single_sample['direct_answers']))
acc += score[0]
print(acc/len(val_datasets))
Looking forward to your reply.
from prophet.
The official evaluation formula of A-OKVQA is little bit different with that of OK-VQA and VQA v2. See the official code for details.
P.S. I believe it's a mistake by authors of A-OKVQA. They claimed the evaluation of A-OKVQA is following that of OK-VQA. But it seems like that they misunderstand the original evaluation formula of OK-VQA. However, we can still use the A-OKVQA dataset to conduct fair comparisons as long as we always use the official code to evaluate the results.
P.P.S You may find the implementation from VQA v2 is little different with the implementation of your code (above). But you can prove that they are mathematically equivalent. In one word, both implementation is valid for VQA v2 and OK-VQA.
from prophet.
Related Issues (20)
- The process of image caption
- Checkpoints Availability HOT 2
- assets HOT 1
- Replacing GPT-3 with other academic LLMs HOT 8
- KeyError: 179520 ?? HOT 13
- 当我运行bash scripts/extract_img_feats.sh时显示下面内容,但并没有生成coco2014_feats HOT 7
- 当我在训练stage1时预训练、微调和生成候选答案时报了一样的错OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the HOT 4
- 当我运行stage2的命令时,显示错误连接openAI,这个是什么原因呢? HOT 1
- 当我在训练stage1时预训练、微调和生成候选答案时报了一样的错 TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType 请问该怎么解决呢 HOT 1
- Trained model HOT 1
- okvqa-stage1-pretrain HOT 3
- OpenAI-Api Cost HOT 1
- skip step 1 and go directly to step 2 HOT 1
- How can I run this model in my custom dataset? HOT 1
- 1
- Accuracy does not increased HOT 1
- Naive question on OK-VQA and A-OKVQA evaluation.
- Huggingface model HOT 1
- Hello, may I ask how you avoid the problem of large explanatory texts in the answers generated by chat3.5. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prophet.