Thank you very much for providing the code. I calculated the accuracy of the first ans

We use the answer field as the GT answer not <code c

The first candidate answer of your provided candidates_okvqa.json in assets.zip about prophet HOT 3 CLOSED

zhongfansun commented on September 23, 2024

The first candidate answer of your provided candidates_okvqa.json in assets.zip

from prophet.

Comments (3)

ParadoxZW commented on September 23, 2024

We use the answer field as the GT answer not raw_answer. It's a common practice.

I strongly advise using the official evaluation code from VQA v2 datasets to obtain a standard evaluation scheme/process.

from prophet.

zhongfansun commented on September 23, 2024

We use the answer field as the GT answer not raw_answer. It's a common practice.

I strongly advise using the official evaluation code from VQA v2 datasets to obtain a standard evaluation scheme/process.

Thank you very much. For the official evaluation code from VQA v2 datasets, I am not familiar with it, and I will study further. Using above code and the answer field as the GT answer, I did get the performance about 53 on OKVQA val. But I got 49.01 on A-OKVQA val using the following code. Is my calculation method wrong again?

import json

#load data
with open('candidates_aokvqa_val.json') as f:
    answer_candidates = json.load(f)
with open('aokvqa_v1p0_val.json') as f:
    val_datasets = json.load(f)

#compute score for a predicted answer
def direct_scores(pred_answer, direct_answers):
    acc_num = 0
    cnt = 0
    for _, answer_id in enumerate(direct_answers):
        if pred_answer == answer_id:
            cnt += 1
    if cnt ==1:
        acc_num = 0.3
    elif cnt == 2:
        acc_num = 0.6
    elif cnt > 2:
        acc_num = 1
    return acc_num

#Calculate the accuracy of the first candidate answer for all samples
acc = 0.0
for single_sample in val_datasets:
    single_sample['DA_candidate'] = [each_answer['answer'] for each_answer in answer_candidates[str(single_sample['question_id'])]]
    score = []
    for i in single_sample['DA_candidate']:
        score.append(direct_scores(i, single_sample['direct_answers']))
    acc += score[0]
print(acc/len(val_datasets))

Looking forward to your reply.

from prophet.

ParadoxZW commented on September 23, 2024

The official evaluation formula of A-OKVQA is little bit different with that of OK-VQA and VQA v2. See the official code for details.

P.S. I believe it's a mistake by authors of A-OKVQA. They claimed the evaluation of A-OKVQA is following that of OK-VQA. But it seems like that they misunderstand the original evaluation formula of OK-VQA. However, we can still use the A-OKVQA dataset to conduct fair comparisons as long as we always use the official code to evaluate the results.

P.P.S You may find the implementation from VQA v2 is little different with the implementation of your code (above). But you can prove that they are mathematically equivalent. In one word, both implementation is valid for VQA v2 and OK-VQA.

from prophet.

Recommend Projects

The first candidate answer of your provided candidates_okvqa.json in assets.zip about prophet HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent