milvlg / prophet Goto Github PK

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Home Page: https://arxiv.org/abs/2303.01903

License: Apache License 2.0

Python 94.46% Shell 5.54%

a-okvqa gpt-3 multimodal-deep-learning okvqa prompt-engineering pytorch visual-question-answering

prophet's People

Contributors

Stargazers

Watchers

prophet's Issues

The process of image caption

Is there the process code about image caption? I find a "captions_okvqa.json" file in the assets about image caption, but I do not find the process code about this file

Could you provide a finetuned model for A-OKVQA dataset?

Hi, I am quite interested in your nice work and happy to see the code has been released!
A finetuned model for OK-VQA has been provided. It works.
I want to try your method on another dataset, A-OKVQA but don't find the finetuned checkpoint.
So could you provide the finetuned model for A-OKVQA dataset? Thanks!

mcan_530_okvqa.json

请问我运行完finetune.sh生成的json文件生成了 10096条数据您给的mcan_530_okvqa.json 只有5048条
而且当我评估时准确率只有50%是因为什么是流程错了吗是用的您给的预训练模型

运行“bash scripts/extract_img_feats.sh”后提示“RuntimeError: CUDA error: out of memory”

亲爱的作者您好，我运行“bash scripts/extract_img_feats.sh”命令后提示显存不足，服务器的显卡是3090，我把tools文件夹里的extract_img_feats.py文件修改了一下，把关于图片的操作修改到cpu上就可以运行了，但是速度非常慢，需要运行两百多小时，请问一下有没有什么解决方法？

Huggingface model

Have your models been updated to Hugging Face?

bug

Have you ever encountered this kind of bug?

The first candidate answer of your provided candidates_okvqa.json in assets.zip

Thank you very much for providing the code. I calculated the accuracy of the first answer on OKVQA val in the candidates_okvqa.json in assets.zip you provided. The code I run is following. It turns out that the accuracy is 47.06 instead of 53. Did I do something wrong？

import json

#load data
with open('candidates_okvqa.json') as f:
    answer_candidates = json.load(f)
with open('mscoco_val2014_annotations.json') as f:
    val_datasets_annotations = json.load(f)['annotations']

#organize answer list
val_datasets = []
for val_a in val_datasets_annotations:
    multi_answers = []
    for ans in val_a['answers']:
        multi_answers.append(ans['raw_answer'])
    row = {'question_id': val_a['question_id'], 'direct_answers': multi_answers}
    val_datasets.append(row)

#compute score for a predicted answer
def direct_scores(pred_answer, direct_answers):
    acc_num = 0
    cnt = 0
    for _, answer_id in enumerate(direct_answers):
        if pred_answer == answer_id:
            cnt += 1
    if cnt ==1:
        acc_num = 0.3
    elif cnt == 2:
        acc_num = 0.6
    elif cnt > 2:
        acc_num = 1
    return acc_num

#Calculate the accuracy of the first candidate answer for all samples
acc = 0.0
for single_sample in val_datasets:
    single_sample['DA_candidate'] = [each_answer['answer'] for each_answer in answer_candidates[str(single_sample['question_id'])]]
    score = []
    for i in single_sample['DA_candidate']:
        score.append(direct_scores(i, single_sample['direct_answers']))
    acc += score[0]
print(acc/len(val_datasets))

Looking forward to your reply.

50GB memory？

"To conduct the following experiments, a machine with at least 1 RTX 3090 GPU, 50GB memory", wherein "50GB memory" refers to Memory for CPU or GPU？

mcan_530_okvqa.json

你好，请问一下mcan_530_okvqa.json 是哪部分代码生成的谢谢

pretrain.py的代码，当RESUME的时候，start_epoch = self.__C.CKPT_EPOCH是不是有问题呢？

start_epoch = self.__C.CKPT_EPOCH

但是，我看了openvqa中的代码，发现start_epoch = ckpt['epoch'] ，ckpt = torch.load(path)，是从ckpt中加载的，而不是从配置文件读取。如果按照上述的代码，每次接着预训练每次都从self.__C.CKPT_EPOCH开始。我感觉不对，还是有什么其它的意义呢？

Trained model

Can we use the model you have already trained from existing code？

The process of image caption

Is there the process code about image caption? I find a "captions_okvqa.json" file in the assets about image caption, but I do not find the process code about this file

How can I run this model in my custom dataset?

您好，关于Caption这块有疑问想请教您一下

   这个多模态模型可以理解为：使用“caption”和“答案启发”方式将图片要素转为文字来进行两个模态的交互吗？
  “Caption的内容是需要使用an off-the-shelf captioning model将图像翻译成caption”，那么这个额外的图转文模型才是多模态交互关键点吗，这样的话这个额外的模型才是决定此模型关键把，如果额外模型效果不好，噪音也就很大调用gpt意义就不大了吧？

当我在训练stage1时预训练、微调和生成候选答案时报了一样的错OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the

当我在训练stage1时使用官方给的预训练、微调和生成候选答案命令时报了一样的错
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-large-cased is not the path to a directory containing a file named config.json
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode
在main.py中添加代码
TRANSFORMERS_OFFLINE=1 # 离线状态下可运行
又报了新的错误：
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104,'Connection reset by peer'))
请问该怎么解决呢

when I skip stage1 and execute stage2, the dataset size is 0

is right?

OpenAI's apikey

I can't call OpenAI's apikey on the rented server, is there any way, thank you

Error communicating with OpenAI

Hello dear author, I tried your modified result but there is still this problem. Do you have any good solutions?

okvqa-stage1-pretrain

when l pretrain in okvqa use mcan model,it error
raise LocalEntryNotFoundError( huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.
During handling of the above exception, another exception occurred:
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-large-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

so need download bert-large-uncased online? and run code offline?

Prerequisites Questions

Dear author, when I was processing "conda env create -f environment.yml" a error occured like this：

If its right to delete the "@v1.0"
I hope you can help me to answer this question, thank you very much.

Naive question on OK-VQA and A-OKVQA evaluation.

Hi @ParadoxZW @MIL-VLG , thanks for your grate project.

I am not very familiar with OK-VQA and A-OKVQA evaluation. Here are some naive questions:

OK-VQA and A-OKVQA have an open-ended QA setting. For each question, it has ~10 gt answers (although some answers are the same). Do you use exact match (vqav2-style, match at least 3 gt answers) to compute the accuracy?
Is it common to train on A-OKVQA train+val and conduct inference on A-OKVQA test?

KeyError: 179520 ？？

while running command

bash scripts/pretrain.sh \
    --task ok --version okvqa_pretrain_1 --gpu 0

I met this problem:

Traceback (most recent call last):
  File "/root/autodl-fs/prophet-main/main.py", line 35, in <module>
    runner.run()
  File "/root/autodl-fs/prophet-main/prophet/stage1/pretrain.py", line 162, in run
    self.train(train_set, valid_set)
  File "/root/autodl-fs/prophet-main/prophet/stage1/pretrain.py", line 93, in train
    for step, input_tuple in enumerate(dataloader):
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/autodl-fs/prophet-main/prophet/stage1/utils/load_data.py", line 136, in __getitem__
KeyError: 179520

whole project structure:

prophet-main
├── assets
│   ├── answer_aware_examples_okvqa.json
│   ├── answer_dict_aokvqa.json
│   ├── answer_dict_okvqa.json
│   ├── answer_dict_vqav2.json
│   ├── candidates_aokvqa_test.json
│   ├── candidates_aokvqa_val.json
│   ├── candidates_okvqa.json
│   ├── captions_aokvqa.json
│   ├── captions_okvqa.json
│   ├── examples_aokvqa_test.json
│   ├── examples_aokvqa_val.json
│   └── Untitled.ipynb
├── ckpts
│   └── epoch_6.pkl
├── CLIP
│   ├── clip
│   │   ├── bpe_simple_vocab_16e6.txt.gz
│   │   ├── clip.py
│   │   ├── __init__.py
│   │   ├── model.py
│   │   └── simple_tokenizer.py
│   ├── CLIP.png
│   ├── data
│   │   ├── country211.md
│   │   ├── prompts.md
│   │   ├── rendered-sst2.md
│   │   └── yfcc100m.md
│   ├── hubconf.py
│   ├── LICENSE
│   ├── MANIFEST.in
│   ├── model-card.md
│   ├── notebooks
│   │   ├── Interacting_with_CLIP.ipynb
│   │   └── Prompt_Engineering_for_ImageNet.ipynb
│   ├── README.md
│   ├── requirements.txt
│   ├── setup.py
│   └── tests
│       └── test_consistency.py
├── configs
│   ├── finetune.yml
│   ├── path_cfgs.py
│   ├── pretrain.yml
│   ├── prompt.yml
│   ├── __pycache__
│   │   ├── path_cfgs.cpython-39.pyc
│   │   ├── task_cfgs.cpython-39.pyc
│   │   └── task_to_split.cpython-39.pyc
│   ├── task_cfgs.py
│   └── task_to_split.py
├── datasets
│   ├── aokvqa
│   │   ├── aokvqa_v1p0_test.json
│   │   ├── aokvqa_v1p0_train.json
│   │   └── aokvqa_v1p0_val.json
│   ├── coco2014
│   │   ├── train2014
│   │   ├── train2014.zip
│   │   └── val2014
│   ├── coco2014_feats
│   │   ├── train2014
│   │   ├── train2014.zip
│   │   ├── val2014
│   │   └── val2014.zip
│   ├── coco2017
│   ├── coco2017_feats
│   ├── datasets.zip
│   ├── okvqa
│   │   ├── mscoco_train2014_annotations.json
│   │   ├── mscoco_val2014_annotations.json
│   │   ├── OpenEnded_mscoco_train2014_questions.json
│   │   └── OpenEnded_mscoco_val2014_questions.json
│   ├── old_data
│   │   ├── coco2014
│   │   └── coco2014_feats
│   ├── Untitled.ipynb
│   └── vqav2
│       ├── v2_mscoco_train2014_annotations.json
│       ├── v2_mscoco_val2014_annotations.json
│       ├── v2_OpenEnded_mscoco_train2014_questions.json
│       ├── v2_OpenEnded_mscoco_val2014_questions.json
│       ├── v2valvg_no_ok_annotations.json
│       ├── v2valvg_no_ok_questions.json
│       ├── vg_annotations.json
│       └── vg_questions.json
├── environment.yml
├── evaluation
│   ├── ans_punct.py
│   ├── aok_utils
│   │   ├── eval_predictions.py
│   │   ├── load_aokvqa.py
│   │   ├── __pycache__
│   │   └── remap_predictions.py
│   ├── aokvqa_evaluate.py
│   ├── okvqa_evaluate.py
│   ├── __pycache__
│   │   ├── ans_punct.cpython-39.pyc
│   │   ├── aokvqa_evaluate.cpython-39.pyc
│   │   └── okvqa_evaluate.cpython-39.pyc
│   └── vqa_utils
│       ├── __pycache__
│       ├── vqaEval.py
│       └── vqa.py
├── LICENSE
├── main.py
├── misc
│   ├── framework.png
│   └── tree.txt
├── outputs
│   ├── ckpts
│   │   ├── okvqa_finetune_1
│   │   ├── okvqa_heuristics_1
│   │   └── okvqa_pretrain_1
│   ├── logs
│   │   ├── okvqa_finetune_1
│   │   └── okvqa_pretrain_1
│   └── results
│       ├── okvqa_finetune_1
│       └── okvqa_heuristics_1
├── preds
├── prophet
│   ├── __init__.py
│   ├── __pycache__
│   │   └── __init__.cpython-39.pyc
│   ├── stage1
│   │   ├── finetune.py
│   │   ├── heuristics.py
│   │   ├── model
│   │   ├── pretrain.py
│   │   ├── __pycache__
│   │   └── utils
│   └── stage2
│       ├── prompt.py
│       └── utils
├── README.md
├── scripts
│   ├── evaluate_file.sh
│   ├── evaluate_model.sh
│   ├── extract_img_feats.sh
│   ├── finetune.sh
│   ├── heuristics_gen.sh
│   ├── pretrain.sh
│   └── prompt.sh
├── --task
├── tools
│   ├── extract_img_feats.py
│   ├── __pycache__
│   │   └── transforms.cpython-39.pyc
│   └── transforms.py
└── Untitled.ipynb

Accuracy does not increased

I have trained on the custom dataset, During the training model loss decreased but accuracy remained the same at Zero.

could prophet work with LLaMA(from facebook)

I think prophet can offer answer heuristics for LLaMA instead of OpenAI GPT-3 to get better results.
first, LLaMA model is opensource.
second,LLaMA maybe better than GPT-3.

当我在训练stage1时预训练、微调和生成候选答案时报了一样的错 TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType 请问该怎么解决呢

Loading common data...
== Total image number: 123287
Traceback (most recent call last):
File "/root/autodl-tmp/prophet/main.py", line 40, in
runner.run()
File "/root/autodl-tmp/prophet/prophet/stage1/pretrain.py", line 160, in run
common_data = CommonData(self.__C)
File "/root/autodl-tmp/prophet/prophet/stage1/utils/load_data.py", line 55, in init
self.tokenizer = AutoTokenizer.from_pretrained(__C.BERT_VERSION)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1834, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1959, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/miniconda3/envs/prophet/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 213, in init
if not os.path.isfile(vocab_file):
File "/root/miniconda3/envs/prophet/lib/python3.9/genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Replacing GPT-3 with other academic LLMs

Thank you so much for your excellent work!

I have a minor problem about the LLM selection. Have you tried other academic LLM models, e.g., LLAMA, to replace GPT-3? Will it make a big performance difference? Thanks!

Best regards

1 请问国内可以直接跑吗？跑起来可以直接调到openai的API？

assets

May I ask if the files in the assets folder were created by myself? If they were generated by code, please let me learn the code. Thank you.

当我运行bash scripts/extract_img_feats.sh时显示下面内容，但并没有生成coco2014_feats

当我运行bash scripts/extract_img_feats.sh时显示下面内容，但并没有生成coco2014_feats
image dirs: ['datasets/coco2014/train2014/', 'datasets/coco2014/val2014/']
total images: 0
100%|████████████████████████████████████| 1.26G/1.26G [2:37:25<00:00, 143kiB/s]
0it [00:00, ?it/s]
请问是为什么

could you upload file to Baidu netdisk

the file of dataset and pretrained model from sharepoint.com can not download successfully, could you upload pretrained model to Baidu netdisk?

very good guy and very impressive story

it is NOT an issue but respect!!!!

预训练MCAN模型和在okvqa上微调是一起的吗？应该先预训练MCAN，再去微调。

At this stage, we train an improved MCAN model through pretraning on VQA v2 and finetuning on target dataset. Take OK-VQA for example, run pretraining step with commands:

$ bash scripts/pretrain.sh --task ok --version okvqa_pretrain_1 --gpu 0

预训练MCAN模型和在okvqa上微调是一起的吗？应该先预训练MCAN，再去微调。
但是，上面的脚本，task是ok，是不是MCAN已经预训练结束了，然后在okvqa上进行微调？还是，预训练和微调放在一起执行呢？
是否应该有单独进行mcan预训练的执行脚本代码？然后，保存checkpoint，提供下载，然后再去okvqa上进行微调？

skip step 1 and go directly to step 2

Step 1 takes a long time. You mentioned in your introduction that we can skip step 1 and go directly to step 2 based on the answer_aware_example_okvqa.json and candidates_okvqa.json you provided, right?

multiple GPUs

Dear author, how do I set up multiple GPUs correctly?
questions.docx

Checkpoints Availability

Hello! I was wondering when/if the checkpoints for prophet would be made publically available? Thanks in advance :)

请问你们跑这几个数据集花了多少钱调用api... 我才刚开始跑一小会就花了5美元。。太贵了

当我运行stage2的命令时，显示错误连接openAI,这个是什么原因呢？

Loaded dataset size: 9009, top10 accuracy: 91.81, top1 accuracy: 86.54
Loaded dataset size: 5046, top10 accuracy: 79.83, top1 accuracy: 53.05

Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5046 0:06:31/-:--:--(--s/iter)<class 'openai.error.APIConnectionError'> Error communicating with OpenAI
retrying...

OpenAI-Api Cost

l want know the cost of the whole process of the use of openai-api,or the cost of one use test of the model.
I'm afraid I can't afford the expense of my experiments.
Probably cost is enough.l hope can get answer,thank you very much.

milvlg / prophet Goto Github PK

prophet's People

Contributors

Stargazers

Watchers

Forkers

prophet's Issues

Recommend Projects

Recommend Topics

Recommend Org