fawazsammani / nlxgpt Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 10.0 784 KB

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)

Python 88.87% Jupyter Notebook 11.13%

nlxgpt's People

Contributors

Stargazers

Watchers

Forkers

pugangqiang suo-wei dschaehi yqgao716 simeonjunker teodorchiaburu sommoha njuhugn mkava98

nlxgpt's Issues

the link is broken

Hi
The "annotations" link is broken.

I can't open this link

(404. That’s an error.
The requested URL was not found on this server. That’s all we know.)

what should I do?

Thank you for the response to question 1. Do you mean 2-3h or 6h is the time for training 30 epochs?
And how to determine which epoch achieves the optimal model? Is it the epoch where a certain metric (such as B-4) achieves the best value on the validation set? Or something else?

Google driver link is invalid

How to run pretrained model VQA-X with single image in Google Colab

Hi!

I am wanting to run the pretrained VQAX_p model (the same one that is used in the 'Explanations with Natural Text' Hugging Face demo). I have the project files and dependencies all imported into Google Colab, but I am unsure which files/functions I should be using in order to get explanation on one single image.

For help

Hello, we are also doing relevant research. Could you please provide the pre-training weights for caption in your paper?

How do I get the results of the VCR dataset stated in the appendix by running the source code

How do I get the results of the VCR dataset stated in the appendix by running the source code. Directly run vcr.py and fine tune with the pre-trained model on the Caption dataset?

VQA_X finetuning

Hello!

I am attempting to finetune the VQA_X model and am running into some confusions about the data required.

I currently have a dataset of images and captions prepared and formatted similar to vqaX_test_annot_full.json and vqaX_test_annot_exp.json with one-to-one image/annotation pairs along with information to the file path of the jpeg file for each image.

Do I also need to prepare an additional set of data formatted similar to vqaX_val.json & vqaX_test.json with answers, explanations, and the image_id and name in order to do finetune training on the model, or am I able to do so only with the dataset mentioned above?

Thanks

Pre-training step

Hi @fawazsammani,

Since the repo provides pretrained models, but not a script for pretraining, I am wondering what split to choose to pretrain on the four datasets mentioned in the paper (i.e., coco captions, flickr30k, VG and image paragraph captioning). I think this is not well described in the paper. Would I need to split the datasets for pre-training or can I pre-train the model on the entire datasets without splitting?

GPU and Train time

Thanks for your excellent work.
For NLE models on different datasets (VQA-X, ACT-X, e-SNLI-VE), how many GPUs are required for the pre training and finetune stages？And how many hours are required for the pretraining and finetune stages?

Problems about reproducing the same VQA-X filtered scores reported in paper

Hi @fawazsammani,

Thanks for your excellent work. I am very appreciated your work! However, I failed to reproduce the same scores on VQA-X datasets, and thus I want to check whether I was on the correct way about the model usage in models section.

Are the models in models section the checkpoints that could have the approximated performance reported on paper? In my experiments, the pretrained vqaX checkpoint model could only achieve 103 on filtered CIDEr. Or, should I use the checkpoints to reproduce? Could you please give me some instructions to reproduce the scores on your reported paper?

'prepare_data' folder and README incomplete

Hi, I was hoping to use this model to explain predictions for a different task than used in the paper which will require me to prepare explanations myself as well. It seems that the prepare_data folder that is referenced in the explain_predict folder is empty though. Will there be instructions coming soon for the specific steps the authors did for this process? Thanks!

question about visual attention map

Hi @fawazsammani.

First of all thank you once again for providing the tutorial for single image usage.

I was playing around with the model and I am curious about one thing.
Together with the textual explanation, we can also have the visual attention explanation with the attention map.
So, I was wondering if this visual explanation is computed considering the entire sentence output (= answer + explanation).
It is possible to divide the visual attention map in due different images: one that focuses on the classification part (answer) and another one that focuses on the explanation part. So in the first one I will have highlighted areas that are important for the answer prediction and in the second one it will be shown only the most important parts for explanations only.

Sorry for my lack of knowledge regarding this problem.
Thank you for your time.
Best wishes.

No such file or directory: 'cococaption/annotations/vqaX_test_annot_exp.json'

Hi @fawazsammani,

Thank you for the sharing the code!
I have a little issue in training the model on VQA-X. The training script complains that file cococaption/annotations/vqaX_test_annot_exp.json is missing. Indeed there is no such a file in that folder (cf.~https://github.com/ruotianluo/coco-caption/tree/ea20010419a955fed9882f9dcc53f2dc1ac65092/annotations). Could you help me with this issue? Thank you.

Below is the full error message:

Evaluation: Finished 1967/1968          loading annotations into memory...
Traceback (most recent call last):
  File "/data/lee/home/Projects/nlxgpt/vqaX.py", line 595, in <module>
    filter_and_get_scores(
  File "/data/lee/home/Projects/nlxgpt/vqaX.py", line 126, in filter_and_get_scores
    coco = COCO(annFileExp)
  File "/data/lee/home/Projects/nlxgpt/cococaption/pycocotools/coco.py", line 76, in __init__
    dataset = json.load(open(annotation_file, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: 'cococaption/annotations/vqaX_test_annot_exp.json'
Traceback (most recent call last):
  File "/export/home/lee/miniconda3/envs/nlx-gpt/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/export/home/lee/miniconda3/envs/nlx-gpt/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/export/home/lee/miniconda3/envs/nlx-gpt/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/export/home/lee/miniconda3/envs/nlx-gpt/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/export/home/lee/miniconda3/envs/nlx-gpt/bin/python3.9', 'vqaX.py']' returned non-zero exit status 1.

fawazsammani / nlxgpt Goto Github PK

nlxgpt's People

Contributors

Stargazers

Watchers

Forkers

nlxgpt's Issues

Recommend Projects

Recommend Topics

Recommend Org