fawazsammani / nlxgpt Goto Github PK
View Code? Open in Web Editor NEWNLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)
Hi
The "annotations" link is broken.
I can't open this link
(404. That’s an error.
The requested URL was not found on this server. That’s all we know.)
what should I do?
Thank you for the response to question 1. Do you mean 2-3h or 6h is the time for training 30 epochs?
And how to determine which epoch achieves the optimal model? Is it the epoch where a certain metric (such as B-4) achieves the best value on the validation set? Or something else?
Google driver link is invalid
Hi!
I am wanting to run the pretrained VQAX_p model (the same one that is used in the 'Explanations with Natural Text' Hugging Face demo). I have the project files and dependencies all imported into Google Colab, but I am unsure which files/functions I should be using in order to get explanation on one single image.
How do I get the results of the VCR dataset stated in the appendix by running the source code. Directly run vcr.py and fine tune with the pre-trained model on the Caption dataset?
Hello!
I am attempting to finetune the VQA_X model and am running into some confusions about the data required.
I currently have a dataset of images and captions prepared and formatted similar to vqaX_test_annot_full.json and vqaX_test_annot_exp.json with one-to-one image/annotation pairs along with information to the file path of the jpeg file for each image.
Do I also need to prepare an additional set of data formatted similar to vqaX_val.json & vqaX_test.json with answers, explanations, and the image_id and name in order to do finetune training on the model, or am I able to do so only with the dataset mentioned above?
Thanks
Hi @fawazsammani,
Since the repo provides pretrained models, but not a script for pretraining, I am wondering what split to choose to pretrain on the four datasets mentioned in the paper (i.e., coco captions, flickr30k, VG and image paragraph captioning). I think this is not well described in the paper. Would I need to split the datasets for pre-training or can I pre-train the model on the entire datasets without splitting?
Thanks for your excellent work.
For NLE models on different datasets (VQA-X, ACT-X, e-SNLI-VE), how many GPUs are required for the pre training and finetune stages?And how many hours are required for the pretraining and finetune stages?
Hi @fawazsammani,
Thanks for your excellent work. I am very appreciated your work! However, I failed to reproduce the same scores on VQA-X datasets, and thus I want to check whether I was on the correct way about the model usage in models section.
Are the models in models section the checkpoints that could have the approximated performance reported on paper? In my experiments, the pretrained vqaX checkpoint model could only achieve 103 on filtered CIDEr. Or, should I use the checkpoints to reproduce? Could you please give me some instructions to reproduce the scores on your reported paper?
Hi, I was hoping to use this model to explain predictions for a different task than used in the paper which will require me to prepare explanations myself as well. It seems that the prepare_data folder that is referenced in the explain_predict folder is empty though. Will there be instructions coming soon for the specific steps the authors did for this process? Thanks!
Hi @fawazsammani.
First of all thank you once again for providing the tutorial for single image usage.
I was playing around with the model and I am curious about one thing.
Together with the textual explanation, we can also have the visual attention explanation with the attention map.
So, I was wondering if this visual explanation is computed considering the entire sentence output (= answer + explanation).
It is possible to divide the visual attention map in due different images: one that focuses on the classification part (answer) and another one that focuses on the explanation part. So in the first one I will have highlighted areas that are important for the answer prediction and in the second one it will be shown only the most important parts for explanations only.
Sorry for my lack of knowledge regarding this problem.
Thank you for your time.
Best wishes.
Hi @fawazsammani,
Thank you for the sharing the code!
I have a little issue in training the model on VQA-X. The training script complains that file cococaption/annotations/vqaX_test_annot_exp.json
is missing. Indeed there is no such a file in that folder (cf.~https://github.com/ruotianluo/coco-caption/tree/ea20010419a955fed9882f9dcc53f2dc1ac65092/annotations). Could you help me with this issue? Thank you.
Below is the full error message:
Evaluation: Finished 1967/1968 loading annotations into memory...
Traceback (most recent call last):
File "/data/lee/home/Projects/nlxgpt/vqaX.py", line 595, in <module>
filter_and_get_scores(
File "/data/lee/home/Projects/nlxgpt/vqaX.py", line 126, in filter_and_get_scores
coco = COCO(annFileExp)
File "/data/lee/home/Projects/nlxgpt/cococaption/pycocotools/coco.py", line 76, in __init__
dataset = json.load(open(annotation_file, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: 'cococaption/annotations/vqaX_test_annot_exp.json'
Traceback (most recent call last):
File "/export/home/lee/miniconda3/envs/nlx-gpt/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/export/home/lee/miniconda3/envs/nlx-gpt/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/export/home/lee/miniconda3/envs/nlx-gpt/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/export/home/lee/miniconda3/envs/nlx-gpt/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/export/home/lee/miniconda3/envs/nlx-gpt/bin/python3.9', 'vqaX.py']' returned non-zero exit status 1.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.