Comments (8)
@dschaehi I will provide the splits and pretrain script tonight.
You cannot simply pretrain the model on the whole dataset. Because VQA-X test set is taken from COCO images (and possibly Visual Genome in which 50% of it is COCO), and e-SNLI-VE test set is taken from Flickr30k. Both COCO and Flickr are used for pretraining.
Pretrained VL-models always excluded these test images from the pretraining dataset, because the finetuning uses the same dataset, just in a different way. For example, it is absolutely wrong to pretrain a VL-model with the masked language modelling objective, where the model sees the whole caption (except the masked words which are randomly chosen), and then to later finetune this VL-model on the image captioning task. Because the pretraining step already saw the test caption which the finetuned model should predict. In summary, when the same dataset is used for pretraining and finetuning, regardless of what the task is, the finetuning test dataset should be excluded from the pretraining dataset.
In our case, the pretraining dataset (image captioning) is completely different with the finetuning dataset (Natural Language Explanations), it's just that the images are shared. Whether or not it is fair to use the finetuning test images during pretraining is a debate. But the general thing, is that the test dataset should be something the model has never seen before, and has no idea about. Essentially, allowing the model to understand these finetuning NLE test images through a different way (e.g. image captioning) is distilling knowledge about these images in the pretrained model. Therefore, pretraining with the NLE test images is wrong, and we avoided this.
Hope you are clear now
Regards
from nlxgpt.
Hi @fawazsammani, thank you again for your answer!
@dschaehi I will provide the splits and pretrain script tonight.
This is great. Thanks!
In our case, the pretraining dataset (image captioning) is completely different with the finetuning dataset (Natural Language Explanations), it's just that the images are shared. Whether or not it is fair to use the finetuning test images during pretraining is a debate.
I find it a bit confusing to follow. If I understand it correctly, only the images from the fine-tuning datasets are shared with the pre-training datasets, which is OK (but debatable) because they are for two different tasks, i.e., image captioning vs NLE?
from nlxgpt.
@dschaehi correct, but we avoid this.
Regards
from nlxgpt.
Hi @fawazsammani, thanks for the clarification so far.
In your first reply to my question, you wanted to provide a script for pre-training in the night of the day you replied. If you haven't uploaded the script yet, would you do this soon? This would be very helpful to reproduce the results and to learn more about the details about the pre-training step. Thanks!
from nlxgpt.
Hi again @dschaehi
I'm really sorry, I had forgotten to post it last time.
Im currently on vacation, and unfortunately I do not have my office computer with me....
I will be back on Friday and post it directly.
However, if you require the pretrained model, it is already available in the Models section. I do not see any need for training it again and wasting computational resources if we already did :)
Regards
Fawaz
from nlxgpt.
Hi @fawazsammani,
Thanks for getting back to this. Please enjoy your vacation first. I am just interested in how such a pre-training works in general as I'd like to come up with a new model as well.
Regardless of this, I think a fully reproducible code should contain all the steps: pre-training, hyper-parameter tuning, fine-turning, random seed, etc.
from nlxgpt.
Hello @dschaehi ,
Sorry for the delay again. I have now uploaded the pretrain script. The pretrain annotations are also here. As mentioned in the earlier discussion, we use the "filtered" annotations, with prefix filtered_
. The split sizes are also provided and compared in a txt
file. I am uploading the unfiltered annotations also so that if you need them for a project different than NLE (a project which does not share images between pretraining and finetuning). Please also note that for e-SNLI-VE we do not use the pretraining model as initialization for the finetuning. So the complete Flickr30k can be included in the pretraining as well.
Feel free to open this issue if you have any other doubts.
Regards
Fawaz
from nlxgpt.
Great! Thank you very much!
from nlxgpt.
Related Issues (13)
- How to run pretrained model VQA-X with single image in Google Colab HOT 8
- question about visual attention map HOT 8
- VQA_X finetuning HOT 2
- For help HOT 1
- No such file or directory: 'cococaption/annotations/vqaX_test_annot_exp.json' HOT 6
- Problems about reproducing the same VQA-X filtered scores reported in paper HOT 3
- 'prepare_data' folder and README incomplete HOT 3
- How do I get the results of the VCR dataset stated in the appendix by running the source code
- Google driver link is invalid HOT 3
- GPU and Train time HOT 1
- best checkpoint HOT 1
- the link is broken HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nlxgpt.