Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi again <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Pre-training step about nlxgpt HOT 8 CLOSED

dschaehi commented on August 11, 2024

Pre-training step

from nlxgpt.

Comments (8)

fawazsammani commented on August 11, 2024

@dschaehi I will provide the splits and pretrain script tonight.
You cannot simply pretrain the model on the whole dataset. Because VQA-X test set is taken from COCO images (and possibly Visual Genome in which 50% of it is COCO), and e-SNLI-VE test set is taken from Flickr30k. Both COCO and Flickr are used for pretraining.

Pretrained VL-models always excluded these test images from the pretraining dataset, because the finetuning uses the same dataset, just in a different way. For example, it is absolutely wrong to pretrain a VL-model with the masked language modelling objective, where the model sees the whole caption (except the masked words which are randomly chosen), and then to later finetune this VL-model on the image captioning task. Because the pretraining step already saw the test caption which the finetuned model should predict. In summary, when the same dataset is used for pretraining and finetuning, regardless of what the task is, the finetuning test dataset should be excluded from the pretraining dataset.

In our case, the pretraining dataset (image captioning) is completely different with the finetuning dataset (Natural Language Explanations), it's just that the images are shared. Whether or not it is fair to use the finetuning test images during pretraining is a debate. But the general thing, is that the test dataset should be something the model has never seen before, and has no idea about. Essentially, allowing the model to understand these finetuning NLE test images through a different way (e.g. image captioning) is distilling knowledge about these images in the pretrained model. Therefore, pretraining with the NLE test images is wrong, and we avoided this.

Hope you are clear now
Regards

from nlxgpt.

dschaehi commented on August 11, 2024

Hi @fawazsammani, thank you again for your answer!

@dschaehi I will provide the splits and pretrain script tonight.

This is great. Thanks!

In our case, the pretraining dataset (image captioning) is completely different with the finetuning dataset (Natural Language Explanations), it's just that the images are shared. Whether or not it is fair to use the finetuning test images during pretraining is a debate.

I find it a bit confusing to follow. If I understand it correctly, only the images from the fine-tuning datasets are shared with the pre-training datasets, which is OK (but debatable) because they are for two different tasks, i.e., image captioning vs NLE?

from nlxgpt.

fawazsammani commented on August 11, 2024

@dschaehi correct, but we avoid this.

Regards

from nlxgpt.

dschaehi commented on August 11, 2024

Hi @fawazsammani, thanks for the clarification so far.
In your first reply to my question, you wanted to provide a script for pre-training in the night of the day you replied. If you haven't uploaded the script yet, would you do this soon? This would be very helpful to reproduce the results and to learn more about the details about the pre-training step. Thanks!

from nlxgpt.

fawazsammani commented on August 11, 2024

Hi again @dschaehi
I'm really sorry, I had forgotten to post it last time.
Im currently on vacation, and unfortunately I do not have my office computer with me....
I will be back on Friday and post it directly.

However, if you require the pretrained model, it is already available in the Models section. I do not see any need for training it again and wasting computational resources if we already did :)

Regards
Fawaz

from nlxgpt.

dschaehi commented on August 11, 2024

Hi @fawazsammani,
Thanks for getting back to this. Please enjoy your vacation first. I am just interested in how such a pre-training works in general as I'd like to come up with a new model as well.
Regardless of this, I think a fully reproducible code should contain all the steps: pre-training, hyper-parameter tuning, fine-turning, random seed, etc.

from nlxgpt.

fawazsammani commented on August 11, 2024

Hello @dschaehi ,
Sorry for the delay again. I have now uploaded the pretrain script. The pretrain annotations are also here. As mentioned in the earlier discussion, we use the "filtered" annotations, with prefix filtered_. The split sizes are also provided and compared in a txt file. I am uploading the unfiltered annotations also so that if you need them for a project different than NLE (a project which does not share images between pretraining and finetuning). Please also note that for e-SNLI-VE we do not use the pretraining model as initialization for the finetuning. So the complete Flickr30k can be included in the pretraining as well.

Feel free to open this issue if you have any other doubts.

Regards
Fawaz

from nlxgpt.

dschaehi commented on August 11, 2024

Great! Thank you very much!

from nlxgpt.

Pre-training step about nlxgpt HOT 8 CLOSED

Comments (8)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent