Code Monkey home page Code Monkey logo

Comments (8)

fawazsammani avatar fawazsammani commented on August 11, 2024

@dschaehi I will provide the splits and pretrain script tonight.
You cannot simply pretrain the model on the whole dataset. Because VQA-X test set is taken from COCO images (and possibly Visual Genome in which 50% of it is COCO), and e-SNLI-VE test set is taken from Flickr30k. Both COCO and Flickr are used for pretraining.

Pretrained VL-models always excluded these test images from the pretraining dataset, because the finetuning uses the same dataset, just in a different way. For example, it is absolutely wrong to pretrain a VL-model with the masked language modelling objective, where the model sees the whole caption (except the masked words which are randomly chosen), and then to later finetune this VL-model on the image captioning task. Because the pretraining step already saw the test caption which the finetuned model should predict. In summary, when the same dataset is used for pretraining and finetuning, regardless of what the task is, the finetuning test dataset should be excluded from the pretraining dataset.

In our case, the pretraining dataset (image captioning) is completely different with the finetuning dataset (Natural Language Explanations), it's just that the images are shared. Whether or not it is fair to use the finetuning test images during pretraining is a debate. But the general thing, is that the test dataset should be something the model has never seen before, and has no idea about. Essentially, allowing the model to understand these finetuning NLE test images through a different way (e.g. image captioning) is distilling knowledge about these images in the pretrained model. Therefore, pretraining with the NLE test images is wrong, and we avoided this.

Hope you are clear now
Regards

from nlxgpt.

dschaehi avatar dschaehi commented on August 11, 2024

Hi @fawazsammani, thank you again for your answer!

@dschaehi I will provide the splits and pretrain script tonight.

This is great. Thanks!

In our case, the pretraining dataset (image captioning) is completely different with the finetuning dataset (Natural Language Explanations), it's just that the images are shared. Whether or not it is fair to use the finetuning test images during pretraining is a debate.

I find it a bit confusing to follow. If I understand it correctly, only the images from the fine-tuning datasets are shared with the pre-training datasets, which is OK (but debatable) because they are for two different tasks, i.e., image captioning vs NLE?

from nlxgpt.

fawazsammani avatar fawazsammani commented on August 11, 2024

@dschaehi correct, but we avoid this.

Regards

from nlxgpt.

dschaehi avatar dschaehi commented on August 11, 2024

Hi @fawazsammani, thanks for the clarification so far.
In your first reply to my question, you wanted to provide a script for pre-training in the night of the day you replied. If you haven't uploaded the script yet, would you do this soon? This would be very helpful to reproduce the results and to learn more about the details about the pre-training step. Thanks!

from nlxgpt.

fawazsammani avatar fawazsammani commented on August 11, 2024

Hi again @dschaehi
I'm really sorry, I had forgotten to post it last time.
Im currently on vacation, and unfortunately I do not have my office computer with me....
I will be back on Friday and post it directly.

However, if you require the pretrained model, it is already available in the Models section. I do not see any need for training it again and wasting computational resources if we already did :)

Regards
Fawaz

from nlxgpt.

dschaehi avatar dschaehi commented on August 11, 2024

Hi @fawazsammani,
Thanks for getting back to this. Please enjoy your vacation first. I am just interested in how such a pre-training works in general as I'd like to come up with a new model as well.
Regardless of this, I think a fully reproducible code should contain all the steps: pre-training, hyper-parameter tuning, fine-turning, random seed, etc.

from nlxgpt.

fawazsammani avatar fawazsammani commented on August 11, 2024

Hello @dschaehi ,
Sorry for the delay again. I have now uploaded the pretrain script. The pretrain annotations are also here. As mentioned in the earlier discussion, we use the "filtered" annotations, with prefix filtered_. The split sizes are also provided and compared in a txt file. I am uploading the unfiltered annotations also so that if you need them for a project different than NLE (a project which does not share images between pretraining and finetuning). Please also note that for e-SNLI-VE we do not use the pretraining model as initialization for the finetuning. So the complete Flickr30k can be included in the pretraining as well.

Feel free to open this issue if you have any other doubts.

Regards
Fawaz

from nlxgpt.

dschaehi avatar dschaehi commented on August 11, 2024

Great! Thank you very much!

from nlxgpt.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.