When I train only transformer mapping network,I found that the dimension of x is(40 ,

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Prediction with transformer is available in <a href="https://github.com/rmokady/CLIP_p

A question about the x dimension. about clip_prefix_caption HOT 7 CLOSED

rmokady commented on July 24, 2024

A question about the x dimension.

from clip_prefix_caption.

Comments (7)

rmokady commented on July 24, 2024

Hi @tianjunyu0871 ,
There is two version of CLIP (Resnet and VIT)
Their encoding size is different - 500 and 640
I assume this is your issue

It should be solvable using different command line arguments

Is it helpful?

from clip_prefix_caption.

tianjunyu0871 commented on July 24, 2024

Thanked your reply.
Does the parameter is_rn represent resnet?But the following command appears is_rn?Is it a clerical error?

In addition, can you share the pre-training weights of MLP and the program evaluation code? Thank you so much!!

from clip_prefix_caption.

rmokady commented on July 24, 2024

Yes this is an error
Thank you very much for pointing it out
I will fix it ASAP

We use the evaluation code as used in the OSCAR repository
Just replacing the JSON files with our JSONs

We already shared the weights of MLP - see "Inference Notebooks" section in the readme.

from clip_prefix_caption.

tianjunyu0871 commented on July 24, 2024

I tried to modify the prediction code and the following error occurred while loading the pre-trained Transformer data.

I don't know if there is a problem with my code. Can you share your code for forecasting with Transformer? Thank you very much!

from clip_prefix_caption.

rmokady commented on July 24, 2024

Prediction with transformer is available in this notebook

from clip_prefix_caption.

tianjunyu0871 commented on July 24, 2024

I have gained a lot from your work, but I still have a few questions, and I hope to get your answers.
First question: I tried to remove the stoptoken, but the effect is not good, is there a good way to generate more than one sentence?
Second question: Have you tried using different GPT models? Such as GPT2-medium or GPT2-large . Is the difference significant?
Third question: what does the prefix_length_clip parameter mean in training?
Looking forward to your reply, thank you very much!

from clip_prefix_caption.

rmokady commented on July 24, 2024

To generate more than one sentence you should replace the inference algorithm (e.g. beam search)
Using a variants of beam search you can produce different captions.

We haven't tried to use different GPT models.

prefix_length_clip control the transformer mapping network - size (in tokens) of the clip embedding, as some of the prefix is a learned const.

from clip_prefix_caption.

Recommend Projects

A question about the x dimension. about clip_prefix_caption HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent