Code Monkey home page Code Monkey logo

x-clip's People

Contributors

xmu-xiaoma666 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

x-clip's Issues

The use of transpose in sentence-frame score

sentence_frame_logits = logit_scale * torch.sum(torch.matmul(sentence_output, frame_features.permute(0, 2, 1)) \

Hi, thank you for the wonderful job!

I suppose the use of .t() in sentence-frame score , it seems that this transpose make the original [bs_text, bs_video] to [bs_video, bs_text], which make this score inconsistent with other scores. I am wondering whether my understanding is correct.

Thanks! Hope to discuss with you!

SeqTransf & meanP

Dear Author,

I really am appreciated and fascinated by your work, and feel thankful of releasing your code.

I know that CLIP4clip + meanP have all the best performance among CLIP4Clip + seqTranf, seqLSTM, and tightTransf,

But I found that in your script, always seqTransf are recommended in sh files.

Is that any special reason that why "sim_header == seqTransf" is default setting?

I had looked your Table 2 on MSVD, your model recorded X-CLIP(ViT-B/32) R@1 scores 47.1 .
Is it mean that when X-Clip with seqTransf is the best than any other mode -meanP, tightTransf- ?
I cannot find that what kind of sim_header retrieved that scores in that table.

If X-CLIP + seqtrasnf is recommended anyway,
any special reason why seqTrasnf outperforms than meanP, unlike Clip4Clip did?

Sincerely,

Question about ablation study of the different contrastive modules in Tab. 6

Hi, thanks for your nice and open-sourced and job.

I have some questions about the experimental setup in Tab. 6.

  • Are all the experimental results obtained through retraining, or are the Exp1-14's experimental results obtained only by inferring on Exp15's checkpoint?

According to my experimental results, there is an obvious decrease if only infer on Exp15's checkpoint. So, I guess all the experimental results were obtained through retraining. Hope to get your confirmation.

Poor performance when reproduce model on ActivityNet.

Due to the huge size of original dataset, I extracted images from the original videos with FPS=1, and trained the CLIP4clip(meanP) on 8 RTX3090. Due to the GPU memory constrain, I set the gradient_accumulation_steps=2.

The caption is downloaded from https://cs.stanford.edu/people/ranjaykrishna/densevid/.

I first try to reproduce the results of CLIP4clip(meanP / ViT-B/32) on ActivityNet and get R@1=37.9 which is much worse than 40.5 reported in Table 5.

Do authors have any useful experience on this issue? Thanks very much!

Finetuned model weights?

Hello! I was wondering if you have the final model weights after you finished training the model. I know you initialize with the CLIP weights, but it would be super helpful to have the final model weights as well. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.