lucas-ventura / covr Goto Github PK

View Code? Open in Web Editor NEW

78.0 1.0 7.0 12.45 MB

Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".

Home Page: https://imagine.enpc.fr/~ventural/covr/

License: MIT License

Python 96.19% Shell 3.81%

composed-image-retrieval composed-video-retrieval

covr's People

Contributors

Stargazers

Watchers

Forkers

poo-09 deep-unlearning kentalmd chenttyy agarciahunter chikap421 shivank21

covr's Issues

Discrepancy in CIRR test set

Hi,

I checked this #15 updated results but cannot reproduce the similar numbers on the CIRR test set although my validation set number is very similar.

I used this evaluation script and submit the files to the server:

python test.py test=cirr model/ckpt=cirr_ft-covr+gt

My numbers is

Method	R@1	R@2	R@5	R@10	R@50	Recall_subset @ 1	Recall_subset @ 2	Recall_subset @ 3
Zero-shot	27.566	38.265	52.506	63.494	85.976	71.277	86.217	94.048
Train	38.241	51.205	67.518	78	94.217	77.277	91.036	96.337

My number on the validation set

Method	R@1	R@5	R@10	R@50	Recall_subset @ 1	Recall_subset @ 2	Recall_subset @ 3
Zero-shot	29.06	55.2	66.13	87.54	72.73	87.39	93.51
Train	41.55	70.08	79.91	94.16	78.51	91.75	95.93

Could you provide your json files for CIRR test?

Wget issue.

Don't know if this is an issue on my end but wget is having this problem when I run it:

bash tools/scripts/download_pretrained_models.sh
Select the model to download:
1) All
2) WebVid-CoVR
3) CIRR
4) FashionIQ
Press Enter for default (All)
Enter your choice (1/2/3/4):
The ckpt_4.ckpt checkpoint already exists in outputs/webvid-covr/blip-large/blip-l-coco/tv-False_loss-hnnce_lr-1e-05/good/.
Do you want to overwrite? [y/N]: y
Downloading ckpt_4.ckpt checkpoint...
wget: unrecognized option '--show-progress'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
Download failed.
The ckpt_5.ckpt checkpoint already exists in outputs/cirr/blip-large/webvid-covr/tv-False_loss-hnnce_lr-0.0001/base/.
Do you want to overwrite? [y/N]: y
Downloading ckpt_5.ckpt checkpoint...
wget: unrecognized option '--show-progress'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
Download failed.
The ckpt_5.ckpt checkpoint already exists in outputs/fashioniq-all/blip-large/webvid-covr/tv-False_loss-hnnce_lr-0.0001/base.
Do you want to overwrite? [y/N]: y
Downloading ckpt_5.ckpt checkpoint...
wget: unrecognized option '--show-progress'
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
Download failed.

Don't know if this a problem on my end. Getting rid of -show-progress did fix it though.

Question about the Increase from 1.2M Paired Videos to 1.6M Triplets

Thank you for sharing your research results.
I have a question related to the data generation process.

According to the paper, after going through the "Filtering caption pairs" step, 1.2M paired videos remained, and modifications were created using them. Subsequently, after filtering the video pairs, a total of 1.6M triplets were produced.

The count has increased by 0.4M compared to the paired videos. Could you explain how this happened?
My guess is that the pairs were used bidirectionally to create triplets (1.2M → 2.4M), and then decreased after filtering (2.4M → 1.6M).

Your clarification on this would be greatly appreciated!

please provide feature file

Hi,
thanks for your great work!
But this dataset is too large to download, can you provide the blip feature file?
Any reply will be helpful!

Could you add a requirements file?

It would help speed some setup stuff up if you could.

Question about the test of CIRR

When calculating the similarity between a query and images in CIRR, it seems that the entire image database only includes the reference image mentioned in the triplets, rather than considering all the images in the entire test set. This seems to be somewhat problematic.

your paper table4 issue

abviously, the 40.20 is bigger than 39.05

readme file error

to download the annotation files,
the file says to "bash tools/scripts/download_annotations.sh covr"
however, the file name is download_annotation.sh
so, it should be bash tools/scripts/download_annotation.sh covr
where there is no s in the end

FashionIQ fine-tuned weights

Hi, Thanks for your great work.

Are you planning on releasing fine-tuned checkpoints for FashionIQ?

Does batch size have a significant impact on performance?

hello.

I was so impressed with the amazing CoVR task and results proposed by the authors that I tried to reimplement the code.

The code worked well, but the results are quite lacking compared to the highlighted part of Table 2 below.

My results on Webvid dataset:

{'R1': 39.7868, 'R5': 64.722, 'R10': 74.2869, 'R50': 91.4578, 'R_mean': 59.5986} after 5 epochs.

Authors results:

I know that batch size affectcontrastive learning (nce,hn-nce...etc) , but I didn't expect this much difference, have you ever checked the difference in recall scores based on batch size? You mentioned batch size 2048 when you wrote a paper.

I ask because due to the limited GPUs available to me (MAX 48GB VLAM - A SINGLE A6000) I set the training batch to 48, which seems to have caused a big performance drop.

Again, thanks for the great research and I look forward to your response.

[Question] Double checking but the Images for toptee, shirt, and dress shouldn't be separate right?

          Double checking but the Images for toptee, shirt, and dress shouldn't be separate right?

Also if some of the links don't work will that cause problems or will the data just be less accurate? (Trying to get it to run first before trying to worry about accuracy).

Originally posted by @Agarciahunter in #13 (comment)

The inquiry about test CIRR dataset

Hi, thank you for this wonderful work.

I appreciate you providing the code.

I wonder how to calculate the recall performance of CIRR

I ran the code, but I think the code saves the image list of top-50 similarities, doesn't calculate the recall performance

So I checked the annotations of CIRR test-1, However, there is no label of the target image, only 'members'.

How do I calculate the recall performance of CIRR?

It would be beneficial to me

The inqury about test target video embedding vector

Hi! I'm Cheol-Ho Cho.

I have a question about your work, especially on the test target video embedding vector

I read in your paper that the test target video embedding vector is computed by weighted mean and it is helpful to boost performance.

However, your implemented code does not seem to be computed weighted mean.

Could you explain it more specifically?

Best regard, Cheol-Ho

readme issues

I said it is "bash tools/scripts/download_annotations.sh cirr" and "bash tools/scripts/download_annotations.sh fiq" that hve an additional 's' at the end of the word.

fashionIQ datasets class issue

in the configs/test/fashioniq-dress(shirt or toptee).yaml
there is
targets: ${paths.work_dir}/annotation/fashion-iq/split.toptee.val.json

however the split.toptee.val.json file doesn't exist

can i get a split.toptee.val.json from you?

Running Cost of this repo (memory/GPU/time)

Thanks for your excellent open-source job!
How much time, GPU number, and memory should be used to reproduce the training process? (like 4*Nvidia A100 40G 48h), Thanks!

What is the search space for a query in test set??

Hi,
I am slightly confused about the test set's search space. Given a query (Image/video and text) from the test set, what is the search space for this query? I am assuming that we are searching over all the possible target videos in the test set. I'd appreciate it if you could confirm this.
Thanks

Dataset download paths not working

Great work! I see that the urls path is not working http://imagine.enpc.fr/~ventural/covr/dataset/webvid8m-covr_paths-test.json. Can you please update that?

Question about training multiple frames and fusion mechanism

Hi, thank you for your great work.

I have run the code and noticed that the given code only provides for single-frame training and missing the fusion mechanism (MLP, CA). It would be wonderful if you could provide the full version of those functions.

Thank you for this wonderful work.