mqvr's Introduction

Multi-query Video Retrieval

This repository contains the code for the paper:

@misc{wang2022multiquery,
      title={Multi-query Video Retrieval}, 
      author={Zeyu Wang and Yu Wu and Karthik Narasimhan and Olga Russakovsky},
      year={2022},
      eprint={2201.03639},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Data Preparation

Download raw videos for MSR-VTT, MSVD and VATEX, and put them into data/{dataset}/raw_videos folder.
Run the script data/extract_frames.sh to extract frames from raw videos.

The resulting data folder structures like this:

├── data
    ├── msrvtt
        ├── msrvtt_train.json
        ├── msrvtt_test.json
        ├── msrvtt_test_varying_query_sample_1-20.json
        ├── raw_videos
            ├── video0.mp4
            ├── ...
        ├── extracted_frames
            ├── video0.mp4
                ├── 0.jpg
                ├── ...
            ├── ...
    ├── msvd
        ├── ...
    ├── vatex
        ├── ...

For Frozen model, download the pretrained checkpoint provided by the original authors here, and put into record/pretrained folder.

Training

Run command: python train.py -c configs/{config_path}

Evaluation

Run command: python evaluate.py -c configs/{config_path}

Acknowledgements

The structure of this repository is based on https://github.com/victoresque/pytorch-template. Some of the code are adpated from https://github.com/m-bain/frozen-in-time and https://github.com/ArrowLuo/CLIP4Clip.

mqvr's People

Contributors

Stargazers

Watchers

mqvr's Issues

Clip4clip Performance on ViT B/16

Hi,

Since the reported results of Clip4clip are based on ViT B/32, I wonder if you have tested the model's performance using ViT B/16? For my reproduced version, the performance on MSR-VTT achieves a significant boost (59.6, 72.7, 78.8 R@1 on RA, SA, and MF, respectively). However, the clip4clip model performance using ViT B/16 on MSVD is slightly lower than the reported results using ViT B/32, which is 48.2, 44.7, and 58.9 R@1 on RA, SA, and MF. Therefore, if you've tested the ViT B/16 performance, maybe I could directly use your reported results as baselines for comparison in case my reproducing process might be mistaken.

Thanks!

Recommend Projects

princetonvisualai / mqvr Goto Github PK

mqvr's Introduction

Multi-query Video Retrieval

Data Preparation

Training

Evaluation

Acknowledgements

mqvr's People

Contributors

Stargazers

Watchers

Forkers

mqvr's Issues

Clip4clip Performance on ViT B/16

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent