Code Monkey home page Code Monkey logo

ts2_net_gai's Introduction

TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval

(Oct. 17th, 2022) We release part of our docker file below issue 3 for reimplementation.

Our paper is accepted by ECCV2022. Here is the link of our paper.

This is the PyTorch code of the TS2-NET. The code has been tested on PyTorch 1.7.1.

TS2-Net is a text-video retrieval model based on CLIP. With token shift transformer and token selection transformer, our model achieve SOTA performance on MSR-VTT, VATEX, LSMDC, ActivityNet, and DiDeMo. Our model achieves 54.0 R@1 on MSR-VTT test set.

ts2_net

The token shift operation shift all channels of a token back-and-forth, to preserve the complete token representation and enhance the interaction of adjacent frames. tokenshift

The token selection module selects tokens contributes most to the local spatial semantics, to better model video representation. (Note: In our released code, we manually regard [CLS] token as the most informative, which we find it can have better performance. It means that we select [CLS] token and other K-1 tokens in token selection stage.) tokenselect

Requirements

# From CLIP
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas

Text-Video Retreival Results

For TS2-Net (ViT-B/32), and without inverted softmax

Dataset R@1 R@5 R@10
MSRVTT-1kA 47.0 74.5 83.8
VATEX 59.1 90.0 95.2
LSMDC 23.4 42.3 50.9
DiDeMo 41.8 71.6 82.0
ActivityNet 41.0 73.6 84.5

For TS2-Net (ViT-B/16) and without inverted softmax

Dataset R@1 R@5 R@10
MSRVTT-1kA 49.4 75.6 85.3

With inverted softmax

Dataset Model R@1 R@5 R@10
MSRVTT-1kA TS2-Net (ViT-B/32) 51.1 76.9 85.6
MSRVTT-1kA TS2-Net (ViT-B/16) 54.0 79.3 87.4

Data Preparing

Please refer to ArrowLuo/CLIP4Clip to get data annotation. For vatex, please refer to https://eric-xw.github.io/vatex-website/download.html and dataloader_vatex_retrieval.py to prepare VATEX dataset, and get the split file from https://github.com/cshizhe/hgr_v2t.

Usage

  • For msrvtt,
sh scripts/run_msrvtt.sh
  • For vatex,
sh scripts/run_vatex.sh
  • Change DATA_PATH to your own data path
  • You can change --pretrained_clip_name to ViT-B/16

Citation

If you find this code to be useful for your research, please consider citing.

@inproceedings{liu2022ts2net,
      title={TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval}, 
      author={Yuqi Liu and Pengfei Xiong and Luhui Xu and Shengming Cao and Qin Jin},
      year={2022},
      booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
}

@article{liu2022ts2net,
  title={TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval},
  author={Yuqi Liu and Pengfei Xiong and Luhui Xu and Shengming Cao and Qin Jin},
  year={2022},
  journal={arXiv preprint arxiv:2207.07852},
}

Acknowledge

ts2_net_gai's People

Contributors

liuricky avatar chenqi008 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.