Code Monkey home page Code Monkey logo

video_language_model's Introduction

Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language

An pytorch implementation of paper Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language.

video_language

We study how to transfer knowledge from image-language model to video-language tasks. And our model is based on BLIP. We have implemented several components proposed by recent works and details are shown on models/vit.py (e.g. TokenMixBlock, STAdapter, etc).

Suggestion: More attempts can be done by jointly using two or more modules (e.g. temp trans + token mix). I have tried some combination and it does gain.

An overview of different parameter-efficient tuning methods on video-language tasks. We compare our method with four partial fine-tuning methods including Dual-channel Attention (Hong et al. 2022), BitFit (Zaken, Ravfogel, and Goldberg 2021), ST-Adapter (Pan et al. 2022) and Adapter (Houlsby et al. 2019), Temporal Fine-tuning and a fully finetuning method ViViT (Arnab et al. 2021).

Usage

Preprocessing, get video frames using ffmpeg

change ffmpeg_frame.py, set the true video_path(input) and frames_path(output), and run it.

Edit config for specific task

change xxx.yaml, set true pre-trained model path and video path

Video-Text Captioning: If there are some errors in evaluation, you may need

sudo chmod -R 777 [path to pycocoevalcap package]

python -m torch.distributed.run --nproc_per_node=8 train_video_caption.py --config ./configs/caption_msvd.yaml --output_dir output/caption_msvd

Text-video retrieval

python -m torch.distributed.run --nproc_per_node=8 train_video_retrieval.py --config ./configs/retrieval_msrvtt.yaml --output_dir output/retrieval_msrvtt --evaluate

Video-QA

python -m torch.distributed.run --nproc_per_node=8 train_video_vqa.py --config ./configs/videoqa_msrvtt.yaml --output_dir output/videoqa_msrvtt

Citation

If you find this code to be useful for your research, please consider citing.

@inproceedings{liu2022tokenmix,
      title={Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language}, 
      author={Yuqi Liu and Luhui Xu and Pengfei Xiong and Qin Jin},
      year={2023},
      booktitle={Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI)},
}

video_language_model's People

Contributors

liuricky avatar

Stargazers

 avatar Huy Lê avatar Shawn J. avatar  avatar Xinhao Li avatar Jasper Huang avatar  avatar  avatar

Watchers

 avatar

video_language_model's Issues

Token Mixing 的层数

您好,很荣幸拜读您的文章。文中提到因为尽可能少的计算所以只在vit的最后一层用了token mixing,请问您做过延伸到更多层数的实验吗?谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.