Code Monkey home page Code Monkey logo

mvd's Introduction

Masked Video Distillation (CVPR 2023)

PWC
PWC PWC PWC

Official PyTorch implementation of "Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning".

MVD Framework

News

[2023.5.21] Pretrained models have been released in MODEL_ZOO.md.

[2023.4.9] Code of MVD is available now!

[2023.2.28] MVD is accepted by CVPR 2023.

Main Results

Something-Something V2

Method Pretrain Video Data Backbone Teacher Epoch Top-1 Top-5 resolution #Frames x Clips x Crops Param
MVD Kinetics-400 ViT-S ViT-B 400 70.7 92.6 224 16x2x3 22M
MVD Kinetics-400 ViT-S ViT-L 400 70.9 92.8 224 16x2x3 22M
MVD Kinetics-400 ViT-B ViT-B 400 72.5 93.6 224 16x2x3 87M
MVD Kinetics-400 ViT-B ViT-L 400 73.7 94.0 224 16x2x3 87M
MVD Kinetics-400 ViT-L ViT-L 400 76.1 95.4 224 16x2x3 305M
MVD Kinetics-400 ViT-L ViT-L 800 76.7 95.5 224 16x2x3 305M
MVD Kinetics-400 ViT-H ViT-H 800 77.3 95.7 224 16x2x3 633M

Kinetics-400

Method Pretrain Video Data Backbone Teacher Epoch Top-1 Top-5 resolution #Frames x Clips x Crops Param
MVD Kinetics-400 ViT-S ViT-B 400 80.6 94.7 224 16x5x3 22M
MVD Kinetics-400 ViT-S ViT-L 400 81.0 94.8 224 16x5x3 22M
MVD Kinetics-400 ViT-B ViT-B 400 82.7 95.4 224 16x5x3 87M
MVD Kinetics-400 ViT-B ViT-L 400 83.4 95.8 224 16x5x3 87M
MVD Kinetics-400 ViT-L ViT-L 400 86.0 96.9 224 16x5x3 305M
MVD Kinetics-400 ViT-L ViT-L 800 86.4 97.0 224 16x5x3 305M
MVD Kinetics-400 ViT-H ViT-H 800 87.3 97.4 224 16x5x3 633M

AVA v2.2

Method Pretrain Video Data Extra Label Backbone Teacher Epoch mAP #Frames x Sample Rate Param
MVD Kinetics-400 ViT-B ViT-B 400 29.3 16x4 87M
MVD Kinetics-400 ViT-B ViT-B 400 33.6 16x4 87M
MVD Kinetics-400 ViT-B ViT-L 400 31.1 16x4 87M
MVD Kinetics-400 ViT-B ViT-L 400 34.2 16x4 87M
MVD Kinetics-400 ViT-L ViT-L 800 37.7 16x4 305M
MVD Kinetics-400 ViT-L ViT-L 800 38.7 16x4 305M
MVD Kinetics-400 ViT-H ViT-H 800 40.1 16x4 633M
MVD Kinetics-400 ViT-H ViT-H 800 41.1 16x4 633M

UCF101 & HMDB51

Method Pretrain Video Data Backbone Teacher Epoch UCF101 Top-1 HMDB51 Top-1
MVD Kinetics-400 ViT-B ViT-B 400 97.0 76.4
MVD Kinetics-400 ViT-B ViT-L 400 97.5 79.7

Installation

Please follow the instructions in INSTALL.md.

Data Preparation

Please follow the instructions in DATASET.md for data preparation.

Pre-training

The pre-training instruction is in PRETRAIN.md.

Fine-tuning with pre-trained models

The fine-tuning instruction is in FINETUNE.md.

Model Zoo

We provide pre-trained models in MODEL_ZOO.md.

Acknowledgements

This project is built upon MAE and VideoMAE. Thanks to the contributors of these great codebases.

Citation

If this work is helpful for your research, please consider citing MVD.

@inproceedings{wang2022masked,
  title={Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning},
  author={Wang, Rui and Chen, Dongdong and Wu, Zuxuan and Chen, Yinpeng and Dai, Xiyang and Liu, Mengchen and Yuan, Lu and Jiang, Yu-Gang},
  booktitle={CVPR},
  year={2023}
}

mvd's People

Contributors

xyzforever avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mvd's Issues

Pre-trained models

Thank you for making the code publicly available. When will pre-trained models be released?

Inference on our own videos

Hi there,

I was just curious if you had any scripts that would allow us to run inference on our own videos, if so, could you please provide it?
If not, can you advice us on how to go about it?

Thanks!

关于论文中SSv2数据集汇报精度的问题

作者你好,我注意到你在DATASET.md中提到SSv2使用的测试和验证集是相同的,请问下你论文中的精度的测试方式是和VideoMAE中保持一致的吗?但是VideoMAE作者在其repo中提供的test和val split实际上都是SSv2官方提供的验证集(一共24777个样本),官方测试集则是另外一个有着27157个样本的版本。所以我现在有些困惑,实际论文中MVD,VideoMAE以及之前的方法使用的测试集是一致的吗?是都采用的SSv2官方验证集还是测试集呢?期待你的回复,谢谢!

关于SSV2预训练模型

作者你好,
我这里想使用一下你在SSV2数据集上的预训练模型,请问是否方便提供一下?

how to set distribute parameters to run on 1 node with 2 GPUs

Could you provide an example of how to set distribute parameters to run on 1 node with 2 GPUs:
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=${GPUS}
--master_port ${MASTER_PORT} --nnodes=${NODE_COUNT}
--node_rank=${RANK} --master_addr=${MASTER_ADDR} \

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.