Code Monkey home page Code Monkey logo

atm's Introduction

【ICCV'2023】What Can Simple Arithmetic Operations Do for Temporal Modeling?

Conference Paper

Wenhao Wu1,2, Yuxin Song2, Zhun Sun2, Jingdong Wang3, Chang Xu1, Wanli Ouyang3,1

1The University of Sydney, 2Baidu, 3Shanghai AI Lab


PWC PWC PWC

This is the official implementation of our ATM (Arithmetic Temporal Module), which explores the potential of four simple arithmetic operations for temporal modeling.

Our best model can achieve 89.4% Top-1 Acc. on Kinetics-400, 65.6% Top-1 Acc. on Something-Something V1, 74.6% Top-1 Acc. on Something-Something V2!

🔥 I also have other recent video recognition projects that may interest you ✨.

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao, Wenhao Wu, Zhiheng Li
[Side4Video Code]

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
Accepted by CVPR 2023 | [BIKE Code]

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu, Zhun Sun, Wanli Ouyang
Accepted by AAAI 2023 & IJCV 2023 | [Text4Vis Code]

📣 News

  • Nov 29, 2023: Training codes have be released.
  • July 14, 2023: 🎉Our ATM has been accepted by ICCV-2023.

🌈 Overview

ATM The key motivation behind ATM is to explore the potential of simple arithmetic operations to capture auxiliary temporal clues that may be embedded in current video features, without relying on the elaborate design. The ATM can be integrated into both vanilla CNN backbone (e.g., ResNet) and Vision Transformer (e.g., ViT) for video action recognition.

🚀 Training & Testing

We offer training and testing scripts for Kinetics-400, Sth-Sth V1, and Sth-Sth V2. Please refer to the script folder for details. For example, you can run:

# Train the 8 Frames ViT-B/32 model on Sth-Sth v1.
sh scripts/ssv1/train_base.sh 

# Test the 8 Frames ViT-B/32 model on Sth-Sth v1.
sh scripts/ssv1/test_base_f8.sh

📌 BibTeX & Citation

If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry😁.

@inproceedings{atm,
  title={What Can Simple Arithmetic Operations Do for Temporal Modeling?},
  author={Wu, Wenhao and Song, Yuxin and Sun, Zhun and Wang, Jingdong and Xu, Chang and Ouyang, Wanli},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2023}
}

🎗️ Acknowledgement

This repository is built upon portions of VideoMAE, CLIP, and EVA. Thanks to the contributors of these great codebases.

👫 Contact

For any question, please file an issue or contact Wenhao Wu.

atm's People

Contributors

whwu95 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.