Code Monkey home page Code Monkey logo

bat-video-classification's Introduction

Bilinear Attentional Transforms (BAT) for Video Classification

This is the official code of Non-Local Neural Networks With Grouped Bilinear Attentional Transforms for video classification on Kinetics.

Pretrained models

Here we provide some of the pretrained models.

Method Backbone Input Frames Top-1 Acc Link
C2D ResNet-50 8 72.0% GoogleDrive / BaiduYun(Access Code: r0i2)
I3D ResNet-50 8 72.7% GoogleDrive / BaiduYun(Access Code: dnwv)
C2D + 2D-BAT ResNet-50 8 74.6% GoogleDrive / BaiduYun(Access Code: inb0)
I3D + 2D-BAT ResNet-50 8 75.1% GoogleDrive / BaiduYun(Access Code: q8d8)
C2D + 3D-BAT ResNet-50 8 75.5% GoogleDrive / BaiduYun(Access Code: rnrg)

Quick starts

Requirements

  • Install Lintel
  • pip install -r requirements.txt

Data preparation

  1. Download Kinetics-400 via the official scripts.
  2. Generate the training / validation list file. A list file looks like
video_path frame_num label
video_path frame_num label
...

Training

To train a model, run main.py with the desired model architecture and other super-paremeters:

python main.py \
    /PATH/TO/TRAIN_LIST \
    /PATH/TO/VAL_LIST \
    --read_mode video \
    --resume /PATH/TO/IMAGENET_PRETRAINED/MODEL --soft_resume \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --num_segments 1 --seq_length 8 --sample_rate 8 \
    --lr 0.01 --lr_steps 40 80 --epochs 100 \
    --eval-freq 5 --save-freq 5 -b 64 -j 48 --dropout 0.5

More training scripts can be found in scripts. The ImageNet pretrained models can be downloaded from GoogleDrive / BaiduYun(Acess Code: 1r48).

Testing

Fully-convolution inference (recommended):

python test_models.py \
    /PATH/TO/VAL_LIST \
    /PATH/TO/CHECKPOINT \
    --read_mode video \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --test_segments 10 --test_crops 3 --seq_length 8 --sample_rate 8 \
    -j 16

10 crops and 25 segments:

python test_models.py \
    /PATH/TO/VAL_LIST \
    /PATH/TO/CHECKPOINT \
    --read_mode video \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --test_segments 25 --seq_length 8 --sample_rate 8 \
    -j 16

Other applications of BAT

Citation

If you find this work or code is helpful in your research, please cite:

@InProceedings{Chi_2020_CVPR,
  author = {Chi, Lu and Yuan, Zehuan and Mu, Yadong and Wang, Changhu},
  title = {Non-Local Neural Networks With Grouped Bilinear Attentional Transforms},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}

bat-video-classification's People

Contributors

1820366459 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.