AlphAction

AlphAction aims to detect the actions of multiple persons in videos. It is the first open-source project that achieves 30+ mAP (32.4 mAP) with single model on AVA dataset.

This project is the official implementation of paper Asynchronous Interaction Aggregation for Action Detection, authored by Jiajun Tang*, Jin Xia* (equal contribution), Xinzhi Mu, Bo Pang, Cewu Lu (corresponding author).

Installation

You need first to install this project, please check INSTALL.md

Data Preparation

To do training or inference on AVA dataset, please check DATA.md for data preparation instructions.

Model Zoo

config	backbone	structure	mAP	in paper	model
resnet50_4x16f_parallel	ResNet-50	Parallel	29.0	28.9	[link]
resnet50_4x16f_serial	ResNet-50	Serial	29.8	29.6	[link]
resnet50_4x16f_denseserial	ResNet-50	Dense Serial	30.0	29.8	[link]
resnet101_8x8f_denseserial	ResNet-101	Dense Serial	32.4	32.3	[link]

Visual Demo

To run the demo program on video or webcam, please check the folder demo. We select 15 common categories from the 80 action categories of AVA, and provide a practical model which achieves high accuracy (about 70 mAP) on these categories.

Training and Inference

The hyper-parameters of each experiment are controlled by a .yaml config file, which is located in the directory config_files. All of these configuration files assume that we are running on 8 GPUs. We need to create a symbolic link to the directory output, where the output (logs and checkpoints) will be saved. Besides, we recommend to create a directory models to place model weights. These can be done with following commands.

mkdir -p /path/to/output
ln -s /path/to/output data/output
mkdir -p /path/to/models
ln -s /path/to/models data/models

Training

The pre-trained model weights and the training code will be public available later. 😉

Inference

First, you need to download the model weights from Model Zoo.

To do inference on single GPU, you only need to run the following command. It will load the model from the path speicified in MODEL.WEIGHT. Note that the config VIDEOS_PER_BATCH is a global config, if you face OOM error, you could overwrite the config in the command line as we do in below command.

python test_net.py --config-file "path/to/config/file.yaml" \
MODEL.WEIGHT "path/to/model/weight" \
TEST.VIDEOS_PER_BATCH 4

We use the launch utility torch.distributed.launch to launch multiple processes for inference on multiple GPUs. GPU_NUM should be replaced by the number of gpus to use. Hyper-parameters in the config file can still be modified in the way used in single-GPU inference.

python -m torch.distributed.launch --nproc_per_node=GPU_NUM \
test_net.py --config-file "path/to/config/file.yaml" \
MODEL.WEIGHT "path/to/model/weight"

Acknowledgement

We thankfully acknowledge the computing resource support of Huawei Corporation for this project.

Citation

If this project helps you in your research or project, please cite this paper:

@article{tang2020asynchronous,
  title={Asynchronous Interaction Aggregation for Action Detection},
  author={Tang, Jiajun and Xia, Jin and Mu, Xinzhi and Pang, Bo and Lu, Cewu},
  journal={arXiv preprint arXiv:2004.07485},
  year={2020}
}

uname0x96 / alphaction Goto Github PK

alphaction's Introduction

AlphAction

Installation

Data Preparation

Model Zoo

Visual Demo

Training and Inference

Training

Inference

Acknowledgement

Citation

alphaction's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent