Code Monkey home page Code Monkey logo

alphaction's Introduction

AlphAction

AlphAction aims to detect the actions of multiple persons in videos. It is the first open-source project that achieves 30+ mAP (32.4 mAP) with single model on AVA dataset.

This project is the official implementation of paper Asynchronous Interaction Aggregation for Action Detection, authored by Jiajun Tang*, Jin Xia* (equal contribution), Xinzhi Mu, Bo Pang, Cewu Lu (corresponding author).


demo1 demo2
demo3

Installation

You need first to install this project, please check INSTALL.md

Data Preparation

To do training or inference on AVA dataset, please check DATA.md for data preparation instructions.

Model Zoo

config backbone structure mAP in paper model
resnet50_4x16f_parallel ResNet-50 Parallel 29.0 28.9 [link]
resnet50_4x16f_serial ResNet-50 Serial 29.8 29.6 [link]
resnet50_4x16f_denseserial ResNet-50 Dense Serial 30.0 29.8 [link]
resnet101_8x8f_denseserial ResNet-101 Dense Serial 32.4 32.3 [link]

Visual Demo

To run the demo program on video or webcam, please check the folder demo. We select 15 common categories from the 80 action categories of AVA, and provide a practical model which achieves high accuracy (about 70 mAP) on these categories.

Training and Inference

The hyper-parameters of each experiment are controlled by a .yaml config file, which is located in the directory config_files. All of these configuration files assume that we are running on 8 GPUs. We need to create a symbolic link to the directory output, where the output (logs and checkpoints) will be saved. Besides, we recommend to create a directory models to place model weights. These can be done with following commands.

mkdir -p /path/to/output
ln -s /path/to/output data/output
mkdir -p /path/to/models
ln -s /path/to/models data/models

Training

The pre-trained model weights and the training code will be public available later. ๐Ÿ˜‰

Inference

First, you need to download the model weights from Model Zoo.

To do inference on single GPU, you only need to run the following command. It will load the model from the path speicified in MODEL.WEIGHT. Note that the config VIDEOS_PER_BATCH is a global config, if you face OOM error, you could overwrite the config in the command line as we do in below command.

python test_net.py --config-file "path/to/config/file.yaml" \
MODEL.WEIGHT "path/to/model/weight" \
TEST.VIDEOS_PER_BATCH 4

We use the launch utility torch.distributed.launch to launch multiple processes for inference on multiple GPUs. GPU_NUM should be replaced by the number of gpus to use. Hyper-parameters in the config file can still be modified in the way used in single-GPU inference.

python -m torch.distributed.launch --nproc_per_node=GPU_NUM \
test_net.py --config-file "path/to/config/file.yaml" \
MODEL.WEIGHT "path/to/model/weight"

Acknowledgement

We thankfully acknowledge the computing resource support of Huawei Corporation for this project.

Citation

If this project helps you in your research or project, please cite this paper:

@article{tang2020asynchronous,
  title={Asynchronous Interaction Aggregation for Action Detection},
  author={Tang, Jiajun and Xia, Jin and Mu, Xinzhi and Pang, Bo and Lu, Cewu},
  journal={arXiv preprint arXiv:2004.07485},
  year={2020}
}

alphaction's People

Contributors

yelantf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.