Code Monkey home page Code Monkey logo

two-stream-pytorch's Introduction

PyTorch implementation of popular two-stream frameworks for video action recognition

Current release is the PyTorch implementation of the "Towards Good Practices for Very Deep Two-Stream ConvNets". You can refer to paper for more details at Arxiv.

For future, I will add PyTorch implementation for the following papers:

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition,
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
ECCV 2016

Deep Temporal Linear Encoding Networks
Ali Diba, Vivek Sharma, Luc Van Gool
https://arxiv.org/abs/1611.06678

Hidden Two-Stream Convolutional Networks for Action Recognition
Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
https://arxiv.org/abs/1704.00389

Installation

Tested on PyTorch:

OS: Ubuntu 16.04
Python: 3.5
CUDA: 8.0
OpenCV3
dense_flow

To successfully install dense_flow(branch opencv-3.1), you probably need to install opencv3 with opencv_contrib. (For opencv-2.4.13, dense_flow will be installed more easily without opencv_contrib, but you should run code of this repository under opencv3 to avoid error)

Code also works for Python 2.7.

Data Preparation

Download data UCF101 and use unrar x UCF101.rar to extract the videos.

Convert video to frames and extract optical flow

python build_of.py --src_dir ./UCF-101 --out_dir ./ucf101_frames --df_path <path to dense_flow>

build file lists for training and validation

python build_file_list.py --frame_path ./ucf101_frames --out_list_path ./settings

Training

For spatial stream (single RGB frame), run:

python main_single_gpu.py DATA_PATH -m rgb -a rgb_resnet152 --new_length=1
--epochs 250 --lr 0.001 --lr_steps 100 200

For temporal stream (10 consecutive optical flow images), run:

python main_single_gpu.py DATA_PATH -m flow -a flow_resnet152
--new_length=10 --epochs 350 --lr 0.001 --lr_steps 200 300

DATA_PATH is where you store RGB frames or optical flow images. Change the parameters passing to argparse as you need.

Testing

Go into "scripts/eval_ucf101_pytorch" folder, run python spatial_demo.py to obtain spatial stream result, and run python temporal_demo.py to obtain temporal stream result. Change those label files before running the script.

For ResNet152, I can obtain a 85.60% accuracy for spatial stream and 85.71% for temporal stream on the split 1 of UCF101 dataset. The result looks promising. Pre-trained RGB_ResNet152 Model Pre-trained Flow_ResNet152 Model

For VGG16, I can obtain a 78.5% accuracy for spatial stream and 80.4% for temporal stream on the split 1 of UCF101 dataset. The spatial result is close to the number reported in original paper, but flow result is 5% away. There are several reasons, maybe the pretained VGG16 model in PyTorch is differnt from Caffe, maybe there are subtle bugs in my VGG16 flow model. Welcome any comments if you found the reason why there is a performance gap. Pre-trained RGB_VGG16 Model Pre-trained Flow_VGG16 Model

I am experimenting with memory efficient DenseNet now, will release the code in a couple of days. Stay tuned.

Related Projects

TSN: Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Hidden Two-Stream: Hidden Two-Stream Convolutional Networks for Action Recognition

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.