DRN-3D

3D dense residual network for action recognition

Limited by hardware(I only have one GTX1080 Ti) and network(CN), I did not do further experiments with Large datastes, e.g Kinetics, sports-8M.

3D dense residual network

Inspired by Residual Dense Network for Image Super-Resolution

                                fig1 3D dense residual block

                                fig2 3D dense residual network

The parameters and model size of 3D-DRN as follow.

parameters	model size
1.5M	6.3MB

Requirements

opencv3.2
keras2.0.8
tensorflow1.3

Prepare data

setp1 -- download UCF-101 dataset

step2 -- converting videos to images for UCF-101

python utils/video2img.py --video-path='the path of ucf101' --save-path='the path for saving images'

step3 -- generating label txt for converted images

python utils/make_label_txt.py --image-path='the path of saved images'

Training

In C3D, the input dimensions are 128 × 171 × 16 × 3, in this repo are 128 × 171 × 8 × 3.

During trianing, support three types of length for input clips. check this script for detail.

(1) clip length = 16. I take one sample each two frames.

(2) clip length = 24. I take one sample each three frames.

(3) mixed clip lengths. First, I randomly choose 16 or 24 clip length with 50% probability, then take one sample each two or three frames correspondingly.

Clips are resized to have a frame size of 128 × 171. On training, I randomly crop input clips into 112×112×8 crops for spatial and temporal jittering. I also horizontally ﬂip them with 50% probability.

python train_DRN-3D.py --lr=0.01 --batch-size=16 --drop-rate=0.2 --clip-length=16 --random-length=False --image-path='the path of saved images'

Results

I use only a single center crop per clip, and pass it through the network to make the clip prediction. For video predictions, I average clip predictions of some clips which are evenly extracted from the video (no overlap).

Evaluate video (pre-trained weight files are in 'results' directory )

python evaluate_video.py

Results on UCF101

clip length	clip acc	video acc
16	58.41%	62.80%
24	59.47%	64.16%
16, 24 mixed	59.60%	64.76%

                    fig3 clip acc of length=16 during training

                    fig4 clip loss of length=16 during training

predict video frame by frame and display result

python video_demo.py

Extract video feature for HMDB51 with pre-trained model

Firstly, convert video to image

python utils/video2img_hmdb.py

Secondly, generate label txt

python utils/hmdb_label.py

Extract video feature and evaluate them

python evaluate_hmdb.py

The accuracy of HMDB51 is 56%

Reference

gitumarkk / 3d-dense-residual-network-for-action-recognition Goto Github PK

3d-dense-residual-network-for-action-recognition's Introduction

DRN-3D

3D dense residual network

Requirements

Prepare data

Training

Results

predict video frame by frame and display result

Extract video feature for HMDB51 with pre-trained model

The accuracy of HMDB51 is 56%

Reference

3d-dense-residual-network-for-action-recognition's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent