Code Monkey home page Code Monkey logo

3d-dense-residual-network-for-action-recognition's Introduction

DRN-3D

3D dense residual network for action recognition

Limited by hardware(I only have one GTX1080 Ti) and network(CN), I did not do further experiments with Large datastes, e.g Kinetics, sports-8M.

3D dense residual network

Inspired by Residual Dense Network for Image Super-Resolution

3D dense residual block

                                fig1 3D dense residual block

3D dense redidual network

                                fig2 3D dense residual network

The parameters and model size of 3D-DRN as follow.

parameters model size
1.5M 6.3MB

Requirements

opencv3.2
keras2.0.8
tensorflow1.3

Prepare data

setp1 -- download UCF-101 dataset

step2 -- converting videos to images for UCF-101

python utils/video2img.py --video-path='the path of ucf101' --save-path='the path for saving images'

step3 -- generating label txt for converted images

python utils/make_label_txt.py --image-path='the path of saved images'

Training

In C3D, the input dimensions are 128 × 171 × 16 × 3, in this repo are 128 × 171 × 8 × 3.

During trianing, support three types of length for input clips. check this script for detail.

(1) clip length = 16. I take one sample each two frames.

16f

(2) clip length = 24. I take one sample each three frames.

24f

(3) mixed clip lengths. First, I randomly choose 16 or 24 clip length with 50% probability, then take one sample each two or three frames correspondingly.

Clips are resized to have a frame size of 128 × 171. On training, I randomly crop input clips into 112×112×8 crops for spatial and temporal jittering. I also horizontally flip them with 50% probability.

python train_DRN-3D.py --lr=0.01 --batch-size=16 --drop-rate=0.2 --clip-length=16 --random-length=False --image-path='the path of saved images'

Results

I use only a single center crop per clip, and pass it through the network to make the clip prediction. For video predictions, I average clip predictions of some clips which are evenly extracted from the video (no overlap).

Evaluate video (pre-trained weight files are in 'results' directory )

python evaluate_video.py

Results on UCF101

clip length clip acc video acc
16 58.41% 62.80%
24 59.47% 64.16%
16, 24 mixed 59.60% 64.76%

16f acc

                    fig3 clip acc of length=16 during training

16f loss

                    fig4 clip loss of length=16 during training

predict video frame by frame and display result

python video_demo.py

ds

Extract video feature for HMDB51 with pre-trained model

Firstly, convert video to image

python utils/video2img_hmdb.py

Secondly, generate label txt

python utils/hmdb_label.py

Extract video feature and evaluate them

python evaluate_hmdb.py

The accuracy of HMDB51 is 56%


Reference

RDN

DenseNet

keras-resnet

C3D

c3d-keras

3d-dense-residual-network-for-action-recognition's People

Contributors

tianzhongsong avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.