Code Monkey home page Code Monkey logo

activity-recognition-with-cnn-and-rnn's Introduction

LRCN and Temporal CNN for Activity Recognition

Chih-Yao Ma*, Min-Hung Chen*

(* equal contribution)


Abstract

This is an ongoing project.

We examine and implement several leading deep learning techniques for Human Activity Recognition (video classification), while proposing and investigating a novel convolution on temporally-constructed feature vectors.

Our proposed model classify videos into different human activities and give confident scores for each prediction. Features extracted from both spatial and temporal network were integrated by RNN to make prediction for each image frame. Class predictions for each of the video are made by voting through several selected video frames.

How we tackle Activity Recognition problem?

CNN as baseline, CNN + RNN (LRCN), Temporal CNN

CNN as baseline CNN + RNN (LRCN) Temporal CNN
CNN as baseline CNN + RNN (LRCN) Temporal CNN

Demo

IMAGE ALT TEXT HERE

The above YouTube video demonstrates the top-3 predictions results of our LRCN and temporal CNN model. The text on the top is the ground truth, three texts are the predictions for each of the method, and the bar right next to the predictions are how confident the model makes predictions.


Dataset

We are currently using UCF101 dataset for our project. This dataset has 13320 videos from 101 action categories.

UCF101 Dataset

We will move onto Sports-1M dataset to see how much our performance will be changed in the near future.

SPORTS-1M Dataset


Installation

Our work is currently implemented in Torch, and depends on the following packages: torch/torch7, torch/nn, torch/nngraph, torch/image, cudnn ...

If you are on Ubuntu, please follow the instruction here to install Torch. For a more comprehensive installation guilde, please check Torch installation.

$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd ~/torch; bash install-deps;
$ ./install.sh
$ source ~/.bashrc

You will also need to install some of the packages we used from LuaRocks. LuaRocks should already be installed with your Torch.

$ luarocks install torch
$ luarocks install pl
$ luarocks install trepl
$ luarocks install image
$ luarocks install nn
$ luarocks install dok
$ luarocks install gnuplot
$ luarocks install qtlua
$ luarocks install sys
$ luarocks install xlua
$ luarocks install optim

If you would like to use CUDA on your NVIDIA graphic card, you will need to install CUDA toolkit and some additional packages. For installing CUDA 7.5 on Ubuntu:

# cd to where the downloaded file located, and then
$ sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
$ sudo apt-get update
# install cuda using apt-get
$ sudo apt-get install cuda

add the following lines to your ~/.bashrc file

export CUDA_HOME=/usr/local/cuda-7.5
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64

PATH=${CUDA_HOME}/bin:${PATH}
export PATH

Remember to source your bashrc file afterwards.

$ source ~/.bashrc

In order to use CUDA with Torch, you will need to install some additional packages.

$ luarocks install cutorch
$ luarocks install cunn

You need to install the CUDNN package properly since we use the pre-trained ResNet model. First, you need to download the package from Nvidia (You need to register to download it.)

Then, follow this instruction:

# cd to where the downloaded file located, and then
$ tar -xzvf cudnn-7.5-linux-x64-v5.0-ga.tgz
$ cd cuda
$ sudo cp lib64* /usr/local/cuda/lib64/
$ sudo cp include/cudnn.h /usr/local/cuda/include/
$ luarocks install cudnn

Usage

We provide three different methods to train the models for activity recognition: CNN, CNN with RNN, and Temporal CNN.

Inputs

Our models will take the feature vectors generated by the first CNN as input for training. You can generate the features using our codes under "/CNN_Spatial/". You can also download the feature vectors generated by ourselves. (please refer to the Dropbox link below.) We followed the first training/testing split from UCF-101. If you would like to compare with our results, please use the same training and testing list, as it will affect your overall performance a lot.

CNN with RNN

We use the RNN library provided by Element-Research. Simply install it by:

$ luarocks install rnn

After you downloaded the feature vectors, please modify the code in ./RNN/data.lua to the director where you put your feature vector files.

To start the training process, go to ./RNN and simply execute:

$ th RNN_LSTM.lua

The training and testing performance will be plotted, and the results will be saved into log files. The learning rate and best testing accuracy will be reported each epoch if there is any update.

Temporal CNN

To start the training process, go to ./TCNN and simply execute:

$ qlua run.lua -r 15e-5

For more details, please refer to the readme file in the folder ./TCNN/.

You also need to modify the code in ./TCNN/data.lua to the director where you put your feature vector files.

The training and testing performance will be plotted, and the results will be saved into log files. The best testing accuracy will be reported each epoch if there is any update.


Acknowledgment

This work was initialized as a class project for deep learning class in Georgia Tech 2016 Spring. We were teamed up with Hao Yan and Casey Battaglino to work on this class project, who have been a great help and provide valuable discussions as we go long this class project.

This is an ongoing project. Please contact us if you have any questions.

Chih-Yao Ma at [email protected] or [LinkedIn]

Min-Hung Chen at [email protected]

Last updated: 06/27/2016

activity-recognition-with-cnn-and-rnn's People

Contributors

chihyaoma avatar cmhungsteve avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.