Code Monkey home page Code Monkey logo

actionrecognition's Introduction

Action Recognition

This project aims to accurately recognize user's action in a series of video frames through combination of convolution neural nets, and long-short term memory neural nets.

Project Overview

  • This project explores prominent action recognition models with UCF-101 dataset

  • Perfomance of different models are compared and analysis of experiment results are provided

File Structure of the Repo

rnn_practice: Practices on RNN models and LSTMs with online tutorials and other useful resources

data: Training and testing data. (NOTE: please don't add large data files to this repo, add them to .gitignore)

models: Defining the architecture of models

utils: Utils scripts for dataset preparation, input pre-processing and other helper functions

train_CNN: Training CNN models. The program loads corresponding models, sets the training parameters and initializes network training

process_CNN: Processing video with CNN models. The CNN component is pre-trained and fixed during the training phase of LSTM cells. We can utilize the CNN model to pre-process frames of each video and store the intermediate results for feeding into LSTMs later. This procedure improves the training efficiency of the LRCN model significantly

train_RNN: Training the LRCN model

predict: Calculating the overall testing accuracy on the entire testing set

Models Description

  • Fine-tuned ResNet50 and trained solely with single-frame image data. Each frame of the video is considered as an image for training and testing, which generates a natural data augmentation. The ResNet50 is from keras repo, with weights pre-trained on Imagenet. ./models/finetuned_resnet.py

  • LRCN (CNN feature extractor, here we use the fine-tuned ResNet50 and LSTMs). The input of LRCN is a sequence of frames uniformly extracted from each video. The fine-tuned ResNet directly uses the result of [1] without extra training (C.F.Long-term recurrent convolutional network).

    Produce intermediate data using ./process_CNN.py and then train and predict with ./models/RNN.py

  • Simple CNN model trained with stacked optical flow data (generate one stacked optical flow from each of the video, and use the optical flow as the input of the network). ./models/temporal_CNN.py

  • Two-stream model, combines the models in [2] and [3] with an extra fusion layer that output the final result. [3] and [4] refer to this paper ./models/two_stream.py

Citations

If you use this code or ideas from the paper for your research, please cite the following papers:

@inproceedings{lrcn2014,
   Author = {Jeff Donahue and Lisa Anne Hendricks and Sergio Guadarrama
             and Marcus Rohrbach and Subhashini Venugopalan and Kate Saenko
             and Trevor Darrell},
   Title = {Long-term Recurrent Convolutional Networks
            for Visual Recognition and Description},
   Year  = {2015},
   Booktitle = {CVPR}
}
@article{DBLP:journals/corr/SimonyanZ14,
  author    = {Karen Simonyan and
               Andrew Zisserman},
  title     = {Two-Stream Convolutional Networks for Action Recognition in Videos},
  journal   = {CoRR},
  volume    = {abs/1406.2199},
  year      = {2014},
  url       = {http://arxiv.org/abs/1406.2199},
  archivePrefix = {arXiv},
  eprint    = {1406.2199},
  timestamp = {Mon, 13 Aug 2018 16:47:39 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/SimonyanZ14},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

actionrecognition's People

Contributors

bk202 avatar changanvr avatar davidnazw avatar multimodalitiesfor3dscenes avatar woodfrog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

actionrecognition's Issues

data converted

Hello,
Could you tell me how the image data(2162163) is converted into temperol(21621618) in UCF_utils.py?I do not understand.And it said that "ValueError: could not broadcast input array from shape (10,216,216,3) into shape (216,216,18)".

too many files missing

hello, i am finding difficulty in running the programe. i am running train_cnn.py / train_rnn.py

Traceback (most recent call last):
File "train_CNN.py", line 70, in
fit_model(model, train_data, test_data, weights_dir, input_shape)
File "train_CNN.py", line 45, in fit_model
callbacks=[checkpointer, tensorboard, earlystopping]
File "C:\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Python36\lib\site-packages\keras\engine\training.py", line 2192, in fit_generator
generator_output = next(output_generator)
File "C:\Python36\lib\site-packages\keras\utils\data_utils.py", line 793, in get
six.reraise(value.class, value, value.traceback_)
File "C:\Python36\lib\site-packages\six.py", line 693, in reraise
raise value
File "C:\Python36\lib\site-packages\keras\utils\data_utils.py", line 658, in data_generator_task
generator_output = next(self._generator)
File "C:\Users\jigne\Desktop\BE PROJECT FILES\ActionRecognition\utils\UCF_utils.py", line 60, in image_from_sequence_generator
batch_video, batch_label = next(video_gen)
File "C:\Users\jigne\Desktop\BE PROJECT FILES\ActionRecognition\utils\UCF_utils.py", line 39, in sequence_generator
raise FileExistsError('Too many file missing')
FileExistsError: Too many file missing

optical flow images

Can you tell me how are you feeding optical images which you got from video frames?
I am just confused.

instructions on implementing the project

Could you write some instructions on how to implement (train, validate, test) this project please? I am new to this scope and I wonder where to start.
Thank you

optical flow src dir?

Hello
Could you please tell me that what is
src_dir = '/home/changan/ActionRecognition/data/UCF-Preprocessed-OF'
please clear that what you have p[laced in this folder
Thanks

About data preprocessing

Hi, I am interested in your repo ActionRecognition, and trying to run codes.
However, in you 'experiment.md' file, about Preprocessing

3. Segment number of frames into equal size blocks(frame number/sequence
length L). Randomly select one frame from each block to compose L length
video clip

Is this that if there are 35 frames for one video, then if the L is 5,
it is divided into 1-5, 6-10 , ... , 31-35, which is 7 blocks and get
random 7 frames which are each from first to last blocks then make 7 frames video?
If my understanding is right, I'd like to know why doing this.
Also, is there any way to get sequence length L? Is it fixed? which number you chosed?

p.s. If possible, can you let me know the threshold for discarding videos with too few frames in below description.

2. Extract video to 5 FPS and down sample resolution for each video
and discard videos with too few frames

File not found error?

Traceback (most recent call last):
File "C:/Users/sjc/Desktop/ActionRecognition-master/predict.py", line 136, in
predict_two_stream18_test()
File "C:/Users/sjc/Desktop/ActionRecognition-master/predict.py", line 63, in predict_two_stream18_test
x, y = next(generator)
File "C:\Users\sjc\Desktop\ActionRecognition-master\utils\UCF_utils.py", line 188, in two_stream18_generator
with open(list_dir) as fo:
FileNotFoundError: [Errno 2] No such file or directory: '/home/changan/ActionRecognition/data\ucfTrainTestlist\testlist.txt'

Process finished with exit code 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.