Code Monkey home page Code Monkey logo

multimedia_actionprediction_project3's Introduction

MultiMedia_ActionPrediction_Project3

This repository is created as a deliverable for Project 3 for CS-523 course offered in Spring 2017 at University of Illinois at Chicago.

Project Members:

  1. Shreyas Kulkarni
  2. Hengbin Li
  3. Kruti Sharma

Future Action Prediction using Deep Multi-Scale Video Prediction

Generative Adversarial Network have an extensive use in generating images, faces etc. The same network has an application in predicting next frame. Here we show the next action prediction using GANs

This project implements a generative adversarial network to predict future frames of video, as detailed in "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun. Their official code (using Torch) can be found at: https://github.com/coupriec/VideoPredictionICLR2016. The official code implemented by dyelax (in TensorFlow) can be found at: https://github.com/dyelax/Adversarial_Video_Generation, which shows the prediction of next frames from Ms. Pacman video game dataset.

Requirements to Run ipython notebook:

  1. python 3.5 or Greater
  2. Jupyter Notebook.

Requirements to train/test the Adversarial Network

  1. python 3.5 or Greater
  2. TensorFlow >= 1.0
  3. numpy==1.12.0
  4. scikit-image>=0.13.0
  5. scikit-learn>=0.18.1
  6. scipy>=0.19.0
  7. six==1.10.0
  8. pillow>=4.0.0
  9. imageio>=1.5.0

Dataset

This repository is trained on sequence of images of human actions like walk, slide, bend etc. The video sets of Human Actions were collected from http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html. Each of the videos were than converted into frames and saved in respective directory.

The training, testing and clips directory generated for Human Actions Dataset is in the following format:

Data
  Human Actions
    Train
      daria_bend
        daria_bend_frame0.jpg
        daria_bend_frame1.jpg

        ......
      daria_walk
        daria_walk_frame0.jpg
        daria_walk_frame1.jpg
        ....
       ......
       .....
      ....... 
      lena_skip2
        lena_skip2_frame0.jpg
        lena_skip2_frame1.jpg
        .....

    Test
      daria_bend
        daria_bend_frame0.jpg
        .....

      denis_bend
        denis_bend_fram0.jpg
        ......

    Clips
      0.npz
      1.npz
      .....

    TestClips
      0.npz
      1.npz
      .....

Clips and TestClips are generated by processing the images in Train and Test directories. The training and testing of the network is done on the clips generated. In order to run the repo, please follow the below instructions:

How to Train/Test:

  1. Clone or download the repository on your local machine

  2. Copy the entire repository folder and paste it into the root folder of jupyter notebook.

  3. If you want to train on the Human Actions Dataset, the videos can be downloaded from http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html

  4. Once the videos are downloaded, run the following command on each of the directory (eg: walk, slide etc.) to convert the videos into frmaes. python convert_video_to_jpg.py --v=skip

  5. The above command takes an input a folder - containing the videos and generates the frames for each of the videos in respective folder i.e. if skip folder has a video: daria_skip.avi, denis_skip.avi etc, then this python file create folders inside skip as: daria_skip, denis_skip - with each of these folder containing images obtained from the respective videos.

  6. Repear Step 4 for all the folder containing videos.

  7. Once the frames are generated, transfer all the individual folders into a data/Human_Actions/Train directory. Copy some 20% images frames from Train directory to Test directory.

  8. Once the images are generated and present in Train and Test directory (following the same folder structure as shown above), we can run generate_clips.py to process the images into npz file format.

         python generate_clips.py --t=data/Human_Actions/Train/ --c=data/Human_Actions/Clips/ --n=1000
    
  9. The above command will run on Training data set and generate clips (npz) and store in data/Human_Actions/Clips. For now we have given the n=1000 which means it will generate 1000 clips. For a good training and testing, we must generate atleast 100,000+ clips. The same command must run for Test directory as:

         python generate_clips.py --t=data/Human_Actions/Test/ --c=data/Human_Actions/Clips/ --n=1000
    
  10. Once the clips are generated for both Train and Test data set, we can now train the network and simultaneously test. Test runs are done for every 5000 iterations and images are stored in data/Human_Actions/Save/Images/Default/Test with Step_Numbers. We also save the different scaled images - both ground truth and generated at every 1000 iterations in: data/Human_Actions/Save/Images/Default/Step_1000 and so on.

        Train: python main_prediction.py
    
        Test: python main_prediction.py --test_dir=data/Human_Actions/Test --recursions=1 --test_only
    

(Note: A Test or Train directory must have some sub-folders inside with each subfolder having atleast 5+ images. Also the train directory is initialized in constants.py as TRAIN_DIR = '')

  1. There is also a Jupyter Notebook: Project3UI.ipynb that can be used Step-By-Step to perform all the above steps. The jupyter notebook has all the commands in sequence which can be executed first to produce the images from a video, then process the clips and then train and test the network.
  2. To run the Jupyter Notebook, configure the Train and Test directories for your dataset under data. All the directories must be placed in Code folder to run any python file.

Outputs:

Some of the sample outputs generated are kept in folder: Code/outputs. Each folder has 4 input images and some predictions ranging from 2-4. For example: bend_1 contains 4 input images and 1 prediction, bend_4 : 4 input images and 4 predictions.

Youtube video demo of project:

This demo shows how to use the jupyter notebook to have an interactive session for running this project. https://youtu.be/tpr44-G5MbU

Report

The project report is: Future Action Prediction using Deep Multi-Scale Video Prediction.pdf , present in the root of the repository.

Results from our Trained and Tested Network :

---------Original:--------------Ground Truth:----------------Generated:-----------

alt text alt text alt text

---------Original:--------------Ground Truth:----------------Generated:-----------

alt text alt text alt text

multimedia_actionprediction_project3's People

Contributors

lihengbin333 avatar skruti10 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.