Code Monkey home page Code Monkey logo

icml2017hierchvid's Introduction

Learning to Generate Long-term Future via Hierarchical Prediction

This is the code for the ICML 2017 paper Learning to Generate Long-term Future via Hierarchical Prediction by Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee.

Please follow the instructions to run the code.

Requirements

This code works with

  • Linux
  • NVIDIA Titan X GPU
  • Tensorflow version 1.3.0

Installing Dependencies (Anaconda installation is recommended)

FFMPEG needs to be installed as well to generate gif videos. If using anaconda, ffmpeg can be installed as follows:

  • conda install -c menpo ffmpeg=3.1.3

Data download and preprocessing

Penn Action:

  • Download data:
    Download from Penn Action and extract into ./datasets/PennAction/
  • Download pose estimated using hourglass network:
./datasets/PennAction/download_hourglass.sh
  • Preprocess:
python ./datasets/PennAction/preprocess.py

Human 3.6M:

  • Download and convert:
    Download from Human 3.6M into ./datasets/Human3.6M/ and preprocess by first converting the pose CDF files into .mat using MATLAB and running the matlab script in ./datasets/Human3.6M/:
cdf2mat.m
  • Download pose estimated using hourglass network:
./datasets/Human3.6M/download_hourglass.sh
  • Preprocess:
python ./datasets/Human3.6M/preprocess.py

Download pre-trained perceptual models for feature loss

./perceptual_models/download.sh

Penn Action training/testing

Training LSTM (can run in parallel with image generator training):

CUDA_VISIBLE_DEVICES=GPU_ID python lstm_src/train_det_rnn_penn.py --gpu=GPU_ID

Training Image Generator (can run in parallel with LSTM training):

CUDA_VISIBLE_DEVICES=GPU_ID python imggen_src/train_penn.py --gpu=GPU_ID

Predict future pose from models trained with the above commands:

CUDA_VISIBLE_DEVICES=GPU_ID python lstm_src/test_det_rnn_penn.py --gpu=GPU_ID --prefix=PENNACTION_DET_LSTM_num_class=8_learning_rate=0.001_image_size=128_batch_size=256_lm_size=13_fut_step=32_num_layer=1_lstm_units=1024_seen_step=10_input_size=26_keep_prob=1.0 --steps=64

Predict video from networks trained with the above commands:

CUDA_VISIBLE_DEVICES=GPU_ID python imggen_src/test_penn.py --gpu=GPU_ID --imggen_prefix=PENNACTION_ANALOGY_imgsize=128_layer=3_alpha=1.0_beta=1.0_gamma=1.0_lr=0.0001 --lstm_prefix=PENNACTION_DET_LSTM_num_class=8_learning_rate=0.001_image_size=128_batch_size=256_lm_size=13_fut_step=32_num_layer=1_lstm_units=1024_seen_step=10_input_size=26_keep_prob=1.0

Resulting images and videos will be located at:

./results/images/PENNACTION_ANALOGY_imgsize=128_layer=3_alpha=1.0_beta=1.0_gamma=1.0_lr=0.0001/

Human 3.6M training/testing

Training LSTM (can run in parallel with image generator training):

CUDA_VISIBLE_DEVICES=GPU_ID python lstm_src/train_det_rnn_h36m.py --gpu=GPU_ID

Training Image Generator (can run in parallel with LSTM training):

CUDA_VISIBLE_DEVICES=GPU_ID python imggen_src/train_h36m.py --gpu=GPU_ID

Predict future pose from models trained with the above commands:

CUDA_VISIBLE_DEVICES=GPU_ID python lstm_src/test_det_rnn_h36m.py --gpu=GPU_ID --prefix=HUMAN3.6M_DET_LSTM_fskip=4_keep_prob=1.0_image_size=128_batch_size=256_lm_size=32_fut_step=32_num_layer=1_lstm_units=1024_seen_step=10_input_size=64_learning_rate=0.001 --steps=128

Predict video from networks trained with the above commands:

CUDA_VISIBLE_DEVICES=GPU_ID python imggen_src/test_h36m.py --gpu=GPU_ID --imggen_prefix=HUMAN3.6M_ANALOGY_imgsize=128_layer=3_alpha=1.0_beta=1.0_gamma=1.0_lr=0.0001 --lstm_prefix=HUMAN3.6M_DET_LSTM_fskip=4_keep_prob=1.0_image_size=128_batch_size=256_lm_size=32_fut_step=32_num_layer=1_lstm_units=1024_seen_step=10_input_size=64_learning_rate=0.001

Resulting images and videos will be located at:

./results/images/HUMAN3.6M_ANALOGY_imgsize=128_layer=3_alpha=1.0_beta=1.0_gamma=1.0_lr=0.0001/

Citation

If you find this useful, please cite our work as follows:

@inproceedings{villegas17hierchvid,
  title={{Learning to Generate Long-term Future via Hierarchical Prediction}},
  author={Villegas, Ruben and Yang, Jimei and Zou, Yuliang and Sohn, Sungryull and Lin, Xunyu and Lee, Honglak},
  booktitle=ICML,
  year={2017}
}

Please contact "[email protected]" if you have any questions.

icml2017hierchvid's People

Contributors

rubvillegas avatar rubenvillegas avatar

Watchers

James Cloos avatar Mark Gituma avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.