Code Monkey home page Code Monkey logo

pseudo-3d-residual-networks's Introduction

Pseudo-3D Residual Networks

By Zhaofan Qiu, Ting Yao, Tao Mei.

Microsoft Research Asia (MSRA).

Table of Contents

  1. Introduction
  2. Citation
  3. Implementation
  4. Models
  5. Other Implementation
  6. Contact

Introduction

This repository contains the P3D ResNet models described in the paper "Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks" (http://openaccess.thecvf.com/content_iccv_2017/html/Qiu_Learning_Spatio-Temporal_Representation_ICCV_2017_paper.html). These models are used in ActivityNet 2017 challenge, which won the 1st place in dense-Captioning Events in Videos task and 2rd place in Temporal Action Proposals task.

Citation

If you use these models in your research, please cite:

@inproceedings{qiu2017learning,
  title={Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks},
  author={Qiu, Zhaofan and Yao, Ting and Mei, Tao},
  booktitle={ICCV},
  year={2017}
}

Implementation

  1. We implement the P3D ResNet using our modified Caffe on Windows platform. For fast utilization of our model, here we give the addtional layers used in the network. So you can easily add these layers to your own Caffe branch or Caffe master branch to support P3D ResNet.
  2. In the P3D ResNet, all the blobs are 5D-blobs (num, channels, length, height, width). Some layers in early-version Caffe may only support 4D blobs due to the use of Blob::num(), Blob::channels(), Blob::height() and Blob::width(). You may need to replace these callings with Blob::shape(i).
  3. When training the network, to speedup the network and reduce the memory demand, we use cudnn implementation for conv_layer, relu_layer, bn_layer.
  4. P3D ResNet for ResNext/DenseNet/SENet with P3D convolution and P3D ResNet with lighter weights are in the plan. And our custom Caffe and training/finetuning setting files will be pulic soon.
  5. The mean value for each frame is [104, 117, 123], for each optical flow image is 128. For TVL1 optical flow, we merge the x & y direction grey-level flow image as two-channel image.
  6. For both frame and optical flow, we train the network with 160*160 input resolution as described in our paper. When appling this pre-trianed model or using as feature extractor, larger resolution may get higher performance.

Models

  1. P3D ResNet trained on Sports-1M dataset:

  2. P3D Resnet trained on Kinetics dataset:

Other Implementation

  1. P3D-Pytorch by qijiezhao

Contact

If there is any question, pls feel free to contact me at [email protected].

pseudo-3d-residual-networks's People

Contributors

zhaofanqiu avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.