Code Monkey home page Code Monkey logo

mixste's Introduction

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Official implementation of CVPR 2022 paper(MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video).

Note: Here are core codes of our work. This work is based on the VideoPose3D, some fundamental codes canbe found there. At the same time, We are organizing codes and prepare to submit to the mmpose as soon as possible.

Visualization of our method and ground truth on Human3.6M

Environment

The code is conducted under the following environment:

  • Ubuntu 18.04
  • Python 3.6.10
  • PyTorch 1.8.1
  • CUDA 10.2

You can create the environment as follows:

conda env create -f requirements.yml

Dataset

The Human3.6M dataset and HumanEva dataset setting follow the VideoPose3D. Please refer to it to set up the Human3.6M dataset (under ./data directory).

The MPI-INF-3DHP dataset setting follows the MMPose. Please refer it to set up the MPI-INF-3DHP dataset (also under ./data directory).

Evaluation

Then run the command below (evaluate on 243 frames input):

python run.py -k cpn_ft_h36m_dbb -c <checkpoint_path> --evaluate <checkpoint_file> -f 243 -s 243

Training from scratch

Training on the 243 frames with two GPUs:

python run.py -k cpn_ft_h36m_dbb -f 243 -s 243 -l log/run -c checkpoint -gpu 0,1

if you want to take place of attention module with more efficient attention design, please refer to the rela.py, routing_transformer.py, and linearattention.py. These efficient design are coming from previous works:

Visulization

Please refer to the https://github.com/facebookresearch/VideoPose3D#visualization.

Acknowledgement

Thanks for the baselines, we construct the code based on them:

  • VideoPose3D
  • SimpleBaseline

Citation

@InProceedings{Zhang_2022_CVPR,
    author    = {Zhang, Jinlu and Tu, Zhigang and Yang, Jianyu and Chen, Yujin and Yuan, Junsong},
    title     = {MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {13232-13242}
}

mixste's People

Contributors

jinluzhang1126 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.