Code Monkey home page Code Monkey logo

gntcn's Introduction

GnTCN

arXiv PWC PWC PWC

Introduction

This repository contains the code and models for the following paper.

Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos
Cheng Yu, Bo Wang, Bo Yang, Robby T. Tan
AAAI Conference on Artificial Intelligence, AAAI 2021.

Updates

  • 06/07/2021 evaluation code (PCK_abs camera-centric) and pre-trained model for MuPoTS dataset tested and released
  • 04/30/2021 evaluation code (PCK person-centric), pre-trained model, and estimated 2D joints for MuPoTS dataset released

Installation

Dependencies

Pytorch >= 1.3
Python >= 3.6

Create an enviroment.

conda create -n gntcn python=3.6
conda activate gntcn

Install the latest version of pytorch (tested on pytorch 1.3 - 1.7) based on your OS and GPU driver installed following install pytorch. For example, command to use on Linux with CUDA 11.0 is like:

conda install pytorch torchvision cudatoolkit=11.0 -c pytorch

Install opencv-python, torchsul, tqdm, and scipy to run the evaluation code

pip install opencv-python
pip install --upgrade torchsul 
pip install tqdm
pip install scipy

Pre-trained Model

Download the pre-trained model and processed human keypoint files (H36M and MuPoTS) here, and unzip the downloaded zip file to this project's directory, two folders and one pkl file are expected to see after doing that (i.e., ./ckpts, ./mupots, and points_eval.pkl).

Directory

Copy the two two folders and the pkl file to the root directory of the project, you should see the following directory structure.

${GnTCN_ROOT}
|-- ckpts
|-- models
|-- mupots
|-- util
|-- points_eval.pkl
|-- calculate_mupots_depth.py
|-- other python code, LICENSE, and README files
...

Usage

MuPoTS dataset evaluation

MuPoTS eval set is needed to perform evaluation, which is available on the MuPoTS dataset website (download the mupots-3d-eval.zip file, unzip it, and run get_mupots-3d.sh to download the dataset). After the download is complete, MultiPersonTestSet.zip (5.6 GB) is avaiable. Unzip it and move the folder MultiPersonTestSet to the root directory of the project to perform evaluation on MuPoTS test set. Now you should see the following directory structure.

${GnTCN_ROOT}
|-- ckpts
|-- models
|-- MultiPersonTestSet <-- Newly added MuPoTS eval set
|-- mupots
|-- util
|-- points_eval.pkl
|-- calculate_mupots_depth.py
|-- other python code, LICENSE, and README files
...

3D human pose estimation evaluation on MuPoTS eval set

The following is a snapshot of the Table 3 in the paper, which shows the quantitative evaluation results on MuPoTS-3D. To reproduce the results (PCK and PCK_abs) in the table, please follow the instructions in the next section.

MuPoTS evaluation

Run evaluation on MuPoTS dataset with estimated 2D joints as input

To keep this repository simple and small, 2D pose estimator is not included (HRNet was used as the 2D pose estimator as mentioned in the paper). Therefore, the estimated 2D points are provided in the data package to make it easy to reproduce the results reported in our paper. To evaluate the person-centric 3D human pose estimation:

python calculate_mupots_detect.py
python eval_mupots.py

After running the above code, the following PCK (person-centric, pelvis-based origin) value is expected, which matches the number reported in Table 3, PCK 87.5 (percentage) in the paper.

...
PCK_MEAN: 0.8764509703036868

To evaluate camera-centric (i.e., camera coordinates) 3D human pose estimation:

python calculate_mupots_detect.py
python calculate_mupots_depth.py
python eval_mupots_dep.py

After running the above code, the following PCK_abs (camera-centric) value is expected, which matches the number reported in Table 3, PCK_abs 45.7 (percentage) in the paper.

...
PCK_MEAN: 0.45785827181758376

Run evaluation on MuPoTS dataset with 2D Ground-truth joints as input

The Ground-truth 2D joints are included in the data package as well to demonstrate the upper-bound performance of the model, where the 2D ground-truth keypoints are used as input to mimic the situation that there is no error in 2D pose estimation. To evaluate with GPU:

python calculate_mupots_gt.py
python eval_mupots.py

After running the above code, the following PCK (person-centric, pelvis-based origin) value is expected.

...
PCK_MEAN: 0.8985102807603582

Human3.6M dataset evaluation

Run evaluation on Human3.6M dataset with 2D Ground-truth joints as input

Similar to the evaluation above where 2D ground-truth keypoints are used for MuPoTS. The following evaluation code takes 2D Ground-truth joints of the Human3.6M as input to simulate the situation when there is no error in 2D pose estimator, how the proposed method performs. Please note the MPJPE value from this evaluation is lower than the one reported in the paper because the result in Table 5 in the paper was calculated based on the estimated 2D keypoints (i.e., with errors) not from ground-truth.

If GPU is available and pytorch is installed successfully, the GPU evaluation code can be used,

python eval_gt_h36m.py

After running the above code, the following MPJPE value is expected.

...
MPJPE: 0.0180

If GPU is not available or pytorch is not successfully installed, the CPU evaluation code can be used,

python eval_gt_h36m_cpu.py

Result is the same as the GPU evaluation code.

Testing on wild videos

Please note that we didn't include 2D pose estimator code in this repository to keep it simple, please use off-the-shelf 2D pose estimation methods to get 2D joints first, and together with the code from this repository to infer 3D human pose on testing videos (the TCN takes multiple frames as input). In particular, as stated in the paper: we use the original implementation of HRNet as the 2D pose estimator and extract PAF from OpenPose.

License

The code is released under the MIT license. See LICENSE for details.

Citation

If this work is useful for your research, please cite our paper.

@article{Cheng_Wang_Yang_Tan_2021,
  title={Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos}, 
  author={Cheng, Yu and Wang, Bo and Yang, Bo and Tan, Robby T.}, 
  journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
  year={2021}, 
  month={May}, 
  volume={35}, 
  number={2}, 
  pages={1157-1165}
}

gntcn's People

Contributors

3dpose avatar ddddwee1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gntcn's Issues

Questions about "points_eval.pkl"

Thank you again for sharing your amazing work. I'd like to know whether it's possible to share the points_eval.pkl file. More specifically, I'd like to know the format of the p2d and p3d (i.e. p2d,p3d = dataset[i]). Is there any preprocessing for the 2d joint data (e.g. normalize to 0-1, relative to root joint.. etc )?

I have been tried to use GnTCN on a custom image these days, but there are some parts that still need your guidance or clarification. Let me know if it's ok to email you to discuss more instead of posting issues here.

Thank you so much for your help and time.

How to apply to videos in the wild?

Hi, thanks for your great work. I have just entered into the 3D human pose. I admire your work.
Your paper estimates Z/f using weak perspective model, which SMAP estimates Zw/f, in which Z is the original depth, and f and w are the focal length and the image width both in pixels. Which is better? I think your estimation is in the real space and theirs is in a normalization space.

[1] SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation

About Network Architecture

i wonder if u can share the details of network architecture of gcns and tcns, it's hard to understand the implementation according to the codes.

How to train the GCN and TCN network?

Thanks for your awesome work. I really want to konw that how the provided pre-trained model was obtained?
I would be very glad if you could help me.

About the TCN part

Thanks for making your amazing work open in public!
I am a newcomer to this topic(human pose estimation) so I'm not familiar with reading the code.
And I have a question about the TCNs part.
Even after I've read your paper thoroughly, I am not quite sure about the TCN part of this model architecture.

On page 5 of your paper, there is equation (8) which handles the importance of the output from Root-TCN and velocity-TCN.
I wonder where I can find this part in the code.

Thanks!

Regarding variable names and Preparation of the pickle file for custom data

Thank you for the amazing work, I would like to gain a bit more clarity over the variable names that are used in the code and as to what they represent physically. Particularly I would like to know what p2d refers to and on checking the dimension of this variable it was about (500, 17, 2) what does the 500 (or the number in that position/ bsize variable ) represent.

Furthermore, if I would like to test this pipeline on the custom dataset (a video stream), how should I prepare the corresponding points_eval.pkl file. In other words, what is the format of the points_eval.pkl file for a given video stream.

Your help would be highly appreciated.

Thank You

affb,affpts

Dear authors, thank you for sharing such a great work with public. While when I am trying to write training code in other datasets,I don't know how to generate affb and affpts matrix.In your paper,I found out that affpts can be calculated using confidence heatmaps and affb can be calculated using PAFs.However,It is not clear where you generate affb and affpts,can you provide related code or some tools about generating affb and affpts?

3D coordinate for human pose

Thank you very much for sharing your great work.
I want to infer 3D human pose by giving new image or movie as an input using GnTCN. Is it possible in this repo?
I have limited knowledge about this technology. Sorry for basic question in advance.

RooTTCN Details

Thanks for sharing your amazing work.
I read in your paper that you are using the rootTCN model to get the absolute poses, I have some questions about that:

1- Actually, I tried to load the pre-trained model from the /ckpts/model_root folder using the Discriminator2D class as follows:

roottcn = networktcn.Discriminator2D() M.Saver(roottcn).restore ('./ckpts/model_root/')

However, I'm getting this error:
image

Is there a problem in the network or I have loaded it incorrectly. Also, I would like to know the size of the input, by visualising the pth model I see it is a size of (1024,34,3) and according to the paper the input is 2D poses of each person but in fact the Discriminator2D class doesn't need any parameters.. Could you, please, help me with this issue.

2- My second question is about the data format used to train this model to estimate Z/f. Did you use intrinsic parameters and poses given in Human3.6m for that?

Thank you in advance for your time.

The confidence scores of the part affinity field for bone-GCN

Thank you for sharing such amazing work. I have gone through the code and I have a question about how to get the confidence scores of the PAF for bone-GCN.

In the current eval_gt_h36m.py, I understand that we assume the confidence score of each joint is 1 in this case. So we get the
affpts = torch.ones(bsize,17,17).cuda() / 17. If we want to test our own custom images, we should use the real confidence scores of each joint from OpenPose.

However, when I looked at the outputs data from OpenPose, I couldn't find where we can get the confidence scores of the PAF. I assume this part: affb = torch.ones(bsize,16,16).cuda() / 16 is where we should put the actually scores. It would be great if you can share details about how to generate the confidence scores of the PAF.

Thank you so much for your time and help.

Pre-trained Model

Hi,

Can you share the pre-trained model again because the dropbox link says that the files has been deleted. Thank you!

2 different checkpoints for GCN

Thank you for sharing your work. I have one question:

  1. There are 2 checkpoints for GCN, model_gcn and model_gcnwild. What is the difference between the two? I notice that model_gcn is used for H36M evaluation and model_gcnwild is used for MuPoTs. Is it because H36M and MuPoTs have different keypoints definition?

velocity-tcn

Thanks for your work. I am trying to understand where the velocity-tcn is used in the code on e.g. mupots? I can only see the joint-tcn and root-tcn in the code. Can you point me in the right direction?

PCk_abs threshold

Hello,
First of all, thank you for publishing your code.

I am having some trouble understanding how absolute coordinates are evaluated in MupoTS.
What is the threshold value used when computing the PCK_abs?
Because I noticed that the threshold for PCK_rel is 150 mm and 250 mm for AP root. But I am not sure about PCK_abs.

Thanks in advance for your response and your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.