isarandi / metrabs Goto Github PK

Estimate absolute 3D human poses from RGB images.

Home Page: https://arxiv.org/abs/2007.07227

License: MIT License

Python 52.20% Shell 0.65% Jupyter Notebook 47.15%

human-pose-estimation tensorflow2 deep-learning motion-capture computer-vision machine-learning tensorflow 3d-human-pose

metrabs's Introduction

MeTRAbs Absolute 3D Human Pose Estimator

This repository contains code for the following paper:

MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation
by István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe
IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM), Selected Best Works From Automated Face and Gesture Recognition 2020.

The repo has been updated to an improved version employed in the following paper:

Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats
by István Sárándi, Alexander Hermans, Bastian Leibe
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023.

News

[2023-08-02] Major codebase refactoring, models as described in our WACV'23 paper, several components factored out into separate repos, PyTorch support for inference, and more.
[2021-12-03] Added new backbones, including the ResNet family from ResNet-18 to ResNet-152
[2021-10-19] Released new best-performing models based on EfficientNetV2 and super fast ones using MobileNetV3, simplified API, multiple skeleton conventions, support for radial/tangential distortion, improved antialiasing, plausibility filtering and other new features.
[2021-10-19] Full codebase migrated to TensorFlow 2 and Keras
[2020-11-19] Oral presentation at the IEEE Conference on Automatic Face and Gesture Recognition (FG'20) (Talk Video and Slides)
[2020-11-16] Training and evaluation code now released along with dataset pre-processing scripts! Code and models upgraded to Tensorflow 2.
[2020-10-06] Journal paper accepted for publication in the IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM), Best of FG Special Issue
[2020-08-23] Short presentation at ECCV2020's 3DPW workshop (slides)
[2020-08-06] Our method has won the 3DPW Challenge

Inference Code

We release standalone TensorFlow models (SavedModel) to allow easy application in downstream research. After loading the model, you can run inference in a single line of Python without having this codebase as a dependency. Try it in action in Google Colab.

Gist of Usage

import tensorflow as tf
import tensorflow_hub as tfhub

model = tfhub.load('https://bit.ly/metrabs_l')
image = tf.image.decode_jpeg(tf.io.read_file('img/test_image_3dpw.jpg'))
pred = model.detect_poses(image)
pred['boxes'], pred['poses2d'], pred['poses3d']

See also the demos folder for more examples.

NOTE: The models can only be used for non-commercial purposes due to the licensing of the used training datasets.

Alternatively, you can try the experimental PyTorch version:

wget -O - https://bit.ly/metrabs_l_pt | tar -xzvf -
python -m metrabs_pytorch.scripts.demo_image --model-dir metrabs_eff2l_384px_800k_28ds_pytorch --image img/test_image_3dpw.jpg

Demos

./demo.py to auto-download the model, predict on a sample image and display the result with Matplotlib or PoseViz (if installed).
./demo_video.py filepath-or-url-to-video.mp4 to run inference on a video.

Documentation

How-to Guide with Examples
Full API Reference

Feature Summary

Several skeleton conventions supported through the keyword argument skeleton (e.g. COCO, SMPL, H36M)
Multi-image (batched) and single-image predictions both supported
Advanced, parallelized cropping logic behind the scenes
- Anti-aliasing through image pyramid and supersampling, gamma-correct rescaling.
- GPU-accelerated undistortion of pinhole perspective (homography) and radial/tangential lens distortions
Estimates returned in 3D world space (when calibration is provided) and 2D pixel space
Built-in, configurable test-time augmentation (TTA) with rotation, flip and brightness (keyword argument num_aug sets the number of TTA crops per detection)
Automatic suppression of implausible poses and non-max suppression on the 3D pose level (can be turned off)
Multiple backbones with different speed-accuracy trade-off (EfficientNetV2, MobileNetV3)

Training and Evaluation

See the docs directory.

BibTeX

If you find this work useful in your research, please cite it as:

@article{sarandi2021metrabs,
  title={{MeTRAbs:} Metric-Scale Truncation-Robust Heatmaps for Absolute 3{D} Human Pose Estimation},
  author={S\'ar\'andi, Istv\'an and Linder, Timm and Arras, Kai O. and Leibe, Bastian},
  journal={IEEE Transactions on Biometrics, Behavior, and Identity Science},
  year={2021},
  volume={3},
  number={1},
  pages={16-30},
  doi={10.1109/TBIOM.2020.3037257}
}

The above paper is an extended journal version of the FG'2020 conference paper:

@inproceedings{Sarandi20FG,
  title={Metric-Scale Truncation-Robust Heatmaps for 3{D} Human Pose Estimation},
  author={S\'ar\'andi, Istv\'an and Linder, Timm and Arras, Kai O. and Leibe, Bastian},
  booktitle={IEEE International Conference on Automatic Face and Gesture Recognition},
  pages={677-684},
  year={2020}
}

The newer large-scale models correspond to the WACV'23 paper:

@inproceedings{Sarandi2023dozens,
    author = {S\'ar\'andi, Istv\'an and Hermans, Alexander and Leibe, Bastian},
    title = {Learning {3D} Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats},
    booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year = {2023}
}

Contact

Code in this repository was written by István Sárándi (RWTH Aachen University) unless indicated otherwise.

Got any questions or feedback? Drop a mail to [email protected]!

metrabs's People

Contributors

Stargazers

Watchers

Forkers

tienhoangvan javiermedinamurua galvaowesley ericwang0701 inf800 swipswaps realdr4g0n finchzzz microbugtracker palmerkuo nish-sv sarsigmadelta 1165048017 wulongyuan eng-youssef hy-vision-lab zhanghongyong123456 okkann mkleeroi efan4ik dearyangyu knowledgecluster golfbeta wx-b learninguser valh55 omniscient-crowds mbaharan yozaii yanqi1811 kjw9899 aliscifp palaashagrawal ilearn-gamefaceai leoxs1 hxdaze bruinxiong zhengyiluo bniayochy wedizitize cohyun user-0123456789 young-yoon universewill senseetech gfcacace jaccen hitlyn zack-dev-cm suhlrich mertkiray josephla400 alvaroritter eddyfosman vittorio-caggiano mrxaxen liaw05 jendker sam-37 sshuster theneuronprogrammer wekingchen2077 eden-luminar terremat leestuartx

metrabs's Issues

run_yolo.sh

I have installed cudnn runtime version but i don't have cudnn folder anywhere in my ubuntu machine, which you require inside the make file for darknet.
tensorflow-gpu works fine on my machine which means cudnn is functioning.

Q1- Is there any alternative to run yolo for dataset preparation ?

The human z-axis is unstable

Hi @isarandi ,
The human z-axis is unstable. Squatting slightly predict that the z-axis is far away from the lens with instability, and the y-axis is offset at the same time. It looks normal in the front view, but it can be seen that the front and back drift and jump up and down in the side view. Standing feet should always touch the ground，Body z-axis drift and It seems abnormal for feet to be off the ground or buried on the ground. I think you should have a way to correct it to make the predict more accurate. Thank you!

yolov4

python -m scripts.video_inference --gt-assoc --dataset=3dpw --detector-path=./yolov4 --model-path=models/metrabs_multiperson_smpl --crops=5 --output-dir=./3dpw_predictions
python -m scripts.eval_3dpw --pred-path=./3dpw_predictions

i get the model of yolov4 by
" python save_model.py --weights ./data/yolov4.weights --output ./checkpoints/yolov4-416 --input_size 416 --model yolov4"
but have bug

can you provide the model about ./yolov4? thanks

get smpl pose

i have run it , the results are 3d keypoints and human box ?and i notice the smpl in the code ,
so,
Q1: i want to get the smpl pose for the results , how to do ? i find the result show likes smpl in project with visualize
I'm checking the project code ,
and can you give me some advices or some python module to get it ?

poseviz error

show nothing except camera

Test time augmentation

Hi @isarandi! I almost managed to create a real-time inference script for your model. In this moment I am working on the test time augmentation, but I can't understand two things:

For each bounding box, you do a test time augmentation. How do you combine the resulting skeletons?
I saw that in filter_poses you used a bone_dataset.joint_info to filter implausible poses. Where can I find it?

Thank you!

running inference with `max_detections` parameter results in failure

on some inputs (not all), using max_detections parameter results in failure:
(re-running inference without max_detections specified works just fine)

using parameters:

  model:metrabs_eff2l_y4_360
  fov:55
  batch:64
  maxpeople:4
  skeleton:smpl+head_30
  augmentations:6
  average:1
  suppress:1
  minconfidence:0.1
  iou:0.7

actual inference call:

  result = model.detect_poses(tensor,
    default_fov_degrees=args.fov,
    internal_batch_size=args.batch,
    num_aug=args.augmentations,
    average_aug=bool(args.average),
    skeleton=args.skeleton,
    detector_threshold=args.minconfidence,
    detector_nms_iou_threshold=args.iou,
    max_detections=args.maxpeople,
    antialias_factor=1,
    suppress_implausible_poses=bool(args.suppress)
  )

error log

  File "/home/vlado/.local/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/vlado/.local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'cond_2/TopKV2' defined at (most recent call last):
    File "/home/vlado/dev/motioniq/process/process.py", line 51, in loadModel
      model = tf.saved_model.load(args.model)
Node: 'cond_2/TopKV2'
1 root error(s) found.
  (0) INVALID_ARGUMENT:  input must have at least k columns. Had 3, needed 4
         [[{{node cond_2/TopKV2}}]]
         [[StatefulPartitionedCall/StatefulPartitionedCall/cond_3/else/_4381/cond_3/cond_7/then/_11169/cond_3/cond_7/map/while/loop_body_control/_18889/_1451]]
0 successful operations.
0 derived errors ignored. [Op:__inference_restored_function_body_457118]

p.s. i love your models - extremely well made!

Errors in dataset_preparation

I think you need to remove if-state in run_yolo.sh to your README.md's code run (in your darknet repository)

if [[ -f $IMG_PATHS_FILE ]]; then
    image_paths=$(cat "$IMG_PATHS_FILE")
else
    image_paths=$(find "$IMG_ROOT" -name '*.jpg')
fi

image_paths=$(find "$IMG_ROOT" -name '*.jpg')

`suppress_implausible_poses` cannot be used with external bbox detector

Hi @isarandi! I just noticed that when using an external 2D bbox detector, i.e. directly calling the tf.function estimate_poses_batched() on a saved model as opposed to detect_poses_batched(), it is not possible to enable plausability filtering of poses.

The reason is that, whilst the private Pose3dEstimator._estimate_poses_batched() method does expose the parameter suppress_implausible_poses, the public tf.function estimate_poses_batched() does not expose the parameter and has it hard-coded internally to False.

I assume this cannot be easily fixed without re-training and re-exporting the saved models, but I just wanted to document it here for the sake of completeness.

A question in the processing script of h36m~~~

The code from 75-77 in extract_frames_and_boxes_h36m.py may have mistake?

if not is_image_readable(dst_path):
      print(dst_path)
      imageio.imwrite(dst_path, frame, quality=95)

The function description is Save every 5th and 64th frame from a video as images.
Maybe the not is_image_readable should change to be is_image_readable?

How can i train custom dataset?

Hi, i have custom 3D pose estimation datasets and i want to train your model.

but unfortunately i don't know how to apply my dataset.
i have images, annotations, and camera instrinc, extrinc parameters.

detail information

There is a folder for each body and gender, and there are images in that folder.
image size is 1920 x 1080

annotation file has label of 24 joints and 3-dimension coordinates.

To put it simply, like object detection, there are images and corresponding annotation files.
So, I want you to teach me a sample that reflects a custom dataset.

Ultimately what I want to say is

want to know input shapes of tensor when i train.

for example, Yolo v4 needs, images and annotations.
in detail, annotations file format like this.

  1 0.716797 0.395833 0.216406 0.147222
  0 0.687109 0.379167 0.255469 0.158333
  1 0.420312 0.395833 0.140625 0.166667

or like CNN
tensor : BATCH, WIDTH, HEIGHT, CHANNELS
tensor : BATCH, num of classes

Would you please explain the training details about 3DPW challenges?

Hi @isarandi, congratulations on winning the 3DPW challenge!
I have great interst to your work. But I found some detals, like upper body crop, are missed. I want to know what extactly upper body crop means? Crop body upper the left/right hips? If so, I wonder why only crop the upper body? why not crop the lower body or randomly crop the body?

About the predicted skeleton and bone length

Hi, thank you for the excellent work. I would like to ask some questions:

I've run metrabs on some videos and noticed that the resulted skeletons have identical bone lengths across different people. Could you explain more about the processing (e.g. is there any post-processing or normalization step of the resulted skeleton?)
I run the model on my video using demo_video_batched.py (metrabs_eff2l_y4, batch size=16, and the image size is 1920x1080) but the program crashed after predicting roughly 2000 frames. Maybe there exists some memory leak inside the detector? (My specs is Nvidia 2080Ti 11GB with 48GB of RAM)

If it is possible to convert the saved model to tflite model?

I downloaded the single person from here

https://omnomnom.vision.rwth-aachen.de/data/metrabs/metrabs_singleperson_smpl.zip

And want to convert the saved model to tflite model by tflite converter :

model = tf.saved_model.load("./metrabs_singleperson_smpl/")
converter = tf.lite.TFLiteConverter.from_concrete_functions(model.__call__.concrete_functions)
tfmodel = converter.convert()

then it crashed and the log is

InvalidArgumentError: Input 2 of node StatefulPartitionedCall was passed float from unknown:0 incompatible with expected resource.

Did I miss something while converting?
Thanks for you good job ~~~

Does metro.py can be used for training and testing? what are the differences between metrabs.py and metro.py

Hi, I would like to produce root-relative only results by Metro, is this possible to run the training and testing using this repo instead of the old one? because the main.py will use Metrabs as the default model for the processes right?

Dataset with different set and sequence of keypoints

MPII has 16 keypoints and COCO 17 keypoints. Out of them 12 are common. And also they are indexed differently.
And you have used so many different datasets.

So my questions are:
1- How do you handle different datasets ?
2- Does most of the datasets provide 12 2d keypoints for shoulders, elbows, wrists, hips, knees and ankle ?
3-Is it very important to take care of subtle differences in the way different datasets label a certain keypoint ? Like shoulder from COCO might be a little OFF compared to shoulder from MPII ?

My approach to question 1 is as follows:
1- Create json file for every dataset in coco format and only consider 12 keypoints which are common with most of the datasets. (pairs of shoulders, elbows, wrists, hips, knees, ankles.). Create extra keypoints like neck and head-center at this stage. Neck head-center keypoint creation code will be specific to every dataset. Now we will have a standard json file for every dataset with 14 keypoints.
2- Rest of code like data loading, data transformations stays generic.
3- Network output will be fixed (say 14 heatmaps and 28 PAFs)
Would like to get your opinion and tips as you have already handled a lot of datasets.

PS: I am currently focusing of 2d keypoints.

Questions about batch size and learning rate

Hi,

Firstly, Thanks for your great work!
After studying code, I am wondering the setting of batch size and learning rate in main.py
(1) Why batch_size is divided by 2 in line 70 ?
(2) Why lr_schedule is divided by sqrt(n_replicas) in line 174 ?

Coarse-to-fine volumetric prediction for single-image 3D human pose: Supplementary material

hi, can you provide the paper [55]"Coarse-to-fine volumetric prediction for single-image 3D human pose: Supplementary material", which i can't find on the web, thanks

Failed to load model

Hi, Thanks for sharing your nice work.

I am trying to run demo.py to do the inference. However, it failed to load model:

WARNING:absl:Importing a function (__inference_blocks_34_layer_call_and_return_conditional_losses_131630) with ops with custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_blocks_29_layer_call_and_return_conditional_losses_167514) with ops with custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_blocks_29_layer_call_and_return_conditional_losses_167514) with ops with custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_blocks_29_layer_call_and_return_conditional_losses_167514) with ops with custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_blocks_16_layer_call_and_return_conditional_losses_103281) with ops with custom gradients. Will likely fail if a gradient is requested.
Traceback (most recent call last):
File "demo.py", line 118, in
main()
File "/home/anaconda3/envs/metrabs/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 551, in _recreate
raise ValueError("Unknown SavedObject type: %r" % kind)
ValueError: Unknown SavedObject type: None

The environment is:

cuda 11.0
cudnn 8.0.5
python 3.8
tensorflow-gpu: 2.4

Could you please give me any hint to sovel this problem?

Customize the joints comming from the smpl model

Hello,
Thank you very much for sharing this great project. I have done a lot of testing (inference only). I would like to customize the joints and add new ones, what you extract from smpl model. Can you please tell me where to look in the code?

Can you do it in real time

Can you capture the action in real time

Function "BoundedPool" may cause the data-process dead sometime

The training process for 'h36m' training dataset always lead the process dead~~
so I debug it and find that in "h36m.py" ,the code

 pool.apply_async(
                     make_efficient_example, (ex, new_image_relpath, further_expansion_factor),
                     callback=examples_container.append)

lead the CPU overloaded suddenly; and the GPU server machine will lose response at the same time

Even I set

pool = util.BoundedPool(None, 6) #120

But that doesn't help.
My GPU server's information is:

cat /proc/cpuinfo |more
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb00001b
cpu MHz         : 2201.000
cache size      : 56320 KB
physical id     : 0
siblings        : 44
core id         : 0
cpu cores       : 22
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 20

So I disable the apply_async function, and use

make_efficient_example(ex, new_image_relpath, further_expansion_factor)

Then everything is ok except that the data process stage is so slow T_T, more than twelve hours

update in 8.9 at 20:40
Once the images are processed, the multiprocess progress can be enabled, the data will be loaded much faster and not crash when running the training phase the second time

the dataset used in demo video?

I have noticed that in demo video, there are eyes/ears 3D keypoints

But , the datasets in dataset_preparation, 3D joints have no additonal keypoints named eyes/ears~~~
Could you please tell me which dataset is used in demo video~~~~
Thank you very much ^_^

human tracking ?

Have you considered adding human tracking ?

Training Questions

Hi :)
I am pleased with the results of demo_with_just_an_image() in your demo.py.

Q1. Do I run the command below to achieve the same performance as the metrabs_multiperson_smpl_combined model?

./main.py
--train --dataset=muco-17-150k --dataset2d=mpii-yolo --scale-recovery=metrabs
--epochs=24 --seed=1 --background-aug-prob=0.7 --occlude-aug-prob=0.3 \
--stride-test=32 --logdir=muco/metrabs_univ_seed1 --universal-skeleton

Q2. Do we only need MuCo and MPII as training data?

Q3. Where is a mpii_process_multiperson_train_set.patch for get_much.sh?

Q4. Code of original Metro uses a variety of training sets. (https://github.com/isarandi/metro-pose3d)
However, training code of this repo uses only h36m or mpi-inf-3dhp datasets.
Is the performance still good?

ex)  ./main.py  --train --dataset=mpi-inf-3dhp --train-on=trainval --epochs=27 --seed=1  --background-aug-prob=0.7 --universal-skeleton --logdir=3dhp/metro_univ_seed1

About “many” dataset and “roundrobin_sizes“

Hi, thank you for sharing this great project.

I would like to compare the performance of my trained model with your models uploaded in MODELS.md. So,

Q1. How many examples (cropped single persons) from each dataset are included in your "many” training dataset (Human3.6m+MuCo-3DHP+CMU-Panoptic+SAIL-VOS+SURREAL+AIST-Dance++)? I would like to know the composition of the number of examples.

Q2. What is the set value of “roundrobin_sizes“ when training your Metrabs models on the “many” dataset? Is it the same as the value set here (main.py) ?

Can load only metrabs_mob3l_y4t

Hi, thank you for the nice work.
Anyway I can load only the metrabs_mob3l_y4t model, if I try to load the metrabs_eff2l_y4 or the metrabs_eff2l_y4_360 models I get a lof of the following warning
WARNING - 2021-11-10 08:58:09,498 - function_deserialization - Importing a function (__inference_efficientnetv2-l_layer_call_and_return_conditional_losses_247692) with ops with custom gradients. Will likely fail if a gradient is requested.
and then the following error
ValueError: Unknown SavedObject type: None
I am not used to Tensorflow, so maybe I am getting something wrong

running colab with own person boxes

Hi,

I am using the colab to run pose estimation on images, for 2d pose estimation. For an input image, I would like to pass in person boxes, and the image, without having to pass any intrinsics. I have tried the two models: multiperson_smpl, and multiperson_smpl_combined. However, both don't allow me to pass in just the two inputs : image and person-boxes.

Can you direct me to a model/function that can do that?

Finetuning the public models

Hi, very cool repo!

I am trying to fine-tune one of your public models (e.g. metrabs_mob3l_y4t) on a new dataset.
Since these are the packaged multi-person models, I extracted the crop model itself.

However, using the checkpoint system implemented in main.py, it seems like the weights are not copied.
log_detailed.txt:

WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).variables.0
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).variables.1
...
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).variables.265

Can you point me in the right direction?

Metrabs TensorRT

Hey István,

congrats on these great results and thanks for providing an easy-to-use way to run your models, exceptional work :)
I really like the result I get and just like everyone else in the issues, I would like to run it in real-time.
My approach was to squeeze out some speed-ups using TensorRT and its new tf-trt capability. At least for the resnet-style models, I'd expect a speed-up on the order of 10x. According to Nvidia the same should hold true for efficientnet-type models.

A tensorflow SavedModel can directly be optimized and converted into a TensorRT model using just a few lines of code:

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt

converter = trt.TrtGraphConverterV2(input_saved_model_dir='models/eff2s_y4_short_sig')
converter.convert()
converter.save('models/eff2s_y4_trt')

In order for this conversion to know what to do, a default signature needs to be defined.
This can be achieved with the following:

import tensorflow as tf
model_folder = 'models/metrabs_eff2s_y4/'
out_fold='models/eff2s_y4_short_sig'
model = tf.saved_model.load(model_folder)

@tf.function()
def my_predict(my_prediction_inputs, **kwargs):
    prediction = model.detect_poses(my_prediction_inputs)
    return {"prediction": prediction['poses3d']}

my_signatures = my_predict.get_concrete_function(
   my_prediction_inputs=tf.TensorSpec([None,None, 3], dtype=tf.dtypes.uint8, name="image"))

tf.saved_model.save(model, out_fold, signatures=my_signatures)

(coincidentally, this might be a solution to the tensorflow-lite question in the issues? I haven't tried it, but just a hunch.)

Unfortunately, the conversion segfaults :D I know that this is rather an issue on Nvidia's side, but maybe we can still get this to work. I suspect that the augmentations you perform on the model in the Packaging Model section of your readme might be throwing tf-trt off.
Next, I tried to investigate this issue a little further by trying to look under the hood of the packaged SavedModel. I used
tensorflow's import_pb_to_tensorboard.py and tried to inspect the result in tensorboard.

$ python import_pb_to_tensorboard.py --model_dir models/eff2s_y4_short_sig/saved_model.pb --log_dir log
$ tensorboard --logdir log

Unfortunately again, tensorboard was not capable of displaying the computation graph and I suspect the reason is again the usage of tf.functions, but I am not sure.

What I would like to try is to convert one of your trained metrabs-models into TensorRT and take a look at the speed-up. Would it be possible for you to share a checkpoint file? or the un-augmented SavedModel as exported here: https://github.com/isarandi/metrabs/blob/master/src/main.py#L242 ? Maybe for metrabs_eff2l_y4, metrabs_eff2s_y4, metrabs_rn152_y4, and metrabs_rn18_y4 to see and compare how backbone and depth affect the inference time?

Possible ways to speed up ?

I ran this in google colab, the results were really great. It took about 16s for inference with the box provided. I was wondering if there are ways to speed this up significantly for single person pose estimation even for CPU optimized inference.

how to animation 3D character？

how to covert 3d pose point to euler angle and animation 3d character?

Model Speed

Hey,

this is an amazing pose estimation ai and it works realy good and is easy to use. I tried the new models and have some trouble. My goal is, to use your AI with 30 fps on 4 cameras. Currently I'm on ~22-25 fps with batching and the old model (metrabs_multiperson_smpl). So I tried the new ones. But they are much slower than metrabs_multiperson_smpl. Even the fastest one with less accuracy is barely faster than the old one. Additionaly the tensorflow load time is very high model = tf.saved_model.load(path). With the new models the system needs 18-62 seconds to be ready.

Images	metrabs_multiperson_smpl	metrabs_mob3l_y4	metrabs_eff2s_y4	metrabs_mob3l_y4t
Load Model	2,5 s	15 s	29 s	52 s
First Use	6 s	3 s	6,5 s	10 s
1	23 ms	28 ms	85 ms	250 ms
2	30 ms	28 ms	86 ms	250 ms
4	40 ms	37 ms	90 ms	260 ms
8	76 ms	58 ms	113 ms	300 ms
16	154 ms	116 ms	220 ms	600 ms

meassured call: pred = model.estimate_poses_batched(images, boxes=ragged_boxes, intrinsic_matrix=intrinsics)

1.
So my first question is if this is normal? With this values, "metrabs_multiperson_smpl" is the best model for real time applications. But this model is "outdated" because the functions does not match the new code and api.

2. Questions about the model "metrabs_multiperson_smpl":

The paper states that the AI can handle up to 511 crops pers second with a stride size of 32 and batch of 8. Is this model build with a stride size of 4?

If thats the case, I have to build the model myself again with an higher stride size. Can you provide any checkpoints or models with higher stride size with the same dataset and backbone as metrabs_multiperson_smpl? If not, on wich datasets was this model trained and how much epochs did it need?
If I have to train it again anyway, I can also use the new api for this model.

My System:
AMD Ryzen 5800X (8x3,8-4,7Ghz)
RTX 3080
32 GB RAM

Training code

Thank you for all of your hard work!

I'd like to convert metrabs from tensorflow to pytorch.

So I'd like to use mupots datasets to debug your training code metrabs.

However, the 'camera intrinsics.json' file is missing from the mupots dataset.

What is the location of the file?

Only Depth Estimation portion of network

Hello,
First I would like to say great work. Testing it on some custom images the model performs quite nicely and has some features I didn't think I would end up using. Further, I have a question in regards to the depth estimation only. I currently am building a model and am using YOLO v5 as the detector, and Open-Pose as the pose estimator, and was wondering if its possible to extract only the depth of the keypoints. Essentially, I was wondering if its possible to extract only the depth estimation portion of the network and feed it the keypoints I obtain from OpenPose and return the depths of these key points.

Thank you,
Anand

Correspondence relation between the elements of np_3d_pose and body parts

Hello, Mr.Sarandi.
Thank you for sharing your state of the art achievement!
I want to ask about correspondence relation between the elements of np_3d_pose and body parts(head, right hand, ...).
Could you tell me the clue to identify it? I want to calculate the angle of body part.

install the dependencies 404 error

Hi.
Thank you very much for sharing great work!
I cannot access to install the dependencies at README.
Could you fix it?

How can one convert absolute pose coordinates to root-relative poses?

@isarandi is it possible to convert the poses obtained from the model, which are with respect to camera, back into root-relative values?

InvalidArgumentError: Invalid PNG data, size 706833 [Op:DecodeJpeg]

Hi Thank for this model,

I wanted to test this model with my own pic. But i got a problem while i called the YOLO model to get 2D-Keypoints, Can u help?
Background: I transfer a depth maps to fake-rgb pictures and want to try the accuracy. Can u give me some suggestions?

how to get point position from poseviz space

i notice that intrinsic_matrix: a float32 Tensor of shape [3, 3], the camera's intrinsic matrix. If left at the default value, the intrinsic matrix is determined from the default_fov_degrees argument in API.md.
when i run https://github.com/isarandi/metrabs/blob/master/demo_video_batched.py with default args in model.detect_poses_batched(),.
so,
Q1: there is no intrinsic_matrix args, the result pred['poses3d'] is the 3d position in the camera coordinate, not the 3D world coordinate system ?
Q2: show the result poses3d in poseviz space, and i see the camera = poseviz.Camera.from_fov(55, frame_batch.shape[1:3]) and use the default args default_fov_degrees=55,
how to get the points world 3d position by fov_degress=55 in poseviz space

Meaning of Z

Excuse me.When I ran the demo, I found that the Z value was negative. What does this mean?
Can absolute depth be negative?

gpu low utilization

my GPU is 2080ti, when run code, show low gpu utilization, some methods to add it to add fps?

No tracking reasoning of hand joint

First I would like to say great work.The hand or palm joint follows the wrist and has no self motion tracking, Are there any options or modifications that can be turned on?
Is there any plan for an accelerated version based on TensorRT? Thank you!

The demo problem

Hello, I tried to run the demo.py, but the output is as follows：

The X11 connection broke: No error (code 0)
XIO: fatal IO error 22 (Invalid argument) on X server "localhost:10.0"
after 400 requests (400 known processed) with 0 events remaining.

And I found it happened after this line:
viz = poseviz.PoseViz(joint_names, joint_edges)

How to solve it?
My operating system is Ubuntu 18.04.5 LTS.

Results with 3DPW train

Hi, Thanks for the nice project. Is there by any chance you have results/model trained on 3DPW too?

GradCam

I'd love to try and implement GradCam on this project, however the pretrained models seem to be packed with the Tensorflow save instead of Keras. Do you have the checkpoints available for the evaluation or training code for finetuning/ custom implementations?

no run_person_detector_3dhp.sh scripts

Hello, Mr.Sarandi.
I am an undergraduate student in Korea.
I would like to use your MeTRAbs because it seems to be the best 3d pose reconstruction method.

I want to train 3dhp data set.
However, the run_person_detector_3dhp.sh script file required for training has not been uploaded,

So I downloaded the darknet you uploaded and entered the following command.
darknet/run_yolo.sh --image-paths-file 3dhp_images_for_detection.txt --out-path "$DATA_ROOT/3dhp/yolov3_person_detections.pkl"
However, there was no --image-paths-file arguments, but only --image-root.

I think that after you uploaded the darknet repository,
you added the --image-paths-file contents separately.

I want to train with 3dhp data, but I can't do more because the command is blocked.
Could you check it for me?

Thanks

Model download URL is not working

Thank you very much for open sourcing this amazing work. I just wanted to notify @isarandi that the model download URL is currently not working.

Individual preprocessing of each phase and different performance when running with TensorRT

Hi! Thank you for your great job.
Since I am trying to use it for a robotic project, I need to run it in real time.
As done in issue #26, I extracted the BackBone and the Heads from eff2l model and I run everything as follows:

YoloV4 with TensorRT
BackBone with TensorRT
Heads with Tensorflow

Now I have to combine all the pieces together, but I can't understand from the code how to preprocess the BackBone input. I have tried the following:

Crop the bounding box of the human from the original frame and resize it to 256x256 (as written in the paper, but gives the worst performances)
Set all pixels outside the box as 0 (medium performances)
Extract the bounding box, pad it along the two sides of the shortest dimension such that the human is in the center of the image, then resize to 256x256 (best performances)

but since I am obtaining worst performances with respect to the original model (the skeleton is less accurate and noisy), I wanted to know if this happens because of the conversion to ONNX or if I am doing something wrong.
Thank you again!

no build yet, install_dependencies.sh stalls on linux

I'd really like to see it working bit failed to build on linux yet.
Starting off clean with

~/Documents/metrabs$ ./install_dependencies.sh

.............................

gcc -pthread -B /home/jenscave/miniconda3/envs/metrabs/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/jenscave/miniconda3/envs/metrabs/lib/python3.8/site-packages/numpy/core/include -I../common -I/home/jenscave/miniconda3/envs/metrabs/include/python3.8 -c pycocotools/_mask.c -o build/temp.linux-x86_64-3.8/pycocotools/_mask.o -Wno-cpp -Wno-unused-function -std=c99
gcc: error: pycocotools/_mask.c: No such file or directory
error: command 'gcc' failed with exit status 1
make: *** [Makefile:3: all] Error 1
__