icaruswizard / pixelwiseregression Goto Github PK

PyTorch release for paper "Pixel-wise Regression: 3D Hand Pose Estimation via Spatial-form Representation and Differentiable Decoder"

License: MIT License

Python 100.00%

computer-vision pattern-recognition hand-pose-estimation

pixelwiseregression's Introduction

Pixel-wise Regression for 3D hand pose estimation

PyTroch release of our paper:
Pixel-wise Regression: 3D Hand Pose Estimation via Spatial-form Representation and Differentiable Decoder
Xingyuan Zhang, Fuhai Zhang

If you find this repository useful, please make a reference in your paper:

@ARTICLE{zhang2022srnet,  
    author={Zhang, Xingyuan and Zhang, Fuhai},  
    journal={IEEE Transactions on Multimedia},   
    title={Differentiable Spatial Regression: A Novel Method for 3D Hand Pose Estimation},   
    year={2022},  
    volume={24},  
    number={},  
    pages={166-176},  
    doi={10.1109/TMM.2020.3047552}
}

Update: The paper has been acceptted at TMM! Title has changed as suggested by one of the reviewers. Please consider cite the new version. I did not upload the new version to Arxiv since I am not sure if it is allowed. If you know it is ok to do so, please contact me and I am glad to do the update.

Setup

conda env create -f env.yml
conda activate pixelwise

Dataset

All datasets should be placed in ./Data folder. After placing datasets correctly, run python check_dataset.py --dataset <dataset_name> to build the data files used to train.

NYU

Download the dataset from website.
Unzip the files to ./Data and rename the folder as NYU.

MSRA

Download the dataset from dropbox.
Unzip the files to ./Data and rename the folder as MSRA.

ICVL

Download the dataset from here.
Extract Training.tar.gz and Testing.tar.gz to ./Data/ICVL/Training and ./Data/ICVL/Testing respectively.

HAND17

Ask for the permission from the website and download.
Download center files from github release, and put them in Data/HAND17/.
Extract frame.zip and images.zip to ./Data/HAND17/. Your should end with a folder look like below:

HAND17/
  |
  |-- hands17_center_train.txt
  |
  |-- hands17_center_test.txt
  |
  |-- training/
  |     |
  |     |-- images/
  |     |
  |     |-- Training_Annotation.txt
  |
  |-- frame/
  |     |
  |     |-- images/
  |     |
  |     |-- BoundingBox.txt

Train

Run python train.py --dataset <dataset_name>, dataset_name can be chose from NYU, ICVL and HAND17.

For MSRA dataset, you should run python train_msra.py --subject <subject_id>.

Test

Run python test.py --dataset <dataset_name>.

For MSRA dataset, you should run python test_msra.py --subject <subject_id>.

Results

Results and pretrained models are available in github release. These pretrained models are under a CC BY 4.0 license.

pixelwiseregression's People

Contributors

Stargazers

Watchers

Forkers

aidonchuk theocarme

pixelwiseregression's Issues

MSRA

Hello, this is my first experience with 3d Hand Pose Estimation, I would like to know what is the difference between MSRA and other datasets, is it the different subjects？ And how to visualize the MSRA results since I have run python test_samples.py --dataset MSRA but it failed. Also, I'd like to know how to visualize the metrics(1.mean 3d error 2. fractions of the frame within distance) in the code.
Thank you very much.

FileNotFoundError: Data/MSRA/P1/5/joint.txt not found.

After extracting MSRA dataset, everything is availabe but this file Data/MSRA/P1/5/joint.txt not found. Where can I find this file

Poor results when infering

Hello,

I am currently trying to implement a program that use Pixelwise Regression to estimate the pose of at least two hands on a depth video stream (one frame at the time).
I am using the Stereo Labs ZED Mini camera.
Since Pixelwise can only estimate the pose of only one hand on a frame, I begin by using Mediapipe Hands (I know it is overkill, I may change later) to locate the hands and crop them on the frame. Then I resize the cropped hands to a size of 128x128.
Finally I can use this code :

    def estimate(self, img):
        
        label_img = Resize(size=[self.label_size, self.label_size])(img)
        label_img = tr.reshape(label_img, (1, 1, self.label_size, self.label_size))
        mask = tr.where(label_img > 0, 1.0, 0.0)

        
        img = img.to(self.device, non_blocking=True)
        label_img = label_img.to(self.device, non_blocking=True)
        mask = mask.to(self.device, non_blocking=True)

        self.heatmaps, self.depthmaps, hands_uvd = self.model(img, label_img, mask)[-1]
        hands_uvd = hands_uvd.detach().cpu().numpy()
        self.hands_uvd = hands_uvd

        return hands_uvd

To get this

After I looked into the values of img, label_img and mask when executing test_samples.py, I got the feeling that contrary to mine those matrix are normalized, which could be the cause of my poor results.
So is my feeling right and if so can you explain to me how I can do the same treatment on my matrix.

P.S. : I tested with both HAND17 and MSRA pretrained models.

Thank you for your work.

Why is the heatmap generated like this?

When I look at the source code, it is difficult to understand why it is designed this way, colud you please explain it, thank you very much~
""""
min_d = max(du + dv - 1, 0)
max_d = min(du, dv)
d = (max_d + min_d) / 2
b = du - d
c = dv - d
a = 1 + d - du - dv

heatmap[low_v, low_u] = a
heatmap[low_v, low_u + 1] = b
heatmap[low_v + 1, low_u] = c
heatmap[low_v + 1, low_u + 1] = d"""

Pre-trained models

Hi there,
How can I get the pre-trained models for MSRA and ICVL?

Also, in the paper, it is mentioned that 10 epochs was used (with data aug), but the train code has 50 epochs set as default. Which one is correct?

Thanks

Environment

Hello, thank you for your amazing implementation.
Is this PixelwiseRegression repo were developed on windows 10 ?

FileNotFoundError: [Errno 2] No such file or directory: 'Data/NYU/train/joint_data.mat'

Hello!How can I fix this Error as title?
I'm already download the NYU dataset and pretrain model(NYU_default_final.pt) but I confuse how can I get joint_data.mat?Thanks!

a naive question about error

Hello, thanks for your work, in the paper you said ''Note that, we do not use XYZ coordinates as output like some models did, because the UVD coordinates are more direct information from depthimages without a transformation influenced by the intrinsicparameters of the camera'', so, when you compute the 3D error, you uesd UVD coordinates or XYZ coordinates ?

ICVL train code have a question

NameError: name 'img' is not defined！ The img on line 216 of the training code is not declared。How to solve it？

X_center_test.txt file

Hi there, can you please suggest how can i generate X_center_test.txt file to infer on a custom 2D hand image?

icaruswizard / pixelwiseregression Goto Github PK

pixelwiseregression's Introduction

Pixel-wise Regression for 3D hand pose estimation

Setup

Dataset

NYU

MSRA

ICVL

HAND17

Train

Test

Results

pixelwiseregression's People

Contributors

Stargazers

Watchers

Forkers

pixelwiseregression's Issues

Recommend Projects

Recommend Topics

Recommend Org