Code Monkey home page Code Monkey logo

pixelwiseregression's Introduction

Pixel-wise Regression for 3D hand pose estimation

PyTroch release of our paper:
Pixel-wise Regression: 3D Hand Pose Estimation via Spatial-form Representation and Differentiable Decoder
Xingyuan Zhang, Fuhai Zhang

If you find this repository useful, please make a reference in your paper:

@ARTICLE{zhang2022srnet,  
    author={Zhang, Xingyuan and Zhang, Fuhai},  
    journal={IEEE Transactions on Multimedia},   
    title={Differentiable Spatial Regression: A Novel Method for 3D Hand Pose Estimation},   
    year={2022},  
    volume={24},  
    number={},  
    pages={166-176},  
    doi={10.1109/TMM.2020.3047552}
}

Update: The paper has been acceptted at TMM! Title has changed as suggested by one of the reviewers. Please consider cite the new version. I did not upload the new version to Arxiv since I am not sure if it is allowed. If you know it is ok to do so, please contact me and I am glad to do the update.

Setup

conda env create -f env.yml
conda activate pixelwise

Dataset

All datasets should be placed in ./Data folder. After placing datasets correctly, run python check_dataset.py --dataset <dataset_name> to build the data files used to train.

NYU

  1. Download the dataset from website.
  2. Unzip the files to ./Data and rename the folder as NYU.

MSRA

  1. Download the dataset from dropbox.
  2. Unzip the files to ./Data and rename the folder as MSRA.

ICVL

  1. Download the dataset from here.
  2. Extract Training.tar.gz and Testing.tar.gz to ./Data/ICVL/Training and ./Data/ICVL/Testing respectively.

HAND17

  1. Ask for the permission from the website and download.
  2. Download center files from github release, and put them in Data/HAND17/.
  3. Extract frame.zip and images.zip to ./Data/HAND17/. Your should end with a folder look like below:
HAND17/
  |
  |-- hands17_center_train.txt
  |
  |-- hands17_center_test.txt
  |
  |-- training/
  |     |
  |     |-- images/
  |     |
  |     |-- Training_Annotation.txt
  |
  |-- frame/
  |     |
  |     |-- images/
  |     |
  |     |-- BoundingBox.txt

Train

Run python train.py --dataset <dataset_name>, dataset_name can be chose from NYU, ICVL and HAND17.

For MSRA dataset, you should run python train_msra.py --subject <subject_id>.

Test

Run python test.py --dataset <dataset_name>.

For MSRA dataset, you should run python test_msra.py --subject <subject_id>.

Results

Results and pretrained models are available in github release. These pretrained models are under a CC BY 4.0 license.

pixelwiseregression's People

Contributors

icaruswizard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pixelwiseregression's Issues

MSRA

Hello, this is my first experience with 3d Hand Pose Estimation, I would like to know what is the difference between MSRA and other datasets, is it the different subjects? And how to visualize the MSRA results since I have run python test_samples.py --dataset MSRA but it failed. Also, I'd like to know how to visualize the metrics(1.mean 3d error 2. fractions of the frame within distance) in the code.
Thank you very much.

Poor results when infering

Hello,

I am currently trying to implement a program that use Pixelwise Regression to estimate the pose of at least two hands on a depth video stream (one frame at the time).
I am using the Stereo Labs ZED Mini camera.
Since Pixelwise can only estimate the pose of only one hand on a frame, I begin by using Mediapipe Hands (I know it is overkill, I may change later) to locate the hands and crop them on the frame. Then I resize the cropped hands to a size of 128x128.
Finally I can use this code :

    def estimate(self, img):
        
        label_img = Resize(size=[self.label_size, self.label_size])(img)
        label_img = tr.reshape(label_img, (1, 1, self.label_size, self.label_size))
        mask = tr.where(label_img > 0, 1.0, 0.0)

        
        img = img.to(self.device, non_blocking=True)
        label_img = label_img.to(self.device, non_blocking=True)
        mask = mask.to(self.device, non_blocking=True)

        self.heatmaps, self.depthmaps, hands_uvd = self.model(img, label_img, mask)[-1]
        hands_uvd = hands_uvd.detach().cpu().numpy()
        self.hands_uvd = hands_uvd

        return hands_uvd

To get this result

After I looked into the values of img, label_img and mask when executing test_samples.py, I got the feeling that contrary to mine those matrix are normalized, which could be the cause of my poor results.
So is my feeling right and if so can you explain to me how I can do the same treatment on my matrix.

P.S. : I tested with both HAND17 and MSRA pretrained models.

Thank you for your work.

Why is the heatmap generated like this?

When I look at the source code, it is difficult to understand why it is designed this way, colud you please explain it, thank you very much~
""""
min_d = max(du + dv - 1, 0)
max_d = min(du, dv)
d = (max_d + min_d) / 2
b = du - d
c = dv - d
a = 1 + d - du - dv

heatmap[low_v, low_u] = a
heatmap[low_v, low_u + 1] = b
heatmap[low_v + 1, low_u] = c
heatmap[low_v + 1, low_u + 1] = d"""

Pre-trained models

Hi there,
How can I get the pre-trained models for MSRA and ICVL?

Also, in the paper, it is mentioned that 10 epochs was used (with data aug), but the train code has 50 epochs set as default. Which one is correct?

Thanks

Environment

Hello, thank you for your amazing implementation.
Is this PixelwiseRegression repo were developed on windows 10 ?

a naive question about error

Hello, thanks for your work, in the paper you said ''Note that, we do not use XYZ coordinates as output like some models did, because the UVD coordinates are more direct information from depthimages without a transformation influenced by the intrinsicparameters of the camera'', so, when you compute the 3D error, you uesd UVD coordinates or XYZ coordinates ?

X_center_test.txt file

Hi there, can you please suggest how can i generate X_center_test.txt file to infer on a custom 2D hand image?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.