lmb-freiburg / hand3d Goto Github PK

View Code? Open in Web Editor NEW

793.0 793.0 252.0 246 KB

Network estimating 3D Handpose from single color images

License: GNU General Public License v2.0

Python 98.15% MATLAB 1.85%

cnn deep-learning hand-pose-estimation iccv pose-estimation tensorflow

hand3d's People

Contributors

Stargazers

Watchers

Forkers

caomw codeaudit lijian8 iakashpaul polytronicgr gijs jdc08161063 zhangxujinsh dvisockas benjamesbabala 1um lyk125 chrisyang hexiangquan soledad89 johannah rhythm92 divatemangesh shihyu justinhartbiz hellojialee ericparlin jacksparal jm2981858 runningdongxu xuetsing kp263 aaqa112233 petroffss xiangyun-hz zgsxwsdxg ieee820 wishinger-li george0049 harryoung kinect59 zhyj3038 tangaggie prasad9 greenteahua pobbylee sunpro wzhangneu melonwan jameskry labimage me2george rainbow-lin xiangdonglai policecar miaowu99 xjwxjw blankworld yeon3683 shaoli-huang balcicmarkovic marnim hhmaizi yafeigao donproc ai3dvision tonyfaith cortwave guozhongluo amanagarwal1993 miu200521358 withersdb styanddty spurra sysau jeffreyyihuang qiqika cccvt saifsayed khangprolxag jinliemma minshiu tsharmagw haoqiangshang liyifeng123 pandinosaurus ml-lab wagjub ashwinrajendraprasad echodarkstar zhulei1109 sushantjha8 manuconcepbrito angryhen my3jie jakebildy gunnerwang tranquanghuy0801 erasta wwwzhangshenzecn kekedan sailinglqh hiromi-nee vdumanov meatlf

hand3d's Issues

How can I specify the threshold?

Hi! I have a problem that while no hand in image, the script return weird coordinates, I think it can be resolved by setting the threshold on the result. Can you plz tell me how?

Run.py error

Hi,

I extracted data in the root folder. I am getting the following error.
File "run.py", line 47, in
keypoints_scoremap_tf, keypoint_coord3d_tf = net.inference(image_tf, hand_side_tf, evaluation)
File "/home/alex/dev/projects/hand3d-master/net.py", line 37, in inference
hand_mask = single_obj_scoremap(hand_scoremap)
File "/home/alex/dev/projects/hand3d-master/utils.py", line 246, in single_obj_scoremap
max_loc = find_max_location(scoremap_fg)
File "/home/alex/dev/projects/hand3d-master/utils.py", line 228, in find_max_location
xy_loc.append(tf.concat(0, [x_loc, y_loc]))
File "/home/alex/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1062, in concat
).assert_is_compatible_with(tensor_shape.scalar())
File "/home/alex/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 737, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible

Can I use training_lifting.py for my custom dataset?

I used another hand detector and keypoints detector. After that, I want to use your training_lifting.py to lift 2D to 3D coordinates. So, can I use only your posepriornetwork for 3D pose estimation. How can I do that?

And in your posenet network, what kinds of output of keypoints_map. As example, 21 keypoints for one image or all images. Can you show me your keypoints_map result as an example?

is there a custom layer?

I would like to start to port it to caffe(My goal is to implement just forward passes, not for training).

I have read your paper roughly and I wonder there are some custom layers which are not included in the existing TF or caffe layers.

Plus, I wonder, HandSegNet can be replaced with SSD hand detector as I thought HandSegNet is just for detecting hands and after that the hand-cropped patchs from the original image(not from the feature map of the last layer of the HandSegNet) is transferred to PoseNet.

Can you please add the training script as well?

can you please post any links to newer models for hand pose recognition that you might be aware of?

Asking because sometimes it can take a while for newer work to show prominently in google results

A question about handsegnet

Hey,I read the handsegnet part of your code,but after that ,I have a question.That is: in your paper,you said,
"Our HandSegNet is a smaller version of the network from Wei et al. [19] trained on our hand pose dataset. ".
But in your code,I think you just did the first half part like the method in "Fully Convolutional Networks for Semantic Segmentation"(FCN),the hand masks is not the same as heatmaps,at least in my opinion.Is my understanding right?

why there are 42 key points

hi, thanks for sharing your code.
i'm confused that there are 42 key points in a frame even though the corresponding image in 'color' directory only shows one hand
in fact, i'm creating my own data, but how could i get 42 points by only one hand contained image ?

How to use it with deeplearnjs?

I’d like to run hand3d using deeplearnjs but they accept .ckpt files see https://deeplearnjs.org/demos/mnist/mnist.html

Do you have those files or a script to convert the Pickle files to ckpt?

Translate normalized 3D coordinates to raw image

Hi! I have ran your demo code which provides both 2D and normalized 3D coordinates. The first ones can be easily translated to pixel coordinates and be overlaed in the original image. Is there any way to do the same for the 3D coordinates? i.e. translate the coordinates to the pixel scale to overlap x,y values on the original image.

A question about training on STB

Hello!
I'm trying to achieve the same results that you describe in your paper on the posenet stage when adding the STB dataset. However, the results are far from what you have achieved, and I can not find the reason why. I was hoping if you could enlighten me on this step.

After training with the RHD dataset using the pipeline you've published on posenet_training.py, I load BinaryDbReaderSTB with the following parameters:

dataset = BinaryDbReaderSTB(mode='training', batch_size=train_para['BATCH_SIZE'], shuffle=True, coord_uv_noise=True, hand_crop=True, crop_center_noise=True, use_wrist_coord=True)

And proceed to run the session passing the tensors:

_, loss_v = sess.run([train_op, loss])

The BinaryDbReaderSTB class was not modified and I've processed the data using the scripts you provided.

I then proceed to evaluate the training, using:

dataset = BinaryDbReaderSTB(mode='evaluation', shuffle=False, use_wrist_coord=True)

When executing with USE_RETRAINED=False, the metrics are as expected:
Average mean EPE: 18.581 pixels
However, when using my model trained with RHD+STB, the lowest mean EPE I got was ~40 pixels. Could you please point me to what I am forgetting?

I tried some ideas, as using different epochs combinations, tweaking the lr decay and different configurations on the data loader, but no effect.

Thank you for your attention

how to make mask image from the hand color image

With your RHD DB, you have mask images from color image. Could you tell me how to make mask image?
I would like to make the similar dataset from the custom dataset as well.

why the training_xxx.py the dataset is not updated?

i saw in your training code only use the first batch of the dataset is used for training, why?
or just code is sample to show the idea, we need to write our own training loop? @zimmerm

Is it possible to use the implementation off-line

Does it takes video as input and yields output accordingly?

How to get bounding box?

I would like to visualize a bounding box and visualize it on the full size input image. Is there a option to achieve that?
Thank you!

Joint coordinate is not right.

run.py contains code like this:

keypoints_scoremap_v, keypoint_coord3d_v = sess.run([hand_scoremap_tf, image_crop_tf, scale_tf, center_tf,keypoints_scoremap_tf, keypoint_coord3d_tf],feed_dict={image_tf: image_v})

The input image is ./data/img.png. but I get keypoint_coord3d_v like this:

[[  1.44893011e-06   2.47310300e-06   8.19431716e-06]
 [  1.90374464e-01  -2.14477921e+00  -1.78279579e-01]
 [ -3.34568620e-02  -1.62475693e+00   4.05245125e-02]
 [ -3.15526843e-01  -1.11640537e+00   3.22329104e-01]
 [ -5.08553386e-01  -3.76516193e-01   5.07125676e-01]
 [ -3.13133001e-01  -1.18032885e+00  -1.30266607e+00]
 [  1.37096226e-01  -1.25639629e+00  -1.34058976e+00]
 [  5.93820870e-01  -1.23831999e+00  -1.13210297e+00]
 [  5.98206878e-01  -9.30948436e-01  -4.23936307e-01]
 [ -4.06365603e-01  -5.10450840e-01  -9.82764661e-01]
 [ -1.59013331e-01  -7.13905573e-01  -1.39288712e+00]
 [  5.09743333e-01  -8.13335598e-01  -1.47191596e+00]
 [  7.26554811e-01  -4.88295704e-01  -6.81376576e-01]
 [ -3.66737843e-01  -9.66296196e-02  -9.01745081e-01]
 [ -2.10868478e-01  -2.65226990e-01  -1.37259626e+00]
 [  3.51663291e-01  -2.92807043e-01  -1.53445566e+00]
 [  6.96920574e-01  -4.46554348e-02  -8.14203858e-01]
 [ -3.76463950e-01   2.41669282e-01  -1.09918964e+00]
 [ -1.30924404e-01   1.84271917e-01  -1.45671225e+00]
 [  2.94114619e-01   2.42203340e-01  -1.51309526e+00]
 [  4.97256935e-01   4.71493840e-01  -8.21017146e-01]]

The distance between wrist and thumb tip (keypoint_coord3d_v[0] and keypoint_coord3d_v[1]) is 2.160583, a weird number. What's the unit?? millimeter and meter both are not correct.
Does it need a conversion?

Thanks

Training procedure

Hi @zimmerm
I have read your paper, and in supplementary material you wrote each section training procedure,

it means that you have trained each section separately not end to end?
for example for training pose prior net , you use the ground-truth heatmaps as input to network to get can pose & rot mat?
for training the posenet you have used groundtruth bounding box around the hand ?
I mean you did not have end to end training you have trained each module separately (handseg, posenet, poseprior) with ground truth input (not predictions from other modules as input) , right?

How to get the weight folder?

Hi!
Thanks for you great work!
I was confused that how you get the weight folder which I directly download from the data you showed on Readme. when I run the run.py, it shows "Loaded 102 variables from weights/posenet3d-rhd-stb-slr-finetuned.pickle", "Loaded 37 variables from weights/handsegnet-rhd.pickle". I was confused how you get those pickle files, cause after I finish training, I cannot find any model saved like those pickle files.
Thank you

Is there a trained caffe model?

Thanks for your works.
I have trained on tensorflow successfully, Do you have the trained caffe model?
If you do not have the plan to convert the model to caffe, do you have some suggestions that I should pay attention when trained on caffe?

Visible meaning in RHD

Hi @zimmerm ,

in RHD dataset what does visible mean? does it mean to be occluded or not
Or does it mean the existence of hand (joint) in image?

pretrained model

Hi @zimmerm,

Thanks for your paper,
There is no a pretrained model, could you please you provide that?

How can I use only './weights/posenet3d-rhd-stb-slr-finetuned.pickle'?

when I use only the model "'./weights/posenet3d-rhd-stb-slr-finetuned.pickle'",
I got one error "ValueError: cannot reshape array of size 1049600 into shape (2562,512)"
Hmmm, 2562 x 512 is 1311744 and 1049600 = 2050 x 512. What should I do?
It seems to be related to PosePrior/fc_rel0/weights's shape " 2050, 512 = 1049600.
I think I should call _inference_pose3d() as PosePrior, ViewPointNet's parameters are contained in the pickle and it was loaded already.

I don't use HandSegNet as below;
def inference(self, image, hand_side, evaluation):

    # detect keypoints in 2D
    keypoints_scoremap = self.inference_pose2d(image) 
    keypoints_scoremap = keypoints_scoremap[-1] 
    # estimate most likely 3D pose
    keypoint_coord3d = self._inference_pose3d(keypoints_scoremap, hand_side, evaluation) 
    # upsample keypoint scoremap 
    s = image.get_shape().as_list()
    keypoints_scoremap = tf.image.resize_images(keypoints_scoremap, (s[1], s[2])) 
    return image, keypoints_scoremap

Can anybody give me any hint? Please help me ;/

Detecting both hands

Hi.
Thanks for the great work!
Can this network detect both hands at a time?
I couldn't find a code that can pose estimate both hands simultaneously from run.py.
So I manually masked one side of image if both hands are present.
Would this be only option for detecting 2 hands? or would there be a more convenient way?
Thankyou

The split of STB when train on it.

Hello, thank you for your great work.

There are some questions of STB dataset, the datatset has left & right RGB images, when I saw the description of it (3000 frames of eval & 15000 frames of training) on your paper, but if I follow the setting, I found that it will create 3000 * 2 images of eval & 15000 * 2 of training (because of it has left and right).So I want to make sure is this (3000 * 2, 15000 * 2) correct?

Hope your answer.Best wishes.

Converting to Java (Android)

Hello!

I'm wokring on a academic project (bachelor) where I need to estimate a hands skeleton in 3D!
You work looks very interesting and I would like to use it as my backend for estimating hand skeletons.

I see it is built on top of tensorflow, so how difficult would this be to export to Java? E.g. Tensorflow Lite? I guess it still calls some of the native C++ functions.

Best regards,
Christian!

results on grey images?

i tried to run on grey channel-averaged images and got nonsense results as compared to the color versions (my use case has only grey images). does this mean the model is fined tuned for caucasian hue, and won't work on grey images nor on people of color?

Running the code

How do i run this code to detect my hands in realtime?

About uv coordinates in STB

In create_db.m, i think you should also add "anno_uv_l= anno_uv_l(1:2, :);" in order to get 2x21 coordinate matrix for each sample, just like anno_uv_r. If the third dimension indicates the visibility of each keypoint shouldn't anno_uv_r be of the same shape?

ValueError: Shape must be rank 1 but is rank 0 for 'import/single_obj_scoremap/Slice' (op: 'Slice') with input shapes: [4], [], [1].

I saved frozen model, then converted to a transformed model optimized for inference using Tensorflow TransformGraph, but when I now try to inspect the .pb file created by TransformGraph, I get the following error

Traceback (most recent call last):
  File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 427, in import_graph_def
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'import/single_obj_scoremap/Slice' (op: 'Slice') with input shapes: [4], [], [1].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../tf-coreml/utils/inspect_pb.py", line 58, in <module>
    inspect(sys.argv[1], sys.argv[2])
  File "../tf-coreml/utils/inspect_pb.py", line 12, in inspect
    tf.import_graph_def(graph_def)
  File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 431, in import_graph_def
    raise ValueError(str(e))
ValueError: Shape must be rank 1 but is rank 0 for 'import/single_obj_scoremap/Slice' (op: 'Slice') with input shapes: [4], [], [1].```

Updated weight files

hand3d-master/nets/ColorHandPose3DNetwork.py", line 52, in init
assert os.path.exists(file_name), "File not found."
AssertionError: File not found.

Missing weight files : ./weights/handsegnet-rhd.pickle', './weights/posenet3d-rhd-stb-slr-finetuned.pickle'

Discrepancy in index bone indices

According to STB dataset home page, indices of index finger joints are 13 to 16.

However in BinaryReaderDbSTB.py, the index_root_bone_length is computed from 11 and 12th joint indices at this line.

Am I missing something?

Video Usage

Can this model be used to analyze a video? Also is the snapshot linked in the READMe equal to the power achieved by training the model myself?

can the program detect no hand and two hand

how can the programmed detect two hands ?

thanks

Forward pass?

Hello,
what does the forward pass weights contains? Was it trained with the whole RHD dataset for handsegnet and posenet? Because it says minimal example should I download the dataset and retrain all?

STB xyz convert uv coordinat is false

I try your create_db.m, I find uv coordinate is false in 'BB', how should I do?

Please can someone share the link to the binary file that is needed to run the model

Can a pre-trained model be provided

Hello,
Thanks for your sharing.
I am working on an environment with only four 1080 titian GPUs, I am not sure whether it can be trained if it has other tasks running.
I am not sure whether the lmb-freiburg can provide a pre-trained model?
Thanks

ValueError: Shapes (2, 1) and () are incompatible

I wanted to play around with your work but im getting the above error when executing run.py
annysuggestions ? Its TF 1.1
tia

Processing Frames(images) in run.py is very slow on GPU (G3.4xLarge EC2 Instance)

Hello Team,

I tried to run the run.py file for the various images and the images that provided by you in the folder "data". But the execution time is more for each images. And processing image and getting result itself taking more time and its taking close to "6 seconds per image".

Even I tried tensorflow, and tensoflow-gpu and G3 AWS instance(It has below graphic card) , but no luck in execution time.

Graphic card details:

00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:1e.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)

(Main goal is to explore this runi.py for live webcam video , but please help me reduce the execution time)

Please find installed packages:

absl-py==0.7.1
bleach==1.5.0
cycler==0.10.0
html5lib==0.9999999
Markdown==3.0.1
matplotlib==1.5.3
numpy==1.16.2
Pillow==5.4.1
pkg-resources==0.0.0
protobuf==3.7.0
pyparsing==2.3.1
python-dateutil==2.8.0
pytz==2018.9
scipy==0.18.1
six==1.12.0
tensorflow-gpu==1.5.0
tensorflow-tensorboard==1.5.1
Werkzeug==0.15.1

Code where I am checking execution time in run.py:


        t = time.time()
        print("Intial taken : {:.3f}".format(time.time() - t))

        hand_scoremap_v, image_crop_v, scale_v, center_v,\
        keypoints_scoremap_v, keypoint_coord3d_v = sess.run([hand_scoremap_tf, image_crop_tf, scale_tf, center_tf,
                                                             keypoints_scoremap_tf, keypoint_coord3d_tf],
                                                            feed_dict={image_tf: image_v})

        print("time taken by network : {:.3f}".format(time.time() - t))

Result: for this below images

image_list.append('./data/img3.png')
image_list.append('./data/img4.png')
image_list.append('./data/img5.png')

Intial taken : 0.000
time taken by network : 5.537
Intial taken : 0.000
time taken by network : 5.338
Intial taken : 0.000
time taken by network : 5.340

Hello, I want to ask where is the supplementary information mentioned in the paper？

Could you provide the weights files whcih generated by pickle dump in protocol 2 ?

I had download the data files, weights_HandSegNet.pickle and weights_Pose3D.pickle.
I encountered the error "ValueError: unsupported pickle protocol: 3" when using the python 2.7.
The tensorflow package installed on my python 2.7,and I know that probably I reinstall tensorflow on python 3.5 maybe solve the issue.
If you help me provide the weights files generated by pickle dump in protocol 2, it will help me a lot.

Thanks

Problem during training

When running with ,,python training_handsegnet.py"
it reported error:
"
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./weights/cpm-model-mpii: Not found: ./weights; No such file or directory."

From the paper, some design idea come from open-pose, could you guide me where is the essential pre-downloaded model for training, e.g., cpm-model-mpii.

Thanks & Regards！
Neo