Code Monkey home page Code Monkey logo

hand3d's People

Contributors

zimmerm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hand3d's Issues

How can I specify the threshold?

Hi! I have a problem that while no hand in image, the script return weird coordinates, I think it can be resolved by setting the threshold on the result. Can you plz tell me how?

Run.py error

Hi,

I extracted data in the root folder. I am getting the following error.
File "run.py", line 47, in
keypoints_scoremap_tf, keypoint_coord3d_tf = net.inference(image_tf, hand_side_tf, evaluation)
File "/home/alex/dev/projects/hand3d-master/net.py", line 37, in inference
hand_mask = single_obj_scoremap(hand_scoremap)
File "/home/alex/dev/projects/hand3d-master/utils.py", line 246, in single_obj_scoremap
max_loc = find_max_location(scoremap_fg)
File "/home/alex/dev/projects/hand3d-master/utils.py", line 228, in find_max_location
xy_loc.append(tf.concat(0, [x_loc, y_loc]))
File "/home/alex/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1062, in concat
).assert_is_compatible_with(tensor_shape.scalar())
File "/home/alex/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 737, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible

Can I use training_lifting.py for my custom dataset?

I used another hand detector and keypoints detector. After that, I want to use your training_lifting.py to lift 2D to 3D coordinates. So, can I use only your posepriornetwork for 3D pose estimation. How can I do that?

And in your posenet network, what kinds of output of keypoints_map. As example, 21 keypoints for one image or all images. Can you show me your keypoints_map result as an example?

is there a custom layer?

I would like to start to port it to caffe(My goal is to implement just forward passes, not for training).

I have read your paper roughly and I wonder there are some custom layers which are not included in the existing TF or caffe layers.

Plus, I wonder, HandSegNet can be replaced with SSD hand detector as I thought HandSegNet is just for detecting hands and after that the hand-cropped patchs from the original image(not from the feature map of the last layer of the HandSegNet) is transferred to PoseNet.

A question about handsegnet

Hey,I read the handsegnet part of your code,but after that ,I have a question.That is: in your paper,you said,
"Our HandSegNet is a smaller version of the network from Wei et al. [19] trained on our hand pose dataset. ".
But in your code,I think you just did the first half part like the method in "Fully Convolutional Networks for Semantic Segmentation"(FCN),the hand masks is not the same as heatmaps,at least in my opinion.Is my understanding right?

why there are 42 key points

hi, thanks for sharing your code.
i'm confused that there are 42 key points in a frame even though the corresponding image in 'color' directory only shows one hand
in fact, i'm creating my own data, but how could i get 42 points by only one hand contained image ?

Translate normalized 3D coordinates to raw image

Hi! I have ran your demo code which provides both 2D and normalized 3D coordinates. The first ones can be easily translated to pixel coordinates and be overlaed in the original image. Is there any way to do the same for the 3D coordinates? i.e. translate the coordinates to the pixel scale to overlap x,y values on the original image.

A question about training on STB

Hello!
I'm trying to achieve the same results that you describe in your paper on the posenet stage when adding the STB dataset. However, the results are far from what you have achieved, and I can not find the reason why. I was hoping if you could enlighten me on this step.

After training with the RHD dataset using the pipeline you've published on posenet_training.py, I load BinaryDbReaderSTB with the following parameters:

dataset = BinaryDbReaderSTB(mode='training', batch_size=train_para['BATCH_SIZE'], shuffle=True, coord_uv_noise=True, hand_crop=True, crop_center_noise=True, use_wrist_coord=True)

And proceed to run the session passing the tensors:

_, loss_v = sess.run([train_op, loss])

The BinaryDbReaderSTB class was not modified and I've processed the data using the scripts you provided.

I then proceed to evaluate the training, using:

dataset = BinaryDbReaderSTB(mode='evaluation', shuffle=False, use_wrist_coord=True)

When executing with USE_RETRAINED=False, the metrics are as expected:
Average mean EPE: 18.581 pixels
However, when using my model trained with RHD+STB, the lowest mean EPE I got was ~40 pixels. Could you please point me to what I am forgetting?

I tried some ideas, as using different epochs combinations, tweaking the lr decay and different configurations on the data loader, but no effect.

Thank you for your attention

How to get bounding box?

I would like to visualize a bounding box and visualize it on the full size input image. Is there a option to achieve that?
Thank you!

Joint coordinate is not right.

run.py contains code like this:

keypoints_scoremap_v, keypoint_coord3d_v = sess.run([hand_scoremap_tf, image_crop_tf, scale_tf, center_tf,keypoints_scoremap_tf, keypoint_coord3d_tf],feed_dict={image_tf: image_v})

The input image is ./data/img.png. but I get keypoint_coord3d_v like this:

[[  1.44893011e-06   2.47310300e-06   8.19431716e-06]
 [  1.90374464e-01  -2.14477921e+00  -1.78279579e-01]
 [ -3.34568620e-02  -1.62475693e+00   4.05245125e-02]
 [ -3.15526843e-01  -1.11640537e+00   3.22329104e-01]
 [ -5.08553386e-01  -3.76516193e-01   5.07125676e-01]
 [ -3.13133001e-01  -1.18032885e+00  -1.30266607e+00]
 [  1.37096226e-01  -1.25639629e+00  -1.34058976e+00]
 [  5.93820870e-01  -1.23831999e+00  -1.13210297e+00]
 [  5.98206878e-01  -9.30948436e-01  -4.23936307e-01]
 [ -4.06365603e-01  -5.10450840e-01  -9.82764661e-01]
 [ -1.59013331e-01  -7.13905573e-01  -1.39288712e+00]
 [  5.09743333e-01  -8.13335598e-01  -1.47191596e+00]
 [  7.26554811e-01  -4.88295704e-01  -6.81376576e-01]
 [ -3.66737843e-01  -9.66296196e-02  -9.01745081e-01]
 [ -2.10868478e-01  -2.65226990e-01  -1.37259626e+00]
 [  3.51663291e-01  -2.92807043e-01  -1.53445566e+00]
 [  6.96920574e-01  -4.46554348e-02  -8.14203858e-01]
 [ -3.76463950e-01   2.41669282e-01  -1.09918964e+00]
 [ -1.30924404e-01   1.84271917e-01  -1.45671225e+00]
 [  2.94114619e-01   2.42203340e-01  -1.51309526e+00]
 [  4.97256935e-01   4.71493840e-01  -8.21017146e-01]]

The distance between wrist and thumb tip (keypoint_coord3d_v[0] and keypoint_coord3d_v[1]) is 2.160583, a weird number. What's the unit?? millimeter and meter both are not correct.
Does it need a conversion?

Thanks

--

Training procedure

Hi @zimmerm
I have read your paper, and in supplementary material you wrote each section training procedure,

  • it means that you have trained each section separately not end to end?

  • for example for training pose prior net , you use the ground-truth heatmaps as input to network to get can pose & rot mat?

  • for training the posenet you have used groundtruth bounding box around the hand ?

  • I mean you did not have end to end training you have trained each module separately (handseg, posenet, poseprior) with ground truth input (not predictions from other modules as input) , right?

How to get the weight folder?

Hi!
Thanks for you great work!
I was confused that how you get the weight folder which I directly download from the data you showed on Readme. when I run the run.py, it shows "Loaded 102 variables from weights/posenet3d-rhd-stb-slr-finetuned.pickle", "Loaded 37 variables from weights/handsegnet-rhd.pickle". I was confused how you get those pickle files, cause after I finish training, I cannot find any model saved like those pickle files.
Thank you

Is there a trained caffe model?

Thanks for your works.
I have trained on tensorflow successfully, Do you have the trained caffe model?
If you do not have the plan to convert the model to caffe, do you have some suggestions that I should pay attention when trained on caffe?

Visible meaning in RHD

Hi @zimmerm ,

in RHD dataset what does visible mean? does it mean to be occluded or not
Or does it mean the existence of hand (joint) in image?

pretrained model

Hi @zimmerm,

Thanks for your paper,
There is no a pretrained model, could you please you provide that?

How can I use only './weights/posenet3d-rhd-stb-slr-finetuned.pickle'?

when I use only the model "'./weights/posenet3d-rhd-stb-slr-finetuned.pickle'",
I got one error "ValueError: cannot reshape array of size 1049600 into shape (2562,512)"
Hmmm, 2562 x 512 is 1311744 and 1049600 = 2050 x 512. What should I do?
It seems to be related to PosePrior/fc_rel0/weights's shape " 2050, 512 = 1049600.
I think I should call _inference_pose3d() as PosePrior, ViewPointNet's parameters are contained in the pickle and it was loaded already.

I don't use HandSegNet as below;
def inference(self, image, hand_side, evaluation):

    # detect keypoints in 2D
    keypoints_scoremap = self.inference_pose2d(image) 
    keypoints_scoremap = keypoints_scoremap[-1] 
    # estimate most likely 3D pose
    keypoint_coord3d = self._inference_pose3d(keypoints_scoremap, hand_side, evaluation) 
    # upsample keypoint scoremap 
    s = image.get_shape().as_list()
    keypoints_scoremap = tf.image.resize_images(keypoints_scoremap, (s[1], s[2])) 
    return image, keypoints_scoremap

Can anybody give me any hint? Please help me ;/

Detecting both hands

Hi.
Thanks for the great work!
Can this network detect both hands at a time?
I couldn't find a code that can pose estimate both hands simultaneously from run.py.
So I manually masked one side of image if both hands are present.
Would this be only option for detecting 2 hands? or would there be a more convenient way?
Thankyou

The split of STB when train on it.

Hello, thank you for your great work.

There are some questions of STB dataset, the datatset has left & right RGB images, when I saw the description of it (3000 frames of eval & 15000 frames of training) on your paper, but if I follow the setting, I found that it will create 3000 * 2 images of eval & 15000 * 2 of training (because of it has left and right).So I want to make sure is this (3000 * 2, 15000 * 2) correct?

Hope your answer.Best wishes.

Converting to Java (Android)

Hello!

I'm wokring on a academic project (bachelor) where I need to estimate a hands skeleton in 3D!
You work looks very interesting and I would like to use it as my backend for estimating hand skeletons.

I see it is built on top of tensorflow, so how difficult would this be to export to Java? E.g. Tensorflow Lite? I guess it still calls some of the native C++ functions.

Best regards,
Christian!

results on grey images?

i tried to run on grey channel-averaged images and got nonsense results as compared to the color versions (my use case has only grey images). does this mean the model is fined tuned for caucasian hue, and won't work on grey images nor on people of color?

About uv coordinates in STB

In create_db.m, i think you should also add "anno_uv_l= anno_uv_l(1:2, :);" in order to get 2x21 coordinate matrix for each sample, just like anno_uv_r. If the third dimension indicates the visibility of each keypoint shouldn't anno_uv_r be of the same shape?

ValueError: Shape must be rank 1 but is rank 0 for 'import/single_obj_scoremap/Slice' (op: 'Slice') with input shapes: [4], [], [1].

I saved frozen model, then converted to a transformed model optimized for inference using Tensorflow TransformGraph, but when I now try to inspect the .pb file created by TransformGraph, I get the following error

Traceback (most recent call last):
  File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 427, in import_graph_def
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'import/single_obj_scoremap/Slice' (op: 'Slice') with input shapes: [4], [], [1].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../tf-coreml/utils/inspect_pb.py", line 58, in <module>
    inspect(sys.argv[1], sys.argv[2])
  File "../tf-coreml/utils/inspect_pb.py", line 12, in inspect
    tf.import_graph_def(graph_def)
  File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/jyoti/code/hand3d/venv-latest/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 431, in import_graph_def
    raise ValueError(str(e))
ValueError: Shape must be rank 1 but is rank 0 for 'import/single_obj_scoremap/Slice' (op: 'Slice') with input shapes: [4], [], [1].```

Updated weight files

hand3d-master/nets/ColorHandPose3DNetwork.py", line 52, in init
assert os.path.exists(file_name), "File not found."
AssertionError: File not found.

Missing weight files : ./weights/handsegnet-rhd.pickle', './weights/posenet3d-rhd-stb-slr-finetuned.pickle'

Discrepancy in index bone indices

According to STB dataset home page, indices of index finger joints are 13 to 16.

However in BinaryReaderDbSTB.py, the index_root_bone_length is computed from 11 and 12th joint indices at this line.

Am I missing something?

Video Usage

Can this model be used to analyze a video? Also is the snapshot linked in the READMe equal to the power achieved by training the model myself?

Forward pass?

Hello,
what does the forward pass weights contains? Was it trained with the whole RHD dataset for handsegnet and posenet? Because it says minimal example should I download the dataset and retrain all?

Can a pre-trained model be provided

Hello,
Thanks for your sharing.
I am working on an environment with only four 1080 titian GPUs, I am not sure whether it can be trained if it has other tasks running.
I am not sure whether the lmb-freiburg can provide a pre-trained model?
Thanks

Processing Frames(images) in run.py is very slow on GPU (G3.4xLarge EC2 Instance)

Hello Team,

I tried to run the run.py file for the various images and the images that provided by you in the folder "data". But the execution time is more for each images. And processing image and getting result itself taking more time and its taking close to "6 seconds per image".

Even I tried tensorflow, and tensoflow-gpu and G3 AWS instance(It has below graphic card) , but no luck in execution time.

Graphic card details:

00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:1e.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)

(Main goal is to explore this runi.py for live webcam video , but please help me reduce the execution time)

Please find installed packages:

absl-py==0.7.1
bleach==1.5.0
cycler==0.10.0
html5lib==0.9999999
Markdown==3.0.1
matplotlib==1.5.3
numpy==1.16.2
Pillow==5.4.1
pkg-resources==0.0.0
protobuf==3.7.0
pyparsing==2.3.1
python-dateutil==2.8.0
pytz==2018.9
scipy==0.18.1
six==1.12.0
tensorflow-gpu==1.5.0
tensorflow-tensorboard==1.5.1
Werkzeug==0.15.1

Code where I am checking execution time in run.py:


        t = time.time()
        print("Intial taken : {:.3f}".format(time.time() - t))

        hand_scoremap_v, image_crop_v, scale_v, center_v,\
        keypoints_scoremap_v, keypoint_coord3d_v = sess.run([hand_scoremap_tf, image_crop_tf, scale_tf, center_tf,
                                                             keypoints_scoremap_tf, keypoint_coord3d_tf],
                                                            feed_dict={image_tf: image_v})

        print("time taken by network : {:.3f}".format(time.time() - t))

Result: for this below images

image_list.append('./data/img3.png')
image_list.append('./data/img4.png')
image_list.append('./data/img5.png')
Intial taken : 0.000
time taken by network : 5.537
Intial taken : 0.000
time taken by network : 5.338
Intial taken : 0.000
time taken by network : 5.340

Could you provide the weights files whcih generated by pickle dump in protocol 2 ?

I had download the data files, weights_HandSegNet.pickle and weights_Pose3D.pickle.
I encountered the error "ValueError: unsupported pickle protocol: 3" when using the python 2.7.
The tensorflow package installed on my python 2.7,and I know that probably I reinstall tensorflow on python 3.5 maybe solve the issue.
If you help me provide the weights files generated by pickle dump in protocol 2, it will help me a lot.

Thanks

Problem during training

When running with ,,python training_handsegnet.py"
it reported error:
"
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./weights/cpm-model-mpii: Not found: ./weights; No such file or directory."

From the paper, some design idea come from open-pose, could you guide me where is the essential pre-downloaded model for training, e.g., cpm-model-mpii.

Thanks & Regards!
Neo

how fast is it?

can I get any inference speed measurments? can it be poosible to run on TX2 in real time?

Questions about quantitative comparison

Hello, thank you very much for your outstanding work!

My question is, the GT root node of the RHD data set in your paper is at the position of the wrist. But the root node of the predicted hand pose is in the center of the palm.
So, how do you ensure the fairness of quantitative comparison?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.