snap-research / arielai_youtube_3d_hands Goto Github PK

A dataset for 3D hand reconstruction in the wild.

License: Other

Python 100.00%

arielai_youtube_3d_hands's Issues

Has anyone downloaded this dataset? Can you share it? Thank you very much.

about the mesh vertices format(xyz or uvd)

Thanks for sharing the great dataset and relative work.
I have some dobuts abut the mesh 3d formats. In the load_db.py, the function just plots the projected mesh points using the mesh vertices' x and y ,e.g. plt.plot(vertices[:, 0], vertices[:, 1], 'o', color='green', markersize=1) in line 56, and the paper also indicates that
"meshes in the image coordinate system is better than pretraining in the canonical frame and estimating camera parameters."
So, did the json of the label about the dataset just use the data format( u,v d),d is the scaled depth, instead of format(x,y,z) in camera C.S. ? If so, how about the output of the net? did it also observe the same format(uvd)? And could you please share some method about normalization of mesh's gt while training the net?
Thank you again!

Problems in dataset request form

Hi, the link of the dataset request form (https://forms.gle/U385D7b7Qfrig9NR9) is not avaliable for me on Chrome. Please verify it again, thank you !

Wrong vertices in the video "r6gdTV1A2lY"

Thank you for providing this 3d hands dataset. I used the "load_db.py" to visualize the mesh. I found a offset of vertices in the video "r6gdTV1A2lY". As shown below for example, is there something wrong with the video resolution or wrong with the vertices of this video.

full annotations request

I'm currently working on your dataset.

In annotation files, there only exists part of annotation (not for all frames in video) and vertex coordinate is saved as 2.5D coordinate(x,y coordinate in image space).

So, I'm wondering if I can get whole annotation file that cover above mentioned problems. (Annotations for all frames, 3D coordinate for mesh vertices or MANO parameters)

Dataset Access

I have followed the guidelines provided on GitHub and submitted the dataset request form as instructed. However, I have not received a response regarding my request. It seems the email address is not valid now. Could you please provide an update on the status of my request or any additional information on the process?

How can I get the 21 hand key points ?

How to obtain the 3D joint locations from mesh vertices?

Should I use MANO model to fit to the provided mesh vertices and extract the resulting 3D joint locations? Or do you already have this data? Thanks!

[youtube] FN_DKBbuQUU download fail - Private video

A problem about "regex_search: could not find match for (?:v=|\/)([0-9A-Za-z_-]{11})."

Hi, when I run your program "python download_images.py --vid VIDEO_ID", there is a problem "regex_search: could not find match for (?:v=|/)([0-9A-Za-z_-]{11})."occured. It seems to be related to pytube, but after I modified it according to the suggestions on github(pytube/pytube#312 (comment)), it still doesn't work, so I cannot get the dataset. In addition, program "python download_images.py --set train" seem can work. The pytube version I use is "9.6.4", can you give me some help? Thank you very much!

invalid Data link

Hello, I am very interested in your work. However, The dataset link request that is likely to be invalid. Could you please give your datasets link.

I hope for your reply.

best

Qiu

Some problems about dataset

Hello, I filled out the form to get the data set yesterday,

I have filled in, but I haven’t received any message yet. I really need this data set. Can you help me?

Questions Regarding MANO fitting

Hi, I had some questions regarding the iterative fitting performed to create the dataset. After reading the paper my understanding is that it is split up into two separate parts. First is optimizing the camera parameters + hand orientation, and then the rest of the remaining parameters (poses and shape). I have the following questions

In section 3 you explain that you optimize for the pose, shape, camera translation, and camera scaling. Specifically for the camera parameters, you explain that you initialize it similar to Simplify-x. For clarity does this mean that you are estimating the extrinsic parameters (R,t)? or just the camera translation (t)? In Simplify-x the camera translation is initialized with the assumption that the person is standing straight and similar triangles are used to estimate the depth. Is the equivalent done in your case but with just the palm joints (non-mcp joints and wrist)?
Is the camera translation equivalent to the mesh translation from the camera? aka T_delta is how you translate the mesh away from the camera and s is how you scale to the world coordinates? If so why the choice to treat it as camera translation vs mesh translation? does the difference even matter?
How do you initialize the camera's intrinsic parameters? Are the camera center and focal length assumed to be known values? How can this be used to get a good estimate for unconstrained in the wild images? or are you using a weak perspective camera model?

Great work on this and I appreciate the help!

steps to get ground truth meshes via mano

Hi,
Thanks for the awesome work. I have another healthcare dataset that has a large domain shift compared to the datasets available in the wild.
I am wondering how to generate the GT meshes for my video to train the network as you did.
Are the steps
1.) Detect the 2d key points using openpose and crop the hand image.
2.) Using github repos like https://hassony2.github.io/obman.html estimate the shape and pose parameters from RGB images.
3) Pass these to MANO to get some GT mesh which will not be good.
4) Now keep varying the shape and pose parameters manually with initial estimates as the above until we are satisfied?

Is this how we do it? or am I missing something obvious ?
I am new to graphics and any help will be greatly appreciated.
Thanks a lot

the definition of mesh vertices

I think this repo is not maintained by the authors anymore.... But I hope somebody can help me.

I saw this issue, but still cannot understand the definition of mesh vertices in the annotation file.
(x,y) are clearly in the image pixel space. But what is the definition of the z coordinates? What is the metric (mm? cm?) for them?
It seems the z coordinates are normalized, but how?

How did you get the camera parameters?

Thanks for sharing the great youtube 3D dataset.
where did you put the camera parameters in Youtube 3D when project vertices from 3d to pixel planes?

how to get 2d landmark from the mesh vertices?

as title

Faces parameter for viz_sample

The viz_sample function in load_db.py accepts an optional faces parameter, how do I get the faces to provide to the function?

some feedback

nice work. I found there are some mistakes for downloading the video clips

"Su0mLlax_0s" and "PQSjVMOaRGc" actually have the same contents
Line 54 in video_list.py "5zNg5kRgLlc5kzqX4KHfT4". I think it should be two videos, that is "5zNg5kRgLlc" and "5kzqX4KHfT4"
The following video clips can not be downloaded with the default resolution, thus it probably needs special processing in the python file for easy usage

kmtmR5nC0S4
B-0aiNk9bXk
2wtgc5Pl8bA
d8LtOm2cZpk
rzZadl9uy8I
SCJYNApRo08

cheers
yangang

problems about annotations

In json file, there are two keys: "images" and "annotations". I found that 'image_id' in "annotations" are not consistent with "images". That is, no all 'image_id' in "images" are used. The json file can be concisely processed
another problem is that not all hands in the given frames have annotations. Most of them give 1 hand only. Is it available to label full hands for all the frames?

about get dataset

Hello, great project. I did not receive any email when write Dataset Request Form for academic research about a week ago.Hope your reply. email [email protected]

How did you crop the hand region

Hi,Thanks for the kindness sharing for the youtube 3D dataset.
As the data format you provide,The annotation did not contain hand bounding box information,so I am curious about the way you crop hand region from the original youtube video. Did you just use openpose to detect keypoints and use the tightest bounding box enclosing all hand joints?Or use some hand detection algorithm to crop the hand region?

snap-research / arielai_youtube_3d_hands Goto Github PK

arielai_youtube_3d_hands's Issues

Recommend Projects

Recommend Topics

Recommend Org