zhec / gta-im-dataset Goto Github PK

View Code? Open in Web Editor NEW

242.0 13.0 18.0 102.67 MB

[ECCV-20] 3D human scene interaction dataset: https://people.eecs.berkeley.edu/~zhecao/hmp/index.html

License: Other

Python 100.00%

3d-human-pose human-scene-interaction rgbd dataset eccv-2020

gta-im-dataset's People

Contributors

Stargazers

Watchers

Forkers

liuguoyou deeplearning2012 booool sttomato gitscade liangsmmm annopackage elmiar0642 baldrlector shuowang-ai reed4u jwen307 jie311 lotayou shxutoki nihonges yuxuansnow

gta-im-dataset's Issues

Which 8/2 scense are used for training?

Hi,
In the paper you mention:
"GTA-IM:We train and test our model on our collected dataset as described in Section 4.We split 8 scenes for training and 2 scenes for evaluation. We choose 21 out of 98 human joints provided from the dataset. We convert both the 3D path and the 3D pose into the camera coordinate frame for both training and evaluation."

The dataset is FPS5 and FPS30, which one you used to produce the results?
In case it's FPS5, there's about 12 scenes, did you do k splits to select 8/2 scenes?

Thanks

Blurry images

Some images are very blurry. I suspect there was some error in data generation?
For example, the last ~100 images in FPS-5/2020-06-09-16-53-33

joints_2d are not exact

Hi,
I just downloaded the dataset and ran some first inspections. On the very first frame I checked, I noticed that the joints_2d are a bit off. The sequence I am referring to is FPS-30/2020-05-20-21-13-13/

It looks to me like the projection matrix is wrong, so I computed the joints_2d from the 3d positions myself, but with the same result.
Here's a visualization of the keypoints.

Is there a way to find out which frames are annotated correctly and which ones are not? Or even better, can the annotations be fixed?

The question about your predicted future motion

Hi,
I saw that you predict the future 3d human poses in the current frame's camera coordinate. My question is that the predicted future 3D motion results shown in your paper are in the game/world coordinate, so we need additional camera future positions to transform this coordinate system. Actually, this transformation matrix or camera position information is unavailable in practice. Hence, how can your method predict the future human motion in the 3D scene and obtain the visualization shown in your paper?

Looking forward to your reply.
Dean.

Human IDs are different in different images?

I get the human id as

GTA-IM-Dataset/vis_skeleton_pcd.py

Lines 62 to 64 in 31d7baa

    
           human_id = id_map[ 
        
               np.clip(int(p[1]), 0, 1079), np.clip(int(p[0]), 0, 1919) 
        
           ]

It is the id at the head location.

It seems that the human id is not unique. Or is it because the head is occluded or out of the field of view?

Could you please provide code of the prediction part?

Hi,

In the paper, Long-term Human Motion Prediction with Scene Context, I want to ask the processing code for PROX dataset.

Thank you so much!

cut off images

There seem to be some samples with broken image content. The following is found in FPS-5/2020-06-21-19-42-55

What is the torso point?

Hi,
In the paper you mention: " Next,PathNetlearns to plan a 3D path towards each goal – the 3D locationsequence of the human center (torso)". I'm trying to figure out which point in your dataset you used to be the torso point?


LIMBS = [
    (0, 1),  # head_center -> neck
    (1, 2),  # neck -> right_clavicle
    (2, 3),  # right_clavicle -> right_shoulder
    (3, 4),  # right_shoulder -> right_elbow
    (4, 5),  # right_elbow -> right_wrist
    (1, 6),  # neck -> left_clavicle
    (6, 7),  # left_clavicle -> left_shoulder
    (7, 8),  # left_shoulder -> left_elbow
    (8, 9),  # left_elbow -> left_wrist
    (1, 10),  # neck -> spine0
    (10, 11),  # spine0 -> spine1
    (11, 12),  # spine1 -> spine2
    (12, 13),  # spine2 -> spine3
    (13, 14),  # spine3 -> spine4
    (14, 15),  # spine4 -> right_hip
    (15, 16),  # right_hip -> right_knee
    (16, 17),  # right_knee -> right_ankle
    (14, 18),  # spine4 -> left_hip
    (18, 19),  # left_hip -> left_knee
    (19, 20)  # left_knee -> left_ankle
]

The offical code of introduced method

Hi Zhe,
Recently, I need to use the method introduced in your paper 'Long-term Human Motion Prediction with Scene Context' as one of the baseline models and compare its more performances with ours. Since I did not find its official code, can you share it with us?

Thank you very much!

Could you share the data generation code

Could you share the data generation code and the procedure to automatically produce the raw captures and the files "info_frames.pickle"? I am very interested for that. And I have been struggling for this procedure for a long time. Thanks!

How to know the scene id of each video

Hi
I would like to know how to get the scene id of the 30FPS videos because I need to split these videos by scene.
Thanks.
Jingbo

Getting access to the dataset

The Training Code

Do you release your training code? How long does your model need to train? Could I use one 1080ti GPU to reproduce your experiments?

The background of these videos

Hi
Nice Work!
I want to know if you can provide background pictures for each video(without the person)
Thanks!
Jingbo

Permission for access to the dataset

How long it will be to get access to the dataset after sending the dataset request email?

how to get semantic segmentation labels of gta dataset?

how to get semantic segmentation labels of gta dataset?
thanks

Do you also provide camera pose for each frame?

Hi,
I'm working on a view synthesis related problem. I am hopeful your dataset might help me. Can you please confirm if you also provide pose (rotation and translation) of the camera for each frame?

Basically, I'm trying to warp a frame to the view of its next frame using camera pose, camera intrinsic matrix and depth map.

Also, it would be good if you can share a sample so that we can try that first and if useful, we can write a mail to you and purchase GTA V.

Obraining depth maps from raw data

Hey there,

I currently have some raw depth maps (.raw files) from the GTA game engine and I'm curious, how did you get the depth images that you have in samples?

I was able to convert them to (0, 255) color images with the following function, but still can't get the depth in meters like you have.

def convert_depth(img_path):
    abs_min = 1008334389
    # sky =  964405378
    abs_max = 1067424357
    x = np.fromfile(img_path, dtype='uint32')[4:]
    x = np.clip(x, abs_min, abs_max)
    x = ((x - abs_min) / (abs_max - abs_min))
    x = np.uint8(x * 255)
    x = x.reshape(1080, 1920)
    return x

Thanks!

Sequences splits of FPS-30 used for training and evaluation?

Thanks for your impressive work. In your paper, you split 10 scenes into 8/2 scenes for training and evaluation. So, I wonder how you split 102 sequences in the FPS-30 folder into the training set and testing set.

	human_id = id_map[
	np.clip(int(p[1]), 0, 1079), np.clip(int(p[0]), 0, 1919)
	]