cvlab-epfl / segmentation-driven-pose Goto Github PK

Segmentation-driven 6D Object Pose Estimation. CVPR 2019.

License: GNU General Public License v3.0

Python 100.00%

segmentation-driven-pose's Introduction

Overview

Please find the latest version at WDR-Pose.

This repository contains the code for the paper Segmentation-driven 6D Object Pose Estimation. Yinlin Hu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann. CVPR. 2019. [Paper]

The most recent trend in estimating the 6D pose of rigid objects has been to train deep networks to either directly regress the pose from the image or to predict the 2D locations of 3D keypoints, from which the pose can be obtained using a PnP algorithm. In both cases, the object is treated as a global entity, and a single pose estimate is computed. As a consequence, the resulting techniques can be vulnerable to large occlusions.

In this paper, we introduce a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations. We then use a predicted measure of confidence to combine these pose candidates into a robust set of 3D-to-2D correspondences, from which a reliable pose estimate can be obtained. We outperform the state-of-the-art on the challenging Occluded-LINEMOD and YCB-Video datasets, which is evidence that our approach deals well with multiple poorly-textured objects occluding each other. Furthermore, it relies on a simple enough architecture to achieve real-time performance.

Figure 1: Overall workflow of our method. Our architecture has two streams: One for object segmentation and the other to regress 2D keypoint locations. These two streams share a common encoder, but the decoders are separate. Each one produces a tensor of a spatial resolution that defines an SxS grid over the image. The segmentation stream predicts the label of the object observed at each grid location. The regression stream predicts the 2D keypoint locations for that object.

Figure 2: Occluded-LINEMOD results. In each column, we show, from top to bottom: the foreground segmentation mask, all 2D reprojection candidates, the selected 2D reprojections, and the final pose results. Our method generates accurate pose estimates, even in the presence of large occlusions. Furthermore, it can process multiple objects in real time.

How to Use

Step 1

Download the datasets.

Occluded-LINEMOD: https://hci.iwr.uni-heidelberg.de/vislearn/iccv2015-occlusion-challenge/

YCB-Video: https://rse-lab.cs.washington.edu/projects/posecnn/

Step 2

Download the pretrained model.

Occluded-LINEMOD: https://1drv.ms/u/s!ApOY_gOHw8hLbbdmVZgnqk30I5A

YCB-Video: https://1drv.ms/u/s!ApOY_gOHw8hLbLl4i8CAXD6LGuU

Download and put them into ./model directory.

Due to commercial problem, we can only provide the code for inference. However, it is straightforward to implement the training part according to our paper and this repository.

Step 3

Prepare the input file list using gen_filelist.py.

Step 4

Run test.py and explore it.

Citing

@inproceedings{hu2019segpose,
  title={Segmentation-driven 6D Object Pose Estimation},
  author={Yinlin Hu and Joachim Hugonot and Pascal Fua and Mathieu Salzmann},
  booktitle={CVPR},
  year={2019}
}

segmentation-driven-pose's People

Contributors

Stargazers

Watchers

Forkers

demingwang galenooo staceycy jarygrace schliffen trendingtechnology chomolungma poodarchu cxt98 tristramar gracejary jiahengzhao futurev 18381304961 gill-wang liupenglei auto-driving-competition anshul-gupta24 571502680 admoreau darkgeekms whwme paroj lingweizhang renqiangnwpu dunerdou hhhlllyyy kongkong-1 yusong886922 nivesh48 cosmoshua szymonkulpinski mierfolg ezxzeng phantomxm2021 yunsoopk peterrsong lihenghitcs

segmentation-driven-pose's Issues

I can not download trained model from the url

I can not download trained model from the url,why?

Training data in YCB-video image

Hi，

How many training data be generated from 80K synthetic in YCB dataset？ Did all the 80K synthetic be used， or just a part of them？

./model directory

Download and put them into ./model directory.

Can you explain in detail please?

Coordinate system convention of the ground truth poses

Hi,
One more question related to original LineMOD. The ground truth poses in Occluded-LineMOD dataset follows the OpenGL Coordinate system convention (camera viewing direction is the negative Z-axis) as mention in the section 2.2 of this document. But I am not sure of the coordinate system convention used in the original LineMOD dataset. For example one of the transformation file like looks like
-13.0792
-7.83575
104.177
The values are in centimetres.
so, that's ~1 meters along the positive z-axis.
This make me think that the poses are annotated with OpenCV
coordinate system convention. But my rendering pipeline is not working correctly if I follow this assumption. Any idea how the ground truth poses in original LineMOD dataset is defined?
Also, what this file in your repo is for?

Sorry that this issue is not about the segmentation-driven-pose repo but rather a general question.

No Fusion

Thanks for the sharing the code.
I'm confused the word No Fusion and Oracle. Could you explain about it??

Supposed finding a cup 8 corner.

Is that right No Fusion means predict 8 corners only based on the object segmentation mask a center grid with not using confidence score and confidence loss
But the Fusion means predict 8 corners based on the object segmentation mask (included a center grid and other grids in mask) and using the highest confidence score and confidence loss?
"In paper Oracle results obtained by selecting the best predicted 2D location for each 3D keypoints using the ground truth 2D reprojections.'
Could you explained detailed about how to selecting the best predicted 2D location?? Is that means selecting the closest location between predicted 2D keypoints and ground truth 2d projections??
Is that right oracle 3D keypoint means a cup's ground truth 8 corner points (3D bounding box)?

About the train.py

When do you share train.py? the work is very good!

Great work! would you mind providing data synthetic code?

Hello, we're trying to reimplement training code but we need to make a synthetic dataset first.
Would you mind share part of the script?
If not, can you give us more details? Do you try to prevent objects from overlapping?

gen_filelist.py.

Prepare the input file list using gen_filelist.py.

How to prepare this file? Can you explain it in detail please?

what modulating factor do you use in confidence term?

How is data/YCB-Video/YCB_vertex.npy generated

Hi,
We are trying to render the point-cloud model to the 2D picture. The predicted 6Dpose worked fine with ply generated from YCB_vertex.npy but the result was strange with the ply file from the official website of YCB dataset. Could you explain how you generated YCB_vertex.npy? We have attached the screen shot of two ply files opened in meshlab and it can be seen that they are different. Thank you!

How to make a mesh label for semantic segmentation during training

预训练模型

谁有预训练模型，分享一下？[email protected]，谢谢！

SolvePnPRansac - Assertion failed on matrix.cpp line 2355

Hello,

Would you kindly share the versions of OpenCV and Pytorch used to run 'python test.py'? I am getting Assertion failure on matrix.cpp@2355 line after the method solvePnPRansac call at utils.py:204.

Thank you very much for your attention

About hyperparameters

Hello,
I am trying to implement this work according to the paper, but I don't know the value of some hyperparameters.
Can you provide the value of 'tau' used in the confidence loss and the value of 'beta' and 'gamma' used in the regression loss?
Thank you very much :)

LINEMOD_bbox.npy

hello, I want to know how to get the corner coordinates of 3D Bounding box, and why it is shape (8, 8, 3) ? Thanks

about generate mask

hi, sorry, I have a problem about mask generate. I follow the following procedure to generate a mask:

load the object vertexs from LINEMOD_vertex.npy(you provide)
correct the vertexs using Transform_RT_to_OccLINEMOD_meshes.npy(like R*vertexs+t)
then projection those 3d vertexs to 2d plane using pose Rt and linemod_k
extract the projection area as mask

I do like the above, but the result looks strange:

this, I want that cat's mask
can you help me, what problem?

About Occluded-LINEMOD training data

Hi,
Do you generate synthetic data by rendering the provided models? or only using the central objects in LINEMOD scenes?

For each epoch, how many training images do you use? 20k samples mentioned in the paper?

Thanks.

how to generate YCB_bbox.npy

Hi Dr.Hu

We are very interested in your good work, segmentation-driven-pose. Could I ask you a question?

From issues in repo, we saw someone proposed this question, but cannot understand what you said. #11
How could we generate these two files,
segmentation-driven-pose/data/YCB-Video/ YCB_bbox.npy, YCB_vertex.npy Thanks a lot.