Code Monkey home page Code Monkey logo

segmentation-driven-pose's Introduction

Overview

Please find the latest version at WDR-Pose.

This repository contains the code for the paper Segmentation-driven 6D Object Pose Estimation. Yinlin Hu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann. CVPR. 2019. [Paper]

The most recent trend in estimating the 6D pose of rigid objects has been to train deep networks to either directly regress the pose from the image or to predict the 2D locations of 3D keypoints, from which the pose can be obtained using a PnP algorithm. In both cases, the object is treated as a global entity, and a single pose estimate is computed. As a consequence, the resulting techniques can be vulnerable to large occlusions.

In this paper, we introduce a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations. We then use a predicted measure of confidence to combine these pose candidates into a robust set of 3D-to-2D correspondences, from which a reliable pose estimate can be obtained. We outperform the state-of-the-art on the challenging Occluded-LINEMOD and YCB-Video datasets, which is evidence that our approach deals well with multiple poorly-textured objects occluding each other. Furthermore, it relies on a simple enough architecture to achieve real-time performance.

Figure 1: Overall workflow of our method. Our architecture has two streams: One for object segmentation and the other to regress 2D keypoint locations. These two streams share a common encoder, but the decoders are separate. Each one produces a tensor of a spatial resolution that defines an SxS grid over the image. The segmentation stream predicts the label of the object observed at each grid location. The regression stream predicts the 2D keypoint locations for that object.

Figure 2: Occluded-LINEMOD results. In each column, we show, from top to bottom: the foreground segmentation mask, all 2D reprojection candidates, the selected 2D reprojections, and the final pose results. Our method generates accurate pose estimates, even in the presence of large occlusions. Furthermore, it can process multiple objects in real time.

How to Use

Step 1

Download the datasets.

Occluded-LINEMOD: https://hci.iwr.uni-heidelberg.de/vislearn/iccv2015-occlusion-challenge/

YCB-Video: https://rse-lab.cs.washington.edu/projects/posecnn/

Step 2

Download the pretrained model.

Occluded-LINEMOD: https://1drv.ms/u/s!ApOY_gOHw8hLbbdmVZgnqk30I5A

YCB-Video: https://1drv.ms/u/s!ApOY_gOHw8hLbLl4i8CAXD6LGuU

Download and put them into ./model directory.

Due to commercial problem, we can only provide the code for inference. However, it is straightforward to implement the training part according to our paper and this repository.

Step 3

Prepare the input file list using gen_filelist.py.

Step 4

Run test.py and explore it.

Citing

@inproceedings{hu2019segpose,
  title={Segmentation-driven 6D Object Pose Estimation},
  author={Yinlin Hu and Joachim Hugonot and Pascal Fua and Mathieu Salzmann},
  booktitle={CVPR},
  year={2019}
}

segmentation-driven-pose's People

Contributors

yinlinhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

segmentation-driven-pose's Issues

Training data in YCB-video image

Hi,

How many training data be generated from 80K synthetic in YCB dataset? Did all the 80K synthetic be used, or just a part of them?

./model directory

Download and put them into ./model directory.

Can you explain in detail please?

Coordinate system convention of the ground truth poses

Hi,
One more question related to original LineMOD. The ground truth poses in Occluded-LineMOD dataset follows the OpenGL Coordinate system convention (camera viewing direction is the negative Z-axis) as mention in the section 2.2 of this document. But I am not sure of the coordinate system convention used in the original LineMOD dataset. For example one of the transformation file like looks like
-13.0792
-7.83575
104.177
The values are in centimetres.
so, that's ~1 meters along the positive z-axis.
This make me think that the poses are annotated with OpenCV
coordinate system convention. But my rendering pipeline is not working correctly if I follow this assumption. Any idea how the ground truth poses in original LineMOD dataset is defined?
Also, what this file in your repo is for?

Sorry that this issue is not about the segmentation-driven-pose repo but rather a general question.

No Fusion

Thanks for the sharing the code.
I'm confused the word No Fusion and Oracle. Could you explain about it??

Supposed finding a cup 8 corner.

  1. Is that right No Fusion means predict 8 corners only based on the object segmentation mask a center grid with not using confidence score and confidence loss
    But the Fusion means predict 8 corners based on the object segmentation mask (included a center grid and other grids in mask) and using the highest confidence score and confidence loss?

  2. "In paper Oracle results obtained by selecting the best predicted 2D location for each 3D keypoints using the ground truth 2D reprojections.'
    Could you explained detailed about how to selecting the best predicted 2D location?? Is that means selecting the closest location between predicted 2D keypoints and ground truth 2d projections??

  3. Is that right oracle 3D keypoint means a cup's ground truth 8 corner points (3D bounding box)?

gen_filelist.py.

Prepare the input file list using gen_filelist.py.

How to prepare this file? Can you explain it in detail please?

How is data/YCB-Video/YCB_vertex.npy generated

Hi,
We are trying to render the point-cloud model to the 2D picture. The predicted 6Dpose worked fine with ply generated from YCB_vertex.npy but the result was strange with the ply file from the official website of YCB dataset. Could you explain how you generated YCB_vertex.npy? We have attached the screen shot of two ply files opened in meshlab and it can be seen that they are different. Thank you!
Screenshot from 2019-10-15 18-08-37

SolvePnPRansac - Assertion failed on matrix.cpp line 2355

Hello,

Would you kindly share the versions of OpenCV and Pytorch used to run 'python test.py'? I am getting Assertion failure on matrix.cpp@2355 line after the method solvePnPRansac call at utils.py:204.

Thank you very much for your attention

About hyperparameters

Hello,
I am trying to implement this work according to the paper, but I don't know the value of some hyperparameters.
Can you provide the value of 'tau' used in the confidence loss and the value of 'beta' and 'gamma' used in the regression loss?
Thank you very much :)

LINEMOD_bbox.npy

hello, I want to know how to get the corner coordinates of 3D Bounding box, and why it is shape (8, 8, 3) ? Thanks

about generate mask

hi, sorry, I have a problem about mask generate. I follow the following procedure to generate a mask:

  1. load the object vertexs from LINEMOD_vertex.npy(you provide)
  2. correct the vertexs using Transform_RT_to_OccLINEMOD_meshes.npy(like R*vertexs+t)
  3. then projection those 3d vertexs to 2d plane using pose Rt and linemod_k
  4. extract the projection area as mask

I do like the above, but the result looks strange:
S$DO1I5TD8RE{BSJ`%QT%BW

this, I want that cat's mask
can you help me, what problem?

About Occluded-LINEMOD training data

Hi,
Do you generate synthetic data by rendering the provided models? or only using the central objects in LINEMOD scenes?

For each epoch, how many training images do you use? 20k samples mentioned in the paper?

Thanks.

how to generate YCB_bbox.npy

Hi Dr.Hu

We are very interested in your good work, segmentation-driven-pose. Could I ask you a question?

From issues in repo, we saw someone proposed this question, but cannot understand what you said. #11
How could we generate these two files,
segmentation-driven-pose/data/YCB-Video/ YCB_bbox.npy, YCB_vertex.npy Thanks a lot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.