Code Monkey home page Code Monkey logo

liepose-diffusion's Introduction

Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)

CVPR 2024

Tsu-Ching Hsiao  Hao-Wei Chen  Hsuan-Kung Yang  Chun-Yi Lee 
Elsa Lab, National Tsing Hua University

| arXiv |

se3-diffusion-cover-gt-right-1

Official re-implementation in JAX.

Abstract

Addressing pose ambiguity in 6D object pose estimation from single RGB images presents a significant challenge, particularly due to object symmetries or occlusions. In response, we introduce a novel score-based diffusion method applied to the SE(3) group, marking the first application of diffusion models to SE(3) within the image domain, specifically tailored for pose estimation tasks. Extensive evaluations demonstrate the method's efficacy in handling pose ambiguity, mitigating perspective-induced ambiguity, and showcasing the robustness of our surrogate Stein score formulation on SE(3). This formulation not only improves the convergence of denoising process but also enhances computational efficiency. Thus, we pioneer a promising strategy for 6D object pose estimation.

Updates

  • 2024/05/14: Code released.

Videos

set_all_low.mp4
Click here to see the SYMSOL-T demos
output.mp4
output.mp4
output.mp4
output.mp4
output.mp4

Table of Contents

Installation

Requirements

Ensure your system meets the following requirements:

  • Linux (only tested on Ubuntu 20.04)
  • nvidia-docker
  • CUDA 12.2 or higher

Setup

  1. Clone this repo with the following command:
git clone [email protected]:Ending2015a/liepose-diffusion.git
  1. Download datasets. This will download the TLESS dataset and VOC2012 dataset.
cd liepose-diffusion
make download

NOTE if the datasets do not download correctly, you can download them from the links provided in Datasets section.

  1. Build the docker image and start the container
make build
make run
# inside the docker
cd /workspace
  1. Now you are ready to run the experiments. See Experiments.

Experiments

SYMSOL

SO(3)

python3 -m liepose.exp.symsol.run

The result is located at logs/experiments/symsol-score-flat/.../inference_400000/summary.json.

SYMSOL-T

SE(3)

python3 -m liepose.exp.symsolt.run "lie_type=[se3]"

R3SO(3)

python3 -m liepose.exp.symsolt.run "lie_type=[r3so3]"

The result is located at logs/experiments/symsolt-score-flat/.../inference_800000/summary.json. The ... depends on the given parameters, e.g. lie_type=se3+repr_type=tan+....

T-LESS

SE(3)

python3 -m liepose.exp.bop.run "lie_type=[se3]"

R3SO(3)

python3 -m liepose.exp.bop.run "lie_type=[r3so3]"

The result is located at logs/experiments/bop-tless-score-flat/.../inference_400000/summary.json. The ... depends on the given parameters, e.g. lie_type=se3+repr_type=tan+....

Metrics

SYMSOL and SYMSOL-T

In summary.json, you will see the format like

{
  "final_metrics": {
    "rot": 0.007605713326483965,
    "rot(deg)": 0.4357752501964569,
    "rot_2": 99.59599999999999,
    "rot_5": 99.88,
    "rot_10": 99.94,
    "rot_id0": 0.007878238335251808,
    "rot(deg)_id0": 0.45138978958129883,
    "rot_2_id0": 99.98,
    "rot_5_id0": 100.0,
    ...
  },
  ...
}

The meaning and the shapes' ID is listed as follows:

Metrics Meaning
rot average rotation errors in radians
rot(deg) average rotation errors in degrees
rot_2 the percentage (%) of the samples rotation errors less than 2 degrees
rot_5 the percentage (%) of the samples rotation errors less than 5 degrees
rot_10 the percentage (%) of the samples rotation errors less than 10 degrees
tran average translation errors (distance)
tran_0.02 the percentage (%) of the samples translation errors less than 0.02
tran_0.05 the percentage (%) of the samples translation errors less than 0.05
tran_0.1 the percentage (%) of the samples translation errors less than 0.1
add average distance of two point clouds (ADD)
add_0.02 the percentage (%) of the samples average distance less than 0.02
add_0.05 the percentage (%) of the samples average distance less than 0.05
add_0.1 the percentage (%) of the samples average distance less than 0.1
geo average geodesic distance on SE(3)
geo_0.02 the percentage (%) of the samples geodesic distance less than 0.02
geo_0.05 the percentage (%) of the samples geodesic distance less than 0.05
geo_0.1 the percentage (%) of the samples geodesic distance less than 0.1
{metric}_id* the metrics for each shape, e.g. rot_id0, tran_id2, ...
ID Shape
0 tetrahedron
1 cube
2 icosahedron
3 cone
4 cylinder

Datasets

We use the following datasets in our experiments:

Download
SYMSOL tfds*1
SYMSOL-T gdrive *2
T-LESS gdrive
VOC2012 gdrive

*1 SYMSOL is downloaded automatically via tensorflow-datasets during the first run.

*2 We also provide the scripts for synthesizing SYMSOL-T dataset.

# You need to enable the screen/display before `make run`
export DISPLAY=:0
make run

# Make your own SYMSOL-T (25000 samples) = (5 shapes) * (5k per shape)
python3 liepose.data.symsolt.synth --path "dataset/symsolt/my-symsolt-5k" "num_samples=25000"

For more configs, see the script.

Customized Datasets

TBA.

We will announce our workflow to annotate your custom dataset in another repo Ending2015a/custom-bop-tool.

Multi-GPUs Training

In default, single-GPU is used for training as we set the CUDA_VISIBLE_DEVICES='0' inside the Makefile. You can enable multi-GPU training simply by setting the CUDA_VISIBLE_DEVICES to multiple devices. The framework will automatically switch to the parallel training mode. For example, export CUDA_VISIBLE_DEVICES='0,1,2', will use 3 devices for training, and the batch_size is divided by 3 for each device.

Citation

@inproceedings{hsiao2024confronting,
    title={Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)},
    author={Hsiao, Tsu-Ching and Chen, Hao-Wei and Yang, Hsuan-Kung and Lee, Chun-Yi},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={352--362},
    year={2024}
}

liepose-diffusion's People

Contributors

ending2015a avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

hiyyg

liepose-diffusion's Issues

About sampling poses

Dear author,

First of all, thanks for your amazing work. I have some questions regarding inference phase and SE(3) diffusion.

  1. In the inference phase, do you only sample one pose? Given the nature of generative models, it seems possible that the sampled pose might deviate from the distribution(regarded as outlier). Have you encountered issues with this? Or has the model been trained well enough that such concerns are unnecessary? If I missed something, please let me know!

  2. And as proposed in [1], since using a 6D representation is continuous in Euclidean space, it seems we can apply the traditional diffusion training directly (Here, we are representing SO(3) with 6D-representation). Would there be any significant differences when training the model this way? Would it be better in terms of performance to train the model in a more complex SE(3)? I would like to hear your opinion!

Thank you.

[1] Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.

Some questions

This is an outstanding paper.
How can I be successful just like you?😁😁

If you have any tips, please let me know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.