Code Monkey home page Code Monkey logo

shapo's Introduction

ShAPO:tophat:: Implicit Representations for Multi-Object Shape, Appearance and Pose Optimization

License: MIT

This repository is the pytorch implementation of our paper:

ShAPO: Implicit Representations for Multi-Object Shape, Appearance and Pose Optimization
Muhammad Zubair Irshad, Sergey Zakharov, Rares Ambrus, Thomas Kollar, Zsolt Kira, Adrien Gaidon
European Conference on Computer Vision (ECCV), 2022

[Project Page] [arXiv] [PDF] [Video] [Poster]

Explore CenterSnap in Colab

Previous ICRA'22 work:

CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation
Muhammad Zubair Irshad, Thomas Kollar, Michael Laskey, Kevin Stone, Zsolt Kira
International Conference on Robotics and Automation (ICRA), 2022

[Project Page] [arXiv] [PDF] [Video] [Poster]

Citation

If you find this repository useful, please consider citing:

@inproceedings{irshad2022shapo,
  title={ShAPO: Implicit Representations for Multi-Object Shape Appearance and Pose Optimization},
  author={Muhammad Zubair Irshad and Sergey Zakharov and Rares Ambrus and Thomas Kollar and Zsolt Kira and Adrien Gaidon},
  journal={European Conference on Computer Vision (ECCV)},
  year={2022},
  url={https://arxiv.org/abs/2207.13691},
}

@inproceedings{irshad2022centersnap,
  title={CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation},
  author={Muhammad Zubair Irshad and Thomas Kollar and Michael Laskey and Kevin Stone and Zsolt Kira},
  journal={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2022},
  url={https://arxiv.org/abs/2203.01929},
}

Contents

🀝 Google Colab

If you want to experiment with ShAPO, we have written a Colab. It's quite comprehensive and easy to setup. It goes through the following experiments / ShAPO properties:

  • Single Shot inference
    • Visualize peak and depth output
    • Decode shape with predicted textures
    • Project 3D Pointclouds and 3D bounding boxes on 2D image
  • Shape, Appearance and Pose Optimization
    • Core optimization loop
    • Viusalizing optimized 3D output (i.e. textured asset creation)

πŸ’» Environment

Create a python 3.8 virtual environment and install requirements:

cd $ShAPO_Repo
conda create -y --prefix ./env python=3.8
conda activate ./env/
./env/bin/python -m pip install --upgrade pip
./env/bin/python -m pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

The code was built and tested on cuda 10.2

πŸ“Š Dataset

Download camera_train, camera_val, real_train, real_test, ground-truth annotations, camera_composed_depth, mesh models and eval_results provided by NOCS and nocs preprocess data.
Also download sdf_rgb_pretrained_weights. Unzip and organize these files in $ShAPO_Repo/data as follows:

data
β”œβ”€β”€ CAMERA
β”‚   β”œβ”€β”€ train
β”‚   └── val
β”œβ”€β”€ Real
β”‚   β”œβ”€β”€ train
β”‚   └── test
β”œβ”€β”€ camera_full_depths
β”‚   β”œβ”€β”€ train
β”‚   └── val
β”œβ”€β”€ gts
β”‚   β”œβ”€β”€ val
β”‚   └── real_test
β”œβ”€β”€ results
β”‚   β”œβ”€β”€ camera
β”‚   β”œβ”€β”€ mrcnn_results
β”‚   β”œβ”€β”€ nocs_results
β”‚   └── real
β”œβ”€β”€ sdf_rgb_pretrained
β”‚   β”œβ”€β”€ LatentCodes
β”‚   β”œβ”€β”€ Reconstructions
β”‚   β”œβ”€β”€ ModelParameters
β”‚   β”œβ”€β”€ OptimizerParameters
β”‚   └── rgb_net_weights
└── obj_models
    β”œβ”€β”€ train
    β”œβ”€β”€ val
    β”œβ”€β”€ real_train
    β”œβ”€β”€ real_test
    β”œβ”€β”€ camera_train.pkl
    β”œβ”€β”€ camera_val.pkl
    β”œβ”€β”€ real_train.pkl
    β”œβ”€β”€ real_test.pkl
    └── mug_meta.pkl

Create image lists

./runner.sh prepare_data/generate_training_data.py --data_dir /home/ubuntu/shapo/data/nocs_data/

Now run distributed script to collect data locally in a few hours. The data would be saved under data/NOCS_data.

Note: The script uses multi-gpu and runs 8 workers per gpu on a 16GB GPU. Change worker_per_gpu variable depending on your GPU size.

python prepare_data/distributed_generate_data.py --data_dir /home/ubuntu/shapoplusplus/data/nocs_data --type camera_train

--type chose from 'camera_train', 'camera_val', 'real_train', 'real_val'

✨ Training and Inference

ShAPO is a two-stage process; First, a single-shot network to predict 3D shape, pose and size codes along with segmentation masks in a per-pixel manner. Second, test-time optimization of joint shape, pose and size codes given a single-view RGB-D observation of a new instance.

  1. Train on NOCS Synthetic (requires 13GB GPU memory):
./runner.sh net_train.py @configs/net_config.txt

Note than runner.sh is equivalent to using python to run the script. Additionally it sets up the PYTHONPATH and ShAPO Enviornment Path automatically. Also note that this part of the code is similar to CenterSnap. We predict implicit shapes as SDF MLP instead of pointclouds and additionally also predict appearance embedding and object masks in this stage.

  1. Finetune on NOCS Real Train (Note that good results can be obtained after finetuning on the Real train set for only a few epochs i.e. 1-5):
./runner.sh net_train.py @configs/net_config_real_resume.txt --checkpoint \path\to\best\checkpoint
  1. Inference on a NOCS Real Test Subset

Download a small Real test subset from here, our shape and texture decoder pretrained checkpoints from here and shapo pretrained checkpoints on real dataset here. Unzip and organize these files in $ShAPO_Repo/data as follows:

test_data
β”œβ”€β”€ Real
β”‚   β”œβ”€β”€ test
|   ckpts
└── sdf_rgb_pretrained
    β”œβ”€β”€ LatentCodes
    β”œβ”€β”€ LatentCodes
    β”œβ”€β”€ Reconstructions
    β”œβ”€β”€ ModelParameters
    β”œβ”€β”€ OptimizerParameters
    └── rgb_net_weights

Now run the inference script to visualize the single-shot predictions as follows:

bash
./runner.sh inference/inference_real.py @configs/net_config.txt --test_data_dir path_to_nocs_test_subset --checkpoint checkpoint_path_here

You should see the visualizations saved in results/ShAPO_real. Change the --ouput_path in *config.txt to save them to a different folder

  1. Optimization

This is the core optimization script to update latent shape and appearance codes along with 6D pose and sizes to better the fit the unseen single-view RGB-D observation. For a quick run of the core optimization loop along with visualization, see this notebook here

./runner.sh opt/optimize.py @configs/net_config.txt --data_dir /path/to/test_data_dir/ --checkpoint checkpoint_path_here

πŸ“ FAQ

Please see FAQs from CenterSnap here

Acknowledgments

  • This code is built upon the implementation from CenterSnap

Related Work

Licenses

  • This repository is released under the CC BY-NC 4.0 license.

shapo's People

Contributors

zubair-irshad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shapo's Issues

Dataset preprocessing

Hi, I'm trying to train shapo with my own synthetic dataset. Beforehand, I wanted to reproduce your results.

I tried to generate a dataset file and succeeded in the train set. However, it seems like some files are missing to generate the validation set. In the attached sdf_rgb_pretrained.tar.xz, I cannot find "Reconstructions" directory.
Screenshot from 2024-01-15 21-44-07

Because of it, the generation script says the following error. I ran the following command.
Screenshot from 2024-01-15 21-48-07
Screenshot from 2024-01-15 21-49-03

Here is my data structure. I followed the guidelines and was able to reproduce the training set, but not validation set.
Screenshot from 2024-01-15 21-50-08

Thank you a lot in advance.

Support for realtime camera image? ROS Integration

Hi,

Thanks for this amazing work.

I wonder, wether you provide support for tutorials about how to integrate with robot cameras in real-time.
Just like the example you showed by using the HSR robot.

I could also contribute to this matter.

I can't find those 4 files:

I can't find those 4 files:
could you please update the link for dataset?
β”œβ”€β”€ camera_val.pkl
β”œβ”€β”€ real_train.pkl
β”œβ”€β”€ real_test.pkl
└── mug_meta.pkl

Creating custom dataset for shapo

Hello,

First of all, thanks for your contribution!

I am trying to create a custom dataset of different objects which are NOT present in shapenet database from scratch. For that, I am trying to imitate the dataset structure that you shared. I am using BlenderProc to create the synthetic data. I have also downloaded the dataset which you used for training and analysed it.

I have a several doubts regarding the creation of different files in the dataset:

  1. How to create a depth image exactly like CAMERA dataset (for e.g. CAMERA/train/0000_depth.png) so that it looks like this:

image

Is there any script present in the repository to create this depth image? I did not understand how the depth information is encoded in an RGB image.

How to create a depth image exactly like in camera_full_depth folder (for e.g. camera_full_depth/train/0000/0000_composed.png) :

image

  1. How to create bbox.txt for each object file in obj_models?

  2. how to generate camera_train.pkl and camera_val.pkl in obj_models?

  3. Why is mug_meta.pkl present in the obj_models folder?

  4. How to create norm.txt and norm_vertices.txt in obj_models/real_train?

  5. How to generate β€˜Results’ folder and all the files in it?

  6. In β€˜sdf_rgb_pretrained’ folder, how to generate the β€˜Latent Codes’ and all_train_ids.json inside it?

  7. How to generate .pkl files in β€˜gts’ folder?

  8. Do we need to store 6D poses of all the objects in the scene somewhere as annotations for all the images?

Thank you in advance for your answers.

About pretrained models for inference

Hello.
First, thank you for the answers in #7 .
I could run 'inference' and 'optimize' code with the uploaded pretrained weights.
For further evaluation, I have a few questions.

I think that sdf_latentcode, sdf_rgb_net_weights, and trained weights(shapo_real.ckpt) are needed.

  1. Can you explain the weights' meaning as I written above?

  2. If I run the 'train' phase as you written above, I think I can only train the trained weights(shapo_real.ckpt). Is that right? If true, where the sdf_latentcode, sdf_rgb_net_weights come from?

  3. To run the inference code on all the NOCS dataset(Real test dataset, Real train dataset), can I use all the pretrained weights written above?

  4. To run the inference code on other datasets, can I use all the pretrained weights written above? If not, which weights do I have to change?

I'm really appreciated with your project!

Error in distributed_generate_data

Hello, when I try to run python prepare_data/distributed_generate_data.py with camera_val or real_val type, I get the following error in the workers generated file:

0%| | 0/459 [00:00<?, ?it/s]
0%| | 0/459 [00:00<?, ?it/s]
Traceback (most recent call last):
File "prepare_data/distributed_worker.py", line 45, in
main(args)
File "prepare_data/distributed_worker.py", line 28, in main
annotate_test_data(args.data_dir, 'Real', 'test', args.start, args.end)
File "D:\shapo\prepare_data\generate_training_data.py", line 451, in annotate_test_data
img_path.split('/')[-2], img_path.split('/')[-1]))
IndexError: list index out of range

Do you know how can I solve this?
Thank you in advance.

Note: When I run it with camera_train and real_train I donΒ΄t have any problem.

Problems with running code

Hello.
Thanks for the nice work!
I tried to run your 'ShAPO' code, but I got stuck.
Can you share the solution how to solve the problems written below?

  1. As mentioned in #6 , there is no auto_encoder_model folder and sdf_rgb_pretained/rgb_latent folder in supported download link. Can you update the link?

  2. When I try to run the distributed script,
    python prepare_data/distributed_generate_data.py --data_dir /home/ubuntu/shapoplusplus/data/nocs_data --type camera_train
    I got an error message like this
    FileNotFoundError: [Errno 2] No such file or directory: '/data/sdf_rgb_pretrained/rgb_net_weights/reconstructor.pt'
    Because there is no rgb_net_weights/reconstructor.pt in $ShAPO_Repo/data.
    How to solve this?

  3. After download and organize a small Real test subset, pretrained checkpoints,...
    directory shape is like this
    test_data
    β”œβ”€β”€ ckpts
    β”‚Β Β  └── shapo_real.ckpt
    β”œβ”€β”€ Real
    β”‚Β Β  β”œβ”€β”€ test
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ scene_1
    β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0006_color.png
    β”‚Β Β  β”‚Β Β  β”‚Β Β  └── 0006_depth.png
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ scene_2
    β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0049_color.png
    β”‚Β Β  β”‚Β Β  β”‚Β Β  └── 0049_depth.png
    β”‚Β Β  β”‚Β Β  └── scene_3
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0003_color.png
    β”‚Β Β  β”‚Β Β  └── 0003_depth.png
    β”‚Β Β  └── test_list_subset.txt
    β”œβ”€β”€ sdf_rgb_pretrained
    β”‚Β Β  β”œβ”€β”€ all_train_ids.json
    β”‚Β Β  β”œβ”€β”€ LatentCodes
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 1000.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 100.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 2000.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 500.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ all_train_ids.json
    β”‚Β Β  β”‚Β Β  └── latest.pth
    β”‚Β Β  β”œβ”€β”€ Logs.pth
    β”‚Β Β  β”œβ”€β”€ ModelParameters
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 1000.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 100.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 2000.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 500.pth
    β”‚Β Β  β”‚Β Β  └── latest.pth
    β”‚Β Β  β”œβ”€β”€ OptimizerParameters
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 1000.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 100.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 2000.pth
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 500.pth
    β”‚Β Β  β”‚Β Β  └── latest.pth
    β”‚Β Β  β”œβ”€β”€ rgb_net_weights
    β”‚Β Β  β”‚Β Β  β”œβ”€β”€ feats_colors.pt
    β”‚Β Β  β”‚Β Β  └── reconstructor.pt
    β”‚Β Β  └── specs.json

It means that there is no sdf_rgb_pretrained/Reconstructions, sdf_rgb_pretrained/rgb_latent in $ShAPO_Repo/test_data.

Is there any missing points that I'm doing?

  1. Also, is there a way to inference with pre-trained weights without training?

Thanks!

Regarding the evaluation of your pre-trained model

I have come to the problem of evaluating your the pre-trained ShAPO model you provided in the repo here
I could not find an evaluation script in your ShAPO repository, and I found a similar issue in your CenterSnap repo here. The author of the issue states problems with finding the predicted class labels and sizes, then in one of your replies you provide this helper function and ask them to use mask_rcnn results from the object-derformnet repository.

I have done all the things you asked in that issue. However, I used your pre-trained shapo model (without post-optimization) for evaluation, instead of training one of my own from scratch. But I cannot reproduce the numbers you report in ShAPO paper (assuming your pre-trained model performs as well as CenterSnap's numbers). therefore I have the following questions:

  1. Is the pre-trained ShAPO you provided not an optimal one, but an intermediate one? and therefore I cannot reproduce the numbers (without post-optimization). Moreover, does using your pretrained ShAPO model without post-optimization give similar numbers to CenterSnap?

  2. How can one determine the f_size in result['pred_scales'] = f_size statement you write in that issue. I calculate the f_size using the predicted point-cloud from the predicted shape latents using this line of code from object-deformnet. As i understand, this f_size is important in calculating the 3D IoU numbers you report in the ShAPO paper.

  3. To alleviate this confusion, is there a possibility that you could share the evaluation script that you used to generate the numbers using the compute_mAP function as you mentioned in that github Issue?

Thank you,
Sandeep

Use Linemod dataset

Hi,

Do you think it would be possible to run shapo with the linemod dataset if I follow the tips of some issues related to the use of custom datasets.

Thank you!

Dataset download

Hi, in the Dataset part there is no link to download the zip for auto_encoder_model and in the zip download of sdf_rgb_pretrained I there was no rgb_latent folder.
Thanks for the work!

Problem with boto3

Hi, when run the file net_train I get the following error: botocore.exceptions.NoCredentialsError: Unable to locate credentials
I would appreciate if you could help me.
Thank you!

can't find .pickle.zstd to train

when i train the carmra dateset, i cant find any file include "*.pickle.zstd", so train_ds.list()=[]
can you tell me what's wrong?

Running the evaluation code

Hello authors, thank you for the contribution.
I am currently making a simple simulation of the evaluation code so I could understand how the evaluation process is working. Right now, I have created couple dummy points and I can get IoUs and IoU matches. When I tried to run this code :

def compute_ap_and_acc(pred_matches, gt_matches):

It gives me this error message:

errorshAPO

I don't know how you do you explain this problem. Could you help me with that? For your information, my two parameters look like this:

errorshapo2

Please let me know if you could help me with that. I am looking forward to hearing back from you. Thank you!

how can i get gt rotation matrix, translation vector of each image

Is there a possible method to get the gt rotation matrix, translation vector of each image? i have realsense D435 camera, kinect DK v2. I notice we need these for custom model training. When record data from my D435 camera, i get the same rotation matrix, translation vector for each image even though i change the camera's angle,position or moving objects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.