Code Monkey home page Code Monkey logo

differentiable-blocksworld's Introduction

Differentiable Blocks World icon.png

Differentiable Blocks World:
Qualitative 3D Decomposition by Rendering Primitives

Tom MonnierJake AustinAngjoo KanazawaAlexei EfrosMathieu Aubry

teaser.gif

Official PyTorch implementation of Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives (to appear in NeurIPS 2023). Check out our webpage for video results!

This repository contains:

  • scripts to download and load datasets
  • configs to optimize the models from scratch
  • evaluation pipelines to reproduce quantitative results
  • guidelines to run the model on a new scene
If you find this code useful, don't forget to star the repo ⭐ and cite the paper 👇
@inproceedings{monnier2023dbw,
  title={{Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives}},
  author={Monnier, Tom and Austin, Jake and Kanazawa, Angjoo and Efros, Alexei A. and Aubry, Mathieu},
  booktitle={{NeurIPS}},
  year={2023},
}

Installation 👷

1. Create conda environment 🔧

conda env create -f environment.yml
conda activate dbw
Optional live monitoring 📉 Some monitoring routines are implemented, you can use them by specifying your visdom port in the config file. You will need to install visdom from source beforehand:
git clone https://github.com/facebookresearch/visdom
cd visdom && pip install -e .
Optional Nerfstudio dataloading 🚜 If you want to load data processed by Nerfstudio (e.g., for a custom scene), you will need to install nerfstudio as described here. In general, executing the following lines should do the job:
pip install ninja==1.10.2.3 git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install nerfstudio==0.1.15

2. Download datasets ⬇️

bash scripts/download_data.sh

This command will download one of the following sets of scenes presented in the paper:

  • DTU: paper / dataset (1.86GB, pre-processing conventions come from IDR, big thanks to the team!)
  • BlendedMVS: paper / dataset (115MB, thanks to the VolSDF team for hosting the dataset)
  • Nerfstudio: paper / repo / dataset (2.67GB, images and Nerfacto models for the 2 scenes in the paper)

It may happen that gdown hangs, if so download the file manually and move it to the datasets folder.

How to use 🚀

1. Run models from scratch 🏃

inp.png rec_edges.gif rec_hard.gif rec_traj.gif

To launch a training from scratch, run:

cuda=gpu_id config=filename.yml tag=run_tag ./scripts/pipeline.sh

where gpu_id is a device id, filename.yml is a config in configs folder, run_tag is a tag for the experiment.

Results are saved at runs/${DATASET}/${DATE}_${run_tag} where DATASET is the dataset name specified in filename.yml and DATE is the current date in mmdd format.

Available configs 🔆
  • dtu/*.yml for each DTU scene
  • bmvs/*.yml for each BlendedMVS scene
  • nerfstudio/*.yml for each Nerfstudio scene

NB: for running on Nerfstudio scenes, you need to install nerfstudio library (see installation section)

Computational cost 💰

The approximate optimization time is roughly 4 hours on a single GPU.

2. Reproduce quantitative results on DTU 📊

dtu_table.png

Our model is evaluated at the end of each run and scores are written in dtu_scores.tsv for the official Chamfer evaluation and final_scores.tsv for training losses, transparencies and image rendering metrics. To reproduce our results on a single DTU scene, run the following command which will launch 5 sequential runs with different seeds (the auto score is the one with minimal training loss):

cuda=gpu_id config=dtu/scanXX.yml tag=default_scanXX ./scripts/multi_pipeline.sh
Get numbers for EMS and MBF baselines 📋

For completeness, we provide scripts for processing data and evaluating the following baselines:

  • EMS: run scripts/ems_pproc.sh, then apply EMS using the official repo, then run scripts/ems_eval.sh to evaluate the 3D decomposition
  • MBF: run scripts/mbf_pproc.sh, then apply MBF using the official repo, then run scripts/mbf_eval.sh to evaluate the 3D decomposition

Do not forget to update the path of the baseline repos in src/utils/path.py. Results will also be computed using the preprocessing step removing the ground from the 3D input.

3. Train on a custom scene 🔮

If you want to run our model on a custom scene, we recommend using Nerfstudio framework and guidelines to process your multi-views, obtain the cameras and check their quality by optimizing their default 3D model. The resulting data and output model should be moved to datasets/nerfstudio folder in the same format as the other Nerfstudio scenes (you can also use symlinks).

Then, you can add the model path in the custom Nerfstudio dataloader (src/datasets/nerfstudio.py), create a new config from one of our nerfstudio config and run the model. One thing that is specific to each scene is the initialization of R_world and T_world, which can be roughly estimated by visual comparisons in plotly or Blender using the pseudo ground-truth point cloud.

Further information 📚

If you like this project, check out related works from our group:

differentiable-blocksworld's People

Contributors

monniert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

differentiable-blocksworld's Issues

About exporting results to mesh

Hi, thank you for sharing your great work!
I notice that your method can successfully decompose scenes into primitive 3D meshes, which is really cool. Since I only found logging codes for image/video rendering, I would like to know whether inference results can be exported as mesh files. If so, could you also share these mesh files with us?
Thank you very much!

The position of reconstructed mesh in quali_eval

Hi, thanks for your nice work and concise repo!

I'm working on the project that needs to leverage on the reconstructed mesh generated from dbw, and I found the position and scale of mesh have certain gap compared to points computed from colmap.
I'm wondering if the mesh is not placed on the real position of world coordinates, and instead always placed at the origin?

Question about the design of the model

Hi, thanks for the awesome work!
After reading the paper and codes I have one question about the model design. In section 3, you mention as

Note that compared to recent advances in neural volumetric representations [50, 45, 76], we do not use any neural network and directly optimize meshes, which are straightforward to use in computer graphic pipelines.

I was curious about the part about not using any neural networks because as the model is already light to train, I thought that attaching a small MLP layer after the primitive can highly increase the quality of the rendered images as from other NeRF type models. Although I do understand that attaching the MLP will make the entire model less practical in the sense that we cannot use it in computer graphics pipelines, I think this will make the model competitive with other existing MVS models. Have you ever conducted any experiments with neural networks?

Thanks!

Cpu and Cuda device mismatch error

So in the line 422 in dbw.py
I notice that you define a val_blocks which is on cpu and the self.get_opacities() is on the gpu
so there will be a error raised
My personal solution is to revise the line 422 from val_blocks = torch.linspace(0, 1, self.n_blocks + 1)[1:]
to
val_blocks = torch.linspace(0, 1, self.n_blocks + 1)[1:].to(self.bkg.device)

and line 429 from values = torch.cat([torch.zeros(NFE) , val_blocks.repeat_interleave(self.BNF)])

to
values = torch.cat([torch.zeros(NFE).to(self.bkg.device) , val_blocks.repeat_interleave(self.BNF)])

and the problem is solved

i am not sure whether this is a common issue or just happened on my device
so just remark it here.

Alpha compositing differentiable rendering

Hi!

First of all thank you for the great work and codebase! 😃

I wanted to ask you about the design choice of adding the transparency value of the primitives to the differentiable rendering process. Specifically, in the paper you mention that it behaves better during optimization in comparison to the standard differentiable rendering pipeline. What do you mean by better behavior? How worse are the results when using directly the standard Pytorch3D renderer, are there any examples that you could share showing the difference? Finally, do you have any intuition why this is happening?

Best,
Konstantinos

Upload model and create a demo on Hugging Face

Hi!

Very cool work! It would be nice to have the model checkpoints on the Hugging Face Hub.

Some of the benefits of sharing your models through the Hub would be:

  • versioning, commit history and diffs
  • repos provide useful metadata about their tasks, languages, metrics, etc that make them discoverable
  • multiple features from TensorBoard visualizations, PapersWithCode integration, and more
  • wider reach of your work to the ecosystem

Creating the repos and adding new models should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested.

You can also create a research demo as well.

Please let us know if you would be interested and if you have any questions.

DTU Evaluation Metric

Thank you for your awesome work. I have a few questions about the DTU-CD evaluation step.

I noticed that the evaluated distance in the code is based on the ground truth point cloud file with reconstructed blocks, excluding the background and floor, while the GT point cloud file includes the background. Could you kindly clarify whether you exclusively use the mean_d2s metric in your paper, or do you take the average of mean_d2s and means2d? If it's the latter, how do you handle the environmental point cloud in the referenced ground truth file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.