Code Monkey home page Code Monkey logo

vortx's Introduction

VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

VoRTX is a deep learning model for 3D reconstruction from posed RGB images, using transformers for multi-view fusion.

VoRTX inputs and outputs

Setup

Tested on Ubunutu 20.04.

Dependencies

conda create -n vortx python=3.9 -y
conda activate vortx

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

pip install \
  pytorch-lightning==1.5 \
  scikit-image==0.18 \
  numba \
  pillow \
  wandb \
  tqdm \
  open3d \
  pyrender \
  ray \
  trimesh \
  pyyaml \
  matplotlib \
  black \
  pycuda \
  opencv-python \
  imageio

sudo apt install libsparsehash-dev
pip install git+https://github.com/mit-han-lab/[email protected] 

pip install -e .

Config

cp example-config.yml config.yml

The paths in config.yml will need to be edited to point to the data directories.

Data

The ScanNet data should be downloaded and extracted using the scripts provided by the authors.

To format ScanNet for VoRTX:

python tools/preprocess_scannet.py --src path/to/scannet_src --dst path/to/new/scannet_dst

In config.yml, set scannet_dir to the value of --dst.

To generate ground truth tsdf:

python tools/generate_gt.py --data_path path/to/scannet_src --save_name TSDF_OUTPUT_DIR
# For the test split
python tools/generate_gt.py --test --data_path path/to/scannet_src --save_name TSDF_OUTPUT_DIR

In config.yml, set tsdf_dir to the value of TSDF_OUTPUT_DIR.

Training

python scripts/train.py --config config.yml

Parameters can be adjusted in config.yml. Set attn_heads=0 to use direct averaging instead of transformers.

Inference

Pretrained weights can be downloaded here.

python scripts/inference.py \
  --ckpt path/to/checkpoint.ckpt \
  --split [train / val / test] \
  --outputdir path/to/desired_output_directory \
  --n-imgs 60 \
  --config config.yml \
  --cropsize 96

Because there is randomness in the view selection process, the memory requirements for a given scene can vary from run to run. Using n-imgs=60 with 24G of VRAM, some test scenes can cause OOM errors that are resolved by changing the random seed or reducing n-imgs or cropsize.

Here are inference results using the provided pre-trained weights, n-imgs=60, and cropsize=64. The reduced test-time crop size caused a slight F-score increase to 0.656 (the paper reports 0.641).

Evaluation

python scripts/evaluate.py \
  --results-dir path/to/inference_output_directory \
  --split [train / val / test] \
  --config config.yml

Citation

@inproceedings{stier2021vortx,
  title={{VoRTX}: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion},
  author={Stier, Noah and Rich, Alexander and Sen, Pradeep and H{\"o}llerer, Tobias},
  booktitle={2021 International Conference on 3D Vision (3DV)},
  pages={320--330},
  year={2021},
  organization={IEEE}
}

vortx's People

Contributors

noahstier avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.