Active Stereo Without Pattern Projector (ICCV 2023)

Stereo-Depth Fusion through Virtual Pattern Projection (Journal Extension)

🚨 This repository contains download links to our code, and trained deep stereo models of our works "Active Stereo Without Pattern Projector", ICCV 2023 and "Stereo-Depth Fusion through Virtual Pattern Projection", Journal Extension

by Luca Bartolomei^1,2, Matteo Poggi², Fabio Tosi², Andrea Conti², and Stefano Mattoccia^1,2

Advanced Research Center on Electronic System (ARCES)¹ University of Bologna²

Active Stereo Without Pattern Projector (ICCV 2023)

Project Page | Paper | Supplementary | Poster

Stereo-Depth Fusion through Virtual Pattern Projection (Journal Extension)

Project Page | Paper

Note: 🚧 Kindly note that this repository is currently in the development phase. We are actively working to add and refine features and documentation. We apologize for any inconvenience caused by incomplete or missing elements and appreciate your patience as we work towards completion.

🎬 Introduction

This paper proposes a novel framework integrating the principles of active stereo in standard passive camera systems without a physical pattern projector. We virtually project a pattern over the left and right images according to the sparse measurements obtained from a depth sensor.

Contributions:

Even with meager amounts of sparse depth seeds (e.g., 1% of the whole image), our approach outperforms by a large margin state-of-the-art sensor fusion methods based on handcrafted algorithms and deep networks.
When dealing with deep networks trained on synthetic data, it dramatically improves accuracy and shows a compelling ability to tackle domain shift issues, even without additional training or fine-tuning.
By neglecting a physical pattern projector, our solution works under sunlight, both indoors and outdoors, at long and close ranges with no additional processing cost for the original stereo matcher.

Extension Contributions:

Acting before any processing occurs, it can be seamlessly deployed with any stereo algorithm or deep network without modifications and benefit from future progress in the field.
Moreover, in contrast to active stereo systems, using a depth sensor in place of a pattern projector:
- It is more effective even in the specific application domain of projector-based systems and potentially less expensive;
- It does not require additional hardware (e.g., additional RGB or IR cameras), as depth estimation is performed in the same target visual spectrum;
- The virtual projection paradigm can be tailored on the fly to adapt to the image content and is agnostic to dynamic objects and ego-motion.

🖋️ If you find this code useful in your research, please cite:

@InProceedings{Bartolomei_2023_ICCV,
    author    = {Bartolomei, Luca and Poggi, Matteo and Tosi, Fabio and Conti, Andrea and Mattoccia, Stefano},
    title     = {Active Stereo Without Pattern Projector},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {18470-18482}
}

@misc{bartolomei2024stereodepth,
      title={Stereo-Depth Fusion through Virtual Pattern Projection}, 
      author={Luca Bartolomei and Matteo Poggi and Fabio Tosi and Andrea Conti and Stefano Mattoccia},
      year={2024},
      eprint={2406.04345},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🎥 Watch Our Research Video!

📥 Pretrained Models

Here, you can download the weights of RAFT-Stereo and PSMNet architectures.

Vanilla Models: these models are pretrained on Sceneflow vanilla images and Middlebury vanilla images
- PSMNet vanilla models: psmnet/sceneflow/psmnet.tar, psmnet/middlebury/psmnet.tar
- RAFT-Stereo vanilla models (raft-stereo/sceneflow/raftstereo.pth and raft-stereo/middlebury/raftstereo.pth) are just a copy from authors' drive
Fine-tuned Models: starting from vanilla models, these models (*-vpp-ft.tar) are finetuned in the same domain but with virtual projected images
Models trained from scratch: these models (*-vpp-tr.tar) are trained from scratch using virtual projected images

To use these weights, please follow these steps:

Install GDown python package: pip install gdown
Download all weights from our drive: gdown --folder https://drive.google.com/drive/folders/1GqcY-Z-gtWHqDVMx-31uxrPzprM38UJl?usp=drive_link

📝 Code

The Test section provides scripts to evaluate disparity estimation models on datasets like KITTI, Middlebury, and ETH3D. It helps assess the accuracy of the models and saves predicted disparity maps.

Please refer to each section for detailed instructions on setup and execution.

Warning:

Please be aware that we will not be releasing the training code for deep stereo models. The provided code focuses on evaluation and demonstration purposes only.
With the latest updates in PyTorch, slight variations in the quantitative results compared to the numbers reported in the paper may occur.

🛠️ Setup Instructions

Dependencies: Ensure that you have installed all the necessary dependencies. The list of dependencies can be found in the ./requirements.txt file.
Build rSGM:

Firstly, please initialize and update git submodules: git submodule init; git submodule update
Go to ./thirdparty/stereo-vision/reconstruction/base/rSGM/
Build and install pyrSGM package: python setup.py build_ext --inplace install

💾 Datasets

We used seven datasets for training and evaluation.