Code Monkey home page Code Monkey logo

zubair-irshad / neo-360 Goto Github PK

View Code? Open in Web Editor NEW
220.0 25.0 9.0 116.91 MB

Pytorch code for ICCV'23 paper. NEO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

Home Page: https://zubair-irshad.github.io/projects/neo360.html

License: Other

Python 100.00%
3d 3d-vision artificial-intelligence autonomous-driving autonomous-vehicles computer-vision convolutional-neural-networks deep-learning differentiable-rendering implicit-neural-representation

neo-360's Introduction

NEO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

This repository is the pytorch implementation of our ICCV2023 paper.

NEO 360: Neural Fields for Sparse View Synthesis
of Outdoor Scenes

Muhammad Zubair Irshad · Sergey Zakharov · Katherine Liu · Vitor Guizilini · Thomas Kollar · Zsolt Kira · Rares Ambrus
International Conference on Computer Vision (ICCV), 2023

Georgia Institute of Technology   |   Toyota Research Institute

Citation

If you find this repository or our NERDS 360 dataset useful, please consider citing:

@inproceedings{irshad2023neo360,
  title={NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes},
  author={Muhammad Zubair Irshad and Sergey Zakharov and Katherine Liu and Vitor Guizilini and Thomas Kollar and Adrien Gaidon and Zsolt Kira and Rares Ambrus},
  journal={Interntaional Conference on Computer Vision (ICCV)},
  year={2023},
  url={https://arxiv.org/abs/2308.12967},
}

Contents

🌇 Environment

Create a python 3.7 virtual environment and install requirements:

cd $NeO-360 repo
conda create -n neo360 python=3.7
conda activate neo 360
pip install --upgrade pip
pip install -r requirements.txt
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
export neo360_rootdir=$PWD

The code was built and tested on cuda 11.3

⛳ Dataset

NERDS 360 Multi-View dataset for Outdoor Scenes

NeRDS 360: "NeRF for Reconstruction, Decomposition and Scene Synthesis of 360° outdoor scenes” dataset comprising 75 unbounded scenes with full multi-view annotations and diverse scenes for generalizable NeRF training and evaluation.

Download the dataset:

Extract the data under data directory or provide a symlink under project data directory. The directory structure should look like this $neo360_rootdir/data/PDMuliObjv6

Colamp poses can be used for techniques such as 3D Gaussian Splatting. Please see our overfitting experiments for more details.

Visualizing the dataset:

To plot accumulated pointclouds, multi-view camera annotations and bounding boxes annotations as shown in the visualization below, run the following commands. This visualization script is adapted from NeRF++.

Zoom-in the visualization to see the objects. Red cameras are source-view cameras and all green cameras denote evaluation-view cameras.

python visualize/visualize_nerds360.py --base_dir PDMultiObjv6/train/SF_GrantAndCalifornia10

Semantic Labels

Semantic labels for our dataset are included in utils/semantic_labels.py. For instance the car corresponds to id 5, Road is id 24 and so on.

You could also run the following to just visualize the poses in a unit sphere after normalization:

python visualize/visualize_poses.py --base_dir PDMultiObjv6/train/SF_GrantAndCalifornia10

🔖 Dataloaders

We provide two conveneint dataloaders written in pytorch for a. single-scene overfitting i.e. settings like testing MipNeRF-360 and b. generalizable evaluation i.e. few-shot setting introduced in our paper. There is a convenient read_poses function in each of the dataloaders. Use this function to see how we load poses with the corresponding images to use our NERDS360 dataset with any NeRF implementation.

a. Our dataloader for single scene overfitting is provided in datasets/nerds360.py. This dataloader directly outputs RGB, poses pairs in the same format as NeRF/MipNeRF-360 and could be used with any of the NeRF implementation i.e. nerf-studio, nerf-factory or nerf-pl.

b. Our dataloader for generalizable training is provided in datasets/nerds360_ae.py. ae denotes auto-encoder style of training. This datalaoder is used for the few-shot setting proposed in our NeO360 paper. During eery training iteration, we randomly select 3 source views and 1 target view and sample 1000 rays from target view for decoding the radiance field. This is the same dataloader you'll see gets utilized next during training below.

Note: Our NeRDS360 dataset also provides depth map, NOCS maps, instance segmentation, semantic segmentation and 3D bounding box annotations (all of these annotations are not used either during training or during inference and we only use RGB images). Although these annotations can be used for other computer vision tasks to push the SOTA for unbounded outdoor tasks like instance segmentation. All the other annotations are easy to load and provided as .png files

Additionally, to output a convenient transforms.json file for a scene in the original NeRF's blender format data, run the following:

python convert_to_nerf_blender --base_dir PDMultiObjv6/train/SF_GrantAndCalifornia10

Note that creating a trasnforms.json file is not required for running our codebase and is merely a convenient script to work with other NeRF architectures.

✨ Inference

Download the validation split from here. These are scenes never seen during training and are used for visualization from a pretrained model.

Download the pretrained checkpoint from here

Extract them under the project folder with directory structure data and ckpts.

Run the following script to run visualization i.e. 360° rendering from just 3 or 5 source views given as input:

python run.py --dataset_name nerds360_ae --exp_type triplanar_nocs_fusion_conv_scene --exp_name multi_map_tp_CONV_scene --encoder_type resnet --batch_size 1 --img_wh 320 240 --eval_mode vis_only --render_name 5viewtest_novelobj30_SF0_360_LPIPS --ckpt_path finetune_lpips_epoch=30.ckpt --root_dir $neo360_rootdir/data/neo360_valsplit/test_novelobj

You would see images which would produce renderings like the last column as shown below. 10 of the 100 rendered views randomly sampled are shown in the second diagram below:

For evaluation i.e. logging psnr, lpips and ssim metrics as reported in the paper, run the following script:

python run.py --dataset_name nerds360_ae --exp_type triplanar_nocs_fusion_conv_scene --exp_name multi_map_tp_CONV_scene --encoder_type resnet --batch_size 1 --img_wh 320 240 --eval_mode full_eval --render_name 5viewtest_novelobj30_SF0_360_LPIPS --ckpt_path finetune_lpips_epoch=30.ckpt --root_dir $neo360_rootdir/data/neo360_valsplit/test_novelobj

The current script evaluates the scenes one by one. Note that 3 or 5 source views are not part of any of the rendered 100 360° views and are chosen randomly from the upper hemisphere (currently hardcoded random views)

If your checkpoint was finetuned with additional LPIPS loss then the render name must have LPIPS in order to load the LPIPS weights during rendering. See our model and code for further clarification.

📉 Generalizable Training

We train on the full NERDS 360 Dataset in 2 stages. First we train using a mix of photometric loss (i.e. MSE loss) + a distortion auxillary loss for 30-50 epochs. We then finetune with an addition of LPIPS loss for a few epochs to imporve the visual fidelity of results.

All our experiments were performed on 8 Nvidia A100 GPUs. Please refer to our paper for more details.

Stage 1 training, please run:

python run.py --dataset_name nerds360_ae --root_dir $neo360_rootdir/data/PDMultiObjv6/train/ --exp_type triplanar_nocs_fusion_conv_scene --exp_name multi_map_tp_CONV_scene --encoder_type resnet --batch_size 1 --img_wh 320 240 --num_gpus 8

Stage 2 finetuning with an additional LPIPS loss, please specificy the checkpoint from which to finetune and add finetune_lpips flag to run the command below:

python run.py --dataset_name nerds360_ae --root_dir $neo360_rootdir/data/PDMultiObjv6/train --exp_type triplanar_nocs_fusion_conv_scene --exp_name multi_map_tp_CONV_scene --encoder_type resnet --batch_size 1 --img_wh 320 240 --num_gpus 8 --ckpt_path epoch=29.ckpt --finetune_lpips

At the end of training run, you will see checkpoints stored under ckpts/$exp_name directory with stage 1 training run checkpoints labelled as epoch=aa.ckpt and finetune checkpoints labelled as finetune_lpips_epoch=aa.ckpt

We also provide an is_optimize flag to finetune on the few-shot source images of a new domain as well. Please refer to our paper for more details on what this flag refers and if this is useful for your case.

We also provide a home-grown implementation of PixelNeRF which is built on top of NeRF-Factory and Pytorch Lightning. If you are a user of both of these, you might find it helpful. Just use exp_type pixelnerf and our generalizable dataset nerds360_ae to run pixelnerf training and evaluation with the scripts mentioned above :)

📊 Overfitting Training Runs

Overfitting NeRFs

While over proposed technique is a generalizable method which works in a few-shot setting, for the ease of reproducibility and to push the state-of-the-art on single scene novel-view-synthesis of unbounded scenes, we provide scripts to overfit to single scenes given many images. We provide NeRF and MipNeRF-360 baselines from NeRF-Factory with our newly proposed NeRDS360 dataset.

To overfit to a single scene using vanilla NeRF on NERDS360 Dataset, simply run:

python run.py --dataset_name nerds360 --root_dir $neo360_rootdir/data/PD_v6_test/test_novel_objs/SF_GrantAndCalifornia6 --exp_type vanilla --exp_name overfitting_test_vanilla_2 --img_wh 320 240 --num_gpus 7

For evaluation, run:

python run.py --dataset_name nerds360 --root_dir $neo360_rootdir/data/PD_v6_test/test_novel_objs/SF_GrantAndCalifornia6 --exp_type vanilla --exp_name overfitting_test_vanilla_2 --img_wh 320 240 --num_gpus 1 --eval_mode vis_only --render_name vanilla_eval

You'll see results as below. We achive a test-set PSNR of 24.75 and SSIM of 0.78 for this scene.

Left GIF Right GIF

To overfit to a single scene using MipNeRF360 on NERDS360 Dataset, simply run:

python run.py --dataset_name nerds360 --root_dir $neo360_rootdir/data/PD_v6_test/test_novel_objs/SF_GrantAndCalifornia6 --exp_type mipnerf360 --exp_name overfitting_test_mipnerf360_2 --img_wh 320 240 --num_gpus 7

Overfitting GaussianSplatting

You can use convert_to_nerf_blender.py script as shown above to create a transforms.json file in NeRF's blender format to train a Gaussian Splatting. One could also run colmap on our provided images and then train gaussian splatting using nerfstudio. We benchmarked overfitting Gaussian Splatting on NERDS360 dataset and achieved a PSNR of around 31 by just training for a few minutes on a single GPU to get the following output.

📌 FAQ

  1. The compute requirements are very high. How do I run NeO-360 on low-memory GPUs:

Please see this thread on things to try to reduce compute load.

  1. Can I use NERDS360 dataset with my own NeRF implementation or other architectures?

Absolutely, you can use NERDS360 with Gaussian Splatting or any other NeRF implementation in both overfitting (i.e. many views scenario) or few-shot (i.e. few-views scenario). Please see Dataloaders and use any of our pre-released dataloaders with your implementation. Please note that a wrapper might have to be written on top of our dataloader to try Gaussian Splatting with our dataset with requires full-images and not just rays.

Acknowledgments

This code is built upon the implementation from nerf-factory, NeRF++ and PixelNeRF with distortion loss and unbounded scene contraction used from MipNeRF360. Kudos to all the authors for great works and releasing their code. Thanks to the original NeRF implementation and the pytorch implementation nerf_pl for additional inspirations during this project.

Licenses

This repository and the NERDS360 dataset is released under the CC BY-NC 4.0 license.

neo-360's People

Contributors

eltociear avatar zubair-irshad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neo-360's Issues

Real world results from KITTI-360 evaluation

Hello,

Thank you for your work on the paper and your code publication. I wish to ask if there is already a pre-trained model for KITTI-360 or a pipeline setup for it. I have been currently working on setting up this dataset to see occupancy (density fields) results from the input view setup in the dataset. As I could see the experiment result for KITTI-360 in the paper Fig. 14 in Additional Qualitive results, I hope to see if the pre-trained model is available.

Would you share some insights regarding the training setup for KITTI-360? :)

Thank you!

Dataset License?

Congrats on the remarkable work. The repo mentions MIT License. Is the dataset also published under the same?

Why Tri-plane matters instead of 3D feature grid?

Hello, Neo-360 is a good paper. But I still have a question. If Tri-Plane is removed from your pipeline and the 3D sample point is projected into the 3D feature grid directly without 3DCNN and tri-plane, will the performance of Neo-360 degrade? In your ablation studies, I see the importance of your 3D feature grid but not the tri-plane. Could you explain why tri-plane matters?

Axes convention

Hi again !

Could you share the axes convention you used for the world and the cameras ?

For example:
- the x-axis is pointing to the camera's right
- the y-axis is pointing to the camera's forward
- the z-axis is pointing to the camera's upward

Have a good day !

Cameras in ground or objects

Hello !

Just to let you and future user know that sometimes (very rarely) a camera may be partially in the ground or an object (e.g., a trash bin).

Example with PDMultiObjv6/train/SF_VanNessAveAndTurkSt8/train/instance_segmentation_2d/midsize_sedan_04_emg_01-006.png

midsize_sedan_04_emg_01

Intrinsics missing

Dear authors,

Thank you very much for sharing your dataset.
It seems to me that the intrinsics are missing ?

Segmentation classes

Hey,

Where can I find the correspondency between semantic indices and classes ? It looks like 5 corresponds to 'cars', what about the others ?
Thank you in advance,

How to handle multi view images ?

Thanks for your jobs!,
i have a question, in the paper, a single image can generate three planes, and when infer, the fr(residual feat) can get from train image, but when i have multi view images, how to use multi view images generate the three planes , and when i want to genenrate a novel view, how to get fr ?

Code Release of Neo-360

Hi! This is a great work! I'm recently investigating efficient generalizable NeRF. My research is highly related to NeO-360. I really appreciate this work and would like to do my research based on it! When do you plan to release the whole code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.