Code Monkey home page Code Monkey logo

zeronvs's Introduction

ZeroNVS

This is the offical code release for ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image.

teaser image

What is in this repository: 3D SDS distillation code, evaluation code, trained models

In this repository, we currently provide code to reproduce our main evaluations and also to run ZeroNVS to distill NeRFs from your own images. This includes scripts to reproduce the main metrics on DTU and Mip-NeRF 360 datasets.

How do I train my own diffusion models?

Check out the companion repository, https://github.com/kylesargent/zeronvs_diffusion.

Acknowledgement

This codebase is heavily built off existing codebases for 3D-aware diffusion model training and 3D SDS distillation, namely Zero-1-to-3 and threestudio. If you use ZeroNVS, please consider also citing these great contributions.

Requirements

The code has been tested on an A100 GPU with 40GB of memory.

To get the code:

git clone https://github.com/kylesargent/zeronvs.git
cd zeronvs

To set up the environment, use the following sequence of commands. The exact setup that will work for you might be platform dependent. Note: it's normal for installing tiny-cuda-nn to take a long time.

conda create -n zeronvs python=3.8 pip
conda activate zeronvs

pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

pip install -r requirements-zeronvs.txt
pip install nerfacc -f https://nerfacc-bucket.s3.us-west-2.amazonaws.com/whl/torch-2.0.0_cu118.html

Finally, be sure to initialize and pull the code in the zeronvs_diffusion submodule.

cd zeronvs_diffusion
git submodule init
git submodule update

cd zero123
pip install -e .
cd ..

cd ..

Data and models

Since we have experimented with a variety of datasets in ZeroNVS, the codebase consumes a few different types of data formats.

To download all the relevant data and models, you can run the following commands within the zeronvs conda environment

gdown --fuzzy https://drive.google.com/file/d/1q0oMpp2Vy09-0LA-JXpo_ZoX2PH5j8oP/view?usp=sharing
gdown --fuzzy https://drive.google.com/file/d/1aTSmJa8Oo2qCc2Ce2kT90MHEA6UTSBKj/view?usp=drive_link
gdown --fuzzy https://drive.google.com/file/d/17WEMfs2HABJcdf4JmuIM3ti0uz37lSZg/view?usp=sharing

unzip dtu_dataset.zip

MipNeRF360 dataset

You can download it here. Be sure to set the appropriate path in resources.py

DTU dataset

Download it here (hosted by the PixelNeRF authors). Be sure to unzip it and then set the approriate path in resources.py

Your own images

Store them as 256x256 png images and pass them to launch_inference.sh (details below).

Models

We release our main model, trained with our $\mathbf{M}_{\mathrm{6DoF+1,~viewer}}$ parameterization on CO3D, RealEstate10K, and ACID. You can download it here. We use this one model for all our main results.

Inference

Evaluation is performed by distilling a NeRF for each of the scenes in the dataset. DTU has 15 scenes and the Mip-NeRF 360 dataset has 7 scenes. Since NeRF distillation takes ~3 hours, running the full eval can take quite some time, especially if you only have 1 GPU.

Note that you can still achieve good performance with much faster config options; for instance, reduced resolution, batch size, number of training steps, or some combination. The code as-is is just intended to reproduce the results from the paper.

After downloading the data and models, you can run the evals via either launch_eval_dtu.sh or launch_eval_mipnerf360. The metrics for each scene will be saved in metrics.json files which you must average to get the final performance.

We provide the expected performance for individual scenes in the tables below. Note that there is some randomness inherent in SDS distillation, so you may not get exactly these numbers (though the performance should be quite close, especially on average).

DTU (expected performance)

ssim psnr lpips scene_uid manual_gt_to_pred_scale
0.6094 13.2329 0.2988 8.0 1.2
0.1739 8.4278 0.5783 21.0 1.4
0.6311 14.1864 0.2332 30.0 1.5
0.2992 8.9569 0.5117 31.0 1.4
0.3862 14.049 0.3611 34.0 1.4
0.3495 12.6771 0.4659 38.0 1.3
0.4612 12.2447 0.3729 40.0 1.2
0.4657 12.5998 0.3794 41.0 1.3
0.369 11.241 0.4441 45.0 1.4
0.4456 17.0177 0.4322 55.0 1.2
0.5724 12.6056 0.2639 63.0 1.5
0.5384 12.1564 0.2725 82.0 1.5
0.5434 16.0902 0.3811 103.0 1.5
0.6353 19.5588 0.349 110.0 1.3
0.5529 18.2336 0.3613 114.0 1.3

Mip-NeRF 360 (expected performance)

ssim psnr lpips scene_uid manual_gt_to_pred_scale
0.1707 13.184 0.6536 bicycle 1.0
0.3164 13.1137 0.6122 bonsai 1.0
0.2473 12.2189 0.6823 counter 0.9
0.207 15.2817 0.5366 garden 1.0
0.254 13.2983 0.6245 kitchen 0.9
0.3431 11.8591 0.5928 room 0.9
0.1396 13.124 0.6717 stump 1.1

Running on your own images

Use the script launch_inference.sh. You will need to specify the image path, field-of-view, camera elevation, and content scale. These don't need to be exact, but badly wrong values will cause convergence failure.

Citation

If you use ZeroNVS, please cite via:

@article{zeronvs,
    author = {Sargent, Kyle and Li, Zizhang and Shah, Tanmay and Herrmann, Charles and Yu, Hong-Xing and Zhang, Yunzhi and Chan, Eric Ryan and Lagun, Dmitry and Fei-Fei, Li and Sun, Deqing and Wu, Jiajun},       
    title = {{ZeroNVS}: Zero-Shot 360-Degree View Synthesis from a Single Real Image},
    journal={arXiv preprint arXiv:2310.17994},
    year={2023}
}

zeronvs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zeronvs's Issues

360 image?

Hi! Thank you for making this project available! Amazing work.

Would it be possible to extract a 360 degree image from the result? Eg, an environment HDR?

Thanks!

Help for running ZeroNVS on custom images

Thanks for your contribution! For evaluation purposes, it would help a lot if you could add a small example on how to run launch_inference.sh on new images. I tried to run zeronvs on the following image (resized to 256x256 PNG image):

2008_001546_256x256

But the output is completely wrong and the program eventually crashes because no mesh could be reconstructed.

it0-test.mp4

I launched launch.py in the following way (NOTE I replaced --train with --export):

python launch.py --config configs/zero123_scene.yaml --export --gpu 0 \
    system.guidance.cond_image_path=/home/user/Documents/projects/zeronvs/test_data/2008_001546_256x256.png \
    data.image_path=/home/user/Documents/projects/zeronvs/test_data/2008_001546_256x256.png \
    system.guidance.pretrained_model_name_or_path=zeronvs.ckpt \
    system.guidance.pretrained_config=zeronvs_config.yaml \
    data.view_synthesis=null \
    data.default_elevation_deg=0 \
    data.default_fovy_deg=52.55 \
    data.random_camera.fovy_range="[52.55,52.55]" \
    data.random_camera.eval_fovy_deg=52.55 \
    system.loss.lambda_opaque=0.0 \
    system.background.color='[0.5,0.5,0.5]' \
    system.background.random_aug=true \
    system.background.random_aug_prob=1.0 \
    system.guidance.guidance_scale=9.5 \
    system.renderer.near_plane=0.5 \
    system.renderer.far_plane=1000.0 \
    system.guidance.precomputed_scale=0.7 \
    system.guidance.use_anisotropic_schedule=true \
    system.guidance.anisotropic_offset=1000

Let me know if you need further information. Thanks in advance!

That's a great work! But I cannot reproduce the results in my device.

Thanks for your open source and that's a really great work!

I evaluate zeonvs.ckpt using the parameters that you provide on my desktop. Many of them show unsatisfied results as below. I would like to know whether something I get wrong when running. All the settings are the same as yours. Both linux and windows cannot show satisfied results.

Are this method unstable or something haven't released yet?

Thanks for your assistance in advance.
Uploading fig_nvs.png…

How do I turn SDS anchoring on?

I realize that SDS anchoring is turned off by default.
Is there any way to turn on it by simply controling the configuration?

Cannot reproduce test result

code updated to the latest. When tested on the image "motorcycle.png", no vision was shown. no errors or warnings were present except resource limitation.

it0-test.mp4

out of memory

I want to run the model on my 3090 device, but there is not enough memory. How can I adjust the relevant parameters to make it work

Cannot reproduce some results

Thanks for your great work! I encountered a problem when I was trying to reproduce the result of motorcycle.png. I used all the default hyper-parameters but I got the following outputs:
8a2b543f6b3af2d93ba9008a9312117
Unfortunately, I'm a freshman in diffusion model. Could you give me some suggestions? Thanks a lot!

About the focus_point_fn

Hi! Sorry to bother you. Could you provide the reference or the tutorial about how you compute the focus point? I still cannot understand the code here:

def focus_point_fn(poses: np.ndarray) -> np.ndarray:
    directions, origins = poses[:, :3, 2:3], poses[:, :3, 3:4]
    m = np.eye(3) - directions * np.transpose(directions, [0, 2, 1])
    mt_m = np.transpose(m, [0, 2, 1]) @ m
    focus_pt = np.linalg.inv(mt_m.mean(0)) @ (mt_m @ origins).mean(0)[:, 0]
    return focus_pt

SDS anchoring by default turned off?

Hi,

Is SDS anchoring by default turned off? I saw n_aux_c2w = 0 in zero123_scene.yaml, which I guess means no anchor cameras. So if I run launch_eval_mipnerf360.sh directly, it's standard SDS without anchoring?

Thanks for the great work!

Clip loss?

Did the clip loss based on SDS anchoring improve with your results? If it does, which scale did you use?
Thanks:)

Cuda out of memory during inference

Hello, my device is 3090, I tried to run the inference on my own image and cuda was out of memory. I checked the passed issue but the solution doesn't work. Is there a way to solve this issue? Thanks!

Generate novel views with 6dof+1 parameterization

Hi Kyle,

Thanks for your fantastic work! I'd like to know how to generate novel views with your model trained with $\mathbf{M}_{\mathrm{6DoF+1,~viewer}}$ parameterization. I tried to use the code in the "zeronvs_diffusion" submodule but found it only uses 3dof parameterization. Could you please provide any instructions?

Thanks in advance,
Rundong

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.