Code Monkey home page Code Monkey logo

nerf-sr's Introduction

NeRF-SR: High-Quality Neural Radiance Fields using Supersampling

This is the official implementation of our ACM MM 2022 paper NeRF-SR: High-Quality Neural Radiance Fields using Supersampling. Pull requests and issues are welcome.

Abstract: We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs. Our method is built upon Neural Radiance Fields (NeRF) that predicts per-point density and color with a multi-layer perceptron. While producing images at arbitrary scales, NeRF struggles with resolutions that go beyond observed images. Our key insight is that NeRF benefits from 3D consistency, which means an observed pixel absorbs information from nearby views. We first exploit it by a supersampling strategy that shoots multiple rays at each image pixel, which further enforces multi-view constraint at a sub-pixel level. Then, we show that NeRF-SR can further boost the performance of supersampling by a refinement network that leverages the estimated depth at hand to hallucinate details from related patches on an HR reference image. Experiment results demonstrate that NeRF-SR generates high-quality results for novel view synthesis at HR on both synthetic and real-world datasets.

Note: There is an error in the paper, for LLFF dataset training, the input resolution is 252x189, but the paper said it's 504x378.

Requirements

The codebase is tested on

  • Python 3.6.9 (should be compatible with Python 3.7+)
  • PyTorch 1.8.1
  • GeForce 1080Ti, 2080Ti, RTX 3090

Create a virtual environment and then run:

pip install -r requirements.txt

Dataset

In our paper, we use the same dataset as in NeRF:

However, our method is compatible to any dataset than can be trained on NeRF to perform super-resolution. Feel free to try out.

Render the pretrained model

We provide pretrained models in Google Drive.

For supersampling, first download the pretrained models and put them under the checkpoints/nerf-sr/${name} directory, then run:

bash scripts/test_llff_downX.sh

or

bash scripts/test_blender_downX.sh

For the ${name} parameter, you can directly use the one in the scripts. You can also modify it to your preference, then you have to change the script.

For refinement, run:

bash scripts/test_llff_refine.sh

Train a new NeRF-SR model

Please check the configuration in the scripts. You can always modify it to your desired model config (especially the dataset path and input/output resolutions).

Supersampling

bash scripts/train_llff_downX.sh

to train a 504x378 NeRF with 252x179 inputs. or

bash scripts/train_blender_downX.sh

Refinement

After supersampling and before refinement, we have to perform depth warping to find relevant patches, run:

python warp.py

to create *.loc files. An example of *.loc files can be found in the provided fern checkpoints (in the 30_val_vis folder), which can be used directly for refinement.

After that, you can train the refinement model:

bash scripts/train_llff_refine.sh

Baseline Models

To replicate the results of baseline models, first train a vanilla NeRF using command:

bash scripts/train_llff.sh

or

bash scrpts/train_blender.sh

For vanilla-NeRF, just test the trained NeRF under high resolutions using bash scripts/test_llff.sh or bash scripts/test_blender.sh (change the img_wh to your desired resolution). For NeRF-Bi, NeRF-Liif and NeRF-Swin, you need to super-resolve testing images with the corresponding model. The pretrained models of NeRF-Liif and NeRF-Swin can be found below:

  • NeRF-Liif: We used the RDN-LIIF pretrained model. The download link can be found in the official LIIF repo.
  • NeRF-Swin: We used the "Real-world image SR" setting of SwinIR and the pretrained SwinIR-M model. Click to download the x2 and x4 model.

Citation

If you consider our paper or code useful, please cite our paper:

@inproceedings{wang2022nerf,
  title={NeRF-SR: High-Quality Neural Radiance Fields using Supersampling},
  author={Wang, Chen and Wu, Xian and Guo, Yuan-Chen and Zhang, Song-Hai and Tai, Yu-Wing and Hu, Shi-Min},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={6445--6454},
  year={2022}
}

Credit

Our code borrows from nerf_pl and pytorch-CycleGAN-and-pix2pix.

nerf-sr's People

Contributors

cwchenwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

nerf-sr's Issues

FileNotFoundError: [Errno 2] No such file or directory

Thanks for your well-organized code. When I run wrap.py, an error is as follow::
File "warp.py", line 189, in
ds = LLFFDataset(root_dir, result_dir, width, height)
File "warp.py", line 31, in init
self.read_meta()
File "warp.py", line 112, in read_meta
nerf_depth = np.load(os.path.join(self.result_dir, '{}-fine-depth-ori.npz'.format(i)))['arr_0']
File "/home/xqq/anaconda3/envs/nerfsf/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/nerf-sr/llff-room-378x504-ni64-dp-ds2/30_val_vis/20-fine-depth-ori.npz'

image

l can't find the file

Quenstion about table

Hello, thank you very much for your work. I would like to ask if the PSNR indicator in the table is obtained by averaging the fine_psnr_ori_full values ​​obtained from training in each scene? The average value I calculated is slightly different from the one in the paper.

Hard coded warning

Hi,

I tried to render the model by running "bash scripts/test_llff_downX.sh", then I got a hard-coded warning and the testing was interrupted like this:

image

I tried both the pre-trained model you shared and my own trained model. In both cases, the test did not work because of this warning. Can you tell me how to solve this? Thank you!

question about pretrained model

Hi,
I'm very interested in your work ! But when I use command—bash scripts/test_llff_refine.sh to see the effect of the refinement model. It told me that I need a folder named 30_test_vis which your pretrained model is not provided. It seems that I need to retrain the model before refinement. Could you tell me how can I see the result by only using the pretrained model you provide?
Thanks a lot !

Question about table 1 and table 2

Hi, I'm very interested in your essay, I'd like to ask which specific dataset you'd used to hang out the result, like lego, chais, drums, etc. Or you just average the result. Thanks a lot!!

Question about your paper.

Thanks for your nice work!
I think you should compare with mip-nerf, cause both of you are doing the same thing.
X47}EDT0IPP(LJ$A52(S7JB

How to continue training?

Hi,

The work looks nice! I did some training by myself, however, I could not resume the training after losing the connection to the Colab. I find that there are options for "--continue_train" and setting "load_epoch" to resume from a specific epoch. I add these lines in the train_llff_downX file, but it does not work.

image

Can you tell me how to resume the training? Thank you!

Question about Supersampling

The supersampling method described in the paper is to divide a pixel into sub-pixels for training. I'm a little confused about this method and how it's written in the code.

First, the downscale variable in the code should downsample the image at the specified resolution. For example, if the input image resolution is 504×378 and downscale=2, the resolution size of the downsampled image is 252×189.My question is why down-sample it, is it to train with 252×189 resolution images as low-resolution images and 504×378 resolution images as GT?

At the same time, I do not know whether my understanding of the following code is correct

self.all_rays = torch.cat(self.all_rays, 0) #(61*h/X*w/X,X*X,8)
self.all_rgbs = torch.cat(self.all_rgbs, 0) #(61*h/X*w/X,3)
self.all_rgbs_ori = torch.cat(self.all_rgbs_ori, 0)#(61*h/X*w/X,X*X,3)

It appears to be 252×189 resolution as input for low resolution, and then 504×378 resolution as GT. At the same time, 504 x 378 is divided into multiple s x s patches, which should correspond to supersampling method.

In general, I do not understand how to divide each pixel in the input low-resolution image into s×s sub-pixels, whether to divide multiple s×s patches on the input low-resolution image or the GT image. How is the information of the divided subpixel rays and the corresponding colors obtained? Can you point out specific code actions?
Thank you very much for your contribution. But my understanding may be wrong, I hope you can answer my doubts.

Best regards!

Questions about training the llff_downX script

In the llff_downX script, it says: if downscale=4, change batchsize=128.

But when I do, I get an error:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py38/lib/python3.8/site-packages/einops/einops.py", line 412, in reduce
    return _apply_recipe(recipe, tensor, reduction_type=reduction)
  File "/root/miniconda3/envs/py38/lib/python3.8/site-packages/einops/einops.py", line 235, in _apply_recipe
    _reconstruct_from_shape(recipe, backend.shape(tensor))
  File "/root/miniconda3/envs/py38/lib/python3.8/site-packages/einops/einops.py", line 200, in _reconstruct_from_shape_uncached
    raise EinopsError("Shape mismatch, can't divide axis of length {} in chunks of {}".format(
einops.EinopsError: Shape mismatch, can't divide axis of length 378 in chunks of 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 163, in <module>
    run_dp()
  File "train.py", line 152, in run_dp
    main(None)
  File "train.py", line 38, in main
    dataset = create_dataset(opt, mode=opt.train_split, shuffle=True)  # create a dataset given opt.dataset_mode and other options
  File "/root/autodl-tmp/data/__init__.py", line 78, in create_dataset
    data_loader = CustomDatasetDataLoader(opt, mode, shuffle)
  File "/root/autodl-tmp/data/__init__.py", line 105, in __init__
    self.dataset = dataset_class(opt, mode)
  File "/root/autodl-tmp/data/llff_downX_dataset.py", line 193, in __init__
    self.read_meta()
  File "/root/autodl-tmp/data/llff_downX_dataset.py", line 329, in read_meta
    img = einops.rearrange(img, '(h s1) (w s2) c -> (h w) (s1 s2) c', 
  File "/root/miniconda3/envs/py38/lib/python3.8/site-packages/einops/einops.py", line 483, in rearrange
    return reduce(cast(Tensor, tensor), pattern, reduction='rearrange', **axes_lengths)
  File "/root/miniconda3/envs/py38/lib/python3.8/site-packages/einops/einops.py", line 420, in reduce
    raise EinopsError(message + '\n {}'.format(e))
einops.EinopsError:  Error while processing rearrange-reduction pattern "(h s1) (w s2) c -> (h w) (s1 s2) c".
 Input tensor shape: torch.Size([378, 504, 3]). Additional info: {'s1': 4, 's2': 4}.
 Shape mismatch, can't divide axis of length 378 in chunks of 4

What should I do to make it work?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.