facebookresearch / nsvf Goto Github PK

View Code? Open in Web Editor NEW

798.0 798.0 92.0 7.71 MB

Open source code for the paper of Neural Sparse Voxel Fields.

License: MIT License

Python 83.45% C++ 5.37% Cuda 5.86% C 5.32%

nsvf's People

Contributors

Stargazers

Watchers

Forkers

zebrajack lingjie0206 colinmatthewgeorge87 phongnhhn92 trendingtechnology daydreamer2023 ideaplexus cbchouinard ghasemikasra39 hiyyg avani17101 chuong ocean1100 tubbz-alt zikeyan nnnnai wps1215 iernstig lechaney fengqiangnu fkroll lgvictor nischal-avataar peterzhousz scott-vsi burningdust21 zgojcic liruilong940607 cybercoderbot flamehaze1115 baldrlector qompute zvict rocksat santolina bruinxiong derrick-xwp peterouzh yutongzheng trisct hcp6897 yikeda0124 jasonlsc ck624 roxanneluo liucsg gitshohoku realitian wbjang xgenietony kimsoohwan tne-ai felizang wei-baldwin-zeng calvinytong devinmk gatsby23 evanmey cryptowealth-technology mrbrain295 benjaclara erichyd ashishd loladeng djbicycle knowledgecluster 3a1b2c3 cubantonystark poetrywanderer tianchong-jiang 147-enpu pean1128 aka-blackboots lemolemac ayanamisaki sodiqsrb eveneveno barikata1984 jxu-thu louhz yukinozzz minseong0106 qiruih qianqian121 jinwook-shim li195111 agnsud xifeng205 fanwei360 whuhxb bedlam520 sorcererq

nsvf's Issues

IndexError: list index out of range in rendering_loss.py

On running train.py as mentioned in readme, I get the following error:

NeRF Running Configuration

Describe the bug
I am trying to run the NeRf implementation using the same code base. However, I am getting a memory over run issue. What should the configuration setting be?

To Reproduce
Steps to reproduce the behavior:
Configuration used for training:
export DATASET="/nitthilan/data/NSVF/Synthetic_NSVF/Spaceship/"
export SAVE="./spaceship_nerf_ckpt/"
export TRAIN_DIM="50x50"
export TRAIN_VIEWS="0..100"

export VALID_DIM="1x1"
export VALID_VIEWS="0..100"

export PRUNE_EVERY=2500

export VIEW_PER_BATCH=1
export PIXEL_PER_VIEW=2048
export CUDA_VISIBLE_DEVICES=0,1,2,3

python -u train.py ${DATASET}
--user-dir fairnr
--task single_object_rendering
--train-views ${TRAIN_VIEWS} --view-resolution ${TRAIN_DIM}
--max-sentences 1 --view-per-batch ${VIEW_PER_BATCH} --pixel-per-view ${PIXEL_PER_VIEW}
--no-preload
--sampling-on-mask 1.0 --no-sampling-at-reader
--valid-views ${VALID_VIEWS} --valid-view-resolution ${VALID_DIM}
--valid-view-per-batch 1
--transparent-background "1.0,1.0,1.0" --background-stop-gradient
--arch nerf_base
--color-weight 128.0 --alpha-weight 1.0
--optimizer "adam" --adam-betas "(0.9, 0.999)"
--lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000
--criterion "srn_loss" --clip-norm 0.0
--num-workers 0
--seed 2
--save-interval-updates 500 --max-update 150000
--virtual-epoch-steps 500 --save-interval 1
--half-voxel-size-at "5000,25000,75000"
--reduce-step-size-at "5000,25000,75000"
--pruning-every-steps ${PRUNE_EVERY}
--keep-interval-updates 5 --keep-last-epochs 5
--log-format simple --log-interval 1
--save-dir ${SAVE}
--raymarching-tolerance 0.01
--tensorboard-logdir ${SAVE}/tensorboard
| tee -a $SAVE/train.log

Expected behavior
If I increase the TRAIN_DIM="50x50" to anything more then the memory consumed exceeds 25GB. I am running it on a 4 GPU system. Ideally, the same configuration runs for nsvf_base arch for dimensions of 800x800. Am I missing something?

RuntimeError: shape '[256, -1, 60]' is invalid for input of size 15240

Running script:

python -u /home/stud/kghasemi/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 0.0.0.0 --port 41219 --file /tmp/kghasemi/GRF/train.py /tmp/kghasemi/data/Synthetic_NSVF/Wineholder --user-dir fairnr --task single_object_rendering --train-views 0..100 --view-resolution 200x200 --max-sentences 1 --view-per-batch 2 --pixel-per-view 64 --no-preload --sampling-on-mask 1.0 --no-sampling-at-reader --disable-validation --valid-view-resolution 200x200 --valid-views 100..200 --valid-view-per-batch 2 --transparent-background 1.0,1.0,1.0 --background-stop-gradient --arch nsvf_base --initial-boundingbox /tmp/kghasemi/data/Synthetic_NSVF/Wineholder/bbox.txt --raymarching-stepsize-ratio 0.125 --use-octree --discrete-regularization --color-weight 128.0 --alpha-weight 1.0 --optimizer adam --adam-betas "(0.9, 0.999)" --lr-scheduler polynomial_decay --total-num-update 150000 --lr 0.001 --clip-norm 0.0 --criterion srn_loss --num-workers 0 --seed 2 --save-interval-updates 500 --max-update 150000 --virtual-epoch-steps 50 --save-interval 1 --half-voxel-size-at 5000,25000,75000 --reduce-step-size-at 5000,25000,75000 --pruning-every-steps 2500 --keep-interval-updates 5 --log-format simple --log-interval 1 --tensorboard-logdir checkpoint/Wineholder/tensorboard/nsvf_basev2 --save-dir checkpoint/Wineholder/nsvf_basev2

When I choose --pixel-per-view 64 after a few epochs training on Wineholder, I get this error:

Traceback (most recent call last):
data_utils.py: get_uv  File "/tmp/kghasemi/conda/lib/python3.8/contextlib.py", line 131, in __exit__

    self.gen.throw(type, value, traceback)
  File "/tmp/kghasemi/conda/lib/python3.8/site-packages/fairseq/logging/metrics.py", line 95, in aggregate
    yield agg
  File "/tmp/kghasemi/conda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/tmp/kghasemi/GRF/fairnr_cli/train.py", line 184, in train
    log_output = trainer.train_step(samples)
  File "/tmp/kghasemi/conda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/tmp/kghasemi/conda/lib/python3.8/site-packages/fairseq/trainer.py", line 457, in train_step
    raise e
  File "/tmp/kghasemi/conda/lib/python3.8/site-packages/fairseq/trainer.py", line 425, in train_step
    loss, sample_size_i, logging_output = self.task.train_step(
  File "/tmp/kghasemi/GRF/fairnr/tasks/neural_rendering.py", line 300, in train_step
    return super().train_step(sample, model, criterion, optimizer, update_num, ignore_grad)
  File "/tmp/kghasemi/conda/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 351, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/tmp/kghasemi/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/tmp/kghasemi/GRF/fairnr/criterions/rendering_loss.py", line 42, in forward
    net_output = model(**sample)
  File "/tmp/kghasemi/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/tmp/kghasemi/GRF/fairnr/models/fairnr_model.py", line 77, in forward
    results = self._forward(ray_start, ray_dir, **kwargs)
  File "/tmp/kghasemi/GRF/fairnr/models/nsvf.py", line 79, in _forward
    samples = self.encoder.ray_sample(intersection_outputs)
  File "/tmp/kghasemi/GRF/fairnr/modules/encoder.py", line 355, in ray_sample
    sampled_idx, sampled_depth, sampled_dists = uniform_ray_sampling(
  File "/tmp/kghasemi/GRF/fairnr/clib/__init__.py", line 192, in forward
    pts_idx = pts_idx.reshape(G, -1, P)
RuntimeError: shape '[256, -1, 60]' is invalid for input of size 15240

Process finished with exit code 1

Reproducibility on real dataset

Hello! Thank you for open-sourcing this amazing work.

I have attempted to reproduce your research results on real, inward facing dataset and struggled to achieve photo-realistic results. After a thorough process of elimination, I make an educated guess that the problem is with NSVF's robustness to slightly perturbated camera poses. Running on the same colmap camera poses, we observed that the original NeRF model is significantly more robust to camera pose inaccuracies.

To reproduce: Run colmap on synthetic Lego to get non-ground-truth camera poses. Train model with original configurations.

Here is an example:
Run colmap on synthetic Lego dataset, with exhaustive matching.

Now we compare training results on three sets of camera poses: 1) ground truth, 2) ground truth + additive gaussian noise (std=0.01), 3) colmap camera poses in NSVF's coordinate frame convention. We can see that colmap camera poses does not produce desirable result.

Here is a snapshot of the loss function (pink=ground truth poses, blue=colmap). Note that training on colmap camera poses cannot converge to the same photorealistic results regardless of training time.

Here is another example on a real captured inward facing dataset:

Install bug fix

Replace _ext_src_root = "fairnr/clib" by _ext_src_root = os.path.abspath("fairnr/clib") in setup.py

Preprocessing functions: Pose.txt generating functions and rendering scripts

Is your feature request related to a problem? Please describe.
Would it be possible to share the code used to generate the pose.txt files for the different datasets?
Also if possible the script to generate the rendered images from 3D models?

How to get the correct bbox.txt intrinsic.txt ?

I reviewed related issues here but there is still no correct way found.
Can you please provide your code to get the values?
I'm struggling to reproduce your result with the provided images(especially real, big size objects using colmap. not small, synthetic ones from blender).
I've always failed to get the reasonable rendering results, just seen so ambiguous images.

for example, in Wineholder case,
colmap shows reasonable result so that I can trust it.

(below is Fountain case,

But the bbox.txt using "https://github.com/yxie20/lego_nsvf/tree/master/poses" is :
-1.7405309358237018 0.8937903721375522 0.8837398050743352 2.0868807956960875 3.9037096850320445 3.3750371294069716 0.3239107072481569

and your provided bbox.txt is

-0.5884649 -0.44142154 -0.12279411 0.78653513 0.43357844 0.8772059 0.125.

it's totally different!

camera intrinsics are little similar

888.0 0.0 400.0 0.0
0.0 888.0 400.0 0.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0

875.000000 400.000000 400.000000 0.
0. 0. 0.
0.
1.
800 800
~
Thank you!

Windows 10

Is your feature request related to a problem? Please describe.
I'm always frustrated when I cant try pytorch/cuda/python things under windows

Describe the solution you'd like
what you want to happen : smooth install

Describe alternatives you've considered
any alternative solutions conda or wsl ubuntu working way

Additional context
wsl ubuntu windows 10 has problem with cuda
conda windows 10 has problem with vc14++ compilation allmost on torchsearchsorted

Depth evaluation

Did you compare the depth accuracy with NeRF? In my opinion your sampling strategy samples points closer to where the object is located, so intuitively your depth should be better. Is this correct?

Reduce Memory Use of GPUs in one line code.

I have try to run this project codes in RTX2080Ti (11GB) x 4，

the original args like "--view-per-batch 4 --pixel-per-view 2048" will cause the OOM Error In Cuda devices in just 2 iters,

so I try to reduce the batch size to "--view-per-batch 4 --pixel-per-view 128"，and it works well in the first 5000 iters,

and the args "--view-per-batch 2 --pixel-per-view 128", works well in the first 25000 iters,

They will finally cause the OOM Error in the voxels split step(just a guess)，So I try to check the codes about the mm control part, and I did not found any codes about "Release the unused cache of Pytorch"，like some codes：

torch.cuda.empty_cache()

so I try to add this code to the "fairnr/models/nsvf.py/NSVFModel/clean_caches"：

    def clean_caches(self, reset=False):
        self.encoder.clean_runtime_caches()
        if reset:
            self.encoder.reset_runtime_caches()
        torch.cuda.empty_cache() # cache release after Model do all things

And this really help me to do more split steps (but still can not do more split steps like after 75000 iters)。

Before Add this line code：

Mem use of Cuda device: 4000MB ->(voxel split) 8000MB -> (voxel split) OOM Error

After Add this line code:

Mem use of Cuda device: 4000MB ->(voxel split) 6800MB -> (voxel split) 9900MB ->  (voxel split) OOM Error

And I don't find any bad affect on the results, yet.

I also try other ways to solve the problem of OOM, like add args "--fp16" to turn on the fp16 mode in apex module(which says can reduce the mem use due to use float16),
But this just cause error which I post the Issue #33.

If you guys have interesets about how to run these codes in the other cuda device(especially those not have so much gpu mm as V100 32GB)，This line code and the bug report maybe useful for you guys.

Thanks for replying.

Tensors Found on Different Devices

Hi,

I'm trying to run the train_wineholder.sh script on my machine. It works fine for the first 500 iterations, but immediately after the 500th iteration, it pauses, eventually throwing the following error related to tensors existing on different devices:

Start of training:

2020-11-16 10:31:25 | INFO | fairseq.distributed_utils | distributed init (rank 1): tcp://localhost:14275
2020-11-16 10:31:25 | INFO | fairseq.distributed_utils | distributed init (rank 0): tcp://localhost:14275
2020-11-16 10:31:27 | INFO | fairseq.distributed_utils | initialized host lightbox-desktop as rank 0
2020-11-16 10:31:27 | INFO | fairseq.distributed_utils | initialized host lightbox-desktop as rank 1
2020-11-16 10:31:27 | INFO | fairnr_cli.train | Namespace(L1=False, adam_betas='(0.9, 0.999)', adam_eps=1e-08, all_gather_list_size=16384, alpha_weight=1.0, arch='nsvf_base', background_depth=5.0, background_stop_gradient=True, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_suffix='', chunk_size=64, clip_norm=0.0, color_weight=128.0, cpu=False, criterion='srn_loss', curriculum=0, data='/code/nsvf/datasets/Synthetic_NSVF/Wineholder', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', density_embed_dim=128, depth_weight=0.0, depth_weight_decay=None, deterministic_step=False, device_id=0, disable_validation=False, discrete_regularization=True, distributed_backend='nccl', distributed_init_method='tcp://localhost:14275', distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=2, distributed_wrapper='DDP', empty_cache_freq=0, end_learning_rate=0.0, eval_lpips=False, fast_stat_sync=False, feature_embed_dim=256, feature_layers=1, find_unused_parameters=False, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, half_voxel_size_at='5000,25000,75000', initial_boundingbox='/code/nsvf/datasets/Synthetic_NSVF/Wineholder/bbox.txt', inputs_to_density='emb:6:32', inputs_to_texture='feat:0:256, ray:4', keep_best_checkpoints=-1, keep_interval_updates=5, keep_last_epochs=-1, load_depth=False, load_mask=False, localsgd_frequency=3, log_format='simple', log_interval=1, lr=[0.001], lr_scheduler='polynomial_decay', max_epoch=0, max_hits=60, max_sentences=1, max_sentences_valid=1, max_tokens=None, max_tokens_valid=None, max_update=150000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_color=-1, min_loss_scale=0.0001, min_lr=-1, model_parallel_size=1, no_background_loss=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_load_binary=False, no_preload=True, no_progress_bar=False, no_sampling_at_reader=True, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=2, num_workers=0, object_id_path=None, optimizer='adam', optimizer_overrides='{}', output_valid=None, patience=-1, pixel_per_view=2048.0, power=1.0, profile=False, pruning_every_steps=2500, pruning_rerun_train_set=False, pruning_th=0.5, pruning_with_train_stats=False, quantization_config_path=None, raymarching_stepsize=0.01, raymarching_stepsize_ratio=0.125, raymarching_tolerance=0, reduce_step_size_at='5000,25000,75000', rendering_args=None, rendering_every_steps=None, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sampling_at_center=1.0, sampling_on_bbox=False, sampling_on_mask=1.0, sampling_patch_size=1, sampling_skipping_size=1, save_dir='/code/nsvf/checkpoints/Wineholder/nsvf_basev1', save_interval=1, save_interval_updates=500, scoring='bleu', seed=2, sentence_avg=False, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, subsample_valid=-1, task='single_object_rendering', tensorboard_logdir='/code/nsvf/checkpoints/Wineholder/tensorboard/nsvf_basev1', test_views='0', texture_embed_dim=256, texture_layers=3, threshold_loss_scale=None, tokenizer=None, total_num_update=150000, tpu=False, train_subset='train', train_views='0..100', transparent_background='1.0,1.0,1.0', update_freq=[1], use_bmuf=False, use_octree=True, use_old_adam=False, user_dir='fairnr', valid_chunk_size=64, valid_subset='valid', valid_view_per_batch=1, valid_view_resolution='800x800', valid_views='100..200', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, vgg_level=2, vgg_weight=0.0, view_per_batch=2, view_resolution='800x800', virtual_epoch_steps=5000, voxel_embed_dim=32, voxel_path=None, voxel_size=0.25, warmup_updates=0, weight_decay=0.0)
2020-11-16 10:31:27 | INFO | fairnr_cli.train | NSVFModel(
  (reader): ImageReader()
  (encoder): SparseVoxelEncoder(
    (values): Embedding(1170, 32)
  )
  (field): RaidanceField(
    (bg_color): BackgroundField()
    (den_filters): ModuleDict(
      (emb): NeRFPosEmbLinear(Cat(32, Sinusoidal (in=32, out=384, angular=False)))
    )
    (tex_filters): ModuleDict(
      (feat): Identity()
      (ray): NeRFPosEmbLinear(Sinusoidal (in=3, out=24, angular=True))
    )
    (feature_field): ImplicitField(
      (net): Sequential(
        (0): FCLayer(
          (net): Sequential(
            (0): Linear(in_features=416, out_features=256, bias=True)
            (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (2): ReLU(inplace=True)
          )
        )
        (1): FCLayer(
          (net): Sequential(
            (0): Linear(in_features=256, out_features=256, bias=True)
            (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (2): ReLU(inplace=True)
          )
        )
        (2): FCLayer(
          (net): Sequential(
            (0): Linear(in_features=256, out_features=256, bias=True)
            (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (2): ReLU(inplace=True)
          )
        )
      )
    )
    (predictor): SignedDistanceField(
      (hidden_layer): FCLayer(
        (net): Sequential(
          (0): Linear(in_features=256, out_features=128, bias=True)
          (1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (2): ReLU(inplace=True)
        )
      )
      (output_layer): Linear(in_features=128, out_features=1, bias=True)
    )
    (renderer): TextureField(
      (net): Sequential(
        (0): FCLayer(
          (net): Sequential(
            (0): Linear(in_features=280, out_features=256, bias=True)
            (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (2): ReLU(inplace=True)
          )
        )
        (1): FCLayer(
          (net): Sequential(
            (0): Linear(in_features=256, out_features=256, bias=True)
            (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (2): ReLU(inplace=True)
          )
        )
        (2): FCLayer(
          (net): Sequential(
            (0): Linear(in_features=256, out_features=256, bias=True)
            (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (2): ReLU(inplace=True)
          )
        )
        (3): FCLayer(
          (net): Sequential(
            (0): Linear(in_features=256, out_features=256, bias=True)
            (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (2): ReLU(inplace=True)
          )
        )
        (4): Linear(in_features=256, out_features=3, bias=True)
      )
    )
  )
  (raymarcher): VolumeRenderer()
)
2020-11-16 10:31:27 | INFO | fairnr_cli.train | model nsvf_base, criterion SRNLossCriterion
2020-11-16 10:31:27 | INFO | fairnr_cli.train | num. model params: 582737 (num. trained: 582724)
2020-11-16 10:31:27 | INFO | fairseq.utils | ***********************CUDA enviroments for all 2 workers***********************
2020-11-16 10:31:27 | INFO | fairseq.utils | rank   0: capabilities =  6.1  ; total memory = 7.929 GB ; name = GeForce GTX 1080                        
2020-11-16 10:31:27 | INFO | fairseq.utils | rank   1: capabilities =  6.1  ; total memory = 7.921 GB ; name = GeForce GTX 1080                        
2020-11-16 10:31:27 | INFO | fairseq.utils | ***********************CUDA enviroments for all 2 workers***********************
2020-11-16 10:31:27 | INFO | fairnr_cli.train | training on 2 GPUs
2020-11-16 10:31:27 | INFO | fairnr_cli.train | max tokens per GPU = None and max sentences per GPU = 1
2020-11-16 10:31:27 | INFO | fairseq.trainer | no existing checkpoint found /code/nsvf/checkpoints/Wineholder/nsvf_basev1/checkpoint_last.pt
2020-11-16 10:31:27 | INFO | fairseq.trainer | loading train data for epoch 1
/code/nsvf/env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py:397: UserWarning: The `check_reduction` argument in `DistributedDataParallel` module is deprecated. Please avoid using it.
  warnings.warn(
/code/nsvf/env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py:397: UserWarning: The `check_reduction` argument in `DistributedDataParallel` module is deprecated. Please avoid using it.
  warnings.warn(
Building EasyOctree done. total #nodes = 1881, terminal #nodes = 864 (time taken 0.254881 s)
Building EasyOctree done. total #nodes = 1881, terminal #nodes = 864 (time taken 0.260331 s)
/code/nsvf/env/lib/python3.8/site-packages/fairseq/utils.py:304: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
  warnings.warn(
/code/nsvf/env/lib/python3.8/site-packages/fairseq/utils.py:304: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library

Then at iter 500:

2020-11-16 10:41:12 | INFO | train_inner | epoch 001:    500 / 5000 loss=23.507, vox=0.125, stp=0.016, tvo=0.127, asf=68.201, ash=68.201, nvo=864, color=0.182, alpha=0.228, wps=2.1, ups=1.07, wpb=2, bsz=2, num_updates=500, lr=0.000996667, gnorm=171.876, train_wall=1, wall=464

Traceback (most recent call last):
  File "train.py", line 20, in <module>
    cli_main()
  File "/code/nsvf/fairnr_cli/train.py", line 353, in cli_main
    torch.multiprocessing.spawn(
  File "/code/nsvf/env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/code/nsvf/env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/code/nsvf/env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/code/nsvf/env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/code/nsvf/fairnr_cli/train.py", line 338, in distributed_main
    main(args, init_distributed=True)
  File "/code/nsvf/fairnr_cli/train.py", line 104, in main
    should_end_training = train(args, trainer, task, epoch_itr)
  File "/media/lightbox/Extra/anaconda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/code/nsvf/fairnr_cli/train.py", line 204, in train
    valid_losses = validate_and_save(args, trainer, task, epoch_itr, valid_subsets)
  File "/code/nsvf/fairnr_cli/train.py", line 245, in validate_and_save
    valid_losses = validate(args, trainer, task, epoch_itr, valid_subsets)
  File "/code/nsvf/fairnr_cli/train.py", line 302, in validate
    trainer.valid_step(sample)
  File "/media/lightbox/Extra/anaconda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/code/nsvf/env/lib/python3.8/site-packages/fairseq/trainer.py", line 631, in valid_step
    raise e
  File "/code/nsvf/env/lib/python3.8/site-packages/fairseq/trainer.py", line 615, in valid_step
    _loss, sample_size, logging_output = self.task.valid_step(
  File "/code/nsvf/fairnr/tasks/neural_rendering.py", line 306, in valid_step
    images = model.visualize(sample, shape=0, view=0)
  File "/code/nsvf/env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/code/nsvf/fairnr/models/fairnr_model.py", line 126, in visualize
    images = {
  File "/code/nsvf/fairnr/models/fairnr_model.py", line 127, in <dictcomp>
    tag: recover_image(width=width, **images[tag])
  File "/code/nsvf/fairnr/data/data_utils.py", line 264, in recover_image
    img = ((img - min_val) / (max_val - min_val)).clamp(min=0, max=1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

The only change I've made to the training script is, reducing --view-per-batch to 1. Do you have any idea what the issue might be?

I'm running this on Ubuntu 20.04 with two GTX GeForce 1080 GPUs, CUDA version 10.1. Let me know if I can provide any further info at this time! Thanks so much!

CUDA kernel failed : invalid device function

Hi:
The script reported "CUDA kernel failed : invalid device function
void svo_intersect_point_kernel_wrapper(int, int, int, float, int, const float*, const float*, const float*, const int*, int*, float*, float*) at L:384 in fairnr/clib/src/intersect_gpu.cu" when I try to train a network. The detial imformation are shown below:

Do you have any idea to solve such a problem?

Dockerfile to build and containerize NSVF

I would like to run NSVF as a container with a dockerfile.

I started building my own but am having trouble building the project inside an nvidia docker container as I'm using WSL2 and the current build doesn't come with NVCC or debug tools to check the output.

I think the build scripts and Dockerfiles should match the build workflow for Alicevision Meshroom. https://github.com/alicevision/meshroom/tree/develop/docker

I still need to build the build-ubuntu.sh script but any help or info on building this cleanly would be appreciated.

ARG UBUNTU_VERSION
ARG COMMAND

FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
COPY . /NVSF/
WORKDIR /NVSF
RUN apt-get update \
  && apt-get install -y python3-pip python3 git \
  && cd /usr/local/bin \
  && ln -s /usr/bin/python3 python \
  && pip3 install --upgrade pip \
  && pip3 install torch \
  && pip3 install -r /NVSF/requirements.txt
CMD /bin/bash ${COMMAND}

how can i test on my custom scene?

results are impressive.
how can i get cam pos and intrinsics for my custom photos?

Could u guys share the pose export script in Blender?

I already try to export the pose in blender by myself in the recent two weeks.

But I find some issues about the process:

First, when I used the codes, which similar to the script shared by the Nerf team, to export the poses of cameras, I found that all rays miss voxels in NSVF train codes(cause Error), which means that all rays goes the wrong direction, I try to deepinto the codes and debug to figure out why or what cause the problem, and I found that the defaut direction of the camera in blender is pointed to negative z-axis, which means this is conflict with the codes of NSVF, NSVF train codes set the default depth to 1 at ray direction calculating stage(in blender the camera imaging depth is -1 due to negative z-axis)，and this cause all ray direction point to the opposition and so miss all voxels.
Second, I check the pose data of same model between Nerf and NSVF, and I found that the pose(cam2world matrix) at the second/third cols is different from the nerf data(nsvf those two cols multi -1 turn into nerf data)，this maybe an evidence to the depth setting problem just mentioned at the first part.
After I fix the depth bugs of depth setting, train codes run normally, but compared to the official data shared by NSVF, my results are no good enough, and in valid part, all images someway fall into a malposition problem(lead to very high valid loss problem), which I also try to debug and understand the problems for some days, and I still do not get the good results in my all trys.

So, can u guys share the export scripts in blender? or other 3D model application?

colab

Hi can you please add a google colab thanks

KeyError: 'step' in write_images(self.writer, images, self._num_updates['step'])

Describe the bug
I have tried your example script for evaluation:

DATA="Wineholder"
RES="800x800"
ARCH="nsvf_base"
SUFFIX="v1"
DATASET=/private/home/jgu/data/shapenet/release/Synthetic_NSVF/${DATA}
SAVE=/checkpoint/jgu/space/neuralrendering/new_release/$DATA
MODEL=$ARCH$SUFFIX
MODEL_PATH=$SAVE/$MODEL/checkpoint_last.pt

# start validating a trained model with target images.
# CUDA_VISIBLE_DEVICES=0 \
python validate.py ${DATASET} \
    --user-dir fairnr \
    --valid-views "200..400" \
    --valid-view-resolution "800x800" \
    --no-preload \
    --task single_object_rendering \
    --max-sentences 1 \
    --valid-view-per-batch 1 \
    --path ${MODEL_PATH} \
    --model-overrides '{"chunk_size":1024,"raymarching_tolerance":0.01,"tensorboard_logdir":"","eval_lpips":True}' \

Using this script I could do an evaluation using my trained model. However, the reconstructed images are missing in tensorboard. I noticed that you are passing tensorboard_logdir as an empty string in --model-overrides option. I tried populating it. I also added --results-path option. Here you can find my script :

MODEL="nsvf_basev1"
MODEL_PATH=checkpoint/${DATA}/${MODEL}/checkpoint_best.pt
SAVE=eval/${DATA}/${MODEL}
NAME="${DATA}_${MODEL}"
TB_LOG=checkpoint/${DATA}/tensorboard/${MODEL}


python validate.py ${DATASET} \
    --user-dir fairnr \
    --valid-views "200..205" \
    --valid-view-resolution "800x800" \
    --no-preload \
    --task single_object_rendering \
    --max-sentences 1 \
    --valid-view-per-batch 1 \
    --path ${MODEL_PATH} \
    --model-overrides "{'chunk_size' : 512,'raymarching_tolerance' : 0.01, 'tensorboard_logdir' : '${TB_LOG}', 'eval_lpips' : True}" \
    --results-path ${SAVE} \

However, when running the script, I get this error:

  | valid on 'valid' subset:   0%|                 | 0/200 [00:00<?, ?it/s]Building EasyOctree done. total #nodes = 11371, terminal #nodes = 5026 (time taken 1.524386 s)
Traceback (most recent call last):
   File "validate.py", line 11, in <module>
     cli_main()
   File "/tmp/kghasemi/NSVF/fairnr_cli/validate.py", line 153, in cli_main
     distributed_utils.call_main(args, main, override_args=override_args)
   File "/tmp/kghasemi/conda/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 189, in call_main
     main(args, **kwargs)
   File "/tmp/kghasemi/NSVF/fairnr_cli/validate.py", line 114, in main
     _loss, _sample_size, log_output = task.valid_step(sample, model, criterion)
   File "/tmp/kghasemi/NSVF/fairnr/tasks/neural_rendering.py", line 308, in valid_step
     write_images(self.writer, images, self._num_updates['step'])
KeyError: 'step'

I search this part of the code but could not figure out what the issue was:

    def valid_step(self, sample, model, criterion):
        loss, sample_size, logging_output = super().valid_step(sample, model, criterion)
        model.add_eval_scores(logging_output, sample, model.cache, criterion, outdir=self.output_valid)
        if self.writer is not None:
            images = model.visualize(sample, shape=0, view=0)
            if images is not None:
                write_images(self.writer, images, self._num_updates['step'])

To Reproduce
Steps to reproduce the behavior:

Train a model on an arbitrary dataset
Using your trained model. try to do an evaluation

Expected behavior
The evaluation should not produce any error and reconstructed images must appear in the Tensorboard

Screenshots

Desktop (please complete the following information):

Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-123-generic x86_64)

insufficient shared memory (i think) during render

Describe the bug
I get an exception when trying to run render.py on the synthetic wine-holder example.

To Reproduce
Run the wine-holder example using the command in the README:

DATASET=/opt/Synthetic_NSVF/Wineholder
SAVE=/opt/output
python -u train.py ${DATASET} \
    --user-dir fairnr \
    --task single_object_rendering \
    --train-views "0..100" --view-resolution "800x800" \
    --max-sentences 1 --view-per-batch 4 --pixel-per-view 2048 \
    --no-preload \
    --sampling-on-mask 1.0 --no-sampling-at-reader \
    --valid-views "100..200" --valid-view-resolution "400x400" \
    --valid-view-per-batch 1 \
    --transparent-background "1.0,1.0,1.0" --background-stop-gradient \
    --arch nsvf_base \
    --initial-boundingbox ${DATASET}/bbox.txt \
    --use-octree \
    --raymarching-stepsize-ratio 0.125 \
    --discrete-regularization \
    --color-weight 128.0 --alpha-weight 1.0 \
    --optimizer "adam" --adam-betas "(0.9, 0.999)" \
    --lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000 \
    --criterion "srn_loss" --clip-norm 0.0 \
    --num-workers 0 \
    --seed 2 \
    --save-interval-updates 500 --max-update 150000 \
    --virtual-epoch-steps 5000 --save-interval 1 \
    --half-voxel-size-at  "5000,25000,75000" \
    --reduce-step-size-at "5000,25000,75000" \
    --pruning-every-steps 2500 \
    --keep-interval-updates 5 --keep-last-epochs 5 \
    --log-format simple --log-interval 1 \
    --save-dir ${SAVE} \
    --tensorboard-logdir ${SAVE}/tensorboard \
    | tee -a $SAVE/train.log

I was able to run for 15 checkpoints before it hung due to memory errors. I am able to export a ply with extract.py and it looks good.

Try to render from given camera poses with the command in the README:

MODEL_PATH=${SAVE}/checkpoint_best.pt
python render.py ${DATASET} \
    --user-dir fairnr \
    --task single_object_rendering \
    --path ${MODEL_PATH} \
    --model-overrides '{"chunk_size":512,"raymarching_tolerance":0.01}' \
    --render-save-fps 24 \
    --render-resolution "800x800" \
    --render-camera-poses ${DATASET}/pose \
    --render-views "200..400" \
    --render-output ${SAVE}/output \
    --render-output-types "color" "depth" "voxel" "normal" --render-combine-output \
    --log-format "simple"

Results in this exception:

2021-01-08 21:46:18 | INFO | fairnr.renderer | rendering starts. fairnr BaseModel
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Building EasyOctree done. total #nodes = 10097, terminal #nodes = 4457 (time taken 2.363832 s)
Building EasyOctree done. total #nodes = 10097, terminal #nodes = 4457 (time taken 2.387693 s)
Building EasyOctree done. total #nodes = 10097, terminal #nodes = 4457 (time taken 2.516335 s)
Traceback (most recent call last):
  File "render.py", line 11, in <module>
    cli_main()
  File "/src/NSVF/fairnr_cli/render_multigpu.py", line 137, in cli_main
    distributed_utils.call_main(args, main)
  File "/venv/lib/python3.7/site-packages/fairseq/distributed_utils.py", line 174, in call_main
    args.distributed_world_size,
  File "/venv/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/venv/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/usr/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/venv/lib/python3.7/site-packages/fairseq/distributed_utils.py", line 156, in distributed_main
    main(args, **kwargs)
  File "/src/NSVF/fairnr_cli/render_multigpu.py", line 35, in main
    return _main(args, sys.stdout)
  File "/src/NSVF/fairnr_cli/render_multigpu.py", line 108, in _main
    for i, sample in enumerate(t):
  File "/venv/lib/python3.7/site-packages/fairseq/logging/progress_bar.py", line 245, in __iter__
    for i, obj in enumerate(self.iterable, start=self.n):
  File "/venv/lib/python3.7/site-packages/fairseq/data/iterators.py", line 61, in __iter__
    for x in self.iterable:
  File "/venv/lib/python3.7/site-packages/fairseq/data/iterators.py", line 517, in __next__
    raise item
  File "/venv/lib/python3.7/site-packages/fairseq/data/iterators.py", line 454, in run
    item = next(self._source_iter)
  File "/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
    idx, data = self._get_data()
  File "/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 808, in _get_data
    success, data = self._try_get_data()
  File "/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 774, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 152108) exited unexpectedly

I tried running with the other checkpoints but got the same thing. Is there a parameter I can tweak to avoid these memory issues?

Desktop (please complete the following information):

Ubuntu 18.04
CUDA 10.1
4 NVidia Tesla V100 w/ 32GB RAM

Forward facing real scene on NSVF

Hi Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt,

As talked in Email, we converted nerf NDC coordinates to support with your code and ran your code on a scene from NeRF, “trex” with minor changes on launch script. Is this the result that you expect?

Here is trex on NSVF.
https://drive.google.com/file/d/1EoUnrAEbN6vnnXy7WpWCLqhjex8AByrL/view?fbclid=IwAR27zUKFwvS3gATUMUfQH8Ij_n4EaYJpep94ejFRr2aLbIB9lBFL_FssUaE

Here is our trex dataset.
https://drive.google.com/drive/folders/1bKfaNuafR46qfdcxSiAlFyi3Ox7yIkxO?usp=sharing

Here is bash script to train.
https://gist.github.com/pureexe/52486aa8f62dff0a9a8770b5861e7deb

Here is bash script to render.
https://gist.github.com/pureexe/e9cda0b0adf671a84a1711f6e3e19c4a

Kind regards,
Pakkapon

CC: @domejiraphon @sWizad @supasorn

Request for official results on testing set.

Is your feature request related to a problem? Please describe.
We want to have a qualitative comparison in our work.

Describe the solution you'd like
We hope the official test-set results on Synthetic-NeRF, Synthetic-NSVF, BlendedMVS, and Tanks and Temples can be released for qualitative comparisons in future studies.

Why the validation loss is smaller than the training loss?

I have replicated the NSVF datasets. The rendered views looked good. But why the training loss is always far larger than the validation loss? Here are the scripts of parameters.

python -u train.py ${Dataset}
--user-dir fairnr
--task single_object_rendering
--train-views "0..100" --view-resolution "800x800"
--max-sentences 1 --view-per-batch 1 --pixel-per-view 512
--no-preload
--sampling-on-mask 1.0 --no-sampling-at-reader
--valid-views "100..200" --valid-view-resolution "800x800"
--valid-view-per-batch 1
--transparent-background "1.0,1.0,1.0" --background-stop-gradient
--arch nsvf_base
--initial-boundingbox ${SOURCEDIR}/nsvf/Spaceship/bbox.txt
--use-octree
--raymarching-stepsize-ratio 0.125
--discrete-regularization
--color-weight 128.0 --alpha-weight 1.0
--optimizer "adam" --adam-betas "(0.9, 0.999)"
--lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000
--criterion "srn_loss" --clip-norm 0.0
--num-workers 0
--seed 2
--save-interval-updates 500 --max-update 150000
--virtual-epoch-steps 5000 --save-interval 1
--half-voxel-size-at "5000,25000,75000"
--reduce-step-size-at "5000,25000,75000"
--pruning-every-steps 2500
--keep-interval-updates 5 --keep-last-epochs 5
--log-format simple --log-interval 1
--save-dir ${Result}/${MODEL}
--tensorboard-logdir ${Result}/tensorboard/${MODEL}
| tee -a ${Result}/train.log

CUDA kernel failed : no kernel image is available for execution on the device in clusters.

set up the whole virtual environment in clusters, GPU is Nvida v100. But I receive this error,

CUDA kernel failed : no kernel image is available for execution on the device
void aabb_intersect_point_kernel_wrapper(int, int, int, float, int, const float*, const float*, const float*, int*, float*, float*) at L:371 in fairnr/clib/src/intersect_gpu.cu
CUDA kernel failed : no kernel image is available for execution on the device
void aabb_intersect_point_kernel_wrapper(int, int, int, float, int, const float*, const float*, const float*, int*, float*, float*) at L:371 in fairnr/clib/src/intersect_gpu.cu
Traceback (most recent call last):
  File "train.py", line 20, in <module>
    cli_main()
  File "/ibex/scratch/lir0b/NSVF/fairnr_cli/train.py", line 356, in cli_main
    nprocs=torch.cuda.device_count(),
  File "/home/lir0b/.conda/envs/nsvf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/home/lir0b/.conda/envs/nsvf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 112, in join
    (error_index, exitcode)
Exception: process 0 terminated with exit code 255

scripts are:

#!/bin/bash
#SBATCH -N 1
#SBATCH --partition=batch
#SBATCH -J table11_d4
#SBATCH -o table11_d4.%J.out
#SBATCH -e table11_d4.%J.err
#SBATCH [email protected]
#SBATCH --mail-type=ALL
#SBATCH --time=23:00:00
#SBATCH --mem=40G
#SBATCH --gres=gpu:v100:2
#SBATCH --cpus-per-task=6
#SBATCH --constraint=[gpu]

#run the application:
#DATASET=./Synthetic_NSVF/gun
#SAVE=./exp/gun

#export PATH=$PATH:/ibex/scratch/lir0b/NSVF/fairnr/clib
#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ibex/scratch/lir0b/NSVF/fairnr/clib
#export LD_LIBRARY_PATH=/ibex/scratch/lir0b/NSVF:$LD_LIBRARY_PATH
#conda activate nsvf
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ibex/scratch/lir0b/NSVF/fairnr/clib:/ibex/scratch/lir0b/NSVF
export PATH=$PATH:/ibex/scratch/lir0b/NSVF/fairnr/clib:/ibex/scratch/lir0b/NSVF
pip install --editable /ibex/scratch/lir0b/NSVF
python setup.py build_ext --inplace
python -u train.py ./Synthetic_NSVF/table11_d4 \
    --user-dir fairnr \
    --task single_object_rendering \
    --train-views "1..35" --view-resolution "384x512" \
    --max-sentences 1 --view-per-batch 4 --pixel-per-view 2048 \
    --no-preload \
    --sampling-on-mask 1.0 --no-sampling-at-reader \
    --valid-views "35..44" --valid-view-resolution "384x512" \
    --valid-view-per-batch 1 \
    --transparent-background "1.0,1.0,1.0" --background-stop-gradient \
    --arch nsvf_base \
    --initial-boundingbox ./Synthetic_NSVF/table11_d4/bbox.txt \
    --raymarching-stepsize-ratio 0.125 \
    --discrete-regularization \
    --color-weight 128.0 --alpha-weight 1.0 \
    --optimizer "adam" --adam-betas "(0.9, 0.999)" \
    --lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000 \
    --criterion "srn_loss" --clip-norm 0.0 \
    --num-workers 0 \
    --seed 2 \
    --save-interval-updates 500 --max-update 150000 \
    --virtual-epoch-steps 5000 --save-interval 1 \
    --half-voxel-size-at  "5000,25000,75000" \
    --reduce-step-size-at "5000,25000,75000" \
    --pruning-every-steps 2500 \
    --keep-interval-updates 5 --keep-last-epochs 5 \
    --log-format simple --log-interval 1 \
    --save-dir ./exp/table11_d4 \
    --tensorboard-logdir ./exp/table11_d4/tensorboard \
    | tee -a ./exp/table11_d4/train.log

To Reproduce
Steps to reproduce the behavior:

setup virtual environment
upload script on clusters
check error log
See error

Expected behavior
normal training

Screenshots

[CUDA kernel failed : no kernel image is available for execution on the device
void svo_intersect_point_kernel_wrapper(int, int, int, float, int, const float*, const float*, const float*, const int*, int*, float*, float*) at L:384 in fairnr/clib/src/intersect_gpu.cu
CUDA kernel failed : no kernel image is available for execution on the device
void svo_intersect_point_kernel_wrapper(int, int, int, float, int, const float*, const float*, const float*, const int*, int*, float*, float*) at L:384 in fairnr/clib/src/intersect_gpu.cu
CUDA kernel failed : no kernel image is available for execution on the device
void svo_intersect_point_kernel_wrapper(int, int, int, float, int, const float*, const float*, const float*, const int*, int*, float*, float*) at L:384 in fairnr/clib/src/intersect_gpu.cu
CUDA kernel failed : no kernel image is available for execution on the device
void svo_intersect_point_kernel_wrapper(int, int, int, float, int, const float*, const float*, const float*, const int*, int*, float*, float*) at L:384 in fairnr/clib/src/intersect_gpu.cu
Traceback (most recent call last):
  File "train.py", line 20, in <module>
    cli_main()
  File "/ibex/scratch/lir0b/NSVF/fairnr_cli/train.py", line 356, in cli_main
    nprocs=torch.cuda.device_count(),
  File "/home/lir0b/.conda/envs/nsvf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/home/lir0b/.conda/envs/nsvf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 112, in join
    (error_index, exitcode)
Exception: process 1 terminated with exit code 255](url)

Program exits unexpectedly

Describe the bug
When we run the code for the Bike dataset, I see that the code exits unexpectedly. The dataloader process gets terminated

To Reproduce
Command I used for running:

python -u train.py ${DATASET} --user-dir fairnr --task single_object_rendering --train-views "0..100" --view-resolution "800x800" --max-sentences 1 --view-per-batch 2 --pixel-per-view 2048 --no-preload --sampling-on-mask 1.0 --no-sampling-at-reader --valid-views "100..200" --valid-view-resolution "400x400" --valid-view-per-batch 1 --transparent-background "1.0,1.0,1.0" --background-stop-gradient --arch nsvf_base --initial-boundingbox ${DATASET}/bbox.txt --raymarching-stepsize-ratio 0.125 --discrete-regularization --color-weight 128.0 --alpha-weight 1.0 --optimizer "adam" --adam-betas "(0.9, 0.999)" --lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000 --criterion "srn_loss" --clip-norm 0.0 --num-workers 2 --seed 2 --save-interval-updates 500 --max-update 150000 --virtual-epoch-steps 5000 --save-interval 1 --half-voxel-size-at "5000,25000,75000" --reduce-step-size-at "5000,25000,75000" --pruning-every-steps 2500 --keep-interval-updates 5 --keep-last-epochs 5 --log-format simple --log-interval 1 --save-dir ${SAVE} --tensorboard-logdir ${SAVE}/tensorboard | tee -a $SAVE/train.log

Probably this is something minor.

Regards,
K. J. Nitthilan

AttributeError: module 'fairnr' has no attribute 'clib'

I'm having trouble running the training scripts after installing the repo.

I've followed the steps:

pip install -r requirements.txt
pip install --editable ./

And then tried running train_wineholder.sh (I've also tried python train.py without arguments)

I get the following error:

Traceback (most recent call last):
  File "train.py", line 7, in <module>
    from fairnr_cli.train import cli_main
  File "/home/david/repos/NSVF/fairnr_cli/train.py", line 26, in <module>
    from fairnr import ResetTrainerException
  File "/home/david/repos/NSVF/fairnr/__init__.py", line 11, in <module>
    from . import data, tasks, models, modules, criterions
  File "/home/david/repos/NSVF/fairnr/data/__init__.py", line 6, in <module>
    from .shape_dataset import (
  File "/home/david/repos/NSVF/fairnr/data/shape_dataset.py", line 14, in <module>
    from . import data_utils, geometry, trajectory
  File "/home/david/repos/NSVF/fairnr/data/geometry.py", line 12, in <module>
    from fairnr.clib._ext import build_octree
  File "/home/david/repos/NSVF/fairnr/clib/__init__.py", line 28, in <module>
    import fairnr.clib._ext as _ext
AttributeError: module 'fairnr' has no attribute 'clib'

I didn't have any trouble with running pip install --editable ./ since I get the following output:

Obtaining file:///home/david/repos/NSVF
Installing collected packages: fairnr
  Running setup.py develop for fairnr
Successfully installed fairnr-0.0.0

But looking at the output from running the training scripts, it seems like there's a build issue with fairnir.clib, anyone know why? I'm running this on Ubuntu 18.04 with CUDA 10.2. (Also tried CUDA 10.0, get the same error).

Resuming the training and run on multiple GPUs example

Can you please add sample script example of these to your docs:

How can we resume a training (if possible)
How can we run on multiple GPUs on separate machines

Training on scannet dataset

Hi, I am trying to train on scannet dateset with a voxel map as input, but I encounter an "out of bound" error when running build_octree function. I was wonder do you have any example how we should train with a voxel map as input? Thanks.

Query: raymarching_tolerance and BackgroundField

raymarching_tolerance - Is this field used only during rendering? Can this be used while training too? i.e. reduce the number of steps to accumulate the sigma value?

Is the BackgroundField used during training? I do not see it being used. How is the background color taken care of during the training process

Rough estimate of how much samples inside each voxel?

Hi, thanks for sharing the great work! I wonder if you can provide a rough estimate of how many samples are used to evaluate the MLP inside each sparse voxel, and how many sparse voxels a ray typically intersect with? Thank you!

Question about Camera Matrices

Hi,

I'm trying to run NSVF on my own training set but the end result (even after 2500 iterations) is just blank, white images. I am assuming there is something wrong with either the way I've formatted my camera extrinsics or the training parameters. I was wondering if anyone had any insights here. For context, the scene is inward-facing (taken from our scanner, which has 27 DSLR cameras positioned around the object in a hemispherical arrangement). Here is what one of the images from our dataset looks like:

Our extrinsics (and intrinsics) are taken directly from OpenCV's camera calibration routine, so they follow "OpenCV-style matrix conventions," which I believe is what is used by NSVF. I invert each extrinsic matrix to convert it to a "camera-to-world" matrix. I wasn't able to find any additional information about matrix formatting in the README, but does this sound correct? An example of one of our poses is:

-5.104647941864430827e-01 -8.207758808463482686e-03 -8.598594807243422622e-01 7.951983207748324345e-01
6.956707447815645429e-01 -5.917033288830939597e-01 -4.073443082255215897e-01 1.741095097774456368e+00
-5.054383332823660924e-01 -8.061140138243517717e-01 3.077536156810128376e-01 6.316381418571048556e+01
0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000000e+00

And we use a bounding box with the following dimensions:

bbox = np.array([[
    -15, 4.0, -15,   # min corner
    15, 25, 15,       # max corner
    1.0                   # initial voxel size
]])

Our training script is basically copy+pasted from the train_jade.sh example, but I will paste it below for reference.

I'm wondering if there is something obvious that I've missed in pre-processing our dataset and/or the training script that would cause NSVF to produce completely blank frames. I've tried a number of adjustments, but I'd love to hear any suggestions anyone might have.

Thanks so much for making this project available!

DATA="clamp_alpha"
RES="500x750"
ARCH="nsvf_base"
SUFFIX="v1"
DATASET=/home/ec2-user/SageMaker/${DATA}
SAVE=/home/ec2-user/SageMaker/checkpoints/$DATA
MODEL=$ARCH$SUFFIX
mkdir -p $SAVE/$MODEL

# start training locally
python train.py ${DATASET} \
    --user-dir fairnr \
    --task single_object_rendering \
    --train-views "0..100" \
    --view-resolution $RES \
    --max-sentences 1 \
    --view-per-batch 1 \
    --pixel-per-view 128 \
    --no-preload \
    --sampling-on-mask 1.0 --no-sampling-at-reader \
    --valid-view-resolution $RES \
    --valid-views "100..200" \
    --valid-view-per-batch 1 \
    --transparent-background "1.0,1.0,1.0" \
    --background-stop-gradient \
    --arch $ARCH \
    --initial-boundingbox ${DATASET}/bbox.txt \
    --raymarching-stepsize-ratio 0.125 \
    --use-octree \
    --discrete-regularization \
    --color-weight 128.0 \
    --alpha-weight 1.0 \
    --optimizer "adam" \
    --adam-betas "(0.9, 0.999)" \
    --lr-scheduler "polynomial_decay" \
    --total-num-update 150000 \
    --lr 0.001 \
    --clip-norm 0.0 \
    --criterion "srn_loss" \
    --num-workers 0 \
    --seed 2 \
    --save-interval-updates 500 --max-update 150000 \
    --virtual-epoch-steps 5000 --save-interval 1 \
    --half-voxel-size-at  "5000,25000,75000" \
    --reduce-step-size-at "5000,25000,75000" \
    --pruning-every-steps 2500 \
    --keep-interval-updates 5 \
    --log-format simple --log-interval 1 \
    --tensorboard-logdir ${SAVE}/tensorboard/${MODEL} \
    --save-dir ${SAVE}/${MODEL}

Please share training script for large objects

I've failed to train large objects like Barn. Can you share the training sh file for us?

Code for Maria Sequence

Hi, thanks for sharing this great work.

The code for Maria Sequence is missing in the repo. Could you please release the code and the data for Maria Sequence.

Thank you very much.

Build my own dataset for training

Thanks for your excellent work first.
I'm working on my own dataset for free-viewpoint rendering and interested in your work. Since rgb, pose and intrinsic datas are prepared, how can I get the bbox data like the style in your datasets?

fatal error: cublas_v2.h: No such file or directory

Describe the bug
I am trying to run your project. When running pip install --editable ./ or python setup.py build_ext --inplace I am getting the following output:

Obtaining file:///tmp/kghasemi/NSVF
Installing collected packages: fairnr
  Running setup.py develop for fairnr
    ERROR: Command errored out with exit status 1:
     command: /tmp/kghasemi/conda/envs/nsvf-env/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/kghasemi/NSVF/setup.py'"'"'; __file__='"'"'/tmp/kghasemi/NSVF/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
         cwd: /tmp/kghasemi/NSVF/
    Complete output (23 lines):
    running develop
    running egg_info
    writing fairnr.egg-info/PKG-INFO
    writing dependency_links to fairnr.egg-info/dependency_links.txt
    writing entry points to fairnr.egg-info/entry_points.txt
    writing top-level names to fairnr.egg-info/top_level.txt
    reading manifest file 'fairnr.egg-info/SOURCES.txt'
    writing manifest file 'fairnr.egg-info/SOURCES.txt'
    running build_ext
    building 'fairnr.clib._ext' extension
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/fairnr
    creating build/temp.linux-x86_64-3.7/fairnr/clib
    creating build/temp.linux-x86_64-3.7/fairnr/clib/src
    gcc -pthread -B /tmp/kghasemi/conda/envs/nsvf-env/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/kghasemi/conda/envs/nsvf-env/lib/python3.7/site-packages/torch/include -I/tmp/kghasemi/conda/envs/nsvf-env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/tmp/kghasemi/conda/envs/nsvf-env/lib/python3.7/site-packages/torch/include/TH -I/tmp/kghasemi/conda/envs/nsvf-env/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/tmp/kghasemi/conda/envs/nsvf-env/include/python3.7m -c fairnr/clib/src/octree.cpp -o build/temp.linux-x86_64-3.7/fairnr/clib/src/octree.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    In file included from fairnr/clib/src/../include/utils.h:7:0,
                     from fairnr/clib/src/octree.cpp:7:
    /tmp/kghasemi/conda/envs/nsvf-env/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7:10: fatal error: cublas_v2.h: No such file or directory
     #include <cublas_v2.h>
              ^~~~~~~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /tmp/kghasemi/conda/envs/nsvf-env/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/kghasemi/NSVF/setup.py'"'"'; __file__='"'"'/tmp/kghasemi/NSVF/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

To Reproduce
Steps to reproduce the behavior:

git clone https://github.com/facebookresearch/NSVF.git
cd NSVF
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt
pip install --editable ./

If you don't execute step 3, you will get this error:

    ERROR: Command errored out with exit status 1:
     command: /tmp/kghasemi/conda/envs/nsvf-env/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-09c9r6sr/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-09c9r6sr/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-cn8dm7pt
         cwd: /tmp/pip-req-build-09c9r6sr/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-09c9r6sr/setup.py", line 2, in <module>
        from torch.utils.cpp_extension import BuildExtension, CUDA_HOME
    ModuleNotFoundError: No module named 'torch'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Expected behavior
When I execute find /usr/local/ -name cublas_v2.h, I get this /usr/local/cuda-10.2/targets/x86_64-linux/include/cublas_v2.h
which means the header exist.
Therefore I should not get this error.

Screenshots

Desktop (please complete the following information):

Distributor ID: Ubuntu
Description:    Ubuntu 18.04.1 LTS
Release:        18.04
Codename:       bionic

Env

cudatoolkit               10.1.243             h6bb024c_0  
pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
lpips-pytorch             latest                   pypi_0    pypi
pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
torchvision               0.5.0                py37_cu101    pytorch
ipython                   7.19.0                   pypi_0    pypi
ipython-genutils          0.2.0                    pypi_0    pypi
opencv-python             4.2.0.32                 pypi_0    pypi
python                    3.7.9                h7579374_0  
python-dateutil           2.8.1                    pypi_0    pypi

Conda environment files

Could you please provide the conda environment file you are using please?

How to use patch sampling?

Describe the bug
Hi! I noticed you have implement VGG loss and patch-sampling for nerf, could you elaborate on how to use it?
Besides, did you adopt this loss in your paper?

To Reproduce
add

    --vgg-weight 1e-4 \
    --sampling-patch-size 48

in jade training config.

Expected behavior
This doesn't work now, all_results is None after setting this.

Differentiable PBR Texture synthesis as an output for NSVF.

I have been thinking about this method of lightfield synthesis and came to the same conclusion about sparse voxel grids with Taichi (though I haven't managed to impliment it).

My original thought was to take the outter most voxels of each object, unwrap the faces, differentiably render with a gan the vector displacement / depth / raw light and pbr textures for each voxel. You can tag voxels with object detection PV-RCNN or similar.
Also being able to define +inf as a skydome to paint on with spherical harmonics differentiable rendering would be required for breaking down the lighting components as well as a store of light sources past a threshold for lighting estimation.

Additionally a global frame or time step would be handy for making a camera rig that can record voxel video with n cameras and global shutter sync. Being able to update only specific voxels as needed if changed beyond a threshold will help with sparse computation.

Ideally the output of this pipeline could be a voxelized lightfield projection with material onto a standard cubified obj or alembic sequence.

issue with splitting volume

Amazing work, Thanks for sharing the source!

I run into some problem while training, after the first volume splitting, part of the volume that should be occupied become empty.
I am curious if this is a bug or because package version issue. I am using python 3.8, pytorch 1.60.

Thanks!

About apex and the args "--fp16"

I already install the nvidia/apex module in my env(which is optional said in your project README).

When I try to add args "--fp16" to the train script：

python -u train.py ${DATASET} \
    ... \
    --fp16 \
    ... \
    --tensorboard-logdir ${SAVE}/tensorboard \
    | tee -a $SAVE/train.log

It will occur some errors, the main Error Report is about c10:Error ：

...
terminate called after throwing an instance of 'c10::Error'
...

Something similar to fairsep issue#1683 - closed&no response

I try to find ways to solve this, like add args "--ddp-backend=no_c10d"，but this just cause the same error.

I haven't read all the main codes of project, but I guess you guys maybe more familiar with these problem, so I try to post this issue.

Thanks for replying.

BTW：train without "--fp16" is always fine, and the env is almost the same as the requirement file in README.

Build is not working

Describe the bug
There are build failures:

$ pip install --editable ./
Obtaining file:///home/user/dev/NSVF
Installing collected packages: fairnr
  Running setup.py develop for fairnr
    ERROR: Command errored out with exit status 1:
     command: /home/user/miniconda3/envs/python37/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/user/dev/NSVF/setup.py'"'"'; __file__='"'"'/home/user/dev/NSVF/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
         cwd: /home/user/dev/NSVF/
    Complete output (120 lines):
    running develop
    running egg_info
    writing fairnr.egg-info/PKG-INFO
    writing dependency_links to fairnr.egg-info/dependency_links.txt
    writing entry points to fairnr.egg-info/entry_points.txt
    writing top-level names to fairnr.egg-info/top_level.txt
    reading manifest file 'fairnr.egg-info/SOURCES.txt'
    writing manifest file 'fairnr.egg-info/SOURCES.txt'
    running build_ext
    building 'fairnr.clib._ext' extension
    Emitting ninja build file /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/build.ninja...
    Compiling objects...
    Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
    /home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/cuda/__init__.py:104: UserWarning:
    NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
    The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
    If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
    
      warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
    [1/6] c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/intersect.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    FAILED: /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o
    c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/intersect.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /home/user/dev/NSVF/fairnr/clib/src/intersect.cpp:6:10: fatal error: intersect.h: No such file or directory
        6 | #include "intersect.h"
          |          ^~~~~~~~~~~~~
    compilation terminated.
    [2/6] c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/octree.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/octree.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/octree.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    FAILED: /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/octree.o
    c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/octree.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/octree.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/octree.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /home/user/dev/NSVF/fairnr/clib/src/octree.cpp:6:10: fatal error: octree.h: No such file or directory
        6 | #include "octree.h"
          |          ^~~~~~~~~~
    compilation terminated.
    [3/6] c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/binding.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/binding.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/binding.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    FAILED: /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/binding.o
    c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/binding.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/binding.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/binding.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /home/user/dev/NSVF/fairnr/clib/src/binding.cpp:6:10: fatal error: intersect.h: No such file or directory
        6 | #include "intersect.h"
          |          ^~~~~~~~~~~~~
    compilation terminated.
    [4/6] c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/sample.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    FAILED: /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample.o
    c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/sample.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /home/user/dev/NSVF/fairnr/clib/src/sample.cpp:6:10: fatal error: sample.h: No such file or directory
        6 | #include "sample.h"
          |          ^~~~~~~~~~
    compilation terminated.
    [5/6] /usr/local/cuda-11.0/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample_gpu.o.d -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/sample_gpu.cu -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14
    FAILED: /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample_gpu.o
    /usr/local/cuda-11.0/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample_gpu.o.d -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/sample_gpu.cu -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/sample_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14
    /home/user/dev/NSVF/fairnr/clib/src/sample_gpu.cu:11:10: fatal error: cuda_utils.h: No such file or directory
       11 | #include "cuda_utils.h"
          |          ^~~~~~~~~~~~~~
    compilation terminated.
    [6/6] /usr/local/cuda-11.0/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect_gpu.o.d -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/intersect_gpu.cu -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14
    FAILED: /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect_gpu.o
    /usr/local/cuda-11.0/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect_gpu.o.d -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/intersect_gpu.cu -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14
    /home/user/dev/NSVF/fairnr/clib/src/intersect_gpu.cu:11:10: fatal error: cuda_utils.h: No such file or directory
       11 | #include "cuda_utils.h"
          |          ^~~~~~~~~~~~~~
    compilation terminated.
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1673, in _run_ninja_build
        env=env)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/subprocess.py", line 512, in run
        output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/user/dev/NSVF/setup.py", line 35, in <module>
        'fairnr-train = fairseq_cli.train:cli_main'
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/setuptools/command/develop.py", line 136, in install_for_development
        self.run_command('build_ext')
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
        _build_ext.build_ext.run(self)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 708, in build_extensions
        build_ext.build_extensions(self)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
        _build_ext.build_ext.build_extensions(self)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
        _build_ext.build_extension(self, ext)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
        depends=ext.depends)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 538, in unix_wrap_ninja_compile
        with_cuda=with_cuda)
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1359, in _write_ninja_file_and_compile_objects
        error_prefix='Error compiling objects for extension')
      File "/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/user/miniconda3/envs/python37/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/user/dev/NSVF/setup.py'"'"'; __file__='"'"'/home/user/dev/NSVF/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

What is peculiar is that copy pasting individual build commands, such as the follow one, compile correctly on their own.

$ c++ -MMD -MF /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o.d -pthread -B /home/user/miniconda3/envs/python37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/home/user/miniconda3/envs/python37/include/python3.7m -c -c /home/user/dev/NSVF/fairnr/clib/src/intersect.cpp -o /home/user/dev/NSVF/build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/ATen/Parallel.h:140,
[...]
                 from /home/user/dev/NSVF/fairnr/clib/src/intersect.cpp:6:
/home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/ATen/ParallelOpenMP.h:83: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   83 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
In file included from /home/user/miniconda3/envs/python37/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
[...]
                 from /home/user/dev/NSVF/fairnr/clib/src/intersect.cpp:6:
/home/user/dev/NSVF/fairnr/clib/src/intersect.cpp: In function ‘std::tuple<at::Tensor, at::Tensor, at::Tensor> ball_intersect(at::Tensor, at::Tensor, at::Tensor, float, int)’:
fairnr/clib/include/utils.h:12:24: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
   12 |     TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor"); \
[...]

The individual build command finishes with warnings but without errors.

To Reproduce
Following the README, at the installation step, running either pip install --editable ./ or python setup.py build_ext --inplace results in the build failures detailed above.

Expected behavior
Build should pass.

Desktop (please complete the following information):
Ubuntu 18, CUDA 11.3, g++ version 9.3.0.

blender files

Hello! Can you provide the .blend files for Synthetic-NSVF? Thank you and stay safe!

No ray-voxel intersections occurring: 'NoneType' object has no attribute 'new_ones'

Dataset to recreate

Describe the bug
I've been able to get NSVF to successfully run locally on the supplied datasets, but I'm running into errors when using my own custom dataset. I converted my dataset to match the format described in the README and set the bounding box dimensions to the minima and maxima of the translation coordinates among all extrinsic camera matrices (pose matrices). I calculated the voxel size according to the volume of these boundaries.

Running train.py with the supplied arguments results in the following error:

File "train.py", line 20, in <module>
    cli_main()
  File "/home/ubuntu/CSM/NSVF/fairnr_cli/train.py", line 373, in cli_main
    main(args)
  File "/home/ubuntu/CSM/NSVF/fairnr_cli/train.py", line 104, in main
    should_end_training = train(args, trainer, task, epoch_itr)
  File "/usr/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/ubuntu/CSM/NSVF/fairnr_cli/train.py", line 181, in train
    log_output = trainer.train_step(samples)
  File "/usr/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/fairseq/trainer.py", line 431, in train_step
    ignore_grad=is_dummy_batch,
  File "/home/ubuntu/CSM/NSVF/fairnr/tasks/neural_rendering.py", line 300, in train_step
    return super().train_step(sample, model, criterion, optimizer, update_num, ignore_grad)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/fairseq/tasks/fairseq_task.py", line 351, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/CSM/NSVF/fairnr/criterions/rendering_loss.py", line 42, in forward
    net_output = model(**sample)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/CSM/NSVF/fairnr/models/fairnr_model.py", line 77, in forward
    results = self._forward(ray_start, ray_dir, **kwargs)
  File "/home/ubuntu/CSM/NSVF/fairnr/models/nsvf.py", line 91, in _forward
    all_results['missed'] = fill_in((fullsize, ), hits, all_results['missed'], 1.0).view(S, V, P)
  File "/home/ubuntu/CSM/NSVF/fairnr/data/geometry.py", line 306, in fill_in
    output = input.new_ones(*shape) * initial
AttributeError: 'NoneType' object has no attribute 'new_ones'

I investigated further and determined that this means no rays are intersecting with the voxels during training. I assume that this is due to an issue with my bounding box setup. To recreate this, here is my dataset.

Is there an issue with the way I'm using my camera pose matrices to set my bounding box dimensions? Are there other steps I could take to ensure that at least some intersections are occurring between the rays and voxels?

Pre-traind Models?

Hello authors, may I know where to download pre-trained models?
(I see that README.md mentioned pre-trained models are MIT-licensed).

--valid-view-per-batch 2, NSVF still processes 1 view in the validation phase

I am using --valid-view-per-batch 2 for the validation phase (to be the same as the training phase --view-per-batch 2). Debugging your code, I can confirm that this value (--valid-view-per-batch) is received and set correctly, but It's still processing 1 view in the validation phase:

The training phase works fine:

is it a bug in the code or I misunderstood?

question about the dynamic scene

I don't really understand how to get the results of a dynamic scene.
What's the meaning of 'hypernetwork to encode all the 200 frames'?

![Screenshot from 2021-06-04 19-21-10](https://user-images.githubusercontent.com/41947948/120793925-1980aa80-c56a-11eb-
8da8-6ab349152f03.png)

Hope to know the specific operation. Thanks a lot!

Using --transparent-background "0.0,0.0,0.0" crashes while pruning

Describe the bug
I am trying to run this code for a custom dataset. The dataset is as follows.
https://drive.google.com/drive/folders/1p6L5FVMrGzz3wdOBZwlo7ZCjS-AfsDfY?usp=sharing

And I use the following command to run
export DATASET="../neural_sparse_voxel_field/data/pixologic/brian_pape_1/nsvf/"
export SAVE="./brian_pape_1_ckpt/"
mkdir brian_pape_1_ckpt
rm -rf brian_pape_1_ckpt/*

CUDA_VISIBLE_DEVICES=1,2,4,5,6,7,8 python -u train.py ${DATASET}
--user-dir fairnr
--task single_object_rendering
--train-views "0..15" --view-resolution "562x750"
--max-sentences 1 --view-per-batch 1 --pixel-per-view 2048
--no-preload
--sampling-on-mask 1.0 --no-sampling-at-reader
--valid-views "0..8" --valid-view-resolution "281x375"
--valid-view-per-batch 1
--transparent-background "0.0,0.0,0.0" --background-stop-gradient
--arch nsvf_base
--initial-boundingbox ${DATASET}/bbox.txt
--use-octree
--raymarching-stepsize-ratio 0.125
--discrete-regularization
--color-weight 128.0 --alpha-weight 1.0
--optimizer "adam" --adam-betas "(0.9, 0.999)"
--lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000
--criterion "srn_loss" --clip-norm 0.0
--num-workers 0
--seed 2
--save-interval-updates 200 --max-update 150000
--virtual-epoch-steps 500 --save-interval 1
--half-voxel-size-at "500,2500,7500"
--reduce-step-size-at "500,2500,7500"
--pruning-every-steps 250
--keep-interval-updates 5 --keep-last-epochs 5
--log-format simple --log-interval 1
--save-dir ${SAVE}
--tensorboard-logdir ${SAVE}/tensorboard
| tee -a $SAVE/train.log

The above error when the first stage of pruning happens. Looks like the number of nodes after pruning seems to be zero. I am not sure how this can happen. Any ideas where I have to look for debugging?

The loss gradually decreases to around 10-30 and the validation PSNR was around 19 and ssim 0.73. So I assumed some voxels should have good values. This was not the case with
--transparent-background "1.0,1.0,1.0". In this case, though the ssim is stuck at 0.73 and slowly rises but the training loss is around 40-50.

Any ideas and directions to debug this??

There is no `--valid-pixel-per-view` option

The system easily runs out of memory during the validation phase as it's using all the pixels. Is there a way to limit the number of sampled pixels during the validation phase? (same as training using --pixel-per-view)

Too slow training

I tried to train with the provided data(windholder) on V100 8 GPUs. it takes arounds 10GB per each GPU and with the default tranining command/configuration, it took almost 6-7 days to finish the training. Is it normal? or is there something for me to speed up? I also tried fp16 training or apex but it's not easy to run(so many errors).
Please help me~
Thank you

stuck with arg distributed-no-spawn, strange OUT OF MEMORY message without that

Describe the bug
when train it with arg --distributed-no-spawn , the program gets stuck.
and without arg --distributed-no-spawn, the program produces strange "OUT OF MEMORY" error, since there's free GPU space.
To Reproduce
append line '--distributed-no-spawn' in train_wineholder.sh like the followed

# just for debugging
DATA="Wineholder"
RES="800x800"
ARCH="nsvf_base"
SUFFIX="v1"
DATASET=/xxx/NSVF/data/Synthetic_NSVF/${DATA}
SAVE=/xxx/NSVF/$DATA
MODEL=$ARCH$SUFFIX
mkdir -p $SAVE/$MODEL
CUDA_VISIBLE_DEVICES="4,7"

# start training locally
python train.py ${DATASET} \
    --user-dir fairnr \
    --task single_object_rendering \
    --train-views "0..100" \
    --view-resolution $RES \
    --max-sentences 1 \
    --view-per-batch 2 \
    --pixel-per-view 2048 \
    --no-preload \
    --sampling-on-mask 1.0 --no-sampling-at-reader \
    --valid-view-resolution $RES \
    --valid-views "100..200" \
    --valid-view-per-batch 1 \
    --transparent-background "1.0,1.0,1.0" \
    --background-stop-gradient \
    --arch $ARCH \
    --initial-boundingbox ${DATASET}/bbox.txt \
    --raymarching-stepsize-ratio 0.125 \
    --use-octree \
    --discrete-regularization \
    --color-weight 128.0 \
    --alpha-weight 1.0 \
    --optimizer "adam" \
    --adam-betas "(0.9, 0.999)" \
    --lr-scheduler "polynomial_decay" \
    --total-num-update 150000 \
    --lr 0.001 \
    --clip-norm 0.0 \
    --criterion "srn_loss" \
    --seed 2 \
    --save-interval-updates 500 --max-update 150000 \
    --virtual-epoch-steps 5000 --save-interval 1 \
    --half-voxel-size-at  "5000,25000,75000" \
    --reduce-step-size-at "5000,25000,75000" \
    --pruning-every-steps 2500 \
    --keep-interval-updates 5 \
    --log-format simple --log-interval 1 \
    --tensorboard-logdir ${SAVE}/tensorboard/${MODEL} \
    --save-dir ${SAVE}/${MODEL} \
    --device-id 4 \
    --distributed-no-spawn

and when running it , the program get stuck.

Without --distributed-no-spawn, it will log like this

2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 7): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 4): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 8): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 3): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 2): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 5): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 1): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 0): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 6): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 6
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | distributed init (rank 9): tcp://localhost:14705
2021-03-23 00:46:08 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 9
2021-03-23 00:46:09 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 7
2021-03-23 00:46:09 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 4
2021-03-23 00:46:09 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 8
2021-03-23 00:46:09 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 3
2021-03-23 00:46:09 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 2
2021-03-23 00:46:09 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 5
2021-03-23 00:46:09 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 1
2021-03-23 00:46:09 | INFO | fairseq.distributed_utils | initialized host ubuntu as rank 0
Traceback (most recent call last):
  File "train.py", line 20, in <module>
    cli_main()
  File "/home/lsy/NSVF/fairnr_cli/train.py", line 356, in cli_main
    nprocs=torch.cuda.device_count(),
  File "/home/lsy/anaconda3/envs/NSVF/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/lsy/anaconda3/envs/NSVF/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/lsy/anaconda3/envs/NSVF/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/lsy/anaconda3/envs/NSVF/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/lsy/NSVF/fairnr_cli/train.py", line 338, in distributed_main
    main(args, init_distributed=True)
  File "/home/lsy/NSVF/fairnr_cli/train.py", line 50, in main
    args.distributed_rank = distributed_utils.distributed_init(args)
  File "/home/lsy/NSVF/3rd/fairseq-stable/fairseq/distributed_utils.py", line 107, in distributed_init
    dist.all_reduce(torch.zeros(1).cuda())
RuntimeError: CUDA error: out of memory

it's strange since it has no message like Tried to allocate 2.0 GiB . Moreover, nvidis-smi shows theres free space in GPU 4 and 7.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:04:00.0 Off |                  N/A |
| 26%   45C    P2    79W / 250W |   8119MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:05:00.0 Off |                  N/A |
| 30%   50C    P2    72W / 250W |   8119MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:06:00.0 Off |                  N/A |
| 28%   47C    P2    73W / 250W |   8103MiB / 12196MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:07:00.0 Off |                  N/A |
| 27%   47C    P2    78W / 250W |   8135MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN Xp            Off  | 00000000:08:00.0 Off |                  N/A |
| 23%   28C    P8     8W / 250W |     10MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN Xp            Off  | 00000000:0B:00.0 Off |                  N/A |
| 32%   53C    P2   118W / 250W |   8334MiB / 12196MiB |     83%      Default |
+-------------------------------+----------------------+----------------------+
|   6  TITAN Xp            Off  | 00000000:0C:00.0 Off |                  N/A |
| 36%   60C    P2   146W / 250W |   8846MiB / 12196MiB |     81%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN Xp            Off  | 00000000:0D:00.0 Off |                  N/A |
| 23%   27C    P8     8W / 250W |     10MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   8  TITAN Xp            Off  | 00000000:0E:00.0 Off |                  N/A |
| 39%   63C    P2   194W / 250W |  12080MiB / 12196MiB |     49%      Default |
+-------------------------------+----------------------+----------------------+
|   9  TITAN Xp            Off  | 00000000:0F:00.0 Off |                  N/A |
| 29%   49C    P2    77W / 250W |   8105MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Desktop (please complete the following information):

OS: Linux ubuntu 4.4.0-186-generic #216-Ubuntu SMP Wed Jul 1 05:34:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linu
NVIDIA-SMI 430.26
Driver Version: 430.26
CUDA Version: 10.2

THANKS!

AttributeError: module 'fairnr' has no attribute 'clib'

Hi
When I am training the network I find the following import error. Probably its a minor issue. Am I missing some step?

root@684ba01c705d:/nitthilan/source_code/multiview_rendering/NSVF# python -u train.py ${DATASET} --user-dir fairnr --task single_object_rendering --train-views "0..100" --view-resolution "800x800" --max-sentences 1 --view-per-batch 4 --pixel-per-view 2048 --no-preload --sampling-on-mask 1.0 --no-sampling-at-reader --valid-views "100..200" --valid-view-resolution "400x400" --valid-view-per-batch 1 --transparent-background "1.0,1.0,1.0" --background-stop-gradient --arch nsvf_base --initial-boundingbox ${DATASET}/bbox.txt --use-octree --raymarching-stepsize-ratio 0.125 --discrete-regularization --color-weight 128.0 --alpha-weight 1.0 --optimizer "adam" --adam-betas "(0.9, 0.999)" --lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000 --criterion "srn_loss" --clip-norm 0.0 --num-workers 0 --seed 2 --save-interval-updates 500 --max-update 150000 --virtual-epoch-steps 5000 --save-interval 1 --half-voxel-size-at "5000,25000,75000" --reduce-step-size-at "5000,25000,75000" --pruning-every-steps 2500 --keep-interval-updates 5 --keep-last-epochs 5 --log-format simple --log-interval 1 --save-dir ${SAVE} --tensorboard-logdir ${SAVE}/tensorboard | tee -a $SAVE/train.log
Traceback (most recent call last):
File "train.py", line 7, in
from fairnr_cli.train import cli_main
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr_cli/train.py", line 26, in
from fairnr import ResetTrainerException
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr/init.py", line 11, in
from . import data, tasks, models, modules, criterions, clib
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr/models/init.py", line 15, in
module = importlib.import_module('fairnr.models.' + model_name)
File "/opt/conda/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr/models/multi_nsvf.py", line 15, in
from fairnr.models.nsvf import NSVFModel, base_architecture
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr/models/nsvf.py", line 24, in
from fairnr.models.fairnr_model import BaseModel
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr/models/fairnr_model.py", line 22, in
from fairnr.modules.encoder import get_encoder
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr/modules/init.py", line 15, in
module = importlib.import_module('fairnr.modules.' + model_name)
File "/opt/conda/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr/modules/encoder.py", line 23, in
from fairnr.clib import (
File "/nitthilan/source_code/multiview_rendering/NSVF/fairnr/clib/init.py", line 28, in
import fairnr.clib._ext as _ext
AttributeError: module 'fairnr' has no attribute 'clib'
root@684ba01c705d:/nitthilan/source_code/multiview_rendering/NSVF#

cannot use with Rtx3090 GPU

Describe the bug
if using PyTorch 1.4 and Cuda 10.2, then Rtx3090 is not supported (RTX 3090 is using compute 8.6)
if trying with PyTorch 1.71 and Cuda 11.0 then fairnr does not compile
(ValueError: Unknown CUDA arch (8.6) or GPU not supported)

To Reproduce
Steps to reproduce the behavior:
1 - install all recommended dependancies as explained in Readme.md

Expected behavior
As NSVF is about fast rendering it would be great to have it working on latest Nvidia's flagship GPU.

Desktop (please complete the following information):

Using Conda
OS: Ubuntu 20.04
Driver Version: 460.32.03 CUDA Version: 11.2

facebookresearch / nsvf Goto Github PK

nsvf's People

Contributors

Stargazers

Watchers

Forkers

nsvf's Issues

Dataset to recreate

Recommend Projects

Recommend Topics

Recommend Org