Code Monkey home page Code Monkey logo

Comments (7)

ashawkey avatar ashawkey commented on July 17, 2024

Hi, could you provide more details? e.g. a minimal reproducible code example?
From the screenshot, I can only guess that you seem to try to backward through composite_rays, which should only be used at inference?

from torch-ngp.

Dengzhi-USTC avatar Dengzhi-USTC commented on July 17, 2024

Selection_303
I just run above script,
Selection_304

from torch-ngp.

ashawkey avatar ashawkey commented on July 17, 2024

@Dengzhi-USTC Are you using the latest version? I cannot reproduce the error...

$ python main_nerf.py data/fox/ --workspace trial_nerf --fp16 --ff --cuda_ray                                                                                       
Namespace(path='data/fox/', test=False, workspace='trial_nerf', seed=0, num_rays=4096, cuda_ray=True, num_steps=128, upsample_steps=128, max_ray_batch=4096, fp16=True, ff=True, tcnn=False, mode='colmap',
 preload=False, bound=2, scale=0.33, gui=False, W=800, H=800, radius=5, fovy=90, max_spp=64)                                                                                                               
NeRFNetwork(                                                                                                                                                                                               
  (encoder): HashEncoder: input_dim=3 num_levels=16 level_dim=2 base_resolution=16 per_level_scale=1.4472692374403782 params=(6328829, 2)                                                                  
  (sigma_net): FFMLP: input_dim=32 output_dim=16 hidden_dim=64 num_layers=2 activation=0                                                                                                                   
  (encoder_dir): SHEncoder: input_dim=3 degree=4                                                                                                                                                           
  (color_net): FFMLP: input_dim=32 output_dim=3 hidden_dim=64 num_layers=3 activation=0                                                                                                                    
)                                                                                                                                                                                                          
[INFO] Trainer: ngp | 2022-03-16_10-23-47 | cuda:0 | fp16 | trial_nerf                                                                                                                                     
[INFO] #parameters: 12676090                                                                                                                                                                               
[INFO] Loading latest checkpoint ...                                                                                                                                                                       
[WARN] No checkpoint found, model randomly initialized.                                                                                                                                                    
==> Start Training Epoch 1, lr=0.010000 ...                                                                                                                                                                
loss=0.0157 (0.0188): : 100% 49/49 [00:01<00:00, 40.71it/s]                                                                                                                                                
==> Finished Epoch 1.                                                                                                                                                                                      
==> Start Training Epoch 2, lr=0.010000 ...                                                                                                                                                                
loss=0.0207 (0.0228): : 100% 49/49 [00:01<00:00, 41.23it/s]                                                                                                                                                
==> Finished Epoch 2.                                                                                                                                                                                      
==> Start Training Epoch 3, lr=0.010000 ...                                                                                                                                                                
loss=0.0126 (0.0134): : 100% 49/49 [00:02<00:00, 21.19it/s]                                                                                                                                                
==> Finished Epoch 3.                                                                                                                                                                                      
==> Start Training Epoch 4, lr=0.010000 ...                                                                                                                                                                
loss=0.0071 (0.0076): : 100% 49/49 [00:02<00:00, 22.08it/s]                                                                                                                                                
==> Finished Epoch 4.                                                                                                                                                                                      
==> Start Training Epoch 5, lr=0.010000 ...                                                                                                                                                                
loss=0.0040 (0.0059): : 100% 49/49 [00:02<00:00, 21.86it/s]                                                                                                                                                
==> Finished Epoch 5.                                                                                                                                                                                      
==> Start Training Epoch 6, lr=0.010000 ...                                                                                                                                                                
loss=0.0045 (0.0046): : 100% 49/49 [00:02<00:00, 21.84it/s]                                                                                                                                                
==> Finished Epoch 6.                                                                                                                                                                                      
==> Start Training Epoch 7, lr=0.010000 ...                                                                                                                                                                
loss=0.0035 (0.0048): : 100% 49/49 [00:02<00:00, 21.85it/s]                                                                                                                                                
==> Finished Epoch 7.                                                                                                                                                                                      
==> Start Training Epoch 8, lr=0.010000 ...                                                                                                                                                                
loss=0.0034 (0.0042): : 100% 49/49 [00:02<00:00, 22.42it/s]                                                                                                                                                
==> Finished Epoch 8.                                                                                                                                                                                      
==> Start Training Epoch 9, lr=0.010000 ...                                                                                                                                                                
loss=0.0051 (0.0032): : 100% 49/49 [00:02<00:00, 21.91it/s]                                                                                                                                                
==> Finished Epoch 9.
==> Start Training Epoch 10, lr=0.010000 ...
loss=0.0017 (0.0029): : 100% 49/49 [00:02<00:00, 22.36it/s]
==> Finished Epoch 10.
++> Evaluate at epoch 10 ...
loss=0.0023 (0.0023): : 100% 1/1 [00:00<00:00,  1.18it/s]
PSNR = 22.404175
++> Evaluate epoch 10 Finished.
[INFO] New best result: None --> 0.0023169806227087975
==> Start Training Epoch 11, lr=0.010000 ...
loss=0.0022 (0.0027): : 100% 49/49 [00:02<00:00, 22.20it/s]

from torch-ngp.

Dengzhi-USTC avatar Dengzhi-USTC commented on July 17, 2024

Yes, i download the latest version.
I try this on the v100 and 3090.
The inference part also generate those problem.
Selection_313
Selection_314

from torch-ngp.

Dengzhi-USTC avatar Dengzhi-USTC commented on July 17, 2024

ok, i soved the problem, the vision of my torch is 1.08.

from torch-ngp.

Karbo123 avatar Karbo123 commented on July 17, 2024

@ashawkey Hi, about this strange phenomenon, I've found a simple workaround.
This seems to be caused by not explicitly giving the return value in class _composite_rays(Function).
image
I think it is better to manually adding a return value (for example, return tuple()) to those functions not giving a return value, even if they aren't accessed at all. The same change should be applied to another function class _compact_rays(Function) similarly. I think this is because a lower pytorch exists this bug which is fixed in a higher version.


By the way, about another suggestion. I also notice that it'd be great if you could remove all the indexing='ij' in meshgrid function so that your codes are compatible with a lower pytorch version (notice that ij is already the default arguments and we could remove it).

from torch-ngp.

ashawkey avatar ashawkey commented on July 17, 2024

@Karbo123 Thanks for the solution! I will add it soon. As for the reason of using indexing='ij', it says the default behaviour will be changed in later pytorch versions. I'll write a condition to better support lower versions.

from torch-ngp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.