Comments (6)
Hi, indeed, the 7.8 includes saving so not accurate.
After commenting out these lines, I ran again with --test
on my 2080Ti
https://github.com/ashawkey/torch-ngp/blob/main/nerf/utils.py#L848-L864
So in average it's 200/11=18.18FPS, which is close to your number.
As for why the speed is different, during the implementation I noticed some improvements:
- (I'm not so familiar with cuda so this might be wrong) I pass in
PackedTensor
directly in the kernel, so I don't need to do thelocate
like here, this at least improves readability, and probably some speed? I'm not sure. And here I use inplace addition instead of passing by additional variables. - I keep track of alive indices instead of making additional buffers and use atomicAdd to track here
- Probably the most effective? When marching rays in testing, we allocate a small number each time for each ray and march the rays only that little (the variable step in your code). However not all rays need that many samples. Take step=1 (initial step) for example, we march 1 sample further for each ray, but many rays don't even hit anything, so we can exclude those samples that hits nothing from network evaluation like
Lines 80 to 87 in ebd4539
In your code, you evaluate every samples, among which many are useless. Take the first step as example, there are 800x800x1 samples, but if we mask out the samples that hit nothing, in my experiment only ~200k samples need to be evaluated, and that leads to speedup.
I can only think of these small things that are different, but unsure if they are really the difference.
from ngp_pl.
Yes I'm doing so! It seems the T_threshold
also matters.
1e-2
: 33.591e-4
: 28.411e-2
w/o masking (point 3): 33.301e-4
w/o masking (point 3): 28.10
(The results are from a TITAN RTX)
by w/o masking
I modified the code to:
sigmas, rgbs = model(xyzs, dirs)
sigmas = sigmas.contiguous().float()
rgbs = rgbs.contiguous().float()
Some quick experiments on torch-ngp also show point 2 and 3 may be not very effective.
I'm not very sure about the difference between PackedTensor
and data_ptr
for point 1, but the indexing overhead should be small.
Also, this notebook only measures rendering time, but the data loading strategy also matters.
ngp_pl preprocess images into rays, while torch_ngp generates rays on-the-fly (a time-memory trade-off).
Finally, thanks for the information and help! It seems still difficult to match native instant-ngp's speed while keeping enough flexibility.
P.S. a little name inconsistency in released ckpts, it should be epoch=19_slim.ckpt
to match the notebook.
from ngp_pl.
I just noticed a big bottleneck in data loading, I fixed it and now it trains 1.5x faster. I will re-benchmark everything
from ngp_pl.
Thanks for the detailed explanation! I'll experiment on them later.
from ngp_pl.
A notebook test.ipynb and a pretrained model is uploaded, you can use them to evaluate and measure the fps.
from ngp_pl.
Whether to consider using volrender to improve inference speed?
https://github.com/sxyu/volrend
That is to say, the output format needs to be converted into the form of plenoctree?
https://github.com/sxyu/plenoctree
from ngp_pl.
Related Issues (20)
- vren HOT 2
- An error of configuration environment during cmake
- debug .cu
- Poor reconstruction with white background in real dataset
- How to calculate the bbox for a custom recored NSVF dataset?
- do you used NDC in llff?
- about show_gui.py
- Train Result
- Volume rendering gradient equation
- RayMarcher的backward是不必要的 HOT 3
- Ambient Occlusion (AO) using the ([Instant-NGP framework]
- Question for the occupancy grid code of Raymatching.cu HOT 1
- def nerf_matrix_to_ngp(pose, scale=0.33, offset=[0, 0, 0]): new_pose = np.array([ [pose[1, 0], -pose[1, 1], -pose[1, 2], pose[1, 3] * scale + offset[0]], [pose[2, 0], -pose[2, 1], -pose[2, 2], pose[2, 3] * scale + offset[1]], [pose[0, 0], -pose[0, 1], -pose[0, 2], pose[0, 3] * scale + offset[2]], [0, 0, 0, 1], ], dtype=np.float32) return new_pose。What is the purpose of the above operation in Instant-Ngp, and how to adjust it accordingly based on the camera pose of your own dataset
- Use COLMAP depth for additional supervised loss
- Optimize extrinsics HOT 1
- How can i get rays_d from xyzs?
- Question about the structure of network
- questions about --scale and N_max
- `Trainer.fit` stopped: `max_epochs=30` reached. HOT 1
- Zero samples got into RuntimeError
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ngp_pl.