Code Monkey home page Code Monkey logo

Comments (6)

kwea123 avatar kwea123 commented on June 15, 2024 1

Hi, indeed, the 7.8 includes saving so not accurate.

After commenting out these lines, I ran again with --test on my 2080Ti
https://github.com/ashawkey/torch-ngp/blob/main/nerf/utils.py#L848-L864

And this is the result:
image

So in average it's 200/11=18.18FPS, which is close to your number.

As for why the speed is different, during the implementation I noticed some improvements:

  1. (I'm not so familiar with cuda so this might be wrong) I pass in PackedTensor directly in the kernel, so I don't need to do the locate like here, this at least improves readability, and probably some speed? I'm not sure. And here I use inplace addition instead of passing by additional variables.
  2. I keep track of alive indices instead of making additional buffers and use atomicAdd to track here
  3. Probably the most effective? When marching rays in testing, we allocate a small number each time for each ray and march the rays only that little (the variable step in your code). However not all rays need that many samples. Take step=1 (initial step) for example, we march 1 sample further for each ray, but many rays don't even hit anything, so we can exclude those samples that hits nothing from network evaluation like
    valid_mask = ~torch.all(dirs==0, dim=1)
    if valid_mask.sum()==0: break
    sigmas = torch.zeros(len(xyzs), device=device)
    rgbs = torch.zeros(len(xyzs), 3, device=device)
    _sigmas, _rgbs = model(xyzs[valid_mask], dirs[valid_mask])
    sigmas[valid_mask] = _sigmas.float()
    rgbs[valid_mask] = _rgbs.float()

    In your code, you evaluate every samples, among which many are useless. Take the first step as example, there are 800x800x1 samples, but if we mask out the samples that hit nothing, in my experiment only ~200k samples need to be evaluated, and that leads to speedup.

I can only think of these small things that are different, but unsure if they are really the difference.

from ngp_pl.

ashawkey avatar ashawkey commented on June 15, 2024 1

Yes I'm doing so! It seems the T_threshold also matters.

  • 1e-2: 33.59
  • 1e-4: 28.41
  • 1e-2 w/o masking (point 3): 33.30
  • 1e-4 w/o masking (point 3): 28.10

(The results are from a TITAN RTX)

by w/o masking I modified the code to:

        sigmas, rgbs = model(xyzs, dirs)
        sigmas = sigmas.contiguous().float()
        rgbs = rgbs.contiguous().float()

Some quick experiments on torch-ngp also show point 2 and 3 may be not very effective.
I'm not very sure about the difference between PackedTensor and data_ptr for point 1, but the indexing overhead should be small.

Also, this notebook only measures rendering time, but the data loading strategy also matters.
ngp_pl preprocess images into rays, while torch_ngp generates rays on-the-fly (a time-memory trade-off).

Finally, thanks for the information and help! It seems still difficult to match native instant-ngp's speed while keeping enough flexibility.

P.S. a little name inconsistency in released ckpts, it should be epoch=19_slim.ckpt to match the notebook.

from ngp_pl.

kwea123 avatar kwea123 commented on June 15, 2024 1

I just noticed a big bottleneck in data loading, I fixed it and now it trains 1.5x faster. I will re-benchmark everything

from ngp_pl.

ashawkey avatar ashawkey commented on June 15, 2024

Thanks for the detailed explanation! I'll experiment on them later.

from ngp_pl.

kwea123 avatar kwea123 commented on June 15, 2024

A notebook test.ipynb and a pretrained model is uploaded, you can use them to evaluate and measure the fps.

from ngp_pl.

SSground avatar SSground commented on June 15, 2024

Whether to consider using volrender to improve inference speed?
https://github.com/sxyu/volrend
That is to say, the output format needs to be converted into the form of plenoctree?
https://github.com/sxyu/plenoctree

from ngp_pl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.