Code Monkey home page Code Monkey logo

Comments (7)

FangjinhuaWang avatar FangjinhuaWang commented on June 1, 2024 1

During sparse reconstruction, we render depth and rgb at a 'virtual' viewpoint that is around a given view. During full view reconstruction, since we evaluate depth map metric and view synthesis (in supp.), we render depth and rgb at the reference view and do not use the gt rgb at reference view (i.e. source view only). These are all discussed in the paper.

from volrecon.

FangjinhuaWang avatar FangjinhuaWang commented on June 1, 2024

The training is similar to IBRNet and sparseneus, only neighboring views (i.e. 'source views') are used to render color and depth at a query viewpoint. (i.e. 'reference view')

from volrecon.

GuangyuWang99 avatar GuangyuWang99 commented on June 1, 2024

Sorry for the misunderstanding! After carefully checking the code again, I do find that in training the definition of 'reference view' and 'source views' follows the convention of MVSNet (i.e. 'pair.txt'). When performing geometry reconstruction, the reference-view and source views are jointly taken as inputs (Line 249 & 250 in 'dtu_test_sparse.py'), which is the same as MVS methods. However, the reference view should be removed if we want to test the novel-view synthesis performance, in a way similar to IBRNet.

from volrecon.

SeaBird-Go avatar SeaBird-Go commented on June 1, 2024

@FangjinhuaWang , sorry to ask quesions in this closed issue again. I also feel confused about the views.
I want to know what the exact meanings of sparse reconstruction and full view reconstruction?

From the paper, I know the sparse reconstruction means we only use very few (3 in paper) to reconstruct the mesh. But the mesh maybe not a 360 degree complete mesh, right?

And from the Sec. 4.2, I guess the full view reconstruction means you use all of the 49 depth maps, and then fuse them into a point cloud to compute the metrics. But in the novel view syntheis (in supp.), you mentioned that you only use 4 input views to do rendering using the same dataset settings as full view reconstruction. So what the meanings of the 4 input views?

from volrecon.

FangjinhuaWang avatar FangjinhuaWang commented on June 1, 2024

Let's say the 49 viewpoints are: I0, I1, ... I48. In full view reconstruction, we render rgb and depth at each viewpoint. During rendering for each viewpoint, we choose 4 source views as input. For example, when rendering I0, we may use the four known images I1, I2, I3 and I4. When rendering I10, we may use another set of known images, e.g. I8, I9, I11, I12. In experiment, we use the four views with highest view selection score.

from volrecon.

SeaBird-Go avatar SeaBird-Go commented on June 1, 2024

Thanks for your quickly reply! It makes me understand the full view reconstruction very well.

BTW, during the sparse view reconstruction, you only use 3 input views, and try to fit the SRDF to infer the rendering rgb and depth maps.
So I want to ask why you need to define a virtual rendering viewpoint by shifting the original camera coordinate frame for d = 25mm along its x-axis. Just to validate whether the learned model could adapt to different viewpoints?

from volrecon.

FangjinhuaWang avatar FangjinhuaWang commented on June 1, 2024

If we render at a given viewpoint I_0, then the projected 2D features in this viewpoint will always be the same for all samples along a ray. Since the pipeline is similar as novel view synthesis, we need to render at 'novel viewpoints'. The offset d=25mm is randomly chosen and is reasonable to form a stereo rig. If d is too large, then there will exist huge occlusion. You can adjust this value, render more virtual views and then fuse all of the depth maps.

from volrecon.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.