fabiotosi92 / nerf-supervised-deep-stereo Goto Github PK
View Code? Open in Web Editor NEWA novel paradigm for collecting and generating stereo training data using neural rendering
Home Page: https://nerfstereo.github.io/
License: MIT License
A novel paradigm for collecting and generating stereo training data using neural rendering
Home Page: https://nerfstereo.github.io/
License: MIT License
Thank you very much for your work!
I'd like to ask a question about the evaluation on 3nerf dataset. As I run for 100 random photos with baseline 0.50 - the obtained results seem to be relatively poor.
EPE: 2.5572
bad 1.0: 41.63%
bad 2.0: 19.63%
bad 3.0: 12.59%
While running on random 100 photos with baseline 0.10 seem to be much better
EPE: 0.3576
bad 1.0: 3.93%
bad 2.0: 1.70%
bad 3.0: 1.06%
Should I do some disparity preprocessing steps before evaluation to obtain good results? Should some additional preprocessing steps be considered while training?
thank your share so perfect project,
I have a question about the uploaded video, what conditions must be met for the video, if I need to generate video data of portraits, how to shoot the video will be better,
What kind of video will cause failure?
Thank you very much for the great work!
I just want to ask about the model's predictions. While running raft-stereo the resulting values are negative while the datasets' disparities are positive. Does that mean that we additionally need to preprocess input disparities as in raft-stereo (flow = np.stack([-disp, np.zeros_like(disp)], axis=-1)....)?
Could you share the code of generating disparity map from depth map ?
Hey team, great work!
Will you share the pipeline for stereo image creation from single image? Let me know if it is uploaded and I am missing something
Hi,Can you provide me with the code snippet to capture the depth, I'm having trouble with instant-ngp! help @fabiotosi92
thank your NeRF-Supervised-Deep-Stereo !! i like it very much !!
but now, i have a question :
i have left.jpg right.jpg disp0.pfm calib.txt
and use colmap generate : poses\colmap_sparse colmap_text colmap.db
i use : python test.py ... , generate .../outdir/0.jpg 0.npy
i use : python demo.py ... , generate .../outdir/disparity_map.png
now , how generate center view on my stereo pairs scene by baseline ?
what code? what python packages?
Thank you for your contribution.
I have a question to ask you.
Is this code wrong? Why did you input irrelevant args to the RAFTStereo and PSMNet models?
def load_pretrained_model(args): print('Load pretrained model') model = None if args.model == 'raft-stereo': model = RAFTStereo(args) elif args.model == 'psmnet': model = PSMNet(args.maxdisp) else: print('Invalid model selected.') exit()
When I'm halfway through the download, the download is interrupted. I've tried it a few times, and it's all like this. I'm guessing it seems like there's a problem with the data source.
Hello, I have a question about when use NGP to render depth. It seems that NERF uses the distance of the sampling point to render the depth. It uses the distance from (x, y, z) to the origin (0, 0, 0) instead of the depth of z. So is it correct to use NERF to render the depth to disp here? Is there a problem?
Hi, an error occured when i use the generated transforms_left.json to render the corresponding left view images, like
C:\ProgramData\anaconda3\python.exe D:/instant-ngp/scripts/run.py
16:23:24 SUCCESS Initialized CUDA 11.6. Active GPU is #0: NVIDIA GeForce RTX 3060 Laptop GPU [86]
16:23:24 INFO Loading NeRF dataset from
16:23:24 INFO ..\data\nerf\test_3\transforms.json
16:23:24 PROGRESS [ ] 0% ( 0/103) 0s/inf
16:23:24 PROGRESS [] 1% ( 1/103) 0s/2s
16:23:24 PROGRESS [] 2% ( 2/103) 0s/1s
16:23:24 PROGRESS [] 3% ( 3/103) 0s/0s
16:23:24 PROGRESS [] 4% ( 4/103) 0s/0s
16:23:24 PROGRESS [] 5% ( 5/103) 0s/0s
16:23:24 PROGRESS [] 6% ( 6/103) 0s/0s
16:23:24 PROGRESS [] 7% ( 7/103) 0s/0s
16:23:24 PROGRESS [] 8% ( 8/103) 0s/0s
16:23:24 PROGRESS [] 9% ( 9/103) 0s/0s
16:23:24 PROGRESS [] 10% ( 10/103) 0s/0s
16:23:24 PROGRESS [] 11% ( 11/103) 0s/0s
16:23:24 PROGRESS [] 12% ( 12/103) 0s/0s
16:23:24 PROGRESS [] 13% ( 13/103) 0s/0s
16:23:24 PROGRESS [] 14% ( 14/103) 0s/0s
16:23:24 PROGRESS [] 15% ( 15/103) 0s/0s
16:23:24 PROGRESS [] 16% ( 16/103) 0s/0s
16:23:24 PROGRESS [] 17% ( 18/103) 0s/0s
16:23:24 PROGRESS [] 17% ( 17/103) 0s/0s
16:23:24 PROGRESS [] 18% ( 19/103) 0s/0s
16:23:24 PROGRESS [] 19% ( 20/103) 0s/0s
16:23:24 PROGRESS [] 20% ( 21/103) 0s/0s
16:23:24 PROGRESS [] 21% ( 22/103) 0s/0s
16:23:24 PROGRESS [] 22% ( 23/103) 0s/0s
16:23:24 PROGRESS [] 23% ( 24/103) 0s/0s
16:23:24 PROGRESS [] 24% ( 25/103) 0s/0s
16:23:24 PROGRESS [] 25% ( 26/103) 0s/0s
16:23:24 PROGRESS [] 26% ( 27/103) 0s/0s
16:23:24 PROGRESS [] 27% ( 28/103) 0s/0s
16:23:24 PROGRESS [] 28% ( 29/103) 0s/0s
16:23:24 PROGRESS [] 29% ( 30/103) 0s/0s
16:23:24 PROGRESS [] 30% ( 31/103) 0s/0s
16:23:24 PROGRESS [] 31% ( 32/103) 0s/0s
16:23:24 PROGRESS [] 32% ( 33/103) 0s/0s
16:23:24 PROGRESS [] 33% ( 34/103) 0s/0s
16:23:24 PROGRESS [] 34% ( 35/103) 0s/0s
16:23:24 PROGRESS [] 35% ( 36/103) 0s/0s
16:23:24 PROGRESS [] 36% ( 37/103) 0s/0s
16:23:24 PROGRESS [] 37% ( 38/103) 0s/0s
16:23:24 PROGRESS [] 38% ( 39/103) 0s/0s
16:23:24 PROGRESS [] 39% ( 40/103) 0s/0s
16:23:24 PROGRESS [] 40% ( 41/103) 0s/0s
16:23:24 PROGRESS [] 41% ( 42/103) 0s/0s
16:23:24 PROGRESS [] 42% ( 43/103) 0s/0s
16:23:24 PROGRESS [] 43% ( 44/103) 0s/0s
16:23:24 PROGRESS [] 44% ( 45/103) 0s/0s
16:23:24 PROGRESS [] 45% ( 46/103) 0s/0s
16:23:24 PROGRESS [] 46% ( 47/103) 0s/0s
16:23:24 PROGRESS [] 47% ( 48/103) 0s/0s
16:23:24 PROGRESS [] 48% ( 49/103) 0s/0s
16:23:24 PROGRESS [] 49% ( 50/103) 0s/0s
16:23:24 PROGRESS [] 50% ( 51/103) 0s/0s
16:23:24 PROGRESS [] 50% ( 52/103) 0s/0s
16:23:24 PROGRESS [] 51% ( 53/103) 0s/0s
16:23:24 PROGRESS [] 52% ( 54/103) 0s/0s
16:23:24 PROGRESS [] 53% ( 55/103) 0s/0s
16:23:24 PROGRESS [] 54% ( 56/103) 0s/0s
16:23:24 PROGRESS [] 55% ( 57/103) 0s/0s
16:23:24 PROGRESS [] 56% ( 58/103) 0s/0s
16:23:24 PROGRESS [] 57% ( 59/103) 0s/0s
16:23:24 PROGRESS [] 58% ( 60/103) 0s/0s
16:23:24 PROGRESS [] 59% ( 61/103) 0s/0s
16:23:24 PROGRESS [] 60% ( 62/103) 0s/0s
16:23:24 PROGRESS [] 61% ( 63/103) 0s/0s
16:23:24 PROGRESS [] 62% ( 64/103) 0s/0s
16:23:24 PROGRESS [] 63% ( 65/103) 0s/0s
16:23:24 PROGRESS [] 64% ( 66/103) 0s/0s
16:23:24 PROGRESS [] 65% ( 67/103) 0s/0s
16:23:24 PROGRESS [] 66% ( 68/103) 0s/0s
16:23:24 PROGRESS [] 67% ( 69/103) 0s/0s
16:23:24 PROGRESS [] 68% ( 70/103) 0s/0s
16:23:24 PROGRESS [] 69% ( 71/103) 0s/0s
16:23:24 PROGRESS [] 70% ( 72/103) 0s/0s
16:23:24 PROGRESS [] 71% ( 73/103) 0s/0s
16:23:24 PROGRESS [] 72% ( 74/103) 0s/0s
16:23:24 PROGRESS [] 73% ( 75/103) 0s/0s
16:23:24 PROGRESS [] 74% ( 76/103) 0s/0s
16:23:24 PROGRESS [] 75% ( 77/103) 0s/0s
16:23:24 PROGRESS [] 76% ( 78/103) 0s/0s
16:23:24 PROGRESS [] 77% ( 79/103) 0s/0s
16:23:24 PROGRESS [] 78% ( 80/103) 0s/0s
16:23:24 PROGRESS [] 79% ( 81/103) 0s/0s
16:23:24 PROGRESS [] 80% ( 82/103) 0s/0s
16:23:24 PROGRESS [] 81% ( 83/103) 0s/0s
16:23:24 PROGRESS [] 82% ( 84/103) 0s/0s
16:23:24 PROGRESS [] 83% ( 85/103) 0s/0s
16:23:24 PROGRESS [] 83% ( 86/103) 0s/0s
16:23:24 PROGRESS [] 84% ( 87/103) 0s/0s
16:23:24 PROGRESS [] 85% ( 88/103) 0s/0s
16:23:24 PROGRESS [] 86% ( 89/103) 0s/0s
16:23:24 PROGRESS [] 87% ( 90/103) 0s/0s
16:23:24 PROGRESS [] 88% ( 91/103) 0s/0s
16:23:24 PROGRESS [] 89% ( 92/103) 0s/0s
16:23:24 PROGRESS [] 90% ( 93/103) 0s/0s
16:23:24 PROGRESS [] 91% ( 94/103) 0s/0s
16:23:24 PROGRESS [] 92% ( 95/103) 0s/0s
16:23:24 PROGRESS [] 93% ( 96/103) 0s/0s
16:23:24 PROGRESS [] 94% ( 97/103) 0s/0s
16:23:24 PROGRESS [] 95% ( 98/103) 0s/0s
16:23:24 PROGRESS [] 96% ( 99/103) 0s/0s
16:23:24 PROGRESS [] 97% (100/103) 0s/0s
16:23:24 PROGRESS [] 98% (101/103) 0s/0s
16:23:24 PROGRESS [] 99% (102/103) 0s/0s
16:23:24 PROGRESS [] 100% (103/103) 0s/0s
16:23:24 SUCCESS Loaded 103 images after 0s
16:23:24 INFO cam_aabb=[min=[0.676668,0.907796,0.517987], max=[1.76123,1.74558,0.722369]]
16:23:24 INFO Loading network snapshot from: ..\data\nerf\test_3\test_3.ingp
16:23:25 INFO GridEncoding: Nmin=16 b=2.97199 F=4 T=2^19 L=8
16:23:25 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
16:23:25 INFO Color model: 3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
16:23:25 INFO total_encoding_params=13041664 total_network_params=10240
Screenshot transforms from ../data/nerf/test_3/output/left_transforms/test_3_transform_left.json
range(0, 103)
Traceback (most recent call last):
File "D:\instant-ngp\scripts\run.py", line 396, in <module>
cam_matrix = f.get("transform_matrix", f["transform_matrix_start"])
~^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'transform_matrix_start'
How you solved the problem? Could youl please give some advice?
Thank you for your great work
In the process of data production, how to use colmap? Does colmap use the original code? Is every frame of the video used?
Hello, I used the following steps to render some datasets, but the polarity of the rendered dataset is not on a horizontal straight line. Can you help me see what the problem is?
May I ask if there are any issues with the above steps or if any necessary steps are missing, resulting in the polarity not being horizontal.
Thanks for your great work, I would like to know how to calculate AO, can you provide the formula please? The formula provided in your paper is wrong. The confidence map obtained by the formula provided in the paper is almost all 1.
Thank you for this great work. However, I encountered a problem while building my own dataset. I can use your code to generate a new transforms.json file and train Instant NGP successfully. But I'm wondering how to export RGB and AO/depth images from the trained Instant NGP. Do I need to use transforms.json and transforms_left.json separately to generate different training results for nerf? By the way, I am using the GUI on Windows.
Thanks for the excellent work and contribution!
I have a little question about preparing my own dataset for stereo training.
As you mentioned in the Supplementary Material, "As a pre-processing step, we adjust the rendered disparity maps generated by Instant-NGP by fitting a scale-shift pair of values for each triplet", could you please provide the code/script for the disparity compensation optimization operation? Looking forward for the reply! Thanks again!
I am trying to run the demo.py. I have follwed the instructions but I get lots of errors. Like this:
File "/home/ai/Documents/nerfstereo/demo.py", line 15
from models.raft-stereo import RAFTStereo
^
SyntaxError: invalid syntax
So I change name and code to raft_stereo.
then new error
File "/home/ai/Documents/nerfstereo/demo.py", line 114, in
main()
File "/home/ai/Documents/nerfstereo/demo.py", line 85, in main
model = load_pretrained_model(args)
File "/home/ai/Documents/nerfstereo/demo.py", line 21, in load_pretrained_model
model = RAFTStereo(args)
File "/home/ai/Documents/nerfstereo/models/raft_stereo.py", line 27, in init
context_dims = args.hidden_dims
AttributeError: 'Namespace' object has no attribute 'hidden_dims'
Hi, Could you please provide me with the correct formula for calculating AO?
Originally posted by @Liyunfengabc in #24 (comment)
Hi, i got a 4 channel image(H×W×4) like
when i set depth mode
testbed.render_mode = ngp.Depth
how to convert this image(shift=0.2) into a corresponding disparity map by the formular: disparity = baseline * focal / depth, as the shift(baseline) 's range may not be the same with the depth obtained above?
I have generated some datasets using a similar method, but I am unsure how to determine if the generated datasets have any issues. To address this, I have used the trilateral photometric loss mentioned in the paper to evaluate the generated datasets. Is this approach reasonable? Or, when you generate datasets, do you have specific metrics to evaluate the generated datasets, or do you only consider their performance as datasets? I am concerned that after creating the complete dataset, I may find that the results are not satisfactory, indicating a problem in a previous step. How do you avoid this issue in the process of dataset creation?
Hello, I download the training set and get a total number of 79,584 triplets. However, as mentioned in Section 4.1 in your paper, the number should be 65,148. I wonder what is the reason for the difference in the numbers, thank you.
Here is the training file I used: train.txt
Thanks for your wonderful job! But I can not download the dataset from the link https://amsacta.unibo.it/id/eprint/7218/
Thanks for your great work.
Can you provide the focal length with a separate file?
Downloading the stereo dataset takes a long time, so downloading the RAW dataset will take the same long time, I just need the focus length to generate the corresponding depth.
This also will be helpful for others.
Thanks for your help!
Hi Authors!
Really great work!
I have a query regarding creation of the stereo pairs and disparity from custom dataset. I am using nerfstudio to create my nerfs. How can I now begin to extract the stereo pairs?
how calculate camera parameter ?
Thank you for your outstanding work. I would like to know if the mentioned loss functions in the paper are used in stereo matching networks like PSMNet. If I need to train on my own, do I simply need to replace the loss function in the backbone with the loss functions from this repository?
I noticed in the supplementary material of the paper that the disparity obtained from nerf has some discrepancy compared to raft stereo and sgm. They corrected this error by training a scale and bias.
I trained the dataset you provided using the triplet loss mentioned in the paper. I tried different learning rates and epochs, but most of the results didn't show much improvement (I compared them with cre stereo).
I would like to know if the datasets provided by you are already rectified?
Thank you very much.
Thank you very much for sharing your excellent work.
We are working on implementing code for training stereo networks. According to your paper, the augmentation procedure described in RAFT-Stereo is used for training. We notice there is an augmentation function named eraser_transform in RAFT-Stereo, which erases random regions in the right image.
def eraser_transform(self, img1, img2):
ht, wd = img1.shape[:2]
if np.random.rand() < self.eraser_aug_prob:
mean_color = np.mean(img2.reshape(-1, 3), axis=0)
for _ in range(np.random.randint(1, 3)):
x0 = np.random.randint(0, wd)
y0 = np.random.randint(0, ht)
dx = np.random.randint(50, 100)
dy = np.random.randint(50, 100)
img2[y0:y0+dy, x0:x0+dx, :] = mean_color
return img1, img2
We are not sure if this function is confilict with the Triplet Photometric Loss in your paper, which backward-warps right/left image. So our quesion is:
It will also be very helpful if you could share the full augmentation code, thank you.
Hi there,
Firstly, I want to express my appreciation for the excellent work you've been doing.
I've been following the discussions around the reconstruction scales in colmap. I've noticed that the reconstruction in Instant-NGP, along with the rendered depth, maybe involves an arbitrary scale, leading to potential variations in depth scales across different scenes. This becomes particularly pronounced when considering scenes of similar physical size but reconstructed with different scales.
My main query revolves around the selection of three virtual baselines (b = 0.5, 0.3, 0.1 units) for data generation across all scenes, as mentioned in your paper. Considering that scenes, such as A and B, may have distinct reconstruction scales in colmap, resulting in different depth ranges, I'm curious about the reasoning behind using the same baselines for all scenes. Given the potential disparity in depth range caused by different reconstruction scales, how does the uniform application of baselines account for this variability?
I appreciate your time and insights into this matter.
Thank you in advance!
I would like to test your wonderful paper with some scenes of my own and I wonder how much time and what gpu you trained on? You trained on .5 Mpix, did you try wirh higher resolutions also?
Hello, I've uploaded my code for training RAFT-Stereo with NeRF supervision: https://github.com/husheng12345/Unofficial-NeRF-Supervised-Deep-Stereo.
Despite my best efforts to replicate the experimental setup as delineated in the paper, there exists a discrepancy between the model obtained from my training scripts and the provided pretrained weights.
Model | KITTI-15 (>3px All) |
Midd-T Full (>2px All) |
---|---|---|
Official pretrained weights | 5.41 | 16.38 |
Trained with my scripts | 6.06 | 22.36 |
Would you be able to offer some guidance on which of my training hyperparameters might not be appropriately set? Thank you.
Thank you for your contribution.
I have a question to ask you.
Can you give me the structure of the folders in Setup Instructions. This step is confusing and causes problems in operation.
Thank you again for your contribution and I look forward to your reply.
Hello, I've found a potential bug regarding the storage method of AO maps.
The range of AO is [0,1], it's multiplied by 65536 and then saved as a uint16 PNG image. However, when AO equals exactly 1, 65536 exceeds the maximum value of uint16. This results in AO being incorrectly stored as 0.
Here's a visualization example; the white areas in the second image represent where AO=0.
(0005/Q/AO/IMG_20220818_180012.png)
Thank you for sharing of your nice work!
Inspired by your work, I finetuned the model to apply this to another domain. Unfortunately, fine-tuning failed. In order to check whether it is a domain problem, we fine-tuned the model on the provided NeRF-stereo triplet dataset, but failed as well. (Detection of texture rather than object boundary)
Because there is no code for the training, I used the same hyperparameter and augmentation procedures as RAFT-stereo, as written in your paper. Is there anything else to note for training? If you have any tips for training, please give me some advice.
I downloaded a portion of the dataset, specifically v1_part1, and I used the following code to read the disparity maps from it.
transform = transforms.ToTensor()
for file in depth_files:
if file.endswith('.png'):
depth_image = Image.open(os.path.join(depth_folder, file))
depth_image = transform(depth_image).squeeze().to(device)
print(depth_image.shape)
print(depth_image.max(), depth_image.min())
Some print results are:
torch.Size([522, 1160]) tensor(16301, dtype=torch.int32) tensor(1907, dtype=torch.int32)
torch.Size([522, 1160]) tensor(9634, dtype=torch.int32) tensor(2407, dtype=torch.int32) torch.Size([522, 1160])
tensor(11203, dtype=torch.int32) tensor(2654, dtype=torch.int32)
The disparity values stored in PNG files range from over 1000 to over 10000, and I am confused about such values. Aren't the values in PNG supposed to be within the range of 0-255 or 0-1? Also, are these values representing disparity? Why are they so large?
Hello, I test the pretrained RAFT-Stereo model using test.py, here are the results I get:
KITTI-15 All
EPE: 1.4704
bad 1.0: 26.43%
bad 2.0: 9.43%
bad 3.0: 5.56%
Midd-T F All
EPE: 9.1773
bad 1.0: 26.56%
bad 2.0: 18.44%
bad 3.0: 15.57%
I notice these results are slightly different from results reported in Table 6 in your paper. I wonder if there is something wrong with my code. I can upload the full test code I used if necessary. Thank you.
@fabiotosi92 Hello, this project is really great! I downloaded stereo_dataset_v1_part14 and parsed the dataset from your instructions and found the disparity map is not exactly right, could you please double check the dataset?
THanks for your great work.
I wonder when will release code & dataset ? I am very excited to see your great work
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.