fabiotosi92 / nerf-supervised-deep-stereo Goto Github PK

View Code? Open in Web Editor NEW

340.0 340.0 18.0 129.8 MB

A novel paradigm for collecting and generating stereo training data using neural rendering

Home Page: https://nerfstereo.github.io/

License: MIT License

Python 94.34% Shell 5.66%

cvpr2023 deep-stereo deep-stereo-network nerf stereo-matching

nerf-supervised-deep-stereo's People

Contributors

Stargazers

Watchers

Forkers

hosseinjavidnia terrisgo mcx kaihsiangl anhtu293 phoenixhuawei xtcl bruinxiong aiforworlds ancmelon kfclpy pablodawson renkeneng whuhxb caracal117 dejavusgg zhangxiao00

nerf-supervised-deep-stereo's Issues

Metrics results

Thank you very much for your work!

I'd like to ask a question about the evaluation on 3nerf dataset. As I run for 100 random photos with baseline 0.50 - the obtained results seem to be relatively poor.

EPE: 2.5572
bad 1.0: 41.63%
bad 2.0: 19.63%
bad 3.0: 12.59%

While running on random 100 photos with baseline 0.10 seem to be much better
EPE: 0.3576
bad 1.0: 3.93%
bad 2.0: 1.70%
bad 3.0: 1.06%

Should I do some disparity preprocessing steps before evaluation to obtain good results? Should some additional preprocessing steps be considered while training?

What should be paid attention to in video capture？

thank your share so perfect project，
I have a question about the uploaded video, what conditions must be met for the video, if I need to generate video data of portraits, how to shoot the video will be better，
What kind of video will cause failure？

can not download the dataset from china, download speed is less than 50kb/s, do you have any plan to upload dataset to google drive?

model's predictions

Thank you very much for the great work!
I just want to ask about the model's predictions. While running raft-stereo the resulting values are negative while the datasets' disparities are positive. Does that mean that we additionally need to preprocess input disparities as in raft-stereo (flow = np.stack([-disp, np.zeros_like(disp)], axis=-1)....)?

The generation of disparity map

Could you share the code of generating disparity map from depth map ？

Stereo image generation

Hey team, great work!

Will you share the pipeline for stereo image creation from single image? Let me know if it is uploaded and I am missing something

About instant-ngp depth

Hi，Can you provide me with the code snippet to capture the depth, I'm having trouble with instant-ngp! help @fabiotosi92

how generate center view on my stereo pairs scene ?

thank your NeRF-Supervised-Deep-Stereo !! i like it very much !!

but now, i have a question :
i have left.jpg right.jpg disp0.pfm calib.txt
and use colmap generate : poses\colmap_sparse colmap_text colmap.db

i use : python test.py ... , generate .../outdir/0.jpg 0.npy
i use : python demo.py ... , generate .../outdir/disparity_map.png

now , how generate center view on my stereo pairs scene by baseline ?
what code? what python packages?

About input args for RAFTStereo and PSMNet models

Thank you for your contribution.
I have a question to ask you.
Is this code wrong? Why did you input irrelevant args to the RAFTStereo and PSMNet models?
def load_pretrained_model(args): print('Load pretrained model') model = None if args.model == 'raft-stereo': model = RAFTStereo(args) elif args.model == 'psmnet': model = PSMNet(args.maxdisp) else: print('Invalid model selected.') exit()

I can't download the dataset.Is there a problem with the data source？

When I'm halfway through the download, the download is interrupted. I've tried it a few times, and it's all like this. I'm guessing it seems like there's a problem with the data source.

about the depth of NERF

Hello, I have a question about when use NGP to render depth. It seems that NERF uses the distance of the sampling point to render the depth. It uses the distance from (x, y, z) to the origin (0, 0, 0) instead of the depth of z. So is it correct to use NERF to render the depth to disp here? Is there a problem?

[Instant NGP render error]

Hi, an error occured when i use the generated transforms_left.json to render the corresponding left view images, like

C:\ProgramData\anaconda3\python.exe D:/instant-ngp/scripts/run.py
16:23:24 SUCCESS  Initialized CUDA 11.6. Active GPU is #0: NVIDIA GeForce RTX 3060 Laptop GPU [86]
16:23:24 INFO     Loading NeRF dataset from
16:23:24 INFO       ..\data\nerf\test_3\transforms.json
16:23:24 PROGRESS [                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ]   0% (  0/103)  0s/inf
16:23:24 PROGRESS []   1% (  1/103) 0s/2s
16:23:24 PROGRESS []   2% (  2/103) 0s/1s
16:23:24 PROGRESS []   3% (  3/103) 0s/0s
16:23:24 PROGRESS []   4% (  4/103) 0s/0s
16:23:24 PROGRESS []   5% (  5/103) 0s/0s
16:23:24 PROGRESS []   6% (  6/103) 0s/0s
16:23:24 PROGRESS []   7% (  7/103) 0s/0s
16:23:24 PROGRESS []   8% (  8/103) 0s/0s
16:23:24 PROGRESS []   9% (  9/103) 0s/0s
16:23:24 PROGRESS []  10% ( 10/103) 0s/0s
16:23:24 PROGRESS []  11% ( 11/103) 0s/0s
16:23:24 PROGRESS []  12% ( 12/103) 0s/0s
16:23:24 PROGRESS []  13% ( 13/103) 0s/0s
16:23:24 PROGRESS []  14% ( 14/103) 0s/0s
16:23:24 PROGRESS []  15% ( 15/103) 0s/0s
16:23:24 PROGRESS []  16% ( 16/103) 0s/0s
16:23:24 PROGRESS []  17% ( 18/103) 0s/0s
16:23:24 PROGRESS []  17% ( 17/103) 0s/0s
16:23:24 PROGRESS []  18% ( 19/103) 0s/0s
16:23:24 PROGRESS []  19% ( 20/103) 0s/0s
16:23:24 PROGRESS []  20% ( 21/103) 0s/0s
16:23:24 PROGRESS []  21% ( 22/103) 0s/0s
16:23:24 PROGRESS []  22% ( 23/103) 0s/0s
16:23:24 PROGRESS []  23% ( 24/103) 0s/0s
16:23:24 PROGRESS []  24% ( 25/103) 0s/0s
16:23:24 PROGRESS []  25% ( 26/103) 0s/0s
16:23:24 PROGRESS []  26% ( 27/103) 0s/0s
16:23:24 PROGRESS []  27% ( 28/103) 0s/0s
16:23:24 PROGRESS []  28% ( 29/103) 0s/0s
16:23:24 PROGRESS []  29% ( 30/103) 0s/0s
16:23:24 PROGRESS []  30% ( 31/103) 0s/0s
16:23:24 PROGRESS []  31% ( 32/103) 0s/0s
16:23:24 PROGRESS []  32% ( 33/103) 0s/0s
16:23:24 PROGRESS []  33% ( 34/103) 0s/0s
16:23:24 PROGRESS []  34% ( 35/103) 0s/0s
16:23:24 PROGRESS []  35% ( 36/103) 0s/0s
16:23:24 PROGRESS []  36% ( 37/103) 0s/0s
16:23:24 PROGRESS []  37% ( 38/103) 0s/0s
16:23:24 PROGRESS []  38% ( 39/103) 0s/0s
16:23:24 PROGRESS []  39% ( 40/103) 0s/0s
16:23:24 PROGRESS []  40% ( 41/103) 0s/0s
16:23:24 PROGRESS []  41% ( 42/103) 0s/0s
16:23:24 PROGRESS []  42% ( 43/103) 0s/0s
16:23:24 PROGRESS []  43% ( 44/103) 0s/0s
16:23:24 PROGRESS []  44% ( 45/103) 0s/0s
16:23:24 PROGRESS []  45% ( 46/103) 0s/0s
16:23:24 PROGRESS []  46% ( 47/103) 0s/0s
16:23:24 PROGRESS []  47% ( 48/103) 0s/0s
16:23:24 PROGRESS []  48% ( 49/103) 0s/0s
16:23:24 PROGRESS []  49% ( 50/103) 0s/0s
16:23:24 PROGRESS []  50% ( 51/103) 0s/0s
16:23:24 PROGRESS []  50% ( 52/103) 0s/0s
16:23:24 PROGRESS []  51% ( 53/103) 0s/0s
16:23:24 PROGRESS []  52% ( 54/103) 0s/0s
16:23:24 PROGRESS []  53% ( 55/103) 0s/0s
16:23:24 PROGRESS []  54% ( 56/103) 0s/0s
16:23:24 PROGRESS []  55% ( 57/103) 0s/0s
16:23:24 PROGRESS []  56% ( 58/103) 0s/0s
16:23:24 PROGRESS []  57% ( 59/103) 0s/0s
16:23:24 PROGRESS []  58% ( 60/103) 0s/0s
16:23:24 PROGRESS []  59% ( 61/103) 0s/0s
16:23:24 PROGRESS []  60% ( 62/103) 0s/0s
16:23:24 PROGRESS []  61% ( 63/103) 0s/0s
16:23:24 PROGRESS []  62% ( 64/103) 0s/0s
16:23:24 PROGRESS []  63% ( 65/103) 0s/0s
16:23:24 PROGRESS []  64% ( 66/103) 0s/0s
16:23:24 PROGRESS []  65% ( 67/103) 0s/0s
16:23:24 PROGRESS []  66% ( 68/103) 0s/0s
16:23:24 PROGRESS []  67% ( 69/103) 0s/0s
16:23:24 PROGRESS []  68% ( 70/103) 0s/0s
16:23:24 PROGRESS []  69% ( 71/103) 0s/0s
16:23:24 PROGRESS []  70% ( 72/103) 0s/0s
16:23:24 PROGRESS []  71% ( 73/103) 0s/0s
16:23:24 PROGRESS []  72% ( 74/103) 0s/0s
16:23:24 PROGRESS []  73% ( 75/103) 0s/0s
16:23:24 PROGRESS []  74% ( 76/103) 0s/0s
16:23:24 PROGRESS []  75% ( 77/103) 0s/0s
16:23:24 PROGRESS []  76% ( 78/103) 0s/0s
16:23:24 PROGRESS []  77% ( 79/103) 0s/0s
16:23:24 PROGRESS []  78% ( 80/103) 0s/0s
16:23:24 PROGRESS []  79% ( 81/103) 0s/0s
16:23:24 PROGRESS []  80% ( 82/103) 0s/0s
16:23:24 PROGRESS []  81% ( 83/103) 0s/0s
16:23:24 PROGRESS []  82% ( 84/103) 0s/0s
16:23:24 PROGRESS []  83% ( 85/103) 0s/0s
16:23:24 PROGRESS []  83% ( 86/103) 0s/0s
16:23:24 PROGRESS []  84% ( 87/103) 0s/0s
16:23:24 PROGRESS []  85% ( 88/103) 0s/0s
16:23:24 PROGRESS []  86% ( 89/103) 0s/0s
16:23:24 PROGRESS []  87% ( 90/103) 0s/0s
16:23:24 PROGRESS []  88% ( 91/103) 0s/0s
16:23:24 PROGRESS []  89% ( 92/103) 0s/0s
16:23:24 PROGRESS []  90% ( 93/103) 0s/0s
16:23:24 PROGRESS []  91% ( 94/103) 0s/0s
16:23:24 PROGRESS []  92% ( 95/103) 0s/0s
16:23:24 PROGRESS []  93% ( 96/103) 0s/0s
16:23:24 PROGRESS []  94% ( 97/103) 0s/0s
16:23:24 PROGRESS []  95% ( 98/103) 0s/0s
16:23:24 PROGRESS []  96% ( 99/103) 0s/0s
16:23:24 PROGRESS []  97% (100/103) 0s/0s
16:23:24 PROGRESS []  98% (101/103) 0s/0s
16:23:24 PROGRESS []  99% (102/103) 0s/0s
16:23:24 PROGRESS [] 100% (103/103) 0s/0s
16:23:24 SUCCESS  Loaded 103 images after 0s
16:23:24 INFO       cam_aabb=[min=[0.676668,0.907796,0.517987], max=[1.76123,1.74558,0.722369]]
16:23:24 INFO     Loading network snapshot from: ..\data\nerf\test_3\test_3.ingp
16:23:25 INFO     GridEncoding:  Nmin=16 b=2.97199 F=4 T=2^19 L=8
16:23:25 INFO     Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
16:23:25 INFO     Color model:   3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
16:23:25 INFO       total_encoding_params=13041664 total_network_params=10240
Screenshot transforms from  ../data/nerf/test_3/output/left_transforms/test_3_transform_left.json
range(0, 103)
Traceback (most recent call last):
  File "D:\instant-ngp\scripts\run.py", line 396, in <module>
    cam_matrix = f.get("transform_matrix", f["transform_matrix_start"])
                                           ~^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'transform_matrix_start'

How you solved the problem? Could youl please give some advice?

about the process of data production

Thank you for your great work
In the process of data production, how to use colmap? Does colmap use the original code? Is every frame of the video used?

The generated stereo matching image is not ideal

Hello, I used the following steps to render some datasets, but the polarity of the rendered dataset is not on a horizontal straight line. Can you help me see what the problem is?

Use colmap to estimate the pose of the collected images, and use LLFF to convert the pose to poses_bounds
Use mips360 for rendering to obtain the scene weights
Using get in the warehouse_ Translated_ Matrix method to obtain left and right target poses_bounds
Use step 2 to obtain the weight file and combine it with left_ poses_bounds, right_ poses_bounds for rendering output AO depth and imgs

May I ask if there are any issues with the above steps or if any necessary steps are missing, resulting in the polarity not being horizontal.

How to calculate AO?

Thanks for your great work, I would like to know how to calculate AO, can you provide the formula please? The formula provided in your paper is wrong. The confidence map obtained by the formula provided in the paper is almost all 1.

How to export shape/AO/depth image from trained instant-ngp

Thank you for this great work. However, I encountered a problem while building my own dataset. I can use your code to generate a new transforms.json file and train Instant NGP successfully. But I'm wondering how to export RGB and AO/depth images from the trained Instant NGP. Do I need to use transforms.json and transforms_left.json separately to generate different training results for nerf? By the way, I am using the GUI on Windows.

About custom dataset preparation

Thanks for the excellent work and contribution!
I have a little question about preparing my own dataset for stereo training.
As you mentioned in the Supplementary Material, "As a pre-processing step, we adjust the rendered disparity maps generated by Instant-NGP by fitting a scale-shift pair of values for each triplet", could you please provide the code/script for the disparity compensation optimization operation? Looking forward for the reply! Thanks again!

Python problems

I am trying to run the demo.py. I have follwed the instructions but I get lots of errors. Like this:
File "/home/ai/Documents/nerfstereo/demo.py", line 15
from models.raft-stereo import RAFTStereo
^
SyntaxError: invalid syntax

So I change name and code to raft_stereo.

then new error

File "/home/ai/Documents/nerfstereo/demo.py", line 114, in
main()
File "/home/ai/Documents/nerfstereo/demo.py", line 85, in main
model = load_pretrained_model(args)
File "/home/ai/Documents/nerfstereo/demo.py", line 21, in load_pretrained_model
model = RAFTStereo(args)
File "/home/ai/Documents/nerfstereo/models/raft_stereo.py", line 27, in init
context_dims = args.hidden_dims
AttributeError: 'Namespace' object has no attribute 'hidden_dims'

[Positive Feedback]

Thanks for the great work and help! I have roughly succeeded run through the whole pipeline of your paper with custome dataset, as below picpure shows.

Hi, Could you please provide me with the correct formula for calculating AO?

          Hi, Could you please provide me with the correct formula for calculating AO?

Originally posted by @Liyunfengabc in #24 (comment)

[About instant-ngp depth map generation]

Hi, i got a 4 channel image(H×W×4) like

when i set depth mode

testbed.render_mode = ngp.Depth

how to convert this image(shift=0.2) into a corresponding disparity map by the formular: disparity = baseline * focal / depth, as the shift(baseline) 's range may not be the same with the depth obtained above?

Is there any method to determine if the generated dataset is qualified?

I have generated some datasets using a similar method, but I am unsure how to determine if the generated datasets have any issues. To address this, I have used the trilateral photometric loss mentioned in the paper to evaluate the generated datasets. Is this approach reasonable? Or, when you generate datasets, do you have specific metrics to evaluate the generated datasets, or do you only consider their performance as datasets? I am concerned that after creating the complete dataset, I may find that the results are not satisfactory, indicating a problem in a previous step. How do you avoid this issue in the process of dataset creation?

Question about the number of training samples

Hello, I download the training set and get a total number of 79,584 triplets. However, as mentioned in Section 4.1 in your paper, the number should be 65,148. I wonder what is the reason for the difference in the numbers, thank you.
Here is the training file I used: train.txt

Can not download the dataset.

Thanks for your wonderful job! But I can not download the dataset from the link https://amsacta.unibo.it/id/eprint/7218/

Focal length of each scene.

Thanks for your great work.

Can you provide the focal length with a separate file?

Downloading the stereo dataset takes a long time, so downloading the RAW dataset will take the same long time, I just need the focus length to generate the corresponding depth.

This also will be helpful for others.

Thanks for your help!

Extracting stereo pairs from custom dataset.

Hi Authors!
Really great work!
I have a query regarding creation of the stereo pairs and disparity from custom dataset. I am using nerfstudio to create my nerfs. How can I now begin to extract the stereo pairs?

how calculate camera parameter ?

Some questions regarding training.

Thank you for your outstanding work. I would like to know if the mentioned loss functions in the paper are used in stereo matching networks like PSMNet. If I need to train on my own, do I simply need to replace the loss function in the backbone with the loss functions from this repository?

how get render gif/video from left.jpg+right.jpg

thank your NeRF-Supervised-Deep-Stereo !! i like it very much !!

but now, i have a question : i have left.jpg right.jpg
and use generate_stereo_pair_matrix.py generate transform_left.json from transform.json
how can i generate gif/video below :

Is the provided dataset already rectified?

I noticed in the supplementary material of the paper that the disparity obtained from nerf has some discrepancy compared to raft stereo and sgm. They corrected this error by training a scale and bias.
I trained the dataset you provided using the triplet loss mentioned in the paper. I tried different learning rates and epochs, but most of the results didn't show much improvement (I compared them with cre stereo).
I would like to know if the datasets provided by you are already rectified?
Thank you very much.

Question about 'eraser_transform' augmentation

Thank you very much for sharing your excellent work.
We are working on implementing code for training stereo networks. According to your paper, the augmentation procedure described in RAFT-Stereo is used for training. We notice there is an augmentation function named eraser_transform in RAFT-Stereo, which erases random regions in the right image.

    def eraser_transform(self, img1, img2):
        ht, wd = img1.shape[:2]
        if np.random.rand() < self.eraser_aug_prob:
            mean_color = np.mean(img2.reshape(-1, 3), axis=0)
            for _ in range(np.random.randint(1, 3)):
                x0 = np.random.randint(0, wd)
                y0 = np.random.randint(0, ht)
                dx = np.random.randint(50, 100)
                dy = np.random.randint(50, 100)
                img2[y0:y0+dy, x0:x0+dx, :] = mean_color

        return img1, img2

We are not sure if this function is confilict with the Triplet Photometric Loss in your paper, which backward-warps right/left image. So our quesion is:

Do you use eraser_transform when training RAFT-Stereo?
If used, is this function applied to all three images (img0, img1 and img2), or just some of them？

It will also be very helpful if you could share the full augmentation code, thank you.

depth rendering

I use ngp.Depth to render depth and ngp.AO to render ao, but I dont know whether this method can get absolute depth directly. do you know how to process the rendered depth to absolute depth? many thanks.

Regarding Depth Range Consistency in Different Scenes

Hi there,

Firstly, I want to express my appreciation for the excellent work you've been doing.

I've been following the discussions around the reconstruction scales in colmap. I've noticed that the reconstruction in Instant-NGP, along with the rendered depth, maybe involves an arbitrary scale, leading to potential variations in depth scales across different scenes. This becomes particularly pronounced when considering scenes of similar physical size but reconstructed with different scales.

My main query revolves around the selection of three virtual baselines (b = 0.5, 0.3, 0.1 units) for data generation across all scenes, as mentioned in your paper. Considering that scenes, such as A and B, may have distinct reconstruction scales in colmap, resulting in different depth ranges, I'm curious about the reasoning behind using the same baselines for all scenes. Given the potential disparity in depth range caused by different reconstruction scales, how does the uniform application of baselines account for this variability?

I appreciate your time and insights into this matter.

Thank you in advance!

Gpu ram size and processing speed needed?

I would like to test your wonderful paper with some scenes of my own and I wonder how much time and what gpu you trained on? You trained on .5 Mpix, did you try wirh higher resolutions also?

The formula for depth rendering

Thank you for your wonderful work!
How to understand Formula 7 in the paper?
Whether σ(i) in the formula should be changed to t(I). δi = t(i+1) - t(i).

t(i) represents the distance from the camera origin to the point.
Looking forward to your reply, thank you!

An unofficial implementation of NeRF-Supervised Deep Stereo

Hello, I've uploaded my code for training RAFT-Stereo with NeRF supervision: https://github.com/husheng12345/Unofficial-NeRF-Supervised-Deep-Stereo.
Despite my best efforts to replicate the experimental setup as delineated in the paper, there exists a discrepancy between the model obtained from my training scripts and the provided pretrained weights.

Model	KITTI-15 (>3px All)	Midd-T Full (>2px All)
Official pretrained weights	5.41	16.38
Trained with my scripts	6.06	22.36

Would you be able to offer some guidance on which of my training hyperparameters might not be appropriately set? Thank you.

The structure of the folders in Setup Instructions

Thank you for your contribution.
I have a question to ask you.
Can you give me the structure of the folders in Setup Instructions. This step is confusing and causes problems in operation.
Thank you again for your contribution and I look forward to your reply.

A potential bug related to using uint16 to store AO maps

Hello, I've found a potential bug regarding the storage method of AO maps.
The range of AO is [0,1], it's multiplied by 65536 and then saved as a uint16 PNG image. However, when AO equals exactly 1, 65536 exceeds the maximum value of uint16. This results in AO being incorrectly stored as 0.
Here's a visualization example; the white areas in the second image represent where AO=0.
(0005/Q/AO/IMG_20220818_180012.png)

Request tips for model training

Thank you for sharing of your nice work!

Inspired by your work, I finetuned the model to apply this to another domain. Unfortunately, fine-tuning failed. In order to check whether it is a domain problem, we fine-tuned the model on the provided NeRF-stereo triplet dataset, but failed as well. (Detection of texture rather than object boundary)

Because there is no code for the training, I used the same hyperparameter and augmentation procedures as RAFT-stereo, as written in your paper. Is there anything else to note for training? If you have any tips for training, please give me some advice.

Some questions about disparity maps.

I downloaded a portion of the dataset, specifically v1_part1, and I used the following code to read the disparity maps from it.
transform = transforms.ToTensor()

for file in depth_files:
if file.endswith('.png'):
depth_image = Image.open(os.path.join(depth_folder, file))
depth_image = transform(depth_image).squeeze().to(device)
print(depth_image.shape)
print(depth_image.max(), depth_image.min())

Some print results are:
torch.Size([522, 1160]) tensor(16301, dtype=torch.int32) tensor(1907, dtype=torch.int32)
torch.Size([522, 1160]) tensor(9634, dtype=torch.int32) tensor(2407, dtype=torch.int32) torch.Size([522, 1160])
tensor(11203, dtype=torch.int32) tensor(2654, dtype=torch.int32)
The disparity values stored in PNG files range from over 1000 to over 10000, and I am confused about such values. Aren't the values in PNG supposed to be within the range of 0-255 or 0-1? Also, are these values representing disparity? Why are they so large?

Question about test results on KITTI and Middlebury

Hello, I test the pretrained RAFT-Stereo model using test.py, here are the results I get:

KITTI-15 All
EPE: 1.4704
bad 1.0: 26.43%
bad 2.0: 9.43%
bad 3.0: 5.56%

Midd-T F All
EPE: 9.1773
bad 1.0: 26.56%
bad 2.0: 18.44%
bad 3.0: 15.57%

I notice these results are slightly different from results reported in Table 6 in your paper. I wonder if there is something wrong with my code. I can upload the full test code I used if necessary. Thank you.