gangweix / igev Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023] Iterative Geometry Encoding Volume for Stereo Matching and Multi-View Stereo
License: MIT License
[CVPR 2023] Iterative Geometry Encoding Volume for Stereo Matching and Multi-View Stereo
License: MIT License
请问middlebury数据集F分辨率训练和推理的时候,视差范围应该设置为多少?
@gangweiX 作者大大您好,我想请问一下,我应该如何将双目鱼眼的图片输入模型来预测深度呢?我如果直接采用鱼眼图片训练模型是否可行?还是说我需要做鱼眼模型和针孔模型的映射?
Hi
Many thanks for your work!
It seems that the max_disp parameter, which is set to 192 per default, can not be changed.
Change this value will cause a shape mismatch.
I figured out, this shape mismatch can be avoided by changing this line to:
gwc_volume = build_gwc_volume(match_left, match_right, self.args.max_disp//4, 8)
However igev stereo still is not able to match disparities > 192 pixels.
Do I neet to retrain the network with this change?
Best regards
Hi! Thank you for the great works!
I was just wondering whether the output of input data for training effect the outcome of the model!
If so, how high resolution would be good for the performance?
thank you!
Hi, thank you for your great work!
I have one question about results on ETH3D:
What is the difference between the model sceneflow.pth and eth3d.pth? Did you fine-tune on training set of ETH3D?
Dear Dr.Xu:
You did a fantastic job on this project! Thank you for sharing your work.
I would like to ask you a question regarding ETH3D. Your evaluation score (3.6) for the ETH3D training set is more modest than RAFT-Stereo's score (3.2), but your experiments outperform RAFT-Stereo in the ETH3D online benchmark (You rank 4th and RAFT ranks 62nd). Could you please teach me why you can perform better now?
Thank you very much!
Thank you for sharing the code for your great work!
I had this problem when I tried to run demo_imgs.py!
self.act1 = model.act1
AttributeError: 'EfficientNetFeatures' object has no attribute 'act1'
look forward to your reply.
作者您好,我尝试了一下将IGEV-Stereo里面的模型转换为onnx模型进行c++的调用,但没能转换成功;请问您有相关代码提供或者文档吗
你好博主,我想请教一下如果训练自己的数据集,在数据加载、数据格式转化、数据预处理方面需要做哪些修改,如能告知,万分感谢。
Hi, thanks for the great work!
There appears to be a small bug in Line 137 of ./core/igev_mvs.py
, which gives:
view_weight_sum += view_weight_sum + view_weight.unsqueeze(1)
I'm wondering whether this line of code should be written as:
view_weight_sum = view_weight_sum + view_weight.unsqueeze(1)
or
view_weight_sum += view_weight.unsqueeze(1)
作者您好,我想咨询您一下,在这两个数据集上,您论文中的指标直接是用的Sceneflow中的预训练模型进行检测的吗?而不是混合微调以后的结果!感谢您的指导
Thanks for the nice work. Is there a pre-trained model / demo for the MVS network?
前向时遇到这行报错,https://github.com/gangweiX/IGEV/blob/main/IGEV-Stereo/core/utils/utils.py#L68
assert torch.unique(ygrid).numel() == 1 and H == 1 # This is a stereo problem
AssertionError
触发了这个assert,我看作者标注了this is a stereo problem
,请问是什么问题呢?又要怎么解决呢
Want to run the network for gray scale images (single channel)
I get this error while running network on gray images
Traceback (most recent call last):
File "demo_imgs.py", line 100, in
demo(args)
File "demo_imgs.py", line 50, in demo
image1 = load_image(imfile1)
File "demo_imgs.py", line 29, in load_image
img = torch.from_numpy(img).permute(2, 0, 1).float()
RuntimeError: number of dims don't match in permute
I tried copying same gray values for all 3 channel but results are not very good.
I see eth3d is gray scale image dataset so I also tried with eth3d network shared.
But I still get above error.
Can you please share what change is needed to adapt network to gray images?
The DTU dataset downloaded from the public website has a different directory structure compared to the one in the project's dtu_yao.py file, especially the structure of the Cameras_1 directory. How should I adjust it?
Thanks for your amazing work.
evaluate_stereo.py on line 192:
epe = torch.sum((flow_pr - flow_gt)**2, dim=0).sqrt()
However, by definition,
epe = ((flow_pr - flow_gt)**2).sqrt()
In addition, another indicator D1 will also be affected by the change in the calculation method of EPE.
Hi,thank you very much for your great work.
I test the generalization performance of the checkpoint you provided on Middlebury and ETH3D, but the results are different from Table7.
Is it because the checkpoint provided is training without data augmentation ?
So i would like to know if i misunderstood or misused? Thanks.
The checkpoint for test is sceneflow.pth
The results on Middlebury(half) training sets is :
All:
'EPE': [7.572738313674927], 'D1': [0.27418503363927205], 'Thres1': [0.4556864986817042], 'Thres2': [0.3513654222091039], 'Thres3': [0.3013058652480443]
Noc:
avg_test_scalars_nonocc {'EPE': [6.537938332557678], 'D1': [0.2406569098432859], 'Thres1': [0.41864077051480614], 'Thres2': [0.3155737062295278], 'Thres3': [0.2668188696106275]}
The results on ETH3D training sets is :
All:
'EPE': [0.9156075097896434], 'D1': [0.06998444140157921], 'Thres1': [0.18718870177313132], 'Thres2': [0.10728459610362295], 'Thres3': [0.06998444140157921]
Noc:
'EPE': [0.9049188627137078], 'D1': [0.06913529045548919], 'Thres1': [0.1841263070150658], 'Thres2': [0.10554461269768783], 'Thres3': [0.06913529045548919]
Some qualitative results about the checkpoint:
BTW,I am also curious about how the results of RAFT-Stereo in Table 7 are obtained?
Please provide an online example.
Maybe on https://huggingface.co
Or https://replicate.com/
作者您好,我想问一下所提供的在kitti上的两个checkpoint,kitti2012和kitti2015的区别。
论文里面写到的是在12和15的混合训练集上进行fine-tune,那应该两个数据集共享一个checkpoint
希望您可以提供一些这两个checkpoint的训练细节,谢谢
我看论文中似乎并没有提到要在这两个数据集上微调,但是只在scene flow上训练达不到论文的效果,而且作者的预训练模型就有ETH3d和Middlebury的,所以我想应该要在这两个数据集上微调,具体微调多少代?
Using IGEV I need to finetune the model on my own dataset. What should be the steps I should follow in order to execute the fine-tuning because there is no fine-tuning file available?
我用
hi, i noticed this repo didn't have a licence. can it be used in commercial?
作者您好,我在训练kitti数据集的过程中,发现记载的验证集是FlyingThings的,但是我在evaluate_stereo.py里面已经将dataset改为kitti了,我找不到原因,麻烦您解答一下。
First, thank you for sharing this great research.
I have a question regarding MVS training.
When training MVS, despite having GT Depth, is there a reason to convert it to disparity level and lose it?
When I was training, I noticed that borrowing the RAFT structure, applying loss at depth level doesn't work, but applying loss at disparity level works.
Do you have any idea why this is the case?
Thanks again.
"Hello, author. I'm training the SceneFlow dataset with IGEV, and the EPE keeps increasing, reaching more than 4000. Could you please provide guidance on how to solve this issue?"
Thank you for sharing the code for your great work!
Can you provide precomputed results on KITTI 2015? (the results you submit to KITTI)
您好,感谢您开源的代码
对于网络结构,我想问一下,您是如何提高模型的泛化能力的,我使用了其他网络模型来预测我自己拍的图像,但是效果十分差,但是用了您的IGEV网络跑出来的效果还是比较可观的,想咨询一下您是如何做到这么强的泛化性的。谢谢
我用了下面两种,但是效果都不怎么好,部分地方颜色异常
plt.imsave(file_stem, disp, cmap='jet')
cv2.imwrite(file_stem, cv2.applyColorMap(cv2.convertScaleAbs(disp, alpha=0.01),cv2.COLORMAP_JET))
Hi,
Thank you for sharing this amazing work! I am wondering how did you generate Figure 2b (shown in screenshot below) in the paper? Did you first generate the all-pairs correlations using the following function? If so, how did you go from there to disparity map?
IGEV/IGEV-Stereo/core/geometry.py
Line 62 in ea4d55b
Thank you in advance for your help with my question :)
从预训练模型到微调的成本并不高,几个小时。但是在仿真数据集的训练一次,起码要耗时4天。
您在设计实验的时候有没有考虑一些降低实验成本的方法?
我在您的提问回答中看到,没有尝试直接在KITTI上进行训练。所以想了解下您是否尝试去降低训练成本,毕竟小一周才能看一次的话,实验很难推进下去。
也许是分割了数据集中的一小部分,作为一个基准,最后全部跑一次。如果是这样,恳请您能详细说下划分的方式和基准的参数。
也许是先依照不到200K的迭代次数,作为一个基准。如果是这样,也想请您详细述说。
这些都是我的一些猜测,因为我不太敢想想八卡服务器或者是好几台服务器混合做实验(贫穷限制了想象)。
你好,请问如何根据视差得到深度图呢,感谢!
Would you advise extending IGEV-Stereo for the special case of a trifocal tensor estabishing the geometric projection between three views, or would IGEV-MVS be a more natural choice? In the latter case, are three images enough for inference? In your paper I see the number of input images is N=5 for training.
To reduce the size of the model, I want to retrain the network with lower resolution of the disparity field.
I've already done this with the RAFT-Stereo network. However, when changing the corresponding parameter n_downsample
from 2 to 3, the update_block
throws a size mismatch error in this line.
It seems the cnet
changes the resolution correctly, while the feature network keeps the original resolutions.
Can you supply a bugfix for this or describe, how you would change the network to allow different downsampling values?
Many thanks in advance!
Nice work!
I know that the basic framework of IGEV-Stereo is based on RaftStereo.
May I ask which method or repo IGEV-MVS refers to?
@gangweiX 作者您好,我在进行demo图片测试的时候,报出以下错误:
当我对报错区域进行初始化后
又会出现如下的错误
请问应该如何解决
Hi!
Thank you for your amazing work!
I test the model's performance on the scene flow dataset, while in the original paper, the EPE performance should equal to 0.47. When I used the per-trained model for testing. I only found the EPE equal to 0.66. Is the releasing Sceneflow pret-trained model with data-aug for other dataset fine-tuning and generation tests?
While the model with an epe of 0.47 only uses for scene flow performance evaluation. If so, would you mind releasing the scenflow-only pertained weights? Since I want to compare with my model in the aspect of occluded regions. Since I cannot reproduce the performance of 0.47.
Thank you very much.
I usually switch the images (and turn 180°) and run the model a second time to get a stereo-match from right to left. This enables me to perform a consistency check between the two disparity maps. However, this is inefficient, since some modules perform the same calculations for both directions (e.g. the feature network). Where would be the best way to split the model into two paths, where either path performs one stereo-matching direction?
Dear Dr.Xu:
Thank you so much for answering my question.
Best!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.