the-learning-and-vision-atelier-lava / pam Goto Github PK
View Code? Open in Web Editor NEW[TPAMI 2020] Parallax Attention for Unsupervised Stereo Correspondence Learning
[TPAMI 2020] Parallax Attention for Unsupervised Stereo Correspondence Learning
作者您好,感谢您的工作和开源的代码,我受益匪浅。
我有一个疑问,希望请教您一下:
根据train.py中的代码,
mask_left = ((disp_left > 0) & (disp_left < 192)).float() mask_right = ((disp_right > 0) & (disp_right < 192)).float()
以及
loss_PAM_P = loss_pam_photometric(img_left, img_right, att, valid_mask, [mask_left, mask_right])
disp_left以及disp_right的信息被用在了训练过程中,这是否和无监督训练有所冲突呢?
作者您好!非常感谢您的杰出工作!看您test.py里面没有单对图像的测试接口, 能否抽空提供?另外,PAM可以输出任意分辨率的图对视差图吗?
您好,非常感谢您的工作。我想问一下这个模型的泛化性能怎样呢?对实际情况能够给出很好的结果吗?如果想要用自己的真实数据进行训练,还需要打上label吗?
已经安装好CUDA等,且查询到默认运行处理器为GPU,该怎么解决呢
lr = cfg.lr * (cfg.gamma ** -(epoch // cfg.n_steps))
cfg.gamma = 0.1, so the lr might become larger during training?
Hi,thank you very much for your great work!
But there may be a small mistake in your metric code.
Refer to Kitti benchmark for the definition of ‘D1’ is percentage of stereo disparity outliers in first frame.
The outliers are defined as the pixels whose disparity errors are larger thanmax(3px; 0.05dtruth), where dtruth denotes the ground-truth disparity.
And I guess your ‘D3’ refers to ‘bad3’,Which is Percentage of “bad”pixels whose error is greater than 3px.
If I am wrong , please tell me.
@LongguangWang
Mr Wang,
I am very interested in your work and I want to refer to your code to do some unsupervised stereo matching works for your excellent disparity results. So I want to know when will the code be released? Thank you very much!
作者你好,我想请问代码可以在没有cuda的情况下运行吗
parser.add_argument('--datapath', default='D:/LongguangWang/Data/SceneFlow', help='data path') 感谢
Thank you for your excellent work! Do you provide performance comparisons with some other unsupervised(or self-supervised) depth estimation like MonoDepthV2?
作者您好,不好意思在百忙之中打扰您!现在在做一个工作,希望借鉴您的工作,但是有两个问题想请教您:
1. 如果 left image 和 right image 的偏移方向不是水平方向的,例如人在歪着头观察物体的情况下产生的 left image 和 right image,本算法也可以适用嘛。 因为根据我的理解,epipolar line是水平的,本算法对Q和K的每一行矩阵相乘,从而保证仅在epipolar line上做attention。
2. 如果 left image 和 right image 的 disparity 较小,例如偏移的像素块仅仅为2,本算法适用嘛,谢谢!
what 's the difference between set max_disp to 192 and set to 0?
作者您好,很感谢你的工作和代码分享,我想请教一下,论文是无监督立体匹配,为什么在代码的损失计算中,mask_left,mask_right是用真实视差值计算出来的,用在了损失的计算中,非常不解,谢谢您的回复
作者你好,我想请问可以在kitti数据集上训练么?因为我看到train.py里是只写了train_set = SceneFlowDatset(),但是您也提供了KITTIDataset(),所以我想请问可以把SceneFlowDatset换成KITTIDataset来训练么?
Mr Wang,
I'm very interested in your work.But I have some problems in training with sceneflow dataset.
In the process of debugging with multiple GPUs, when debugging with two GPUs,
if the batchsize is set to 1, "TypeError: forward() missing 2 required positional arguments: 'x_left' and 'x_right'".
if the batchsize is set to 2, "RuntimeError: CUDA error: out of memory".
How to solve it?
@LongguangWang
Mr Wang,
File "/home/PAM-master/PASMnet/models/modules.py", line 130, in forward
cost_right2left = torch.tril(cost_right2left)
RuntimeError: invalid argument 1: expected a matrix at /opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/THC/generic/THCTensorMathPairwise.cu:174
I'm very interested in your work, and I'd like to ask you what is the reason and how to solve this problem in the process of training with sceneflow dataset.
Thank you for your wonderful work and codes.
Could you help me understand the code below?
https://github.com/LongguangWang/PAM/blob/7200dd201047ceb2d98eeea8215b773d11fff8d4/PASMnet/models/modules.py#L199
Why not directly use the sum of all disparity candidates weighted by the parallax-attention map?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.