gallenszl / cfnet Goto Github PK
View Code? Open in Web Editor NEWCFNet: Cascade and Fused Cost Volume for Robust Stereo Matching(CVPR2021)
License: MIT License
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching(CVPR2021)
License: MIT License
Hi,
Thank you for sharing this interesting work.
I am just wondering if you have cross-domain generalization results of CFNet trained without the asymmetrical chromatic augmentation and asymmetrical occlusion?
Thank you :)
In your paper,you mentioned that "we switch the activation function to Mish and prolong the pre-training process in the SceneFlow dataset for another 15 epochs".So,should the learning rate change in another 15 epochs?
Hello, thanks for the good work.
Just about the Cross-domain generalization evaluation of PSMNet in Table 3.
In the Table 3, the KITTI2015 D1_all of PSMNet trained on Scene Flow datatest is 16.3, while we got the 28.7, which is far from that reported in your paper. And the pre-trained model from github performances 28.
Wondering the reason about it.
Thanks.
hello, thanks for your nice job. I test the finetuning_model on some middlebury images, however, in some cases the performance is not satisfying. do you know the reason.
below is the code I used for testing.
from future import print_function, division
import argparse
import os
import glob
from PIL import Image
from matplotlib import pyplot as plt
import cv2
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable
import torchvision.utils as vutils
import torch.nn.functional as F
import numpy as np
import time
from datasets import datasets
from models import models
from utils import *
import PIL.Image
from torch.utils.data import DataLoader
from datasets import listfiles as ls
from datasets import MiddleburyLoader as DA
import sys
import gc
import skimage
cudnn.benchmark = False
parser = argparse.ArgumentParser(description='Cascade and Fused Cost Volume for Robust Stereo Matching(CFNet)')
parser.add_argument('--model', default='cfnet', help='select a model structure', choices=models.keys())
parser.add_argument('--maxdisp', type=int, default=256, help='maximum disparity')
parser.add_argument('--dataset', default='kitti', help='dataset name', choices=datasets.keys())
parser.add_argument('--loadckpt', default='/home/jucic/my_code/CFNet/finetuning_model', help='load the weights from a specific checkpoint')
args = parser.parse_args()
model = modelsargs.model
model = nn.DataParallel(model)
model.cuda()
model.eval()
print("loading model {}".format(args.loadckpt))
state_dict = torch.load(args.loadckpt)
model.load_state_dict(state_dict['model'])
def save_pfm(file, image, scale = 1):
color = None
if image.dtype.name != 'float32':
raise Exception('Image dtype must be float32.')
if len(image.shape) == 3 and image.shape[2] == 3: # color image
color = True
elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale
color = False
else:
raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')
file.write('PF\n' if color else 'Pf\n')
file.write('%d %d\n' % (image.shape[1], image.shape[0]))
endian = image.dtype.byteorder
if endian == '<' or endian == '=' and sys.byteorder == 'little':
scale = -scale
file.write('%f\n' % scale)
image.tofile(file)
def test():
data_path = '/home/jucic/my_code/RAFT_Opti/topdownshelfframe'
with torch.no_grad():
left_images = glob.glob(os.path.join(data_path,'left/*.png')) + \
glob.glob(os.path.join(data_path,'left/*.jpg'))
right_images = glob.glob(os.path.join(data_path,'right/*.png')) + \
glob.glob(os.path.join(data_path,'right/*.jpg'))
left_images = sorted(left_images)
right_images = sorted(right_images)
count = 1
for imfile1, imfile2 in zip(left_images, right_images):
image1 = np.array(Image.open(imfile1).convert('RGB'))
image2 = np.array(Image.open(imfile2).convert('RGB'))
height = image1.shape[0]/2
width = image1.shape[1]/2
height = int(height+(((height // 32) + 1) * 32 - height) % 32)
width = int(width+(((width // 32) + 1) * 32 - width) % 32)
image1 = cv2.resize(image1, (width, height))
image2 = cv2.resize(image2, (width, height))
image1 = image1/255.0
image2 = image2/255.0
image1 = torch.from_numpy(image1).permute(2,0,1)[None].float()
image2 = torch.from_numpy(image2).permute(2,0,1)[None].float()
print(image1.shape)
begin = time.time()
disp_ests, pred3_s3, pred_s4 = model(image1.cuda(), image2.cuda())
print("{}ms elapsed by cfnet".format((time.time()-begin)*1000))
result_folder = os.path.join('/home/jucic/my_code/CFNet', "result_topdownstereo")
if not os.path.isdir(result_folder):
os.mkdir(result_folder)
plt.imsave("{}/{}.png".format(result_folder,str(count).zfill(7)),(disp_ests[-1].cpu().numpy().squeeze()))
count += 1
if name == 'main':
test()
Your paper is great and efficient. I want to make further improvements on your basis.But I have a doubt, in your paper, HITNet's inference time is 0.015s. But i didn't find official code of HITNet,how can I test its inference time on my GPUs. Can you give some guidance?Thanks!
作者可以共享一下Middlebury数据集的下载链接吗?2014版本的好像是31个pair,但文中作者提到的是28个pair,求作者解答,感谢感谢。
Hi,
Thank you for the fantastic work. I am just wondering if the results reported for Middlebury in Table 3 cover the non-occluded or the occluded regions?
Thank you.
I tried to train the cf-net model using the code from the github and just replace the Mish activation function to Relu for the first 20 epoches and then back to Mish for another 15 epochs just as the paper described. But the performance of the trained model is far from that by the pretrained model given in the gitlab. So what's wrong with my training ? is there any parameter that shoud be modified? I used ./scripts/sceneflow.sh
on two V100 GPUs
Hi, I am getting weird output images on warping the right image with disparity map obtained from pre-trained model. I learnt from the code that disparity map is with respect to left image, hence I tried warping the right image with the disparity map. Below is the warping code I used
def depth_read(filename):
# loads depth map D from png file and returns it as a numpy array
depth_png = np.array(Image.open(filename), dtype=np.int64)
# make sure we have a proper 16bit depth map here.. not 8bit!
#assert(np.max(depth_png) > 255)
depth = depth_png.astype(np.float) / 256.0
depth = depth / depth.shape[1]
#depth[depth_png == 0] = -1.
return depth
img = io.imread(<rightimg_filepath>) # right image
disp = depth_read(<disparity_filepath>) # disparity map with respect to left image
print(img.shape, disp.shape) # (375,1242,3), (375,1242)
img = torch.from_numpy(img.transpose(2,0,1)).float().unsqueeze(0) / 255.0 # img
disp = torch.from_numpy(disp).float().unsqueeze(0).unsqueeze(0) # disp
print(img.shape, disp.shape) # (1, 3, 375, 1242), (1, 1, 375, 1242)
def apply_disparity(img,disp): # gets a warped output
batch_size, _, height, width = img.size()
# Original coordinates of pixels
x_base = torch.linspace(0, 1, width).repeat(batch_size, height, 1).type_as(img)
y_base = torch.linspace(0, 1, height).repeat(batch_size, width, 1).transpose(1, 2).type_as(img)
# Apply shift in X direction
x_shifts = disp[:, 0, :, :] # Disparity is passed in NCHW format with 1 channel
flow_field = torch.stack((x_base + x_shifts, y_base), dim=3)
# In grid_sample coordinates are assumed to be between -1 and 1
output = F.grid_sample(img, 2*flow_field - 1, mode='bilinear', padding_mode='zeros',
align_corners=True)
return output
output = (apply_disparity(img, -disp)*255.0).detach()[0,:,:,:].cpu().numpy().transpose(1,2,0)
output.shape # (375, 1242, 3)
The disparity maps are obtained from both sceneflow_checkpoint and finetuned_model checkpoint. I warped the same image with these 2 disparity maps but I seem to get the same irregular output. I have used the above warping code many times and I don't think there is any problem with the code. I believe the problem is with disparity map itself. Can someone help me out regarding what could possibly have gone wrong.
Below is the input right image -
https://i.stack.imgur.com/aZia5.jpg
Below is the output warped right (also the estimated left) image I got -
https://i.stack.imgur.com/tHCGo.jpg
Hi.
I am thinking of applying your method to my own custom dataset.
So, I added the following code to save_disp.py
's main
with reference to datasets/sceneflow_dataset.py
.
# test one sample
# @make_nograd_func
# def test_sample(sample):
# model.eval()
# disp_ests, pred1_s3_up, pred2_s4 = model(sample['left'].cuda(), sample['right'].cuda())
# return disp_ests[-1]
@make_nograd_func
def test_sample(left, right):
model.eval()
disp_ests, pred1_s3_up, pred2_s4 = model(left.cuda(), right.cuda())
return disp_ests[-1]
if __name__ == '__main__':
left_img = Image.open("/media/A/left/0.png").convert("RGB")
right_img = Image.open("/media/A/right/0.png").convert("RGB")
w, h = left_img.size
crop_w, crop_h = 950, 512
left_img = left_img.crop((w-crop_w, h-crop_h, w, h))
right_img = right_img.crop((w-crop_w, h-crop_h, w, h))
processed = get_transform()
left_img = processed(left_img)
right_img = processed(right_img)
test_sample(left_img, right_img)
Then I get the following error.
Mish activation loaded...
Mish activation loaded...
Mish activation loaded...
Mish activation loaded...
Mish activation loaded...
File "/home/ubuntu/Apps/CFNet/models/cfnet.py", line 136, in forward
x = self.firstconv(x)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 3, 3, 3], but got 3-dimensional input of size [3, 512, 950] instead
Probably this is due to wrong input to the preprocessing network.
how can I generate a disparity image with a custom dataset?
Hi, it looks like your model has a mutiple level of output, may I ask which one should I use to make inference?
Thanks!
for cfnet.py, First error:
def generate_search_range(self, sample_count, input_min_disparity, input_max_disparity):
"""
Description: Generates the disparity search range.
Returns:
:min_disparity: Lower bound of disparity search range
:max_disparity: Upper bound of disaprity search range.
"""
min_disparity = torch.clamp(input_min_disparity - torch.clamp((
sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp)
max_disparity = torch.clamp(input_max_disparity + torch.clamp(
sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp)
return min_disparity, max_disparity
it should be "min_disparity = torch.clamp(input_min_disparity - torch.clamp((
sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp//4-1)
max_disparity = torch.clamp(input_max_disparity + torch.clamp(
sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp//4-1)"
or "min_disparity = torch.clamp(input_min_disparity - torch.clamp((
sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp//2-1)
max_disparity = torch.clamp(input_max_disparity + torch.clamp(
sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp//2-1)"
Second error: in line 643 of cfnet.py, it should be "predmid_s2 = F.upsample(predmid_s2 * 2, [left.size()[2], left.size()[3]], mode='bilinear', align_corners=True)", not "predmid_s2 = F.upsample(predmid_s2 * 4, [left.size()[2], left.size()[3]], mode='bilinear', align_corners=True)"
during training or robust, there is problem like this:
RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
anybody knows how to solve it?
When will you release the code? Thank you for your excellent work.
Can this model be trained in unsupervised or supervised manner?
Thank you
@gallenszl hi, nice work! So when will you release the code?
Can you provide the model trained by kitti2015? I could not achieve the results in the paper by reproduction
Why the value of gamma_s3, gamma_s2, beta_s3, beta_s2 are all zeros in your provided model weights?
If they are all zeros, meaning that they are not functional, right?
Hello,we test your model on scene flow dataset,but the EPE is only 0.97. Is there something wrong with my use of the code or is the EPE is just 0.97. Thank you very much.
Hi,
Thank you for your work, I find it really useful and I am trying to embed it for a test in a real-time environment.
In order to do that, I want to export the model to TensorFlowLite, so that I could do small changes (like quantization) which is more efficient than doing them with PyTorch.
To export the model to TFlite, I first exported it to ONNX and now I'm trying to export it from ONNX to TF with onnx-tf library.
I am using opset_version=11, the lowest version compatible with all the PyTorch operations in CFNet.
However I faced many problems in my journey, first I had a dimension problem with the conversion to ONNX so I decided to use a fixed input size for the images (wh = 512768). I tested the results with the Middlebury SDK (I have done a resize on the input images, run my ONNX model and then resized the disparity maps I get) and these results are quite good.
Then to export to TF, I first had an issue with an unsupported operation :
RuntimeError: Resize coordinate_transformation_mode=pytorch_half_pixel is not supported in Tensorflow.
I tried to add "align_corners=True" inside the upsample functions in the model code, and it solved the problem
Right now I am facing an other issue but I didn't find any way to solve it, here are the logs :
Traceback (most recent call last):
File "/path/scripts/../export_TF.py", line 17, in <module>
tf_rep.export_graph("%s/%s.pb" % (args.outdir,input_model_name))
File "/path/onnx-tensorflow/onnx_tf/backend_rep.py", line 143, in export_graph
signatures=self.tf_module.__call__.get_concrete_function(
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 1264, in get_concrete_function
concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 1244, in _get_concrete_function_garbage_collected
self._initialize(args, kwargs, add_initializers_to=initializers)
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 785, in _initialize
self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2983, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3130, in _create_graph_function func_graph_module.func_graph_from_py_func(
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3831, in bound_method_wrapper
return wrapped_fn(*args, **kwargs)
File "/path/venv/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1147, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
File "/path/onnx-tensorflow/onnx_tf/backend_tf_module.py", line 99, in __call__ *
output_ops = self.backend._onnx_node_to_tensorflow_op(onnx_node,
File "/path/onnx-tensorflow/onnx_tf/backend.py", line 347, in _onnx_node_to_tensorflow_op *
return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
File "/path/onnx-tensorflow/onnx_tf/handlers/handler.py", line 58, in handle *
cls.args_check(node, **kwargs)
File "/path/onnx-tensorflow/onnx_tf/handlers/backend/resize.py", line 68, in args_check *
x_shape = x.get_shape().as_list()
ValueError: as_list() is not defined on an unknown TensorShape.
Using netron.app, I found that the node 10454 seemed to have a dimension problem (which corresponds to the upsample operation at the line 660 of cfnet.py), so I tried to hardcode all the dimensions with my input size :
pred1_s2 = F.upsample(pred1_s2 * 2, [512, 768], mode='bilinear', align_corners=True)
but it didn't resolve my problem at all, and I really don't have any idea on how to solve it.
My TF version is 2.8.0
Did you already tried (and succeeded) to export the model to TensorFlow, and if so how did you do it ?
If not, do you have any idea on how I could solve this problem ?
Thank you.
Hello, thank you very much for your excellent work. May I ask when you open source? We need to conduct ablation experiment on your work to verify our versatility in an excellent work like yours.
I want to evaluate the accuracy of the self-trained checkpoints on Kitti, running Robust_Test.py, but get the following error
ading model /home/rc/20220410StereoMatching/CFNet/checkpoints/sceneflow/pretrained/checkpoint_000009.ckpt start at epoch 0 downscale epochs: [300], downscale rate: 10.0 setting learning rate to 0.001 Traceback (most recent call last): File "robust_test.py", line 335, in <module> train() File "robust_test.py", line 163, in train loss, scalar_outputs, image_outputs = test_sample(sample, compute_metrics=do_summary) File "/home/rc/20220410StereoMatching/CFNet/utils/experiment.py", line 30, in wrapper ret = func(*f_args, **f_kwargs) File "robust_test.py", line 280, in test_sample imgL, imgR, disp_gt = sample['left'], sample['right'], sample['disparity'] KeyError: 'disparity'
Your paper is great and efficient. I want to make further improvements on your basis.But I have a doubt, in your paper, HITNet's inference time is 0.015s. But i didn't find official code of HITNet,
how can I test its inference time on my GPUs. Can you give some guidance?Thanks!
Sorry for bothering you. Your algorithm is very impressive and helpful. But I did not see clearly how to place the data set in "Data Preparation", can you make the download link more clear, or put the datasets you use in google_drive? Thanks a lot for your kind help!
请问文章中数据集的视差分布情况是通过代码统计的吗?
Hello, I saw CFNet inference time=0.18 on the kitti benchmark, but I tested the kitti dataset on a GTX1080ti, inference time=0.3, what is your test equipment?
How can we evaluate the code for just one pair of input image with one left image and one right image?
Hello, I heard someone say that there was a problem with this code before, I would like to ask if the two pretrained models on the web page are updated now?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.