Code Monkey home page Code Monkey logo

newcrfs's Introduction

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

This is the official PyTorch implementation code for NeWCRFs. For technical details, please refer to:

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan
CVPR 2022
[Project Page] | [Paper]

  

Output1

Bibtex

If you find this code useful in your research, please cite:

@inproceedings{yuan2022newcrfs,
  title={NeWCRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation},
  author={Yuan, Weihao and Gu, Xiaodong and Dai, Zuozhuo and Zhu, Siyu and Tan, Ping},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={},
  year={2022}
}

Contents

  1. Installation
  2. Datasets
  3. Training
  4. Evaluation
  5. Models
  6. Demo

Installation

conda create -n newcrfs python=3.8
conda activate newcrfs
conda install pytorch=1.10.0 torchvision cudatoolkit=11.1
pip install matplotlib, tqdm, tensorboardX, timm, mmcv

Datasets

You can prepare the datasets KITTI and NYUv2 according to here, and then modify the data path in the config files to your dataset locations.

Or you can download the NYUv2 data from here and download the KITTI data from here.

Training

First download the pretrained encoder backbone from here, and then modify the pretrain path in the config files.

Training the NYUv2 model:

python newcrfs/train.py configs/arguments_train_nyu.txt

Training the KITTI model:

python newcrfs/train.py configs/arguments_train_kittieigen.txt

Evaluation

Evaluate the NYUv2 model:

python newcrfs/eval.py configs/arguments_eval_nyu.txt

Evaluate the KITTI model:

python newcrfs/eval.py configs/arguments_eval_kittieigen.txt

Models

Model Abs.Rel. Sqr.Rel RMSE RMSElog a1 a2 a3 SILog
NYUv2 0.0952 0.0443 0.3310 0.1185 0.923 0.992 0.998 9.1023
KITTI_Eigen 0.0520 0.1482 2.0716 0.0780 0.975 0.997 0.999 6.9859

Demo

Test images with the indoor model:

python newcrfs/test.py --data_path datasets/test_data --dataset nyu --filenames_file data_splits/test_list.txt --checkpoint_path model_nyu.ckpt --max_depth 10 --save_viz

Play with the live demo from a video or your webcam:

python newcrfs/demo.py --dataset nyu --checkpoint_path model_zoo/model_nyu.ckpt --max_depth 10 --video video.mp4

Output1

Demo video1

Demo video2

Demo video3

Acknowledgements

Thanks to Jin Han Lee for opening source of the excellent work BTS. Thanks to Microsoft Research Asia for opening source of the excellent work Swin Transformer.

newcrfs's People

Contributors

alibaba-oss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

newcrfs's Issues

A question about demo video

hi! i watched your kitti's demo video in youtube. what does the right window show? 3d display from rgbd? i have converted rgbd to point cloud, but the display effect is quete different from yours.

about test based on kitti

Hello, I would like to ask you about what is the command when using KITTI dataset for testing. And what is the folder structure of the KITTI dataset?

Model weights on panoramic image training

Hi!

Congratulations on your work!

Will you release the model weights for the panoramic images/training code for it?

Besides, in the paper you mention that you also tried to train the model with 50k images and then fine tune it with the Matterport 3D dataset. Did you obtain those 50k real world images using the Matterport camera?

Thanks in advance!

How to train on my own data

Hi, @weihaosky

Excellent work!

And I'd like to train on my own data for my specific scenario, would you please guide how to organize the data?
I saw the config file for training nyu as below, I understood "rgb" is the color image, "depth" should be the depth file, but what's the difference between the two "depth" files, sync_depth_xxx.png vs sync_depth_dense_xxx.png?
(bedroom_0130/rgb_00014.jpg bedroom_0130/sync_depth_00014.png bedroom_0130/dense/sync_depth_dense_00014.png)

Thank you so much!

Synchronized raw NYUv2 data issue

Hi Weihao,

Thank you so much for your excellent work. I have a question about NYUv2 training dataset you provided in #3 . Does them include extrinct parameter for each image? If not, do you know how to generate/find extrinct parameter for this training dataset?

Thank you so much for your help!

About the learning rate

Hi, it seems that the learing rate mentioned in your paper is a constant, but in your training code, the lr :
image
can you explain that ? Looking forward to your reply.

About Computation resources

Hi,
It seems you use multiple GPUs for the training.
Can you answer how many GPUs("Nvidia GTX 2080 Ti GPUs." mentioned in your paper) did you use for training?

Thanks.

关于dence

dence目录下也是深度图,那它为什么跟前面的深度图不一样呀?它的作用是啥?

What's the point of post-processing?

        if post_process:
            image_flipped = flip_lr(image)
            pred_depth_flipped = model(image_flipped)
            pred_depth = post_process_depth(pred_depth, pred_depth_flipped)

I wonder what post-processing means。

eval_kitti

想问下kitti数据集是无法用eval?我跑出来的结果是这样的
075e9e7952c9c5ff24f55c3e1e299a9

How to finetune? and how to train with images of different dimensions?

Firstly, congratulations on your publication and thanks a lot for open sourcing the code base. great work !!!
Hi! I have a couple of questions:

  1. Can you please guide me on how to fine-tune a pretrained model on a similar dataset having a slightly different distribution? I can see the option for loading a pretrained backbone, but not for fine-tuning the model with pretrained weights.
  2. I get weird errors like the following when I try to train on images with different dimensions -- other than (640, 480) Do you have any suggestions on how to tackle this issue?

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/train.py", line 321, in main_worker
depth_est = model(image)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/NewCRFDepth.py", line 133, in forward
e2 = self.crf2(feats[2], e3)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/newcrf_layers.py", line 428, in forward
x_out, H, W, x, Wh, Ww = self.crf_layer(x, v, Wh, Ww)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/newcrf_layers.py", line 357, in forward
x = blk(x, v, attn_mask)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, **kwargs)
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/newcrf_layers.py", line 232, in forward
v_windows = window_partition(shifted_v, self.window_size) # nW
B, window_size, window_size, C
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/newcrf_layers.py", line 40, in window_partition
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
RuntimeError: shape '[2, 5, 7, 6, 7, 512]' is invalid for input of size 1541120

Any help would be greatly appreciated.
Thank you! Kalyani

验证过程中显存会不断增加

您好,我在使用您的代码训练时发现,每次验证,显存都会增加,最后导致显存out of memory中断了训练,请问下这个问题怎么解决呢?

max_depth的作用

您好 我在自己的数据集上运行了您的框架并在训练和测试的代码中都看到了max_depth这个参数 请问max_depth这个参数数值是需要精确的 还是粗略值

Data preparation

Thank you for your great work!
I followed the README in bts but the files are not correct:
FileNotFoundError: [Errno 2] No such file or directory: '/home/hyx/dataset/newcrfs/dataset/nyu_depth_v2/official_splits/train/dining_room_0031/rgb_00073.jpg'
because there is no number 00XX in folders' name and not many images in folders.
How can I download the correct NYU dataset to meet the requirements of your code?
Thank you for your time!

Error when i test

hi sir
when i test my own 620*420 jpg,the flowing error occur,do you know how can i resolve the error?
image

预训练模型为啥这么大,1G

是不是单目深度估计的模型都很大,差不多都1-2G?为啥没有像其他视觉任务那样,模型那么小?单目深度估计的模型可以用比较小的模型吗?还是因为比较小的模型效果不是很好?

Saving weight problem in the process of saving training

In the code you provided, why is the weight saved in the training process according to each index? How to choose an optimal weight at the end of training? In addition, the weight file format is not ".ckpt". Is it necessary to convert by yourself?

KITTI Crops

Thanks for releasing the code of your work!

I've realized that when you evaluate on the KITTI dataset you first perform the kb_crop and then the garg_crop. I don't think that's the expected behavior.

When performing the evaluation, on the dataloader you do:

if self.args.do_kb_crop is True:
  height = image.shape[0]
  width = image.shape[1]
  top_margin = int(height - 352)
  left_margin = int((width - 1216) / 2)
  image = image[top_margin:top_margin + 352, left_margin:left_margin + 1216, :]
  if self.mode == 'online_eval' and has_valid_depth:
      depth_gt = depth_gt[top_margin:top_margin + 352, left_margin:left_margin + 1216, :]

So you are cropping both the image and the ground truth. Then, on eval.py, you:

if args.do_kb_crop:
    height, width = gt_depth.shape
    top_margin = int(height - 352)
    left_margin = int((width - 1216) / 2)
    pred_depth_uncropped = np.zeros((height, width), dtype=np.float32)
    pred_depth_uncropped[top_margin:top_margin + 352, left_margin:left_margin + 1216] = pred_depth
    pred_depth = pred_depth_uncropped

But this actually does nothing, as the ground truth depth has already been cropped.

Your evaluation code is based on the work of BTS, but in their code they do not kb_crop the ground truth when evaluating. They crop the input image with both kb_crop and garg_crop, but the ground truth only with garg_crop. That means that, as I understood the code, your evaluation code and the BTS perform a different cropping on the ground truth.

Evaluating on the ground truth only with the garg_crop (and inputs with both crops as in BTS) worsens the results.

Am I missing something?

thanks a lot!

About GPU Memory requirements for training

Hello, I configured the environment with reference to the information in the readme.md file. But it keeps getting the error "CUDA out of memory"

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 4.92 GiB already allocated; 0 bytes free; 5.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

RuntimeError: HIP out of memory. Tried to allocate 54.00 MiB (GPU 0; 15.98 GiB total capacity; 14.39 GiB already allocated; 17179869183.78 GiB free; 14.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

FPS rate

Thank you for your paper and implementation!
Could you share information about inferens speed, what FPS with your hardware setup?

AttributeError: module 'glm' has no attribute 'vec3'

AttributeError: module 'glm' has no attribute 'vec3'
AttributeError: module 'glm' has no attribute 'mat4'
i run demo but i cant continue with frame , error is glm has no attribute mat4 and vec3 ;
i test all version by glm ,my os is win11 python 3.9.12 ;
hope u can help me,thank u

Input resolution seems to have a big impact on performance

Hi, I find that the input resolution seems to have a very big impact on the performance.

When I keep the input shape unchanged, e.g. 640 x 480, everything seems fine, I can reproduce the metrics claimed in the paper on NYU test set.
However, when I change the input resolution, e.g. resize every image to 384 x 288, the RMSE and other metric show a notable drop。 The RMSE drops from 0.33 to 0.95. Why does this happen? Requiring the same resolution as training in inference stage
seems unreasonable.

Thanks

Please provide complete and correct requirements

Hello,

I am trying to reproduce the results of your paper and I wanted first to fiddle with demo.py.
However, I am stuck at the requirements. First of all, the commands you provide do not install pytorch and torchvision in their GPU versions and seem to cause a segfault later on in the process when importing torchvision.
Moreover, demo.py requires scipy, pyside2, opengl, and scikit-image, which are not listed in the requirements.

If I manage to fix my error, I can open a PR with the necessary changes, but I would be extremely grateful if you provided a complete and correct requirements.txt anyway 😄

test on kitti

RuntimeError: shape '[1, 7, 7, 23, 7, 256]' is invalid for input of size 2060800

When I test on the kitti dataset, I can't get the output depth map, and the run reports an error as above.Thank you!!!

about backbone

Hi, Thanks for your great job.
I am trying to run this code on my datasets. But I don't understand what should I do one the backbone link. Could you give me some lead?

pretrained swin-L model and loaded state dict do not match exactly

Here is the output:

Load encoder backbone from: model_zoo/swin_transformer/swin_large_patch4_window7_224_22k.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: norm.weight, norm.bias, head.weight, head.bias, layers.0.blocks.1.attn_mask, layers.1.blocks.1.attn_mask, layers.2.blocks.1.attn_mask, layers.2.blocks.3.attn_mask, layers.2.blocks.5.attn_mask, layers.2.blocks.7.attn_mask, layers.2.blocks.9.attn_mask, layers.2.blocks.11.attn_mask, layers.2.blocks.13.attn_mask, layers.2.blocks.15.attn_mask, layers.2.blocks.17.attn_mask

missing keys in source state_dict: norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

It seems that the pretrained swin-L model and the source state dict do not match.

UnicodeDecodeError because the code and txt files are not utf-8

Hey! Thank you for your great work!
I found that these files are not utf-8. So there will be
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 8: invalid continuation byte
when we train or eval.
Could you please update so that other researchers can follow the README without any error? Thank you for your time!

In what units is the result returned?

Sorry for the stupid question, but I can't figure out what units the result is in.

    with torch.no_grad():
        for _, sample in enumerate(tqdm(dataloader.data)):
            image = Variable(sample['image'].cuda())
            # Predict
            depth_est = model(image)

[[3.6550407 3.6550407 3.461664 ... 3.2897868 3.4753633 3.4753633],...

I think it's the distance meters for each pixel from the camera, but I'm not sure.

Eval result

那个我跟着装了这边的库,跟着BTS那边首页的readme下载了NYU和KITTI,KITTI好像只有GT所以没办法用你这边的eval,然而我NYU直接跑那个eval出来的结果和你这边readme给的数据以及论文数据差距很大,我跑"python newcrfs/eval.py configs/arguments_eval_nyu.txt"的结果大概如下:
image
想问一下是什么原因,我检查了一下数据集应该是匹配的

Well, I downloaded NYU and Kitti follow the readme on the front page of BTS. It seems that Kitti only has GT, so I can't use your eval code to test. However, the results of NYU dataset running the eval code differ greatly from the data given by your readme and the paper data. My result when running "python newcrfs/eval.py configs/arguments_eval_nyu.txt" is like following:
image
I wonder why cause I checked that the data set is matched.

Test on custom images?

Could you provide an interface for testing on custom images? your test.py seems be prepared for nyu and kitti.

inv_depth variable

What does inv_depth variable present in the constructor of NeWCRFs indicate and where is it being used?

关于eval时使用的model

在每跑完1000张图片之后,会有一次保存检查点文件的操作,并且会保存最佳的评估指标。但是为什么会保存成六个呢?
如果是为了保存评估指标,那应该是保存成日志文件更方便一些啊。
如果是为了保存模型文件,为什么不保存成一个呢,我没办法同时使用六个文件进行eval,只能使用其中一个

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.