aliyun / newcrfs Goto Github PK

License: MIT License

Python 100.00%

newcrfs's Introduction

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

This is the official PyTorch implementation code for NeWCRFs. For technical details, please refer to:

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan
CVPR 2022
[Project Page] | [Paper]

Bibtex

If you find this code useful in your research, please cite:

@inproceedings{yuan2022newcrfs,
  title={NeWCRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation},
  author={Yuan, Weihao and Gu, Xiaodong and Dai, Zuozhuo and Zhu, Siyu and Tan, Ping},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={},
  year={2022}
}

Installation
Datasets
Training
Evaluation
Models
Demo

Installation

conda create -n newcrfs python=3.8
conda activate newcrfs
conda install pytorch=1.10.0 torchvision cudatoolkit=11.1
pip install matplotlib, tqdm, tensorboardX, timm, mmcv

Datasets

You can prepare the datasets KITTI and NYUv2 according to here, and then modify the data path in the config files to your dataset locations.

Or you can download the NYUv2 data from here and download the KITTI data from here.

Training

First download the pretrained encoder backbone from here, and then modify the pretrain path in the config files.

Training the NYUv2 model:

python newcrfs/train.py configs/arguments_train_nyu.txt

Training the KITTI model:

python newcrfs/train.py configs/arguments_train_kittieigen.txt

Evaluation

Evaluate the NYUv2 model:

python newcrfs/eval.py configs/arguments_eval_nyu.txt

Evaluate the KITTI model:

python newcrfs/eval.py configs/arguments_eval_kittieigen.txt

Models

Model	Abs.Rel.	Sqr.Rel	RMSE	RMSElog	a1	a2	a3	SILog
NYUv2	0.0952	0.0443	0.3310	0.1185	0.923	0.992	0.998	9.1023
KITTI_Eigen	0.0520	0.1482	2.0716	0.0780	0.975	0.997	0.999	6.9859

Demo

Test images with the indoor model:

python newcrfs/test.py --data_path datasets/test_data --dataset nyu --filenames_file data_splits/test_list.txt --checkpoint_path model_nyu.ckpt --max_depth 10 --save_viz

Play with the live demo from a video or your webcam:

python newcrfs/demo.py --dataset nyu --checkpoint_path model_zoo/model_nyu.ckpt --max_depth 10 --video video.mp4

Demo video1

Demo video2

Demo video3

Acknowledgements

Thanks to Jin Han Lee for opening source of the excellent work BTS. Thanks to Microsoft Research Asia for opening source of the excellent work Swin Transformer.

newcrfs's People

Contributors

Stargazers

Watchers

Forkers

2dav8 xinfushe bingyuanw rancheng lyyyyh lihanxing shuweishao pilotier yijunwu 82magnolia tre3x 0smile jack870601 fanrz kalyani7195 jongrok-lee ashutosh1807 jiangyuewu kongan cv-ip cv-depth niuwenju satoshirobatofujimoto arctanbell g7b9 erezposner huynhdotanthanh lazylazypig abd-elr4hman bucky999 hrishikesh2002 jxncyym wolfworld6 wxlsky yzt-cqu sadra-safa mwnam0221 smartadpole mu-c00l josaklil-ai jiaka lhickley stu-z chyang0822 caokeai aifeixingdelv

newcrfs's Issues

A question about demo video

hi! i watched your kitti's demo video in youtube. what does the right window show? 3d display from rgbd? i have converted rgbd to point cloud, but the display effect is quete different from yours.

about test based on kitti

Hello, I would like to ask you about what is the command when using KITTI dataset for testing. And what is the folder structure of the KITTI dataset?

Model weights on panoramic image training

Hi!

Congratulations on your work!

Will you release the model weights for the panoramic images/training code for it?

Besides, in the paper you mention that you also tried to train the model with 50k images and then fine tune it with the Matterport 3D dataset. Did you obtain those 50k real world images using the Matterport camera?

Thanks in advance!

How to train on my own data

Hi, @weihaosky

Excellent work!

And I'd like to train on my own data for my specific scenario, would you please guide how to organize the data?
I saw the config file for training nyu as below, I understood "rgb" is the color image, "depth" should be the depth file, but what's the difference between the two "depth" files, sync_depth_xxx.png vs sync_depth_dense_xxx.png?
(bedroom_0130/rgb_00014.jpg bedroom_0130/sync_depth_00014.png bedroom_0130/dense/sync_depth_dense_00014.png)

Thank you so much!

关于uncrop

https://github.com/aliyun/NeWCRFs/blob/d327bf7ca8fb43959734bb02ddc7b56cf283c8d9/newcrfs/test.py#LL102C19-L102C19

这里似乎有点问题，应该使用从args读入的图片尺寸吧

For how many epochs is the network trained?

The config files says that the network is trained for 50 epochs but the paper reports its trained for 20.

Synchronized raw NYUv2 data issue

Hi Weihao,

Thank you so much for your excellent work. I have a question about NYUv2 training dataset you provided in #3 . Does them include extrinct parameter for each image? If not, do you know how to generate/find extrinct parameter for this training dataset?

Thank you so much for your help!

About the learning rate

Hi, it seems that the learing rate mentioned in your paper is a constant, but in your training code, the lr :

can you explain that ? Looking forward to your reply.

RuntimeError: shape '[1, 20, 7, 35, 7, 256]' is invalid for input of size 8843520

Hi,I'm trying to test on my own test set, but the following issues arise and I would like to ask you how to fix them
The environment used is as follows：
python 3.8
cuda 10.1
pytorch 1.8.1
mmcv 1.4.8

Config file to train kitti full dataset

Hi,

Can you please provide the config file to train official kitti dataset split?

Regards,
Ashutosh

About Computation resources

Hi,
It seems you use multiple GPUs for the training.
Can you answer how many GPUs("Nvidia GTX 2080 Ti GPUs." mentioned in your paper) did you use for training?

Thanks.

Is there any requirement for the size of the input image?

When I try to input a picture of 1080x1920 as input, an error will occur. What size should I align to? Thank you.

Please upload environment.yml from conda to setup requirements

I followed install instruction and got some issue with cudatoolkit version and then I tried running demo.py and it says I do not have cv2 package.

Please create environment.yml file from conda.
you can easily do it by following this link.
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#sharing-an-environment

Thank you very much in advance.

- when i eval, the following issue occur: how can i solve it ?

when i eval, the following issue occur: how can i solve it ?

关于dence

dence目录下也是深度图，那它为什么跟前面的深度图不一样呀？它的作用是啥？

What's the point of post-processing？

        if post_process:
            image_flipped = flip_lr(image)
            pred_depth_flipped = model(image_flipped)
            pred_depth = post_process_depth(pred_depth, pred_depth_flipped)

I wonder what post-processing means。

eval_kitti

想问下kitti数据集是无法用eval？我跑出来的结果是这样的

don't need , at"pip install matplotlib, tqdm, tensorboardX, timm, mmcv"

it should be SPACE between matplotlib and tqdm

How to finetune? and how to train with images of different dimensions?

Firstly, congratulations on your publication and thanks a lot for open sourcing the code base. great work !!!
Hi! I have a couple of questions:

Can you please guide me on how to fine-tune a pretrained model on a similar dataset having a slightly different distribution? I can see the option for loading a pretrained backbone, but not for fine-tuning the model with pretrained weights.
I get weird errors like the following when I try to train on images with different dimensions -- other than (640, 480) Do you have any suggestions on how to tackle this issue?

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/train.py", line 321, in main_worker
depth_est = model(image)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/NewCRFDepth.py", line 133, in forward
e2 = self.crf2(feats[2], e3)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/newcrf_layers.py", line 428, in forward
x_out, H, W, x, Wh, Ww = self.crf_layer(x, v, Wh, Ww)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/newcrf_layers.py", line 357, in forward
x = blk(x, v, attn_mask)
File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, **kwargs)
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/newcrf_layers.py", line 232, in forward
v_windows = window_partition(shifted_v, self.window_size) # nWB, window_size, window_size, C
File "/mmfs1/gscratch/sciencehub/kmarathe/models/NeWCRFs_Conveyor/newcrfs/networks/newcrf_layers.py", line 40, in window_partition
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
RuntimeError: shape '[2, 5, 7, 6, 7, 512]' is invalid for input of size 1541120

Any help would be greatly appreciated.
Thank you! Kalyani

RuntimeError: shape '[1, 20, 7, 35, 7, 256]' is invalid for input of size 8843520

I use my own data as the train dataset and there is a bug

如何导出ply格式的点云文件

验证过程中显存会不断增加

您好，我在使用您的代码训练时发现，每次验证，显存都会增加，最后导致显存out of memory中断了训练，请问下这个问题怎么解决呢？

max_depth的作用

您好我在自己的数据集上运行了您的框架并在训练和测试的代码中都看到了max_depth这个参数请问max_depth这个参数数值是需要精确的还是粗略值

Data preparation

Thank you for your great work!
I followed the README in bts but the files are not correct:
FileNotFoundError: [Errno 2] No such file or directory: '/home/hyx/dataset/newcrfs/dataset/nyu_depth_v2/official_splits/train/dining_room_0031/rgb_00073.jpg'
because there is no number 00XX in folders' name and not many images in folders.
How can I download the correct NYU dataset to meet the requirements of your code?
Thank you for your time!

Error when i test

hi sir
when i test my own 620*420 jpg,the flowing error occur,do you know how can i resolve the error?

demo运行的问题

请问一下您遇到过这个问题吗

复现出来的测试指标和论文所列出结果差距很大

感谢您的开源！
想问下作者，这个针对NYU数据集的测试指标，是对整个NYU数据集图片进行的测试，还是只对data_splits/nyudepthv2_test_files_with_gt.txt文件中列出的574个图片？

预训练模型为啥这么大，1G

是不是单目深度估计的模型都很大，差不多都1-2G?为啥没有像其他视觉任务那样，模型那么小？单目深度估计的模型可以用比较小的模型吗？还是因为比较小的模型效果不是很好？

Saving weight problem in the process of saving training

In the code you provided, why is the weight saved in the training process according to each index? How to choose an optimal weight at the end of training? In addition, the weight file format is not ".ckpt". Is it necessary to convert by yourself?

KITTI Crops

Thanks for releasing the code of your work!

I've realized that when you evaluate on the KITTI dataset you first perform the kb_crop and then the garg_crop. I don't think that's the expected behavior.

When performing the evaluation, on the dataloader you do:

if self.args.do_kb_crop is True:
  height = image.shape[0]
  width = image.shape[1]
  top_margin = int(height - 352)
  left_margin = int((width - 1216) / 2)
  image = image[top_margin:top_margin + 352, left_margin:left_margin + 1216, :]
  if self.mode == 'online_eval' and has_valid_depth:
      depth_gt = depth_gt[top_margin:top_margin + 352, left_margin:left_margin + 1216, :]

So you are cropping both the image and the ground truth. Then, on eval.py, you:

if args.do_kb_crop:
    height, width = gt_depth.shape
    top_margin = int(height - 352)
    left_margin = int((width - 1216) / 2)
    pred_depth_uncropped = np.zeros((height, width), dtype=np.float32)
    pred_depth_uncropped[top_margin:top_margin + 352, left_margin:left_margin + 1216] = pred_depth
    pred_depth = pred_depth_uncropped

But this actually does nothing, as the ground truth depth has already been cropped.

Your evaluation code is based on the work of BTS, but in their code they do not kb_crop the ground truth when evaluating. They crop the input image with both kb_crop and garg_crop, but the ground truth only with garg_crop. That means that, as I understood the code, your evaluation code and the BTS perform a different cropping on the ground truth.

Evaluating on the ground truth only with the garg_crop (and inputs with both crops as in BTS) worsens the results.

Am I missing something?

thanks a lot!

About GPU Memory requirements for training

Hello, I configured the environment with reference to the information in the readme.md file. But it keeps getting the error "CUDA out of memory"

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 4.92 GiB already allocated; 0 bytes free; 5.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

RuntimeError: HIP out of memory. Tried to allocate 54.00 MiB (GPU 0; 15.98 GiB total capacity; 14.39 GiB already allocated; 17179869183.78 GiB free; 14.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

when i eval is occur , what it mean?

About the unary potential function

Segmentation fault (core dumped)

when run eval.py to test nyu, there's a error:Segmentation fault (core dumped). How to solve it?

FPS rate

Thank you for your paper and implementation!
Could you share information about inferens speed, what FPS with your hardware setup?

图中的八位小数是怎么产生的

请问例如0.40810811 的数是怎么产生和决定的

AttributeError: module 'glm' has no attribute 'vec3'

AttributeError: module 'glm' has no attribute 'vec3'
AttributeError: module 'glm' has no attribute 'mat4'
i run demo but i cant continue with frame , error is glm has no attribute mat4 and vec3 ;
i test all version by glm ,my os is win11 python 3.9.12 ;
hope u can help me,thank u

Input resolution seems to have a big impact on performance

Hi, I find that the input resolution seems to have a very big impact on the performance.

When I keep the input shape unchanged, e.g. 640 x 480, everything seems fine, I can reproduce the metrics claimed in the paper on NYU test set.
However, when I change the input resolution, e.g. resize every image to 384 x 288, the RMSE and other metric show a notable drop。 The RMSE drops from 0.33 to 0.95. Why does this happen? Requiring the same resolution as training in inference stage
seems unreasonable.

Thanks

Please provide complete and correct requirements

Hello,

I am trying to reproduce the results of your paper and I wanted first to fiddle with demo.py.
However, I am stuck at the requirements. First of all, the commands you provide do not install pytorch and torchvision in their GPU versions and seem to cause a segfault later on in the process when importing torchvision.
Moreover, demo.py requires scipy, pyside2, opengl, and scikit-image, which are not listed in the requirements.

If I manage to fix my error, I can open a PR with the necessary changes, but I would be extremely grateful if you provided a complete and correct requirements.txt anyway 😄

test on kitti

RuntimeError: shape '[1, 7, 7, 23, 7, 256]' is invalid for input of size 2060800

When I test on the kitti dataset, I can't get the output depth map, and the run reports an error as above.Thank you!!!

Getting inverse Depth Map as output

Even after changing post_process = False in test.py I am getting inverse depth map as an output while visualization

about backbone

Hi, Thanks for your great job.
I am trying to run this code on my datasets. But I don't understand what should I do one the backbone link. Could you give me some lead?

pretrained swin-L model and loaded state dict do not match exactly

Here is the output:

Load encoder backbone from: model_zoo/swin_transformer/swin_large_patch4_window7_224_22k.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: norm.weight, norm.bias, head.weight, head.bias, layers.0.blocks.1.attn_mask, layers.1.blocks.1.attn_mask, layers.2.blocks.1.attn_mask, layers.2.blocks.3.attn_mask, layers.2.blocks.5.attn_mask, layers.2.blocks.7.attn_mask, layers.2.blocks.9.attn_mask, layers.2.blocks.11.attn_mask, layers.2.blocks.13.attn_mask, layers.2.blocks.15.attn_mask, layers.2.blocks.17.attn_mask

missing keys in source state_dict: norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

It seems that the pretrained swin-L model and the source state dict do not match.

UnicodeDecodeError because the code and txt files are not utf-8

Hey! Thank you for your great work!
I found that these files are not utf-8. So there will be
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 8: invalid continuation byte
when we train or eval.
Could you please update so that other researchers can follow the README without any error? Thank you for your time!

In what units is the result returned?

Sorry for the stupid question, but I can't figure out what units the result is in.

    with torch.no_grad():
        for _, sample in enumerate(tqdm(dataloader.data)):
            image = Variable(sample['image'].cuda())
            # Predict
            depth_est = model(image)

[[3.6550407 3.6550407 3.461664 ... 3.2897868 3.4753633 3.4753633],...

I think it's the distance meters for each pixel from the camera, but I'm not sure.

Eval result

那个我跟着装了这边的库，跟着BTS那边首页的readme下载了NYU和KITTI，KITTI好像只有GT所以没办法用你这边的eval，然而我NYU直接跑那个eval出来的结果和你这边readme给的数据以及论文数据差距很大，我跑"python newcrfs/eval.py configs/arguments_eval_nyu.txt"的结果大概如下：

想问一下是什么原因，我检查了一下数据集应该是匹配的

Well, I downloaded NYU and Kitti follow the readme on the front page of BTS. It seems that Kitti only has GT, so I can't use your eval code to test. However, the results of NYU dataset running the eval code differ greatly from the data given by your readme and the paper data. My result when running "python newcrfs/eval.py configs/arguments_eval_nyu.txt" is like following:

I wonder why cause I checked that the data set is matched.