jiyuanfeng / ddp Goto Github PK

Python 95.11% Shell 0.30% Jupyter Notebook 0.03% Dockerfile 0.01% C++ 2.70% Cuda 1.85%

ddp's Introduction

🎆 DDP: Diffusion Model for Dense Visual Prediction

The official implementation of the paper "DDP: Diffusion Model for Dense Visual Prediction".

Project Page | Paper

This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for DDP, which contains:

Semantic Segmentation
Depth Estimation
BEV Map Segmentation
Mask Conditioned ControlNet

We use MMSegmentation, Monocular-Depth-Estimation-Toolbox, BEVfusion, ControlNet, as the correspond codebase. We would like to express our sincere gratitude to the developers of these codebases.

News

In the coming days, we will be updating the corresponding codebases.

Abstract

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts.

Method

Usage

please refer to each task folder for more details.

Catalog

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{ji2023ddp,
  title={DDP: Diffusion Model for Dense Visual Prediction},
  author={Ji, Yuanfeng and Chen, Zhe and Xie, Enze and Hong, Lanqing and Liu, Xihui and Liu, Zhaoqiang and Lu, Tong and Li, Zhenguo and Luo, Ping},
  journal={arXiv preprint arXiv:2303.17559},
  year={2023}
}

ddp's People

Contributors

Stargazers

Watchers

Forkers

daydreamer2023 jodyngo zahragh996 steven-xiong brendanjcrowe donceykong felixpun finninmunich zmk5566 shijie-xiao zhangjy2008327 meetraj19

ddp's Issues

run segmentation code

Thanks for your great work!

In segmention/mmseg/models/deformable_head_with_time.py, the input augments are "query","key","value","time" , etc in self.encoder function, line 119. But, in mmcv package, https://github.com/open-mmlab/mmcv/blob/v1.6.2/mmcv/ops/multi_scale_deform_attn.py, The "MultiScaleDeformableAttention" does not accept the "time" augment. Do you implement a custom function "MultiScaleDeformableAttention"?

Your code: segmention/mmseg/models/deformable_head_with_time.py

mmcv's multi_scale_deform_attn.py https://github.com/open-mmlab/mmcv/blob/v1.6.2/mmcv/ops/multi_scale_deform_attn.py

BEV code release

Hi! @JiYuanFeng @czczup
Thanks for you great work!
ARE there plans to release the code of bev segmentation?

Depth code release

Hi @JiYuanFeng

Thanks for sharing your great work!

Do you know when will you release the code and checpoints for the depth estimation?

Thanks in advance,
Sergio

about gpu memory and batch size

I encountered a confusing problem when I tried to train the depth model.
When I executed bash tools/dist_train.sh configs/ddp_kitti/ddp_swinb_22k_w7_kitti_bs2x8_scale01.py 1 with samples_per_gpu=2, the model took the following GPU memory.

However, after I set samples_per_gpu=4, the occupied memory was reduced.

Why would that happen?

Time of open-source code

Hi there, thanks for your wonderful work and planning to open-source.

I would like to know do you have a schedule to open-source the code.

Thanks, :)

Sampling drift issue

Thank you for your great work and sharing the code.

You mentioned sampling drift problem in your paper. It seems you proposed 'self-aligned denoising' method to solve the problem. However, I didn't find it in your released code. Can you tell me where it is? Or the 'self-aligned denoising' method is just to verify the sampling drift problem but not inclueded in the released code?

Looking forward to your reply.

How can I train a model with my own dataset?

Thank you for your excellent work!
I want to train the model using my own dataset, how does this modify the configuration file? How should I train.

BEV training problem

I first fork bevfusion and replace configs and mmdet3d with ddp's configs and mmdet3d.
When training ddp, I encountered this error:

RuntimeError: "ms_deform_attn_forward_cuda" not implemented for 'Half'

I try to replace the auto_fp16 in forward function with force_fp32, it can work. However, it shows a lot of time for training.
I want to ask any solution for this error.

KeyError: 'pad_shape' coming when running image_demo.py

Hi,

When I try to run inference on a demo image using the image_demo.py file, I find that the key: 'pad_shape' is missing. How do I resolve this issue?

Thanks

When code will be released?

Hi @JiYuanFeng ,

Keep up the great work!
Could you please let me know, when you will release the code?

Thanks!

Not able to reproduce - segmentation result

Hello,

Thank you very much for sharing the great work! I was trying to reproduce the segmentation result using
bash tools/dist_train.sh configs/ade/ddp_swin_s_2x8_512x512_160k_ade20k.py 1

Unfortunately, I cannot reproduce a similar result (not even close): mIoU after 16k steps is only 21.04 as compared to 37.93 in your log. I cannot figure it out why. Environment more or less similar except for CUDA, for which I don't except that huge difference. Log is attached. 20240424_065202.log

I would appreciate any help! Thank you!

loss nan

When I run the segmentation code on other public datasets, the loss becomes nan after training for more than 20 epochs. What could be the reason?

Thank you for your reply!

Error while trying to train depth estimation model

Hello,

I am getting following error:

nvrtc: error: invalid value for --gpu-architecture (-arch)

While trying to train the depth estimation model using the following command:

bash tools/dist_train.sh configs/custom/custom_config.py 1

I have one GPU in my system (RTX 4090), can you help me with this issue, i did troubleshooting but couldn't figured it out what is the cause of this issue.

Depth training problem

Hello, thanks for your work! when train the depth model, it shows error:'ModuleNotFoundError: No module named 'depth.models.depther.regulardepth''. I do not find the 'regulardepth' module in your code. Could you solve this problem? Thanks.

KITTI depth gt problem

KITTI's depth gt is a map with sparse ground truth values. Do you use it directly as x0 of the diffusion model during training?