Code Monkey home page Code Monkey logo

ddp's Introduction

๐ŸŽ† DDP: Diffusion Model for Dense Visual Prediction

PWC PWC PWC PWC PWC

The official implementation of the paper "DDP: Diffusion Model for Dense Visual Prediction".

This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for DDP, which contains:

  • Semantic Segmentation
  • Depth Estimation
  • BEV Map Segmentation
  • Mask Conditioned ControlNet

We use MMSegmentation, Monocular-Depth-Estimation-Toolbox, BEVfusion, ControlNet, as the correspond codebase. We would like to express our sincere gratitude to the developers of these codebases.

News

In the coming days, we will be updating the corresponding codebases.

Abstract

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts.

Method

image

Usage

please refer to each task folder for more details.

Catalog

  • Depth Estimation checkpoints
  • Depth Estimation code
  • BEVMap checkpoints
  • BEVMap Segmentation code
  • Mask Conditioned ControlNet checkpoints
  • Mask Conditioned ControlNet code
  • Segmentation checkpoints
  • Segmentation code
  • Initialization

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{ji2023ddp,
  title={DDP: Diffusion Model for Dense Visual Prediction},
  author={Ji, Yuanfeng and Chen, Zhe and Xie, Enze and Hong, Lanqing and Liu, Xihui and Liu, Zhaoqiang and Lu, Tong and Li, Zhenguo and Luo, Ping},
  journal={arXiv preprint arXiv:2303.17559},
  year={2023}
}

ddp's People

Contributors

czczup avatar jiyuanfeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ddp's Issues

run segmentation code

Thanks for your great work!

In segmention/mmseg/models/deformable_head_with_time.py, the input augments are "query","key","value","time" , etc in self.encoder function, line 119. But, in mmcv package, https://github.com/open-mmlab/mmcv/blob/v1.6.2/mmcv/ops/multi_scale_deform_attn.py, The "MultiScaleDeformableAttention" does not accept the "time" augment. Do you implement a custom function "MultiScaleDeformableAttention"?

Your code: segmention/mmseg/models/deformable_head_with_time.py
image
mmcv's multi_scale_deform_attn.py https://github.com/open-mmlab/mmcv/blob/v1.6.2/mmcv/ops/multi_scale_deform_attn.py
image

Depth code release

Hi @JiYuanFeng

Thanks for sharing your great work!

Do you know when will you release the code and checpoints for the depth estimation?

Thanks in advance,
Sergio

about gpu memory and batch size

I encountered a confusing problem when I tried to train the depth model.
When I executed bash tools/dist_train.sh configs/ddp_kitti/ddp_swinb_22k_w7_kitti_bs2x8_scale01.py 1 with samples_per_gpu=2, the model took the following GPU memory.
image
However, after I set samples_per_gpu=4, the occupied memory was reduced.
image
Why would that happen?

Time of open-source code

Hi there, thanks for your wonderful work and planning to open-source.

I would like to know do you have a schedule to open-source the code.

Thanks, :)

Sampling drift issue

Thank you for your great work and sharing the code.

You mentioned sampling drift problem in your paper. It seems you proposed 'self-aligned denoising' method to solve the problem. However, I didn't find it in your released code. Can you tell me where it is? Or the 'self-aligned denoising' method is just to verify the sampling drift problem but not inclueded in the released code?

Looking forward to your reply.

BEV training problem

I first fork bevfusion and replace configs and mmdet3d with ddp's configs and mmdet3d.
When training ddp, I encountered this error:

  • RuntimeError: "ms_deform_attn_forward_cuda" not implemented for 'Half'

I try to replace the auto_fp16 in forward function with force_fp32, it can work. However, it shows a lot of time for training.
I want to ask any solution for this error.

Not able to reproduce - segmentation result

Hello,

Thank you very much for sharing the great work! I was trying to reproduce the segmentation result using
bash tools/dist_train.sh configs/ade/ddp_swin_s_2x8_512x512_160k_ade20k.py 1

Unfortunately, I cannot reproduce a similar result (not even close): mIoU after 16k steps is only 21.04 as compared to 37.93 in your log. I cannot figure it out why. Environment more or less similar except for CUDA, for which I don't except that huge difference. Log is attached. 20240424_065202.log

I would appreciate any help! Thank you!

loss nan

When I run the segmentation code on other public datasets, the loss becomes nan after training for more than 20 epochs. What could be the reason?

Thank you for your reply!

Error while trying to train depth estimation model

Hello,

I am getting following error:

nvrtc: error: invalid value for --gpu-architecture (-arch)

While trying to train the depth estimation model using the following command:

bash tools/dist_train.sh configs/custom/custom_config.py 1

I have one GPU in my system (RTX 4090), can you help me with this issue, i did troubleshooting but couldn't figured it out what is the cause of this issue.

Depth training problem

Hello, thanks for your work! when train the depth model, it shows error:'ModuleNotFoundError: No module named 'depth.models.depther.regulardepth''. I do not find the 'regulardepth' module in your code. Could you solve this problem? Thanks.

KITTI depth gt problem

KITTI's depth gt is a map with sparse ground truth values. Do you use it directly as x0 of the diffusion model during training?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.