Code Monkey home page Code Monkey logo

diffbev's Introduction

Conditional diffusion probability model for BEV perception


Arxiv

https://arxiv.org/abs/2303.08333

Abstract

BEV perception is of great importance in the field of autonomous driving, serving as the cornerstone of planning, controlling, and motion prediction. The quality of the BEV feature highly affects the performance of BEV perception. However, taking the noises in camera parameters and LiDAR scans into consideration, we usually obtain BEV representation with harmful noises. Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to utilize the diffusion model to get a better BEV representation. In this work, we propose an end-to-end framework, named DiffBEV, to exploit the potential of diffusion model to generate a more comprehensive BEV representation. To the best of our knowledge, we are the first to apply diffusion model to BEV perception. In practice, we design three types of conditions to guide the training of the diffusion model which denoises the coarse samples and refines the semantic feature in a progressive way. What's more, a cross-attention module is leveraged to fuse the context of BEV feature and the semantic content of conditional diffusion model. DiffBEV achieves a 25.9% mIoU on the nuScenes dataset, which is 6.2% higher than the best-performing existing approach. Quantitative and qualitative results on multiple benchmarks demonstrate the effectiveness of DiffBEV in BEV semantic segmentation and 3D object detection tasks. framework

Dataset

Download Datasets From Official Websites

Extensive experiments are conducted on the nuScenes, KITTI Raw, KITTI Odometry, and KITTI 3D Object benchmarks.

Prepare Depth Maps

Follow the script to generate depth maps for KITTI datasets. The depth maps of KITTI datasets are available at Google Drive and Baidu Net Disk. We also provide the script to get the depth map for nuScenes dataset. Replace the dataset path in the script accroding to your dataset directory.

Dataset Processing

After downing these datasets, we need to generate the annotations in BEV. Follow the instructions below to get the corresponding annotations.

nuScenes

Run the script make_nuscenes_labels to get the BEV annotation for the nuScenes benchmark. Please follow here to generate the BEV annotation (ann_bev_dir) for KITTI datasets.

KITTI Datasets

Follow the instruction to get the BEV annotations for KITTI Raw, KITTI Odometry, and KITTI 3D Object datasets.

The datasets' structure is organized as follows.

data
├── nuscenes
    ├── img_dir
        ├── train
        ├── val
    ├── ann_bev_dir
        ├── train
        ├── val
        ├── train_depth
        ├── val_depth
    ├── calib.json
├── kitti_processed
    ├── kitti_raw
        ├── img_dir
            ├── train
            ├── val
        ├── ann_bev_dir
            ├── train
            ├── val
            ├── train_depth
            ├── val_depth
        ├── calib.json
    ├── kitti_odometry
        ├── img_dir
            ├── train
            ├── val
        ├── ann_bev_dir
            ├── train
            ├── val
            ├── train_depth
            ├── val_depth
        ├── calib.json
    ├── kitti_object
        ├── img_dir
            ├── train
            ├── val
        ├── ann_bev_dir
            ├── train
            ├── val
            ├── train_depth
            ├── val_depth
        ├── calib.json

Prepare Calibration Files

For the camera parameters on each dataset, we write them into the corresponding _calib.json file. For each dataset, we upload the _calib.json to Google Drive and Baidu Net Disk.

Please change the dataset path according to the real data directory in the nuScenes, KITTI Raw, KITTI Odometry, and KITTI 3D Object dataset configurations. Modify the path of pretrained model in model configurations.

Installation

DiffBEV is tested on:

  • Python 3.7/3.8
  • CUDA 11.1
  • Torch 1.9.1

Please check install for installation.

  • Create a conda environment for the project.
conda create -n diffbev python=3.7
conda activate diffbev
  • Install Pytorch following the instruction. conda install pytorch torchvision -c pytorch
  • Install mmcv
pip install -U openmim
mim install mmcv-full
  • Git clone this repository
git clone https://github.com/JiayuZou2020/DiffBEV.git
  • Install and compile the required packages.
cd DiffBEV
pip install -v -e .

Visualization

vis

Citation

If you find our work is helpful for your research, please consider citing as follows.

@article{zou2023diffbev,
      title={DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception}, 
      author={Jiayu, Zou and Zheng, Zhu and Yun, Ye and Xingang, Wang},
      journal={arXiv preprint arXiv:2303.08333},
      year={2023}
}

Acknowledgement

Our work is partially based on the following open-sourced projects: mmsegmentation, VPN, PYVA, PON, LSS. Thanks for their contribution to the research community of BEV perception.

diffbev's People

Contributors

jiayuzou2020 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffbev's Issues

Code Release

Hi,

Do you have a timeline for the code release?

Best,

How to train this model properly?

the readme file doesnt have the command how train the model., I don't know where to find the cfg.
Please solve it. Kindly Request, Thanks

Train and test time

How long does it take to train DiffBEV? And approximately how long does it take to infer on a single image?

How to train DiffBEV model?

Thanks for open-sourcing your work!
Since you didn't mention anything about how to train the model, and by looking at the files uploaded, I couldn't figure how to train your model, so I was wondering if you have uploaded the files related to training DiffBEV or not yet. Would you please clarify this?

Lack requirements/build.txt!!! Please upload it, thanks!

Can you help me please?
@JiayuZou2020

in the DiffBEV/setup.py

setup_requires=parse_requirements('requirements/build.txt'),
tests_require=parse_requirements('requirements/tests.txt'),
install_requires=parse_requirements('requirements/runtime.txt'),
extras_require={
    'all': parse_requirements('requirements.txt'),
    'tests': parse_requirements('requirements/tests.txt'),
    'build': parse_requirements('requirements/build.txt'),
    'optional': parse_requirements('requirements/optional.txt'),
},

setup_requires=parse_requirements('requirements/build.txt'),
'build': parse_requirements('requirements/build.txt'),

But in the folderDiffBEV/requirements, build.txt is not exsitent!!!

IndexError: list index out of range during installing

Hello, I tried below codes
cd mmsegmentation
pip install -v -e .

but it seems there's no mmsegmentation folder, and "pip install -v -e ." gives me "IndexError: list index out of range" in line 56 (info['package'] = line.split('#egg=')[1]).

Could anyone let me know how to fix this please?

defussion过程

您好,请问一下您这里仅仅开源了mmseg而完全没有开源自己论文的方法吗?还是我没找到您的方法呢?

pip install -v -e . has some problem

pip install -v -e .
Using pip 23.3.1 from /home/hsh/anaconda3/envs/diffbev/lib/python3.8/site-packages/pip (python 3.8)
Obtaining file:///home/hsh/2024/DiffBEV
Running command python setup.py egg_info
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/home/hsh/2024/DiffBEV/setup.py", line 177, in
'all': parse_requirements('requirements.txt'),
File "/home/hsh/2024/DiffBEV/setup.py", line 97, in parse_requirements
packages = list(gen_packages_items())
File "/home/hsh/2024/DiffBEV/setup.py", line 85, in gen_packages_items
for info in parse_require_file(require_fpath):
File "/home/hsh/2024/DiffBEV/setup.py", line 80, in parse_require_file
for info in parse_line(line):
File "/home/hsh/2024/DiffBEV/setup.py", line 54, in parse_line
info['package'] = line.split('#egg=')[1]
IndexError: list index out of range
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Numbers reported different for LSS

Hi.
Thank you for your work. I had a query about the results on NuScenes.
The numbers reported for Lift-Splat-Shoot (LSS) for Car is 32.06 by the original authors and 27.3 by the your work. Drivable area is reported 72.04 by the original authors, and 55.9 by your work.
What are the possible reason for this?
Can you also specify the train/val split used in NuScenes? Thanks.

Training process

Hi, sorry to bother you again. I'm confused how to integrate the computationally demanding reverse diffusion process, i.g, the step by step deniose process, into the perception train process. Is there any approximation?

Code for detection

Hi, thanks for your great work and the open source code !
If possible, I would greatly appreciate it if you could provide the code for 3D detection

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.