SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention (ECCV 2022)

This is the official repository for SpatialDETR which will be published at ECCV 2022.

scene_26a6b03c8e2f4e6692f174a7074e54ff.mp4

Authors: Simon Doll, Richard Schulz, Lukas Schneider, Viviane Benzin, Markus Enzweiler, Hendrik P.A. Lensch

Abstract

Based on the key idea of DETR this paper introduces an object-centric 3D object detection framework that operates on a limited number of 3D object queries instead of dense bounding box proposals followed by non-maximum suppression. After image feature extraction a decoder-only transformer architecture is trained on a set-based loss. SpatialDETR infers the classification and bounding box estimates based on attention both spatially within each image and across the different views. To fuse the multi-view information in the attention block we introduce a novel geometric positional encoding that incorporates the view ray geometry to explicitly consider the extrinsic and intrinsic camera setup. This way, the spatially-aware cross-view attention exploits arbitrary receptive fields to integrate cross-sensor data and therefore global context. Extensive experiments on the nuScenes benchmark demonstrate the potential of global attention and result in state-of-the-art performance.

If you find this repository useful, please cite

@inproceedings{Doll2022ECCV,
  author = {Doll, Simon and Schulz, Richard and Schneider, Lukas and Benzin, Viviane and Enzweiler Markus and Lensch, Hendrik P.A.},
  title = {SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention},
  booktitle = {European Conference on Computer Vision(ECCV)},
  year = {2022}
}

You can find the Paper here

Setup

To setup the repository and run trainings we refer to getting_started.md

Changelog

06/22

We moved the codebase to the new coordinate conventions of mmdetection3d rc1.0ff
The performancy might slightly vary compared to the original runs on mmdetection3d 0.17 reported in the paper

Experimental results

The baseline models have been trained on 4xV100 GPUs, the submission models on 8xA100 GPUs. For more details we refer to the corresponding configuration / log files. Keep in mind that the performance can vary between runs and that the current codebase uses [email protected]

Config	Logfile	Set	#GPUs	mmdet3d	mAP	ATE	ASE	AOE	AVE	AAE	NDS
query_proj_value_proj.py (baseline)	log / model	val	4	rc1.0	0.315	0.843	0.279	0.497	0.787	0.208	0.396
query_proj_value_proj.py	log	val	4	0.17	0.313	0.850	0.274	0.494	0.814	0.213	0.392
query_center_proj_no_value_proj_shared.py	log	val	8	0.17	0.351	0.772	0.274	0.395	0.847	0.217	0.425
query_center_proj_no_value_proj_shared_cbgs_vovnet_trainval.py	log	test	8	0.17	0.425	0.614	0.253	0.402	0.857	0.131	0.487

Qualitative results

License

See license_infos.md for details.

Acknowledgement

This repo contains the implementations of SpatialDETR. Our implementation is a plugin to MMDetection3D and also uses a fork of DETR3D. Full credits belong to the contributors of those frameworks and we truly thank them for enabling our research!

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	0.x	1.x
MMYOLO		0.x

cgtuebingen / spatialdetr Goto Github PK

spatialdetr's Introduction

SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention (ECCV 2022)

Abstract

Setup

Changelog

06/22

Experimental results

Qualitative results

License

Acknowledgement

spatialdetr's People

Contributors

Stargazers

Watchers

Forkers

spatialdetr's Issues

Welcome update to OpenMMLab 2.0

Recommend Projects

Recommend Topics

Recommend Org