ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection

: We propose a framework for efficient active learning within various 3D object detection techniques and modalities, demonstrating the effectiveness of active learning at reaching comparable detection performance on benchmark datasets at a fraction of the annotation cost. Datasets include roadside infrastructure sensors (top row) and onboard vehicle sensors (bottom row), with LiDAR-only and LiDAR+camera fusion methods, the two dominant strategies in state-of-the-art performance at the safety-critical detection task.

This is the official implementation of our paper:

[IV 2024] ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection [arXiV] [website]

Overview

ActiveAnno3D is the first active learning framework for multi-modal 3D object detection. With this framework you can select data samples for labeling that are of maximum informativeness for training.

In summary:

We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance.
We perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset.
We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset.
BEVFusion achieved an mAP of 64.31 when using half of the training data and 75.0 mAP when using the complete nuScenes dataset.
We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling and minimize the labeling costs.

Architecture

The generalized active learning flow involves the selection of data from an unlabeled pool according to an acquisition function, which, in the case of uncertainty-driven AL, utilizes the trained model or, in the case of diversity-driven AL, may be independent of the training. This selected data is then annotated by an oracle and aggregated with previously labeled data. Whether or not all data or just the new data is used in the next training step is determined by the choice of training strategy. The variety of possible acquisition and training techniques and unique domain challenges posed by autonomous driving make active learning an opportune environment for innovation toward safe and accurate learning.

Installation

For installation using Docker, please refer to the INSTALL.md file.

The code is tested in the following python environment:

Python 3.7.16
PyTorch 1.10.1
CUDA 11.3.1
spconv-cud113=2.21.12
Open3D 0.16.0
Wandb 0.16.2

Our Contributions and Modifications

Incorporate Continuous Training Strategies [link]
Propose Temporal CRB [link]
Propose Class-weighted CRB [link]
Post process Predictions [link]
Develop develop an interface for the proAnno labeling tool [link]

Tutorial

The dataset, e.g. TUM Traffic Intersection, has to be in the KITTI format. If it's in OpenLABEL format then use our converter

python tumtraf_converter.py --load_dir /home/user/tumtraf --save_dir /home/user/tumtraf_kitti_format --splits train val test

To run normal training:

in ./tools/cfgs add a folder <dataset_models> then add the .yaml configuration files for your models there.

python ./tools/main/train.py --cfg_file <path-to-yaml-file> --extra_tag normalTraining

if you want to use a pre-trained model set

--pretrained_model to <path-to-your-pretrained-model>

To run active training:

in ./tools/cfgs add a folder <active-dataset_models> then add the .yaml configuration files for your models there.

python ./tools/main/train.py --cfg_file <path-to-yaml-file> --extra_tag <continuous_training_method>

Connection to proAnno

The flask_app.py file should be located anywhere outside the docker container.
To access this flask server from the proAnno tool running on a different machine, we should use the IP address of the machine where the flask app is running, followed by the port number.

for example, if the IP address of the workstation is 192.168.1.5 and the flask server is running on port 5000, then from the proAnno tool on another machine we would access the flask app using 'http://192.168.1.5:5000'
Run flask_app.py
ActiveAnno3D is ready for the annotator to send a command.
The selected point cloud frames will be saved in ActiveAnno3D under: ./data/proannoV2/currently_annotating, and they have to be copied manually to the proAnno side for annotations.

Evaluation

The graph illustrates the mAP score achieved by the BEVFusion model on the nuScenes dataset relative to the expanding size of the training set in the active learning setting with random and entropy queries separately.

The graph illustrates the mAP scores achieved by the PV-RCNN model on the TUM Traffic Intersection dataset relative to the expanding size of the training set in the active learning setting with random and entropy queries separately. The graph illustrates the mAP scores achieved by the PV-RCNN model on the TUM Traffic Intersection dataset relative to the expanding size of the training set in the active learning setting with eight different query strategies.

Qualitative Results

Qualitative results are illustrated by two pairs of images. The left pair is from the TUM Traffic Intersection dataset, and the right pair is from nuScenes. For each pair, the left image shows the predicted labels for each class, with each class represented by a different color. The right image of each pair shows the predictions made by learning on the complete dataset. Both results are quite similar, showing the efficiency of the active learning technique.

Benchmark Results

Labeled Pool		LiDAR-Only (PV-RCNN)		LiDAR+Camera (BEVFusion)
Round	%	Random	Entropy	Random	Entropy
1	10	51.03	54.32 (+3.29)	30.95	31.06 (+0.11)
2	15	61.98	62.24 (+0.26)	34.19	36.39 (+2.20)
3	20	69.84	68.23 (-1.61)	38.00	40.41 (+2.41)
4	25	74.82	72.40 (-2.42)	42.36	42.17 (-0.19)
5	30	77.25	76.56 (-0.69)	44.94	45.57 (+0.63)
6	35	75.40	75.00 (-0.40)	44.74	46.76 (+2.02)
7	40	77.03	75.48 (-1.55)	-	49.24
8	50	79.09	77.25 (-1.84)	-	64.31
SOA (No AL)	100	83.50		75.00

Evaluation of the PV-RCNN (LiDAR-only) and BEVFusion (camera+LiDAR) model using the random sampling baseline and entropy querying method on the TUM Traffic Intersection dataset and the nuScenes dataset. These Results are compaired to the respective 100% accuracies of the original work.

Acknowledgements

Our baseline is the Active3D framework Active3D, an active learning framework for 3D object detection, proposing the CRB query strategy that assesses the point cloud data informativeness based on 3 data characteristics: 3D box class distribution, feature representativeness, and point density distribution.

Citation

If you find our work useful in your research, please cite our work and ⭐ our repository.

@misc{activeanno3d,
      title={ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection}, 
      author={Ghita, Ahmed and Antoniussen, Bjørk and Zimmer,Walter and Greer,Ross and Creß, Christian and Møgelmose,Andreas and Trivedi, Mohan M. and Knoll,Alois C.},
      year={2024},
      eprint={2402.03235},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

The ActiveAnno3D framework is licensed under CC BY-NC-SA 4.0.

cv-det / active-anno-3d Goto Github PK

active-anno-3d's Introduction

ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection

Overview

Architecture

Installation

Our Contributions and Modifications

Tutorial

Connection to proAnno

Evaluation

Qualitative Results

Benchmark Results

Acknowledgements

Citation

License

active-anno-3d's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent