Code Monkey home page Code Monkey logo

active-anno-3d's Introduction

arXiv tum_traffic website

ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection

figure : We propose a framework for efficient active learning within various 3D object detection techniques and modalities, demonstrating the effectiveness of active learning at reaching comparable detection performance on benchmark datasets at a fraction of the annotation cost. Datasets include roadside infrastructure sensors (top row) and onboard vehicle sensors (bottom row), with LiDAR-only and LiDAR+camera fusion methods, the two dominant strategies in state-of-the-art performance at the safety-critical detection task.

This is the official implementation of our paper:

[IV 2024] ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection [arXiV] [website]

Overview

ActiveAnno3D is the first active learning framework for multi-modal 3D object detection. With this framework you can select data samples for labeling that are of maximum informativeness for training.

In summary:

  1. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance.
  2. We perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset.
  3. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset.
  4. BEVFusion achieved an mAP of 64.31 when using half of the training data and 75.0 mAP when using the complete nuScenes dataset.
  5. We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling and minimize the labeling costs.

Architecture

figure The generalized active learning flow involves the selection of data from an unlabeled pool according to an acquisition function, which, in the case of uncertainty-driven AL, utilizes the trained model or, in the case of diversity-driven AL, may be independent of the training. This selected data is then annotated by an oracle and aggregated with previously labeled data. Whether or not all data or just the new data is used in the next training step is determined by the choice of training strategy. The variety of possible acquisition and training techniques and unique domain challenges posed by autonomous driving make active learning an opportune environment for innovation toward safe and accurate learning.

Installation

For installation using Docker, please refer to the INSTALL.md file.

The code is tested in the following python environment:

Our Contributions and Modifications

  1. Incorporate Continuous Training Strategies [link]
  2. Propose Temporal CRB [link]
  3. Propose Class-weighted CRB [link]
  4. Post process Predictions [link]
  5. Develop develop an interface for the proAnno labeling tool [link]

Tutorial

The dataset, e.g. TUM Traffic Intersection, has to be in the KITTI format. If it's in OpenLABEL format then use our converter

python tumtraf_converter.py --load_dir /home/user/tumtraf --save_dir /home/user/tumtraf_kitti_format --splits train val test

To run normal training:

in ./tools/cfgs add a folder <dataset_models> then add the .yaml configuration files for your models there.

python ./tools/main/train.py --cfg_file <path-to-yaml-file> --extra_tag normalTraining

if you want to use a pre-trained model set

--pretrained_model to <path-to-your-pretrained-model>

To run active training:

in ./tools/cfgs add a folder <active-dataset_models> then add the .yaml configuration files for your models there.

python ./tools/main/train.py --cfg_file <path-to-yaml-file> --extra_tag <continuous_training_method>

Connection to proAnno

  1. The flask_app.py file should be located anywhere outside the docker container.

  2. To access this flask server from the proAnno tool running on a different machine, we should use the IP address of the machine where the flask app is running, followed by the port number.

    for example, if the IP address of the workstation is 192.168.1.5 and the flask server is running on port 5000, then from the proAnno tool on another machine we would access the flask app using 'http://192.168.1.5:5000'

  3. Run flask_app.py

  4. ActiveAnno3D is ready for the annotator to send a command.

  5. The selected point cloud frames will be saved in ActiveAnno3D under: ./data/proannoV2/currently_annotating, and they have to be copied manually to the proAnno side for annotations.

Evaluation

figure The graph illustrates the mAP score achieved by the BEVFusion model on the nuScenes dataset relative to the expanding size of the training set in the active learning setting with random and entropy queries separately.

figure The graph illustrates the mAP scores achieved by the PV-RCNN model on the TUM Traffic Intersection dataset relative to the expanding size of the training set in the active learning setting with random and entropy queries separately. figure The graph illustrates the mAP scores achieved by the PV-RCNN model on the TUM Traffic Intersection dataset relative to the expanding size of the training set in the active learning setting with eight different query strategies.

Qualitative Results

figure Qualitative results are illustrated by two pairs of images. The left pair is from the TUM Traffic Intersection dataset, and the right pair is from nuScenes. For each pair, the left image shows the predicted labels for each class, with each class represented by a different color. The right image of each pair shows the predictions made by learning on the complete dataset. Both results are quite similar, showing the efficiency of the active learning technique.

Benchmark Results

Labeled Pool LiDAR-Only (PV-RCNN) LiDAR+Camera (BEVFusion)
Round % Random Entropy Random Entropy
1 10 51.03 54.32

(+3.29)

30.95 31.06

(+0.11)

2 15 61.98 62.24

(+0.26)

34.19 36.39

(+2.20)

3 20 69.84 68.23

(-1.61)

38.00 40.41

(+2.41)

4 25 74.82 72.40

(-2.42)

42.36 42.17

(-0.19)

5 30 77.25 76.56

(-0.69)

44.94 45.57

(+0.63)

6 35 75.40 75.00

(-0.40)

44.74 46.76

(+2.02)

7 40 77.03 75.48

(-1.55)

- 49.24
8 50 79.09 77.25

(-1.84)

- 64.31
SOA (No AL) 100 83.50 75.00
Evaluation of the PV-RCNN (LiDAR-only) and BEVFusion (camera+LiDAR) model using the random sampling baseline and entropy querying method on the TUM Traffic Intersection dataset and the nuScenes dataset. These Results are compaired to the respective 100% accuracies of the original work.

Acknowledgements

Our baseline is the Active3D framework Active3D, an active learning framework for 3D object detection, proposing the CRB query strategy that assesses the point cloud data informativeness based on 3 data characteristics: 3D box class distribution, feature representativeness, and point density distribution.

Citation

If you find our work useful in your research, please cite our work and ⭐ our repository.

@misc{activeanno3d,
      title={ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection}, 
      author={Ghita, Ahmed and Antoniussen, Bjørk and Zimmer,Walter and Greer,Ross and Creß, Christian and Møgelmose,Andreas and Trivedi, Mohan M. and Knoll,Alois C.},
      year={2024},
      eprint={2402.03235},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

The ActiveAnno3D framework is licensed under CC BY-NC-SA 4.0.

active-anno-3d's People

Contributors

walzimmer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.