Code Monkey home page Code Monkey logo

kl-loss's Introduction

Bounding Box Regression with Uncertainty for Accurate Object Detection

CVPR 2019

Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, Xiangyu Zhang, Carnegie Mellon University & Megvii Inc.

Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together. Our loss greatly improves the localization accuracies of various architectures with nearly no additional computation. The learned localization variance allows us to merge neighboring bounding boxes during non-maximum suppression (NMS), which further improves the localization performance. On MS-COCO, we boost the Average Precision (AP) of VGG-16 Faster R-CNN from 23.6% to 29.1%. More importantly, for ResNet-50-FPN Mask R-CNN, our method improves the AP and AP90 by 1.8% and 6.2% respectively, which significantly outperforms previous state-of-the-art bounding box refinement methods.

Citation

If you find the code useful in your research, please consider citing:

@inproceedings{klloss,
  title={Bounding Box Regression with Uncertainty for Accurate Object Detection},
  author={He, Yihui and Zhu, Chenchen and Wang, Jianren and Savvides, Marios and Zhang, Xiangyu },
  booktitle={2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019},
  organization={IEEE}
}

Installation

Please find installation instructions for Caffe2 and Detectron in INSTALL.md.

When installing cocoapi, please use my fork to get AP80 and AP90 scores.

Testing

Inference without Var Voting (8 GPUs):

python2 tools/test_net.py \
    --cfg configs/e2e_faster_rcnn_R-50-FPN_2x.yaml

You will get:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.385
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.578
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.412
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.209
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.412
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.515
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.499
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.522
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.321
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.553
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.680
 Average Precision  (AP) @[ IoU=0.60      | area=   all | maxDets=100 ] = 0.533
 Average Precision  (AP) @[ IoU=0.70      | area=   all | maxDets=100 ] = 0.461
 Average Precision  (AP) @[ IoU=0.80      | area=   all | maxDets=100 ] = 0.350
 Average Precision  (AP) @[ IoU=0.85      | area=   all | maxDets=100 ] = 0.269
 Average Precision  (AP) @[ IoU=0.90      | area=   all | maxDets=100 ] = 0.154
 Average Precision  (AP) @[ IoU=0.95      | area=   all | maxDets=100 ] = 0.032

Inference with Var Voting:

python2 tools/test_net.py \
    --cfg configs/e2e_faster_rcnn_R-50-FPN_2x.yaml \
    STD_NMS True

You will get:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.392
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.576
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.425
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.212
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.417
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.324
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.528
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.564
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.594
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.736
 Average Precision  (AP) @[ IoU=0.60      | area=   all | maxDets=100 ] = 0.536
 Average Precision  (AP) @[ IoU=0.70      | area=   all | maxDets=100 ] = 0.472
 Average Precision  (AP) @[ IoU=0.80      | area=   all | maxDets=100 ] = 0.363
 Average Precision  (AP) @[ IoU=0.85      | area=   all | maxDets=100 ] = 0.281
 Average Precision  (AP) @[ IoU=0.90      | area=   all | maxDets=100 ] = 0.165
 Average Precision  (AP) @[ IoU=0.95      | area=   all | maxDets=100 ] = 0.037

Training

python2 tools/train_net.py \
    --cfg configs/e2e_faster_rcnn_R-50-FPN_2x.yaml

FAQ

Please create a new issue.


Detectron

Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including Mask R-CNN. It is written in Python and powered by the Caffe2 deep learning framework.

At FAIR, Detectron has enabled numerous research projects, including: Feature Pyramid Networks for Object Detection, Mask R-CNN, Detecting and Recognizing Human-Object Interactions, Focal Loss for Dense Object Detection, Non-local Neural Networks, Learning to Segment Every Thing, Data Distillation: Towards Omni-Supervised Learning, DensePose: Dense Human Pose Estimation In The Wild, and Group Normalization.

Example Mask R-CNN output.

Introduction

The goal of Detectron is to provide a high-quality, high-performance codebase for object detection research. It is designed to be flexible in order to support rapid implementation and evaluation of novel research. Detectron includes implementations of the following object detection algorithms:

using the following backbone network architectures:

Additional backbone architectures may be easily implemented. For more details about these models, please see References below.

Update

License

Detectron is released under the Apache 2.0 license. See the NOTICE file for additional details.

Citing Detectron

If you use Detectron in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{Detectron2018,
  author =       {Ross Girshick and Ilija Radosavovic and Georgia Gkioxari and
                  Piotr Doll\'{a}r and Kaiming He},
  title =        {Detectron},
  howpublished = {\url{https://github.com/facebookresearch/detectron}},
  year =         {2018}
}

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Detectron Model Zoo.

Installation

Please find installation instructions for Caffe2 and Detectron in INSTALL.md.

Quick Start: Using Detectron

After installation, please see GETTING_STARTED.md for brief tutorials covering inference and training with Detectron.

Getting Help

To start, please check the troubleshooting section of our installation instructions as well as our FAQ. If you couldn't find help there, try searching our GitHub issues. We intend the issues page to be a forum in which the community collectively troubleshoots problems.

If bugs are found, we appreciate pull requests (including adding Q&A's to FAQ.md and improving our installation instructions and troubleshooting documents). Please see CONTRIBUTING.md for more information about contributing to Detectron.

References

kl-loss's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.