Code Monkey home page Code Monkey logo

tf-faster-rcnn's Introduction

tf-faster-rcnn

A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen ([email protected]). This repository is based on the python Caffe implementation of faster RCNN available here.

Note: Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling. If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi-official code. For details about the faster RCNN architecture please refer to the paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Detection Performance

We only tested it on plain VGG16 architecture so far. Our best performance as of Feburary 2017 (single model on conv5_3, no multi-scale, no multi-stage bounding box regression, no skip-connection, no extra input):

  • Train on VOC 2007 trainval and test on VOC 2007 test, 71.2.
  • Train on COCO 2014 trainval-minival and test on minival (longer), 29.3.

Note that:

  • The above numbers are obtained with a different testing scheme without selecting region proposals using non-maximal suppression (TEST.MODE top), the default and original testing scheme (TEST.MODE nms) will result in slightly worse performance (see report, for COCO it drops 0.3 - 0.4 AP).
  • Since we keep the small proposals (< 16 pixels width/height), our performance is especially good for small objects.
  • For COCO, we find the performance improving with more iterations (350k/490k: 26.9, 600k/790k: 28.3, 900k/1190k: 29.3), and potentially better performance can be achieved with even more iterations. Check out here for the latest models.

COCO 2014 minival (900k/1190k):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.293
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.498
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.305
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.124
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.336
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.436
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.268
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.393
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.452
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.587

COCO 2015 test-dev (900k/1190k):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.296
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.501
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.311
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.324
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.423
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.399
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.408
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.450
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.592

COCO 2015 test-std (900k/1190k):

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.294
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.500
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.309
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.118
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.323
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.421
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.272
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.398
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.408
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.178
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.452
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.591

Additional Features

Additional features not mentioned in the report are added to make research life easier:

  • Support for train-and-validation. During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded everytime to test on validation. However I have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set.
  • Support for resuming training. I tried to store as much information as possible when snapshoting, with the purpose to resume training from the lateset snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for tensorflow will be reset (not sure how to save the random state of tensorflow now), so it will result in a difference. Note that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion/solution is welcome and much appreciated.
  • Support for visualization. The current implementation will summarize statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging.

Prerequisites

  • A basic Tensorflow installation. r0.12 is fully tested. r0.10+ should in general be fine. While it is not required, for experimenting the original RoI pooling (which requires modification of the C++ code in tensorflow), you can check out my tensorflow fork and look for tf.image.roi_pooling.
  • Python packages you might not have: cython, python-opencv, easydict (similar to py-faster-rcnn).
  • A Docker image containing all of the required dependencies can be found in Docker hub at mbuckler/tf-faster-rcnn-deps. The Docker file used to create this image can be found in the docker directory of this repo.

Installation

  1. Clone the repository
git clone https://github.com/endernewton/tf-faster-rcnn.git
  1. Update your -arch in setup script to match your GPU
cd tf-faster-rcnn/lib
vim setup.py
  1. Build the Cython modules
cd tf-faster-rcnn/lib
make clean
make
  1. Download pre-trained models and weights
# return to the repository root
cd ..
# model for both voc and coco using default training scheme
./data/scripts/fetch_faster_rcnn_models.sh
# model for coco using longer training scheme (600k/790k)
./data/scripts/fetch_coco_long_models.sh
# weights for imagenet pretrained model, extracted from released caffe model
./data/scripts/fetch_imagenet_weights.sh

Right now the imagenet weights are used to initialize layers for both training and testing to build the graph, despite that for testing it will later restore trained tensorflow models. This step can be removed in a simplified version.

  1. Install the Python COCO API. And create a symbolic link to it within tf-faster-rcnn/data

Setup data

Please follow the instructions of py-faster-rcnn here to setup VOC and COCO datasets. The steps involve downloading data and creating softlinks in the data folder. Since faster RCNN does not rely on pre-computed proposals, it is safe to ignore the steps that setup proposals.

If you find it useful, the data/cache folder created on my side is also shared here.

Testing

  1. Create a folder and a softlink to use the pretrained model
mkdir -p output/vgg16/
ln -s data/faster_rcnn_models/voc_2007_trainval output/vgg16/
ln -s data/faster_rcnn_models/coco_2014_train+coco_2014_valminusminival output/vgg16/
  1. Test
GPU_ID=0
./experiments/scripts/test_vgg16.sh $GPU_ID pascal_voc
./experiments/scripts/test_vgg16.sh $GPU_ID coco

It generally needs several GBs to test the pretrained model (4G on my side).

Training

  1. (Optional) If you have just tested the model, first remove the link to the pretrained model
rm -v output/vgg16/voc_2007_trainval
rm -v output/vgg16/coco_2014_train+coco_2014_valminusminival
  1. Train (and test, evaluation)
GPU_ID=0
./experiments/scripts/vgg16.sh $GPU_ID pascal_voc
./experiments/scripts/vgg16.sh $GPU_ID coco
  1. Visualization with Tensorboard
tensorboard --logdir=tensorboard/vgg16/voc_2007_trainval/ --port=7001 &
tensorboard --logdir=tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ --port=7002 &

By default, trained networks are saved under:

output/<network name>/<dataset name>/default/

Test outputs are saved under:

output/<network name>/<dataset name>/default/<network snapshot name>/

Tensorboard information for train and validation is saved under:

tensorboard/<network name>/<dataset name>/default/
tensorboard/<network name>/<dataset name>/default_val/

The default number of training iterations is kept the same to the original faster RCNN, however I find it is beneficial to train longer for COCO (see report). Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers. Solutions are welcome.

Citation

If you find this implementation or the analysis conducted in our report helpful, please consider citing:

@article{chen17implementation,
    Author = {Xinlei Chen and Abhinav Gupta},
    Title = {An Implementation of Faster RCNN with Study for Region Sampling},
    Journal = {arXiv preprint arXiv:1702.02138},
    Year = {2017}
}

For convenience, here is the faster RCNN citation:

@inproceedings{renNIPS15fasterrcnn,
    Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
    Title = {Faster {R-CNN}: Towards Real-Time Object Detection
             with Region Proposal Networks},
    Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
    Year = {2015}
}

tf-faster-rcnn's People

Contributors

endernewton avatar mbuckler avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.