Code Monkey home page Code Monkey logo

mask-rcnn's Introduction

Mask-RCNN

A collection of notes, code snips for Mask-RCNN Ado

Outline

Basics related concepts
A brief history

x x x x x x x x x x x

Basics concepts

Pytorch

We will you Pytorch for implementation of our Mask-RCNN, thus if you are newbie, the following tutorials might be helpfull for you to start.

  • This tutor discribes very well how to make a FCN network in Pytorch and apply for MNIST link
  • This is tutor to fine-tune with nearly all models in pytorch. This explained resnet and resnext. Finally, here is Github for all emplimented codes.

Selective search

Grouping pixels into regions based on the similarity. The metric for similarity can be color,texture, size, shape etc.. (e.g HOG). The grouping is done hiearchically, from pixles to small regions, then mering small regions to larger regions. This was explained very well in this slide

Bounding box regressor

Using NN or SVM with the input is the extracted feature to regress the bounding box(BB) around ROI to make the more precise (localize more precise around object). This blog explained very well what is BBox regressor and how to do the regression. e.g. Bounding box in a R-CNN [picture credit](Bastian Leibe,Ross Girshick)

ROI-pooling and ROI-align

in fast-RCNN, ROIs with different sizes were projected to the feature map (last ConvNet layer). To garantee that the input size for the next FC layer will be the same for all of the ROIs (with different sizes), a ROI-pooling was use. Briefly, it will devide the feature map (of size w x h) into a grid of fixed size (W x H). so for each image, a max-pooling operation with kernel size ([w/W] x [h/H], if stripe = 1) will be applied. This was explained well here This operation make mis-alignment due to the rounding of w/W or h/H. A direct mapping without rounding was used in ROI-align. For example, if we have a ROI with size (5,7) and we will use a grid in ROI-pooling of (2,2). We will devide this ROI into 4 sub ROIs with size: (2,3),(2,4);(3,3),(3,4). Then, max-pooling value in each sub-ROI. Now for ROI-align, we will not need to devide (and round) the ROI into this different size sub-ROI. We will interpolate value at 2.5 and 3.5, thus each sub-ROI will be the same size, and thus the final bounding box will not be mis-aligned.

A brief history

Snap comparision

R-CNN

R-CNN Work flow

  • Detect ROI using selective search
  • Extract features for each ROI using a CNN (can be pre-trained model,e.g VGG, ResNet ...)
  • Classify using SVM
  • Tighten/correct the bounding box (ROI) using a Box regressor

References

https://ardianumam.wordpress.com/2017/12/16/understanding-how-mask-rcnn-works-for-semactic-segmentation/
https://www.youtube.com/watch?v=cSO1nUj495Y&t=90s
Very good explaination (and beautiful blo)

mask-rcnn's People

Contributors

diy2learn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.