- Kaggle Profile: @phalanx
- Twitter: @ZFPhalanx
phalanx-hk / eccv2020_paperlist Goto Github PK
View Code? Open in Web Editor NEWMy summary of papers to read
My summary of papers to read
SOLO is single shot instance segmentation approach.
it reformulate two-stage instance segmentation(i.e., mask rcnn) as two sub tasks, predicting semantic category in category branch and masks of instances in mask branch.
First, input image divided into SxS grid. Category branch predict semantic class probability for each grid and mask branch predict instance mask corresponding each grid. Finally, the category prediction and the corresponding mask are naturally associated by their reference grid cell.
Input image is divided into SxS grids. If the center of an object fall into a grid cell, that grid cell is responsible for predicting semantic category and masks of instances.
We predict semantic class probabilities. output shape is CxSxS, C is number of category. Each cell of the SxS grid must belong to one individual instance. During inference, C-dimentional output indicates the class probability for each object instance.
We predict instance mask corresponding each grid, so there are S^2 predicted mask in total. We encode these masks at the third dimension of 3D output tensor. Output shape is HxWxS^2 where kth channel is responsible to segment instance at grid.
Finally, one-to-one correspondence is established between the semantic category and class-agnostic mask.
Instance mask branch need to correspond position information in image since segmentation masks are conditioned on the grid cells and must be separated by different feature channels. So we use CoordConv for corresponding position information.
We use a confidence threshold of 0.1 to filter out category score with low confidence. Then we select the top 500 scoring masks and feed them into the NMS. Mask threshold is 0.5.
Also, we calculate maskness for each predicted mask, which represents the quality and confidence of mask prediction maskness. Maskness is average of predicted soft mask value. The classification score for each prdiction is multiplied by the maskness as the final confidence score.
use coco test-dev dataset
SOLO outperforms all previous one-stage methods
We compare the impact of grid number on the performance with single output feature map.
Feature is generated by merging C3, C4, C5.
S = 12 can acieve 27.2 AP -> single-scale SOLO can be applicable to some scenarios where object scales don't vary much.
Also, experiment multi-level prediction using FPN. From P2 to P6, the corresponding grid numbers are [40, 36, 24, 16, 12] respectively. It achieve 35.8 AP.
Adding single CoordConv improve AP(+3.6). Two or more CoordConvs don't bring noticeable improvement.
compare different mask loss functions.
Focal Loss is better than BCE because many pixels of intace mask are in background.
Dice Loss is better than Focal Loss because it views the pixels as a whole object and could establish the right balance between foreground and background pixels automatically.
BCE and Focal Loss may be better than Dice Loss by hyper-parameter tuning, but Dice Loss is stable and more likely to attain good results without heuristics.
In the category branch, we must must match the convolutional features with spatial size H×W to S×S.
So we compare three alignment methods, directly bilinear interpolation, avgpool, region grid interpolation.
From our observation, there is no noticeable performance gap between these variants.
use shorter image size of 512 instead of 800.
Models run on single V100 GPU.
We perform an error analysis by replacing the predicted masks with ground-truth values.
For each predicted binary mask, we compute IoUs with ground-truth masks, and replace it with the most overlapping ground-truth mask.
Replacing predicted masks with ground truth masks increase AP to 68.1%, so there are still ample room for improving the mask branch.
we set grid number S = 20, output is 400 channel maps. that prediction is somewhat redundant as in most cases the objects are located sparsely in the image. So we propose Decoupled head.
As below image, We replace output tensor with two output tensors. Each output tensor correspond two axes respectively. Thus, output space is decreased from HxWxS^2 to HxWx2S.
For an object located at grid location (i, j), the mask prediction of that object is defined as the element-wise multiplication of two channel maps.
Decoupled SOLO improve AP to 38.4, so it is efficient and equivalent variant in accuracy of SOLO.
This paper propose Corner Proposal Network(CPN) which is anchor-free, two-stage object detection framework.
At first stage, CPN predict pair of corner keypoint as object proposal. At second stage, it filter out false positive proposal and assign a class label for each survived proposal.
Anchor-free method is able to detect various scale objects and also, by separating detection pipeline into proposal extraction and classification, it can avoids being confused by a large number of false-positive proposal.
About anchor-based method, each anchor is associated with a specific position on the image and its size is fixed.
So even considering bbox regression can change anchor geometry, this mehod have difficulty in finding objects with a peculiar shape. On the other hand, anchor-free method doesn't assume the objects to come from anchors of relatively fixed geometry, so it have better flexibility of locating objects with arbitrary geometry, and thus a higher recall.
We compare anchor-based method and anchor-free method on MS-COCO val. As shown below image, anchor-free method have a higher average recall, especially when object size is peculiar.
Stage 1: Anchor-free Proposals with Corner Keypoints
We locate an object with a pair of keypoints located in its top-left and bottom-right corners. As with CenterNet, we use focal loss for locating the keypoitnt on heatmap and offset loss to learn offset by resolution reduction.
Finally, we enumerate pair of corner keypoints. we mean that two keypoints belong to same class and (x, y) coordinates of top-left keypoint are smaller than that of right-bottom keypoint.
Stage 2: Two-step Classification for Filtering Proposals
There are two steps, proposal binary classification and proposal multi-class classification.
We propose SegFix, a post-processing scheme to improve the boundary for the segmentation result. Predictions of boundary pixels are unreliable, so we replace it by more reliable interior pixels(just replace). SegFix consistently reduces the
boundary errors for segmentation. Also, it is model-agnostic method, so we can incorporate it into other models.
We extract feature map by backbone and send it to binary branch and direction branch.
We mask direction map with binary boundary map, then apply the offset branch to generate offset map, with direction from the boundary pixel to the interior for the boundaries and 0 for interior pixel.
Finally, we replace a coarse label map by offset map.
Boundary F-score on Cityscapes validation dataset
mean IOU on Cityscapes test dataset
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.