Code Monkey home page Code Monkey logo

eccv2020_paperlist's Introduction

eccv2020_paperlist's People

Contributors

phalanx-hk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eccv2020_paperlist's Issues

[Done] SOLO: Segmenting Objects by Locations

Overview

SOLO is single shot instance segmentation approach.
it reformulate two-stage instance segmentation(i.e., mask rcnn) as two sub tasks, predicting semantic category in category branch and masks of instances in mask branch.
First, input image divided into SxS grid. Category branch predict semantic class probability for each grid and mask branch predict instance mask corresponding each grid. Finally, the category prediction and the corresponding mask are naturally associated by their reference grid cell.
image

Method

Input image is divided into SxS grids. If the center of an object fall into a grid cell, that grid cell is responsible for predicting semantic category and masks of instances.

Sementic category

We predict semantic class probabilities. output shape is CxSxS, C is number of category. Each cell of the SxS grid must belong to one individual instance. During inference, C-dimentional output indicates the class probability for each object instance.

  • label assignment and loss function
    Given the ground truth mask, mask center(cx, cy) is obtained by the same center sampling method as FCOS. Center region is defined (cx, cy, εw, εh), where w/h are mask width and height and ε is scale factor(0.2 in journal).
    Loss function is conventional focal loss.

Instance mask

We predict instance mask corresponding each grid, so there are S^2 predicted mask in total. We encode these masks at the third dimension of 3D output tensor. Output shape is HxWxS^2 where kth channel is responsible to segment instance at grid.
Finally, one-to-one correspondence is established between the semantic category and class-agnostic mask.
Instance mask branch need to correspond position information in image since segmentation masks are conditioned on the grid cells and must be separated by different feature channels. So we use CoordConv for corresponding position information.

  • label assignment and loss function

Inference

We use a confidence threshold of 0.1 to filter out category score with low confidence. Then we select the top 500 scoring masks and feed them into the NMS. Mask threshold is 0.5.
Also, we calculate maskness for each predicted mask, which represents the quality and confidence of mask prediction maskness. Maskness is average of predicted soft mask value. The classification score for each prdiction is multiplied by the maskness as the final confidence score.

Result

compare mask AP with SOTA method

use coco test-dev dataset
SOLO outperforms all previous one-stage methods
image

impact of grid number and FPN

We compare the impact of grid number on the performance with single output feature map.
Feature is generated by merging C3, C4, C5.
S = 12 can acieve 27.2 AP -> single-scale SOLO can be applicable to some scenarios where object scales don't vary much.
Also, experiment multi-level prediction using FPN. From P2 to P6, the corresponding grid numbers are [40, 36, 24, 16, 12] respectively. It achieve 35.8 AP.
image

CoordConv

Adding single CoordConv improve AP(+3.6). Two or more CoordConvs don't bring noticeable improvement.
image

Loss function

compare different mask loss functions.
Focal Loss is better than BCE because many pixels of intace mask are in background.
Dice Loss is better than Focal Loss because it views the pixels as a whole object and could establish the right balance between foreground and background pixels automatically.
BCE and Focal Loss may be better than Dice Loss by hyper-parameter tuning, but Dice Loss is stable and more likely to attain good results without heuristics.
image

Alignment in the category branch

In the category branch, we must must match the convolutional features with spatial size H×W to S×S.
image
So we compare three alignment methods, directly bilinear interpolation, avgpool, region grid interpolation.
From our observation, there is no noticeable performance gap between these variants.

SOLO-512

use shorter image size of 512 instead of 800.
Models run on single V100 GPU.
image

Error Analysis

We perform an error analysis by replacing the predicted masks with ground-truth values.
For each predicted binary mask, we compute IoUs with ground-truth masks, and replace it with the most overlapping ground-truth mask.
Replacing predicted masks with ground truth masks increase AP to 68.1%, so there are still ample room for improving the mask branch.
image

Decoupled SOLO

we set grid number S = 20, output is 400 channel maps. that prediction is somewhat redundant as in most cases the objects are located sparsely in the image. So we propose Decoupled head.
As below image, We replace output tensor with two output tensors. Each output tensor correspond two axes respectively. Thus, output space is decreased from HxWxS^2 to HxWx2S.
For an object located at grid location (i, j), the mask prediction of that object is defined as the element-wise multiplication of two channel maps.
image
Decoupled SOLO improve AP to 38.4, so it is efficient and equivalent variant in accuracy of SOLO.
image

[Done] Corner Proposal Network for Anchor-free, Two-stage Object Detection

Overview

This paper propose Corner Proposal Network(CPN) which is anchor-free, two-stage object detection framework.
At first stage, CPN predict pair of corner keypoint as object proposal. At second stage, it filter out false positive proposal and assign a class label for each survived proposal.
Anchor-free method is able to detect various scale objects and also, by separating detection pipeline into proposal extraction and classification, it can avoids being confused by a large number of false-positive proposal.
image

Method

Why anchor-free?

About anchor-based method, each anchor is associated with a specific position on the image and its size is fixed.
So even considering bbox regression can change anchor geometry, this mehod have difficulty in finding objects with a peculiar shape. On the other hand, anchor-free method doesn't assume the objects to come from anchors of relatively fixed geometry, so it have better flexibility of locating objects with arbitrary geometry, and thus a higher recall.
We compare anchor-based method and anchor-free method on MS-COCO val. As shown below image, anchor-free method have a higher average recall, especially when object size is peculiar.
image

Corner Proposal Network

Stage 1: Anchor-free Proposals with Corner Keypoints
We locate an object with a pair of keypoints located in its top-left and bottom-right corners. As with CenterNet, we use focal loss for locating the keypoitnt on heatmap and offset loss to learn offset by resolution reduction.
Finally, we enumerate pair of corner keypoints. we mean that two keypoints belong to same class and (x, y) coordinates of top-left keypoint are smaller than that of right-bottom keypoint.

Stage 2: Two-step Classification for Filtering Proposals
There are two steps, proposal binary classification and proposal multi-class classification.

  • proposal binary classification
    Most keypoint proposals are false positive, so we filter out it. we adapt RoIAlign with a kernel size 7x7 to extract feature on box feature map, then 32x7x7 convolution is followed to obtain classification score, with 0 for negative sample and 1 for positive sample.
    Loss function is defined as:
    image
  • proposal multi-class classification
    we assign class label for survived proposal. As with first step, we adapt RoIAlign with a kernel size 7x7 on category feature map, then 256x7x7 convolution is followed to obtain C dimentional vector.
    Loss function is difined as:
    image
    Overall loss function is:
    image
    weight of each loss function is equal.

Result

evaluate CPN on MS-COCO test-dev dataset
image

[Done] SegFix: Model-Agnostic Boundary Refinement for Segmentation

Overview

We propose SegFix, a post-processing scheme to improve the boundary for the segmentation result. Predictions of boundary pixels are unreliable, so we replace it by more reliable interior pixels(just replace). SegFix consistently reduces the
boundary errors for segmentation. Also, it is model-agnostic method, so we can incorporate it into other models.
image

Method

training

We extract feature map by backbone and send it to binary branch and direction branch.

  • binary branch
    binary branch predict a binary boundary map, with 1 for the boundaries and 0 for interior pixel, and use binary closs-entropy loss as the boundary loss.
  • direction branch
    direction branch map predict a direction map with each element storing the direction pointing from the boundary pixel
    to the interior pixel. We use categorical cross-entropy loss as direction loss.

testing

We mask direction map with binary boundary map, then apply the offset branch to generate offset map, with direction from the boundary pixel to the interior for the boundaries and 0 for interior pixel.
Finally, we replace a coarse label map by offset map.
image

Result

semantic segmentation

Boundary F-score on Cityscapes validation dataset
image

mean IOU on Cityscapes test dataset
image

instance segmentation

mask AP on Cityscapes test dataset
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.