phalanx-hk / eccv2020_paperlist Goto Github PK

View Code? Open in Web Editor NEW

118.0 118.0 11.0 1007 KB

My summary of papers to read

eccv2020_paperlist's Introduction

Hi there 👋

My Social Media 💬

Kaggle Profile: @phalanx
Twitter: @ZFPhalanx

eccv2020_paperlist's People

Contributors

Stargazers

Watchers

Forkers

flyingbird93 myknowntime asmiftekhar satish1901 dun933 tjufan guangweigao 321-seu-in-nanjing liang-zx zhangkai0121 jawaechan

eccv2020_paperlist's Issues

[Done]TIDE: A General Toolbox for Identifying Object Detection Errors

Overview

TIDE is a toolbox for analyzing the source of error in object detection and instance segmentation. Prediction errors are segmented into six types(like below image). TIDE is applicable across datasets and can be applied directly to output prediction files.

[Inprogress] MotionSqueeze: Neural Motion Feature Learning for Video Understanding

Overview

Method

Result

[Done] SOLO: Segmenting Objects by Locations

Overview

SOLO is single shot instance segmentation approach.
it reformulate two-stage instance segmentation(i.e., mask rcnn) as two sub tasks, predicting semantic category in category branch and masks of instances in mask branch.
First, input image divided into SxS grid. Category branch predict semantic class probability for each grid and mask branch predict instance mask corresponding each grid. Finally, the category prediction and the corresponding mask are naturally associated by their reference grid cell.

Method

Input image is divided into SxS grids. If the center of an object fall into a grid cell, that grid cell is responsible for predicting semantic category and masks of instances.

Sementic category

We predict semantic class probabilities. output shape is CxSxS, C is number of category. Each cell of the SxS grid must belong to one individual instance. During inference, C-dimentional output indicates the class probability for each object instance.

label assignment and loss function
Given the ground truth mask, mask center(cx, cy) is obtained by the same center sampling method as FCOS. Center region is defined (cx, cy, εw, εh), where w/h are mask width and height and ε is scale factor(0.2 in journal).
Loss function is conventional focal loss.

Instance mask

We predict instance mask corresponding each grid, so there are S^2 predicted mask in total. We encode these masks at the third dimension of 3D output tensor. Output shape is HxWxS^2 where kth channel is responsible to segment instance at grid.
Finally, one-to-one correspondence is established between the semantic category and class-agnostic mask.
Instance mask branch need to correspond position information in image since segmentation masks are conditioned on the grid cells and must be separated by different feature channels. So we use CoordConv for corresponding position information.

label assignment and loss function

Inference

We use a confidence threshold of 0.1 to filter out category score with low confidence. Then we select the top 500 scoring masks and feed them into the NMS. Mask threshold is 0.5.
Also, we calculate maskness for each predicted mask, which represents the quality and confidence of mask prediction maskness. Maskness is average of predicted soft mask value. The classification score for each prdiction is multiplied by the maskness as the final confidence score.

Result

compare mask AP with SOTA method

use coco test-dev dataset
SOLO outperforms all previous one-stage methods

impact of grid number and FPN

We compare the impact of grid number on the performance with single output feature map.
Feature is generated by merging C3, C4, C5.
S = 12 can acieve 27.2 AP -> single-scale SOLO can be applicable to some scenarios where object scales don't vary much.
Also, experiment multi-level prediction using FPN. From P2 to P6, the corresponding grid numbers are [40, 36, 24, 16, 12] respectively. It achieve 35.8 AP.

CoordConv

Adding single CoordConv improve AP(+3.6). Two or more CoordConvs don't bring noticeable improvement.

Loss function

compare different mask loss functions.
Focal Loss is better than BCE because many pixels of intace mask are in background.
Dice Loss is better than Focal Loss because it views the pixels as a whole object and could establish the right balance between foreground and background pixels automatically.
BCE and Focal Loss may be better than Dice Loss by hyper-parameter tuning, but Dice Loss is stable and more likely to attain good results without heuristics.

Alignment in the category branch

In the category branch, we must must match the convolutional features with spatial size H×W to S×S.

So we compare three alignment methods, directly bilinear interpolation, avgpool, region grid interpolation.
From our observation, there is no noticeable performance gap between these variants.

SOLO-512

use shorter image size of 512 instead of 800.
Models run on single V100 GPU.

Error Analysis

We perform an error analysis by replacing the predicted masks with ground-truth values.
For each predicted binary mask, we compute IoUs with ground-truth masks, and replace it with the most overlapping ground-truth mask.
Replacing predicted masks with ground truth masks increase AP to 68.1%, so there are still ample room for improving the mask branch.

Decoupled SOLO

we set grid number S = 20, output is 400 channel maps. that prediction is somewhat redundant as in most cases the objects are located sparsely in the image. So we propose Decoupled head.
As below image, We replace output tensor with two output tensors. Each output tensor correspond two axes respectively. Thus, output space is decreased from HxWxS^2 to HxWx2S.
For an object located at grid location (i, j), the mask prediction of that object is defined as the element-wise multiplication of two channel maps.

Decoupled SOLO improve AP to 38.4, so it is efficient and equivalent variant in accuracy of SOLO.

[Done] Corner Proposal Network for Anchor-free, Two-stage Object Detection

Overview

This paper propose Corner Proposal Network(CPN) which is anchor-free, two-stage object detection framework.
At first stage, CPN predict pair of corner keypoint as object proposal. At second stage, it filter out false positive proposal and assign a class label for each survived proposal.
Anchor-free method is able to detect various scale objects and also, by separating detection pipeline into proposal extraction and classification, it can avoids being confused by a large number of false-positive proposal.

Method

Why anchor-free?

About anchor-based method, each anchor is associated with a specific position on the image and its size is fixed.
So even considering bbox regression can change anchor geometry, this mehod have difficulty in finding objects with a peculiar shape. On the other hand, anchor-free method doesn't assume the objects to come from anchors of relatively fixed geometry, so it have better flexibility of locating objects with arbitrary geometry, and thus a higher recall.
We compare anchor-based method and anchor-free method on MS-COCO val. As shown below image, anchor-free method have a higher average recall, especially when object size is peculiar.

Corner Proposal Network

Stage 1: Anchor-free Proposals with Corner Keypoints
We locate an object with a pair of keypoints located in its top-left and bottom-right corners. As with CenterNet, we use focal loss for locating the keypoitnt on heatmap and offset loss to learn offset by resolution reduction.
Finally, we enumerate pair of corner keypoints. we mean that two keypoints belong to same class and (x, y) coordinates of top-left keypoint are smaller than that of right-bottom keypoint.

Stage 2: Two-step Classification for Filtering Proposals
There are two steps, proposal binary classification and proposal multi-class classification.

proposal binary classification
Most keypoint proposals are false positive, so we filter out it. we adapt RoIAlign with a kernel size 7x7 to extract feature on box feature map, then 32x7x7 convolution is followed to obtain classification score, with 0 for negative sample and 1 for positive sample.
Loss function is defined as:
proposal multi-class classification
we assign class label for survived proposal. As with first step, we adapt RoIAlign with a kernel size 7x7 on category feature map, then 256x7x7 convolution is followed to obtain C dimentional vector.
Loss function is difined as:

Overall loss function is:

weight of each loss function is equal.

Result

evaluate CPN on MS-COCO test-dev dataset

[Done] SegFix: Model-Agnostic Boundary Refinement for Segmentation

Overview

We propose SegFix, a post-processing scheme to improve the boundary for the segmentation result. Predictions of boundary pixels are unreliable, so we replace it by more reliable interior pixels(just replace). SegFix consistently reduces the
boundary errors for segmentation. Also, it is model-agnostic method, so we can incorporate it into other models.

Method

training

We extract feature map by backbone and send it to binary branch and direction branch.

binary branch
binary branch predict a binary boundary map, with 1 for the boundaries and 0 for interior pixel, and use binary closs-entropy loss as the boundary loss.
direction branch
direction branch map predict a direction map with each element storing the direction pointing from the boundary pixel
to the interior pixel. We use categorical cross-entropy loss as direction loss.

testing

We mask direction map with binary boundary map, then apply the offset branch to generate offset map, with direction from the boundary pixel to the interior for the boundaries and 0 for interior pixel.
Finally, we replace a coarse label map by offset map.

Result

semantic segmentation

Boundary F-score on Cityscapes validation dataset

mean IOU on Cityscapes test dataset

instance segmentation

mask AP on Cityscapes test dataset

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.