The robond-fcn-segmentation from ndilsou

Project: Follow Me

The objective of this project was to train a fully convolutional network to perform semantic segmentation and allow for target tracking on a quadcopter. Final average IoU obtained after 22 epochs of training is 0.475.

Structure of the project:

All the files and their content is described below:

writeup.md: Notebook containing the network architecture and used for training.
code/model_training.ipnb: Notebook containing the network architecture and used for training.
code/data_augmentation.ipnb: Contains code used to perform horizontal flipping on the images and double the dataset size.
data/weights: Weights of the trained networks.
code/keras_viz_dependencies.txt: Requirement for keras.utils.vis_utils.plot_model.
docs/misc/model_architecture.png: Picture of the network architecture.

Network Architecture

We will define blocks using the following nomenclature: BLOCK(kernelWxkernelH, Stride, depth) when relevant. Our network is composed of an assembly of 5 main building blocks:

encoder blocks : ENCODER(depth)=[ SEPARABLE_CONV2D(3x3, 1) -> RELU -> BATCHNORM ]
decoder blocks : DECODER(concat, depth)=[ BILINEAR_UPSAMPLE(2x2) -> CONCATENATE(concat) -> [ SEPARABLE_CONV2D(3x3, 1) -> RELU -> BATCHNORM ] * 2 ]
downsampling blocks : DOWNSAMPLE=[MAXPOOL(2x2, 2)] OR [ENCODER(2x2, 2)]
dropout blocks : DROP=[SPATIAL_DROPOUT2D]
1x1 convolutional block : 1x1CONV2D(depth)=[CONV2D(1x1, 1) -> RELU -> BATCHNORM]

The picture of the final architecture is available here. This architecture is inspired from SegNet.

The full architecture of the final model using the previous notation is as follow:

[INPUT] -> [ENCODER(32) -> DOWNSAMPLE -> DROP] -> [ENCODER(64) -> DOWNSAMPLE -> DROP] -> [ENCODER(128) -> DOWNSAMPLE -> DROP] -> [1x1CONV2D(256)] -> [DECODER(encoder128, 128) -> DROP] -> [DECODER(encoder64, 64) -> DROP] -> [DECODER(encoder32, 32) -> DROP] -> [OUTPUT]

Note that we used MAXPOOL for all the final model downsampling.

Total params: 142,814
Trainable params: 140,958
Non-trainable params: 1,856

Results:

Evaluation Set size:

	following	not visible	far away
number of sample	542	270	322

Average IoU:

aIoU for \ situation	following	not visible	far away
background	0.996124	0.989919	0.997138
people	0.423597	0.793352	0.51175
hero	0.92578	0	0.310087

Confusion Table:

confusion \ situation	following	not visible	far away
true positive	539	0	155
false positive	0	60	2
false negatve	0	0	146

Example predictions:

left: input image, middle: target mask, right: output mask.

ndilsou / robond-fcn-segmentation Goto Github PK

robond-fcn-segmentation's Introduction

Project: Follow Me

Structure of the project:

Network Architecture

Results:

robond-fcn-segmentation's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent