Code Monkey home page Code Monkey logo

robond-fcn-segmentation's Introduction

Project: Follow Me


The objective of this project was to train a fully convolutional network to perform semantic segmentation and allow for target tracking on a quadcopter. Final average IoU obtained after 22 epochs of training is 0.475.

Structure of the project:

All the files and their content is described below:


Network Architecture

We will define blocks using the following nomenclature: BLOCK(kernelWxkernelH, Stride, depth) when relevant. Our network is composed of an assembly of 5 main building blocks:

  • encoder blocks : ENCODER(depth)=[ SEPARABLE_CONV2D(3x3, 1) -> RELU -> BATCHNORM ]

  • decoder blocks : DECODER(concat, depth)=[ BILINEAR_UPSAMPLE(2x2) -> CONCATENATE(concat) -> [ SEPARABLE_CONV2D(3x3, 1) -> RELU -> BATCHNORM ] * 2 ]

  • downsampling blocks : DOWNSAMPLE=[MAXPOOL(2x2, 2)] OR [ENCODER(2x2, 2)]

  • dropout blocks : DROP=[SPATIAL_DROPOUT2D]

  • 1x1 convolutional block : 1x1CONV2D(depth)=[CONV2D(1x1, 1) -> RELU -> BATCHNORM]

The picture of the final architecture is available here. This architecture is inspired from SegNet.

The full architecture of the final model using the previous notation is as follow:

[INPUT] -> [ENCODER(32) -> DOWNSAMPLE -> DROP] -> [ENCODER(64) -> DOWNSAMPLE -> DROP] -> [ENCODER(128) -> DOWNSAMPLE -> DROP] -> [1x1CONV2D(256)] -> [DECODER(encoder128, 128) -> DROP] -> [DECODER(encoder64, 64) -> DROP] -> [DECODER(encoder32, 32) -> DROP] -> [OUTPUT]

Note that we used MAXPOOL for all the final model downsampling.

Total params: 142,814
Trainable params: 140,958
Non-trainable params: 1,856
Results:

Evaluation Set size:

following not visible far away
number of sample 542 270 322

Average IoU:

aIoU for \ situation following not visible far away
background 0.996124 0.989919 0.997138
people 0.423597 0.793352 0.51175
hero 0.92578 0 0.310087

Confusion Table:

confusion \
situation
following not visible far away
true positive 539 0 155
false positive 0 60 2
false negatve 0 0 146

Example predictions:

left: input image, middle: target mask, right: output mask.

robond-fcn-segmentation's People

Contributors

bkinman avatar kylesf avatar danzelmo avatar ndilsou avatar danainschool avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.