The objective of this project was to train a fully convolutional network to perform semantic segmentation and allow for target tracking on a quadcopter. Final average IoU obtained after 22 epochs of training is 0.475.
All the files and their content is described below:
-
writeup.md: Notebook containing the network architecture and used for training.
-
code/model_training.ipnb: Notebook containing the network architecture and used for training.
-
code/data_augmentation.ipnb: Contains code used to perform horizontal flipping on the images and double the dataset size.
-
data/weights: Weights of the trained networks.
-
code/keras_viz_dependencies.txt: Requirement for keras.utils.vis_utils.plot_model.
-
docs/misc/model_architecture.png: Picture of the network architecture.
We will define blocks using the following nomenclature: BLOCK(kernelWxkernelH, Stride, depth) when relevant. Our network is composed of an assembly of 5 main building blocks:
-
encoder blocks : ENCODER(depth)=[ SEPARABLE_CONV2D(3x3, 1) -> RELU -> BATCHNORM ]
-
decoder blocks : DECODER(concat, depth)=[ BILINEAR_UPSAMPLE(2x2) -> CONCATENATE(concat) -> [ SEPARABLE_CONV2D(3x3, 1) -> RELU -> BATCHNORM ] * 2 ]
-
downsampling blocks : DOWNSAMPLE=[MAXPOOL(2x2, 2)] OR [ENCODER(2x2, 2)]
-
dropout blocks : DROP=[SPATIAL_DROPOUT2D]
-
1x1 convolutional block : 1x1CONV2D(depth)=[CONV2D(1x1, 1) -> RELU -> BATCHNORM]
The picture of the final architecture is available here. This architecture is inspired from SegNet.
The full architecture of the final model using the previous notation is as follow:
[INPUT] -> [ENCODER(32) -> DOWNSAMPLE -> DROP] -> [ENCODER(64) -> DOWNSAMPLE -> DROP] -> [ENCODER(128) -> DOWNSAMPLE -> DROP] -> [1x1CONV2D(256)] -> [DECODER(encoder128, 128) -> DROP] -> [DECODER(encoder64, 64) -> DROP] -> [DECODER(encoder32, 32) -> DROP] -> [OUTPUT]
Note that we used MAXPOOL for all the final model downsampling.
Total params: 142,814
Trainable params: 140,958
Non-trainable params: 1,856
Evaluation Set size:
following | not visible | far away | |
---|---|---|---|
number of sample | 542 | 270 | 322 |
Average IoU:
aIoU for \ situation | following | not visible | far away |
---|---|---|---|
background | 0.996124 | 0.989919 | 0.997138 |
people | 0.423597 | 0.793352 | 0.51175 |
hero | 0.92578 | 0 | 0.310087 |
Confusion Table:
confusion \ situation |
following | not visible | far away |
---|---|---|---|
true positive | 539 | 0 | 155 |
false positive | 0 | 60 | 2 |
false negatve | 0 | 0 | 146 |
Example predictions: