Code Monkey home page Code Monkey logo

semantic-segmentation-baselines's Introduction

Neural Semantic Segmentation

Implementations of neural network papers for semantic segmentation using Keras and TensorFlow.

Predictions from Tiramisu on CamVid video stream. Segmentation Demonstration

Installation

To install requirements for the project:

python -m pip install -r requirements.txt

Hardware Specification

Results were generated using a machine equipped with 128GB RAM, nVidia P100 GPU, and Intel Xeon CPU @ 2.10GHz. All results shown are from the testing dataset.

CamVid

  • 32 classes generalized to 11 classes using mapping in 11_class.txt
    • use 12 labels and ignore the Void class (i.e., 11 labels)
  • 960 x 720 scaled down by factor of 2 to 480 x 360

Models

SegNet

SegNet

SegNet Pooling Indexes

The following table describes training hyperparameters.

Crop Size Epochs Batch Size Patience Optimizer α 𝛃 α Decay
352 x 480 200 8 50 SGD 1e-3 0.9 0.95
  • batch normalization statistics computed per batch during training and using a rolling average computed over input batches for validation and testing
    • original paper uses a static statistics computed over the training data
  • encoder transfer learning from VGG16 trained on ImageNet
  • best model in terms of validation accuracy is kept as final model
  • median frequency balancing of class labels (Eigen et al. 2014)
    • weighted categorical cross-entropy loss function
  • local contrast normalization of inputs (Jarrett et al. 2009)
  • pooling indexes (Badrinarayanan et al. 2015)

Quantitative Results

The following table outlines the testing results from SegNet.

Metric Value
Accuracy 0.888625
Mean Per Class Accuracy 0.722078
Mean I/U 0.577455
Bicyclist 0.435263
Building 0.743735
Car 0.700505
Column/Pole 0.254089
Fence 0.385431
Pedestrian 0.393298
Road 0.895652
Sidewalk 0.747693
Sign 0.219355
Sky 0.888208
Vegetation 0.688779

Qualitative Results

Bayesian SegNet

Bayesian SegNet

Bayesian SegNet

The following table describes training hyperparameters.

Crop Size Epochs Batch Size Patience Optimizer α 𝛃 α Decay Dropout Samples
352 x 480 200 8 50 SGD 1e-3 0.9 0.95 50% 40
  • batch normalization statistics computed per batch during training and using a rolling average computed over input batches for validation and testing
    • original paper uses a static statistics computed over the training data
  • encoder transfer learning from VGG16 trained on ImageNet
    • note that VGG16 does not have any dropout by default; transfer from a Bayesian VGG16 model could improve results
  • best model in terms of validation accuracy is kept as final model
  • median frequency balancing of class labels (Eigen et al. 2014)
    • weighted categorical cross-entropy loss function
  • local contrast normalization of inputs (Jarrett et al. 2009)
  • pooling indexes (Badrinarayanan et al. 2015)

Quantitative Results

The following table outlines the testing results from Bayesian SegNet.

Metric Value
Accuracy 0.863547
Mean Per Class Accuracy 0.769486
Mean I/U 0.547227
Bicyclist 0.407042
Building 0.68995
Car 0.678854
Column/Pole 0.206012
Fence 0.376584
Pedestrian 0.305958
Road 0.88796
Sidewalk 0.727901
Sign 0.155895
Sky 0.888182
Vegetation 0.69516

Qualitative Results

The One Hundred Layers Tiramisu

The One Hundred Layers Tiramisu

103 Layers Tiramisu Layers

The following table describes training hyperparameters.

Crop Size Epochs Batch Size Patience Optimizer α α Decay Dropout
224 x 224 200 3 100 RMSprop 1e-3 0.995 20%
352 x 480 200 1 50 RMSprop 1e-4 1.000 20%
  • random horizontal flips of images during training
    • the paper says vertical, but their implementation clearly shows horizontal flips (likely a typo). Horizontal make more sense than vertical anyway and produces empirically better test results
  • batch normalization statistics computed per batch during training, validation, and testing
  • skip connections between encoder and decoder (Jégou et al. 2016)

Quantitative Results

The following table outlines the testing results from 103 Layers Tiramisu.

Metric Value
Accuracy 0.908092
Mean Per Class Accuracy 0.716523
Mean I/U 0.585788
Bicyclist 0.34839
Building 0.775576
Car 0.689861
Column/Pole 0.312897
Fence 0.261254
Pedestrian 0.4299
Road 0.918804
Sidewalk 0.802591
Sign 0.253895
Sky 0.91806
Vegetation 0.732444

Qualitative Results

Bayesian The One Hundred Layers Tiramisu

Bayesian Tiramisu

Aleatoric Uncertainty

Aleatoric Tiramisu

The following table describes training hyperparameters.

Crop Size Epochs Batch Size Patience Optimizer α α Decay Dropout
352 x 480 100 1 10 RMSprop 1e-4 1.000 20%
  • network split to predict targets and loss attenuation
    • custom loss function to train the second head of the network (Kendall et al. 2017)
    • our loss function samples through the Softmax function like their paper says (but contrary to the mathematics they present?). without applying the Softmax function, the loss is unstable and goes negative
  • pre-trained with fine weights from original Tiramisu
  • pre-trained network frozen while head to predict sigma is trained

Quantitative Results

The quantitative results are the same as the standard Tiramisu model.

Qualitative Results

Epistemic Uncertainty

  • pre-trained with fine weights from original Tiramisu
  • 50 samples for Monte Carlo Dropout sampling at test time

Quantitative Results

The following table outlines the testing results from Epistemic Tiramisu.

Metric Value
Accuracy 0.881144
Mean Per Class Accuracy 0.59509
Mean I/U 0.506473
Bicyclist 0.280771
Building 0.734256
Car 0.587708
Column/Pole 0.124245
Fence 0.164669
Pedestrian 0.322883
Road 0.886696
Sidewalk 0.724571
Sign 0.165528
Sky 0.88297
Vegetation 0.696909

Qualitative Results

Wall Clock Inference Time Metrics

Wall Clock Inference Time Metrics

The following box plot describes the mean and standard deviation in wall clock time execution of different segmentation models performing inference on images of size 352 x 480 pixels.

Wall Clock Inference Times (Deterministic Inference)

The following box plot describes the mean and standard deviation in wall clock time execution of different Bayesian segmentation models performing inference on images of size 352 x 480 pixels. Note that in this case, inference is probabilistic due to the test time dropout and Monte Carlo simulation over 50 network samples.

Wall Clock Inference Times (Monte Carlo Dropout)

References

Badrinarayanan V, Kendall A, Cipolla R (2015) SegNet: A Deep Convolutional Encoder-Decoder Architec- ture for Image Segmentation. ArXiv e-prints.

Eigen D, Fergus R (2014) Predicting Depth, Surface Normals and Semantic Labels with a Common Multi- Scale Convolutional Architecture. ArXiv e-prints.

Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? 2009 IEEE 12th International Conference on Computer Vision, 2146–2153.

Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2016) The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. ArXiv e-prints.

Kendall A, Badrinarayanan V, Cipolla R (2015) Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding. ArXiv e-prints.

Kendall A, Gal Y (2017) What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? ArXiv e-prints.

semantic-segmentation-baselines's People

Contributors

kautenja avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.