Code Monkey home page Code Monkey logo

p12.1-semantic-segmentation's Introduction

Semantic Segmentation

Project is about labeling the pixels of a road in images using a Fully Convolutional Network (FCN). Project is done in Python 3.6, DNN framework TensorFlow and monitoring/debuging done by Tensorboard and Visual Studio Code.

Results

Testing results using helper.py project testing function

You can see that there are some really good results and some not that good. There (of-course) are much more job to do to improve results but I would start with larger dataset.

test_result_0 test_result_1 test_result_2 test_result_3 test_result_4 test_result_5

Images and label masks feeded into NN, plotted in Tensorboard. In each window there are three images

  1. Augmented input image (in these examples you can spot brightness reduction, rotations, little blur)
  2. Label mask
  3. NN output

tensorboard_images

Training loss during one of the training sessions, plotted in Tensorboard

training_loss

Project rquirements

Make sure you are using Python 3.x

Install Python dependecies using requirements.txt. If you are using Tensorflow compiled from sources then remove tensorflow line from requirements.

pip install --upgrade -r requirements.txt

Or install Python dependecies manually:

Dataset

Download the Kitti Road dataset from here. Extract the dataset in the data folder. This will create the folder data_road with all the training a test images.

Running project

Run python main.py --help to see project options. Output should look like this:

usage: main.py [-h] [--image_shape IMAGE_SHAPE [IMAGE_SHAPE ...]]
               [--num_classes NUM_CLASSES] [--epochs EPOCHS]
               [--batch_size BATCH_SIZE] [--learning_rate LEARNING_RATE]
               [--data_dir DATA_DIR] [--runs_dir RUNS_DIR]
               [--test_name TEST_NAME] [--chk_path CHK_PATH]
               [--pb_path PB_PATH] [--mode MODE]

optional arguments:
  -h, --help            show this help message and exit
  --image_shape IMAGE_SHAPE [IMAGE_SHAPE ...]
                        Resized image shape which will be used as input for
                        neural net.
  --num_classes NUM_CLASSES
                        Number of classes.
  --epochs EPOCHS       Number of epochs.
  --batch_size BATCH_SIZE
                        Number of batches.
  --learning_rate LEARNING_RATE
                        Optimizer initial learning rate.
  --data_dir DATA_DIR   Data directory path.
  --runs_dir RUNS_DIR   Runs directory path.
  --test_name TEST_NAME
                        Test name, used when create log dir with summaries as
                        prefix
  --chk_path CHK_PATH   Re-save checkpoint path for optimization. If not set
                        then won't save anything.
  --pb_path PB_PATH     Path to optimized FCN model for inferece.
  --mode MODE           Run code in possible modes:
                        --mode train : Will train and save mode. Afterwards test and save results.
                        --mode inference_model : Will re-save checkpoint path for optimization. For this --chk_path must be provided
                        --mode inference_test : Will run inference model on test video. --pb_path must be provided
                        --mode project_test : Will only perform project unit tests.

Descriptions should be clear enough to start working with code however I recommend to first run main.py in test mode, it will only perform tests and if necesarry download some stuff:

python main.py --mode project_test

Training

Run

To train FCN using custom parameters run:

python main.py --mode train --test_name MyFirstTest --learning_rate 1e-6 --batch_size 10 --epochs 25 --num_classes 2 --image_shape 160 576 3

Note that --image_shape is not original image shape but shape you want the original to be resized before feeding it to NN.

Once training is completed FCN model will be saved in ./data/vgg_fcn/ directory and script will run helper.py gen_test_output testing and results will be saved in ./runs directory.

Monitoring

While training terminal will output loss and some other useful information be may want ot get more insights of what's going in. For this I've created Tensorflow summaries and they are updating while model is being trained.

Run tensorboard from any directory by providing full path to logdir (created by main.py):

tensorboard --logdir=/full/path/to/logdir/

Tensorboard then will tell you link which needs to be open in browser. That's it, you can now check how loss is changing over time or images that are feed into NN and more (just add more summaries).

Note that summaries will be flushed to logdir every 60 seconds and you may need to wait a little bit for loss and images to appear in tensorboard.

Optimization

Before optimization we must create pbtxt model description. Run:

python main.py --mode inference_model --chk_path path-to-chk

Once it is done we can optimize trained model for inference. Edit all paths in optimization script and run bash optimize_for_inference.sh. This script will create new models in protobuf (pb) format.

NB!: Currently .pbtxt file will be created very huge because we are using tf.saved_model.loader.load and all variables there are constants. Model.pbtxt contains model description as well as constants. Because of this I wasn't able to run optimization because of memory error (consumed all 64GB of RAM).

TODO: Restore model with TF code and all weigths as variables not constants.

Run Inference on Video

TODO: Run inference model on video and monitor time and visual results.

python main.py --mode inference_test --pb_path path-to-opmized-protobuf-model

Description

VGG16 in numbers VGG16 FCN8s
net architecture net architecture

Input image augmentation

In our case we have small dataset therefore we need deal with overfitting. One effective way to do that is augment images before NN. For example flip image vertically and for NN it would be completely new input, doing this alone we increase dataset size by factor of 2. You can check all augmentation functions and their descriptions in augmentation.py and how it's being used in helper.py.

  • random_brightness: randomly will either add or subtract pixel values in range -50 .. 40, applied to batch
  • random_noise: 50% chance that random Gausian noise will be applied to batch
  • random_blur: randomly blur single image with cv2.GaussianBlur() in range 0 .. 5
  • random_flip: 50% chance that single image and corresponding mask will be flipped vertically
  • random_shifts: randomly shifts single image and corresponding mask up or down and to left or right. Horizontal shifts -20 .. 20 px, Vertical shifts -35 .. 35 px
  • random_rotations: randomly rotates single image and corresponding mask in range -6 .. 6 degrees

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.