Code Monkey home page Code Monkey logo

lyft_preception_challenge's Introduction

Lyft perception challenge

This is the writeup for the 7th submission from ymlai87416.

My final ranking is 57 / 155

Background

The goal of this challenge is pixel-wise identification of objects in camera images. In other words, your task is to identify exactly what is in each pixel of an image! More specifically, you'll be identifying cars and the drivable area of the road. The images below are a simulated camera image on the left and a label image on the right, where each different type of object in the image corresponds to a different color.

Scoring criteria

Repository structure

Here is the folder description:

  • data: Contains training images from CARLA.
  • deeplab_pascal: Contains training and testing deeplab.
  • fcn_vgg16: Contains code for training and testing fcn-vgg16.
  • submission: Contains submission.
  • video: For creating video.
  • workspace: Backup of online workspace.

Implementation

What have I done?

This is the 7th submission. In previous submission. I make use of FCN8-VGG16 [1], DeepLab v3+ [2] and successfully obtained the following best score.

Previous score Current score
Final score 79.1547 84.5664
Average F score 0.8587 0.8747
Car F score 0.758 0.8048
Road F score 0.9593 0.9447
FPS 3.289 7.092

Here is one of the result I got from a previous submission

In this submission, I make use of the DeepLab v3+ pascal model and use transfer learning to re-purpose it for this challenge

DeepLab v3+

DeepLab v3+ [2] is proposed by Google and this implementation uses Xception as the backbone. Xception [3] is also proposed by Google for predicting

The implementation and the model weighting is adopted from a Github repo bonlime/keras-deeplab-v3-plus [4].

It is written in Keras. I adopted the model, frozen the weighting in the 1st - 356th layers and trained the rest.

Bias and Variance

This submission is for proof-of-concept only.

Train, Validate and Test set

The model is trained using the 6300+ images and 1000 images are provided by Udacity at this link

Validation set contains 172 images

Test set contains 300 images

Epoch, Regularization

I train the model with 10 epochs. Dropout layer follows the default implementation of 0.1

Learning rate

Learning rate = 0.001

Transform the trained Keras model for inference optimized form

I use the script provided by Github repo: amir-abdi/keras_to_tensorflow [5] to convert my Keras model in h5 format to a frozen tensorflow model.

I then use optimize_for_inference to further improve the network inference speed.

Inference rate

I cut the sky and the bottom part of the image to reduce the image size. Resizing increase inaccuracy and decrease frame rate so I drop it. I also do some probing on the Telsa K80 card, and find that to archive 10fps, the best input size is 192x600, which is 115200 pixels.

The current configuration of 256x800, each frame will be processed at 0.115s. The resulting frame rate is around 7fps.

Inference path

Input image (600 * 800 * 3) => crop image (256 * 800 * 3) => model => predicted label (256 * 800 * 13) => pad image (600 * 800 * 13)

Result

Here is a snapshot of my result. Some of the pedestrian pavement is marked as road, but the car is much more clear than that of my implementation of FCN8-VGG16.

The trained model is in the release section.

Video result

Validation video: link

Test video: link

Judge test video: link

Reference

[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

[2] Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." arXiv preprint arXiv:1802.02611 (2018).

[3] Chollet, François. "Xception: Deep learning with depthwise separable convolutions." arXiv preprint (2016).

[4] https://github.com/bonlime/keras-deeplab-v3-plus

[5] https://github.com/amir-abdi/keras_to_tensorflow

lyft_preception_challenge's People

Watchers

James Cloos avatar Tom Lai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.