Code Monkey home page Code Monkey logo

mv3d_tf's Introduction

MV3D_TF(In progress)

This is an experimental Tensorflow implementation of MV3D - a ConvNet for object detection with Lidar and Mono-camera.

For details about MV3D please refer to the paper Multi-View 3D Object Detection Network for Autonomous Driving by Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia.

Requirements: software

  1. Requirements for Tensorflow 1.0 (see: Tensorflow)

  2. Python packages you might not have: cython, python-opencv, easydict

Requirements: hardware

  1. For training the end-to-end version of Faster R-CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN)

Installation

  1. Clone the Faster R-CNN repository
  # Make sure to clone with --recursive
  git clone --recursive https://github.com/RyannnG/MV3D_TF.git
  1. Build the Cython modules

     cd $MV3D/lib
     make
  2. Downloads KITTI object datasets.

 % Specify KITTI data path so that the structure is like

 % {kitti_dir}/object/training/image_2
 %                            /image_3
 %                            /calib
 %                            /lidar_bv
 %							 /velodyne
       

 % {kitti_dir}/object/testing/image_2
 %                           /image_3
 %                           /calib
 %                           /lidar_bv
 %							/velodyne
  1. Make Lidar Bird View data

    # edit the kitti_path in tools/read_lidar.py
    # then start make data
    python tools/read_lidar.py
  2. Create symlinks for the KITTI dataset

   cd $MV3D/data/KITTI
   ln -s {kitti_dir}/object object
  1. Download pre-trained ImageNet models

    Download the pre-trained ImageNet models [Google Drive] [Dropbox]

    mv VGG_imagenet.npy $MV3D/data/pretrain_model/VGG_imagenet.npy
  1. Run script to train model
 cd $MV3D
 ./experiments/scripts/mv3d.sh $DEVICE $DEVICE_ID ${.npy/ckpt.meta} kitti_train

DEVICE is either cpu/gpu

Network Structure

Key idea: Use Lidar bird view to generate anchor boxes, then project those boxes on image to do classification.

structure

Examples

Image and corresponding Lidar map

Note:

In image:

  • Boxes without regression

In Lidar:

  • white box: without regression (correspond with image)
  • purple box: with regression

figure_20

figure_20

figure_20

figure_20

figure_20

figure_20

figure_20

figure_20

Existing Errors

Mostly due to regression error

figure_20

(error in box 5,6,9)

figure_20

figure_20

(error in 8, 9, 10)

figure_20

References

Lidar Birds Eye Views

part.2: Didi Udacity Challenge 2017 — Car and pedestrian Detection using Lidar and RGB

Faster_RCNN_TF

Faster R-CNN caffe version

TFFRCNN

mv3d_tf's People

Contributors

hosang avatar icapalija avatar philokey avatar ryannng avatar smallcorgi avatar zacwellmer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.