Code Monkey home page Code Monkey logo

insta-dm's Introduction

Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

[ Install | Datasets | Training | Models | Evaluation | Demo | References | License ]

This is the official PyTorch implementation for the system proposed in the paper :

Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon

AAAI-21 [PDF] [Project]

Unified Visual Odometry : Our holistic visualization of depth and motion estimation from self-supervised monocular training.

If you find our work useful in your research, please consider citing our paper :

@inproceedings{lee2021learning,
  title={Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency},
  author={Lee, Seokju and Im, Sunghoon and Lin, Stephen and Kweon, In So},
  booktitle= {Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
  year={2021}
}

Install

Our code is tested with CUDA 10.2/11.0, Python 3.7.x (conda environment), and PyTorch 1.4.0/1.7.0.

At least 2 GPUs (each 12 GB) are required to train the models with batch_size=4 and maximum_number_of_instances_per_frame=3.

Create a conda environment with PyTorch library as :

conda create -n my_env python=3.7.4 pytorch=1.7.0 torchvision torchaudio cudatoolkit=11.0 -c pytorch
conda activate my_env

Install prerequisite packages listed in :

pip3 install -r requirements.txt

or install manually the following packages :

opencv-python
imageio
matplotlib
scipy==1.1.0
scikit-image
argparse
tensorboardX
blessings
progressbar2
path
tqdm
pypng
open3d==0.8.0.0

Please install torch-scatter and torch-sparse following this link.

pip3 install torch-scatter torch-sparse -f https://pytorch-geometric.com/whl/torch-1.7.0+cu110.html

Datasets

We provide our KITTI-VIS and Cityscapes-VIS dataset (download link), which is composed of pre-processed images, auto-annotated instance segmentation, and optical flow.

  • Images are pre-processed with SC-SfMLearner.

  • Instance segmentation is pre-processed with PANet.

  • Optical flow is pre-processed with PWC-Net.

We associate them to operate video instance segmentation as implemented in datasets/sequence_folders.py.

Please allocate the dataset as the following file structure :

kitti_256 (or cityscapes_256)
    └ image
        └ $SCENE_DIR
    └ segmentation
        └ $SCENE_DIR
    └ flow_f
        └ $SCENE_DIR
    └ flow_b
        └ $SCENE_DIR
    ├ train.txt
    └ val.txt

Training and validation scenes can be randomly generated in train.txt and val.txt.

Training

You can train the models on KITTI-VIS by running :

sh scripts/train_resnet_256_kt.sh

You can train the models on Cityscapes-VIS by running :

sh scripts/train_resnet_256_cs.sh

Please indicate the location of the dataset with $TRAIN_SET.

The hyperparameters (batch size, learning rate, loss weight, etc.) are defined in each script file and default arguments in train.py. Please also check our main paper.

During training, checkpoints will be saved in checkpoints/.

You can also start a tensorboard session by running :

tensorboard --logdir=checkpoints/ --port 8080 --bind_all

and visualize the training progress by opening https://localhost:8080 on your browser.

For convenience, we provide two breakpoints (supported with pdb), commented as BREAKPOINT in train.py. Each breakpoint represents an important point in projecting the object.

BREAKPOINT-1 : Breakpoint after the 1st projection with camera motion. Visualize ego-warped images.
BREAKPOINT-2 : Breakpoint after the 2nd projection with each object motion. Visualize fully-warped images and motion fields.

You can visualize the intermediate outputs with the commented code. This will improve your visibility on debugging the code.

Models

We provide KITTI-VIS and Cityscapes-VIS pretrained models (download link).

The architectures are based on the ResNet18 encoder. Please see the details of them in models/.

Models trained under three different conditions are released :

KITTI : Trained on KITTI-VIS using ImageNet (ResNet18) pretrained model.
CS : Trained on Cityscapes-VIS using ImageNet (ResNet18) pretrained model. This model is only for the pretraining and demo.
CS+KITTI : Pretrained on Cityscapes-VIS, and finetuned on KITTI-VIS.

Evaluation

We evaluate our depth estimation following the KITTI Eigen split. For the evaluation, it is required to download the KITTI raw dataset provided on the official website. Tested scenes are listed in kitti_eval/test_files_eigen.txt.

You can evaluate the models by running :

sh scripts/run_eigen_test.sh

Please indicate the location of the raw dataset with $DATA_ROOT, and the models with $DISP_NET.

We demonstrate our results as follows :

Models Abs Rel Sq Rel RMSE RMSE log Acc 1 Acc 2 Acc 3
ResNet18, 832x256, ImageNet → KITTI 0.112 0.777 4.772 0.191 0.872 0.959 0.982
ResNet18, 832x256, Cityscapes → KITTI 0.109 0.740 4.547 0.184 0.883 0.962 0.983

For convenience, we also provide precomputed depth maps in this link.

Demo

We demonstrate Unified Visual Odometry, which shows the results of depth, ego-motion, and object motion holistically.

You can visualize them by running :

sh scripts/run_demo.sh

Please indicate the location of the image samples with $SCENE. We recommend to visualize Cityscapes scenes since it contains more dynamic objects than KITTI.

More results are demonstrated in this link.

References

License

The source code is released under the MIT license.

insta-dm's People

Contributors

seokjulee avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.