Code Monkey home page Code Monkey logo

s2r-depthnet's Introduction

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

This is the official PyTorch implementation of the paper S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation, CVPR 2021 (Oral), Xiaotian Chen, Yuwang Wang, Xuejin Chen, and Wenjun Zeng.

Citation

@inproceedings{Chen2021S2R-DepthNet,
             title = {S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation},
             author = {Chen, Xiaotian and Wang , Yuwang and Chen, Xuejin and Zeng, Wenjun},
	     conference={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
             year = {2021}   
}

Introduction

Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information. Our S2R-DepthNet (Synthetic to Real DepthNet) can be well generalized to unseen real-world data directly even though it is only trained on synthetic data. S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domaininvariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation. Without access of any real-world images, our method even outperforms the state-of-the-art unsupervised domain adaptation methods which use real-world images of the target domain for training. In addition, when using a small amount of labeled real-world data, we achieve the state-of-the-art performance under the semi-supervised setting.

The following figure shows the overview of S2RDepthNet.

figure

Examples of Depth-specific Structural Representation.

Usage

Dependencies

Datasets

The outdoor Synthetic Dataset is vKITTI and outdoor Real dataset is KITTI

TODO

  • Trianing Structure Encoder

Pretrained Models

We also provide our trained models for inference(outdoor and indoor scenes). Models Link

Train

As an example, use the following command to train S2RDepthNet on vKITTI.

Train Structure Decoder

python train.py --syn_dataset VKITTI \            
	        --syn_root "the path of vKITTI dataset" \
	        --syn_train_datafile datasets/vkitti/train.txt \
	        --batchSize 32 \
	        --loadSize 192 640 \          
	        --Shared_Struct_Encoder_path "the path of pretrained Struct encoder(.pth)" \
	        --trian_stage TrainStructDecoder                  

Train DSA Module and DP module

python train.py --syn_dataset VKITTI \
	        --syn_root "the path of vKITTI dataset" \
	        --syn_train_datafile datasets/vkitti/train.txt \
	        --batchSize 32 \
	        --loadSize 192 640 \
	        --Shared_Struct_Encoder_path "the path of pretrained Struct encoder(.pth)" \
		--Struct_Decoder_path "the path of pretrained Structure decoder(.pth)" \
	        --trian_stage TrainDSAandDPModule 

Evaluation

Use the following command to evaluate the trained S2RDepthNet on KITTI test data.

 python test.py --dataset KITTI --root "the path of kitti dataset" --test_datafile datasets/kitti/test.txt --loadSize 192 640 --Shared_Struct_Encoder_path "the path of pretrained Struct encoder(.pth)" --Struct_Decoder_path "the path of pretrained Structure decoder(.pth)" --DSAModle_path "the path of pretrained DSAModle(.pth)" --DepthNet_path "the path of pretrained DepthNet(.pth)" --out_dir "Path to save results"

Use the following command to evaluate the trained S2RDepthNet on NYUD-v2 test data.

 python test.py --dataset NYUD_V2 --root "the path of NYUD_V2 dataset" --test_datafile datasets/nyudv2/nyu2_test.csv --loadSize 192 256 --Shared_Struct_Encoder_path "the path of pretrained Struct encoder(.pth)" --Struct_Decoder_path "the path of pretrained Structure decoder(.pth)" --DSAModle_path "the path of pretrained DSAModle(.pth)" --DepthNet_path "the path of pretrained DepthNet(.pth)" --out_dir "Path to save results"

Acknowledgement

We borrowed code from GASDA and VisualizationOC.

s2r-depthnet's People

Contributors

xt-chen avatar microsoftopensource avatar microsoft-github-operations[bot] avatar

Stargazers

cheng zhang avatar

Watchers

James Cloos avatar cheng zhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.