Code Monkey home page Code Monkey logo

3d-lmnet's Introduction

3D-LMNet

This repository contains the source codes for the paper 3D-LMNet: Latent Embedding Matching For Accurate and Diverse 3D Point Cloud Reconstruction From a Single Image.
Accepted at British Machine Vision Conference (BMVC 2018)

Citing this work

If you find this work useful in your research, please consider citing:

@inproceedings{mandikal20183dlmnet,
 author = {Mandikal, Priyanka and Navaneet, K L and Agarwal, Mayank and Babu, R Venkatesh},
 booktitle = {Proceedings of the British Machine Vision Conference ({BMVC})},
 title = {{3D-LMNet}: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image},
 year = {2018}
}

Overview

3D-LMNet is a latent embedding matching approach for 3D point cloud reconstruction from a single image. To better incorporate the data prior and generate meaningful reconstructions, we first train a 3D point cloud auto-encoder and then learn a mapping from the 2D image to the corresponding learnt embedding. For a given image, there may exist multiple plausible 3D reconstructions depending on the object view. To tackle the issue of uncertainty in the reconstruction, we predict multiple reconstructions that are consistent with the input view, by learning a probablistic latent space using a view-specific ‘diversity loss’. We show that learning a good latent space of 3D objects is essential for the task of single-view 3D reconstruction.

Overview of 3D-LMNet

Dataset

ShapeNet

We train and validate our model on the ShapeNet dataset. We use the rendered images from the dataset provided by 3d-r2n2, which consists of 13 object categories. For generating the ground truth point clouds, we sample points on the corresponding object meshes from ShapeNet. We use the dataset split provided by r2n2 in all the experiments. Data download links are provided below:
Rendered Images (~12.3 GB): http://cvgl.stanford.edu/data2/ShapeNetRendering.tgz
ShapeNet pointclouds (~2.8 GB): https://drive.google.com/open?id=1cfoe521iTgcB_7-g_98GYAqO553W8Y0g
ShapeNet train/val split: https://drive.google.com/open?id=10FR-2Lbn55POB1y47MJ12euvobi6mgtc

Download each of the folders, extract them and move them into data/shapenet/.
The folder structure should now look like this:
--data/shapenet/
  --ShapeNetRendering/
  --ShapeNet_pointclouds/
  --splits/

Pix3D

We evaluate the generalization capability of our model by testing it on the real-world pix3d dataset. For the ground truth point clouds, we sample 1024 points on the provided meshes. Data download links are provided below:
Pix3D dataset (~20 GB): Follow the instructions in https://github.com/xingyuansun/pix3d
Pix3D pointclouds (~13 MB): https://drive.google.com/open?id=1RZakyBu9lPbG85SyconBn4sR8r2faInV

Download each of the folders, extract them and move them into data/pix3d/.
The folder structure should now look like this:
--data/pix3d/
  --img_cleaned_input/
  --img/
  --mask/
  --model/
  --pix3d_pointclouds/
  --pix3d.json

Usage

Install TensorFlow. We recommend version 1.3 so that the additional TensorFlow ops can be compiled. The code provided has been tested with Python 2.7, TensorFlow 1.3, and CUDA 8.0. The following steps need to be performed to run the codes given in this repository:

  1. Clone the repository:
git clone https://github.com/val-iisc/3d-lmnet.git
cd 3d-lmnet
  1. Tensorflow ops for losses (Chamfer and EMD) as well as for point cloud visualization need to be compiled. Run the makefile as given below. (Note that the the nvcc, cudalib, and tensorflow paths inside the makefile need to be updated to point to the locations on your machine):
make

Training

  • To train the point-cloud auto-encoder, run:
bash scripts/train_ae.sh

Note that the auto-encoder needs to be trained before training either of the latent matching setups.

  • To train the latent matching (lm) setup, run:
bash scripts/train_lm.sh
  • To train the probabilistic latent matching (plm) setup, run:
bash scripts/train_plm.sh

Trained Models

Create a folder called 'trained_models' inside the project folder:

mkdir trained_models

Evaluation

Follow the steps detailed above to download the dataset and pre-trained models.

ShapeNet

  • For computing the Chamfer and EMD metrics reported in the paper (all 13 categories), run:
bash scripts/metrics_shapenet_lm.sh

The computed metrics will be saved inside trained_models/lm/metrics_shapenet/

  • For the plm setup (chair category), run:
bash scripts/metrics_shapenet_plm.sh

The computed metrics will be saved inside trained_models/plm/metrics_shapenet/

Pix3D

  • For computing the Chamfer and EMD metrics reported in the paper (3 categories) for the real-world Pix3D dataset, run:
bash scripts/metrics_pix3d.sh

The computed metrics will be saved inside trained_models/lm/metrics_pix3d/

Demo

Follow the steps detailed above to download the dataset and pre-trained models.

ShapeNet

  • Run the following to visualize the results for latent matching (lm):
bash scripts/demo_shapenet_lm.sh

You can navigate to the next visualization by pressing 'q'. Close visualization using back arrow. You can visualize results for different categories by changing the value of the category flag.

  • Run the following to visualize the results for probabilistic latent matching (plm):
bash scripts/demo_shapenet_plm.sh

Pix3D

  • Run the following to visualize the results on the real-world Pix3D dataset:
bash scripts/demo_pix3d.sh

You can navigate to the next visualization by pressing 'q'. Close visualization using back arrow.

Sample Results

ShapeNet

Below are a few sample reconstructions from our trained model tested on ShapeNet. 3D-LMNet_ShapeNet_results

Pix3D

Below are a few sample reconstructions from our trained model tested on real-world Pix3D dataset. Note that we mask out the background using the provided masks before passing the images through the network. 3D-LMNet_Pix3D_results

3d-lmnet's People

Contributors

klnavaneet avatar mayankgrwl97 avatar priyankamandikal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d-lmnet's Issues

Extract gt pointclouds

An inspiring work.
I want to know the specific process of extracting the gt point clouds from the provided meshes on Pix3D dataset, in particular, the coordinates transformation from original meshes to the processed point clouds. Whether you just normalized the coordinates of these sampled points, or utilizing the camera's extrinsics and intrinsics to transform these sampled points to image plane?
If possible, could you please offer the corresponding specific preprocess code?
Thanks a lot!

Request for point clouds generation method from meshes

Sorry to bother you again. You said that you will provide the ground truth point clouds before but now due to memory restrains you give ShapeNet meshes instead. I totally understand that but could you release your point clouds generation method from meshes and how to package your rendered images, point clouds together to get training started? I supposed that here isn't clear enough to use your dataset to start training.

run code

hello, when I run the bash scripts/train_plm.sh, there exit a question:
Traceback (most recent call last):
File "train_plm.py", line 168, in
z_mean, z_log_sigma_sq = image_encoder(img_inp, FLAGS)
File "/home/user05/.conda/envs/tf/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 442, in iter
"Tensor objects are only iterable when eager execution is "
TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.
could you tell me how to solve, thanks .

The frame the point cloud lies in

Hello, thank you for your excellent work. I have a question about the dataset preparation. Say if we have access to the depth image and camera parameters (intrinsic and extrinsic), we backproject depth image to 3d space using intrinsic matrix and transform it from camera frame to world frame using extrinsic matrix. Do sampled ground truth point cloud and point cloud recovered from depth lie in the same frame? Or the ground truth pc lies in their own body frame?

Evaluation of PSGN

Hi, I'd like figure out some details about how you evaluate PSGN.
The prediction (scaled, N x 3) from PSGN is tranformed by the following rotation matrix,
image
then icp is applied between pred and gt for finear alignment. The final score is computed between the transformed pred and gt.
Is my description of the evaluation process correct?

Why my shapenet's chamfer and emd so low?

Sorry to bother you . I downloaded your trained model,and use
bash scripts/metrics_shapenet_lm.sh
to evaluate the Chamfer and EMD,but the values are too low(eg:airplane:Chamfer: 0.001360 and EMD: 0.001943).I want to ask what might be willing to cause this?The Pix3D‘s result is normal.
Looking forward to your reply!

Confusions about the result of PSGN

Hi

A few papers about point clouds reconstruction gave their result on multi categories. DeformNet (here) gave their results and their experiment on PSGN. They got result of PSGN like 0.13 (CD) while you got 0.05 (Table 3 in your paper).

I've thought this before. PSGN originally used multi categories to train their network and DeformNet use PSGN's network directly to test certain single category. I assume that you use single category to train PSGN network and use it to test that category. So DeformNet's result of PSGN is much higher than yours.

But Dense 3D Object Reconstruction (here) shows a result of PSGN like 0.028 (CD of airplane) instead of your 0.037 (CD of airplane). I also implement PSGN several times with several methods and got similar result (0.028).

I suppose that your paper didn't mention that how did you get your result of PSGN or your normalization about your point cloud and so on.

My confusions are here and looking forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.