princeton-vl / deepv2d Goto Github PK

View Code? Open in Web Editor NEW

645.0 645.0 93.0 55.9 MB

License: BSD 3-Clause "New" or "Revised" License

Shell 0.33% Python 95.07% C++ 4.60%

deepv2d's People

Contributors

Stargazers

Watchers

Forkers

mtlong windzhougithub shiyongde wenmingmeng danping doctorwk007 zebrajack cs-util minygd jasonfresh eric-yyjau arslan-z itking666 ajsutrave flamehaze1115 owang ankitshah009 huynhlam dymolly olegjakushkin leeyqsense dongese duanzhimin14 fmannan zhaocedric zsustc franky-ciomp doantientai taozh2017 zumbalamambo njuhaozhang cock-puncher disorn-inc ootts azizalibasic akashsharma02 shamitlal weihaosky ck624 sconlyshootery sg47 wangpengcheng jhzhang2077 baolinv0 paulcbs gkghzel linhduongtuan 121yaseen unitycoder widemeadows gentmitch mfkiwl zhanghcfjea huyen-nguyen d-t-n satoshirobatofujimoto jeffin07 dhananjay1646 johannakarras daydreamer2023 sonalagrawal7 vartika1007 srikalyan chengyushi1 toothless-cat robert-junwang jason718 zero2er0 atlasgooo2 drzhoukarl shaischneider graffity-technologies lulu1315 frischzenger myalos desperadolxh belanced bruinxiong yijunwu zhigangjiang wuxinmei deltawen aleky-g sunstarchan patrikperssonmath patriciomorag nirmalsnair boing-86 pschroeppel zhaopufeng

deepv2d's Issues

Dynamic Frames For Inference

Hi Zachary,

I am trying to check the depth performance of deepV2D with various frames. If I change the config of KITTI to Frames:2, it turns out that the network parameter of the motion predictor is mismatched. Do we have to re-train the network under the setting of two frames here?

Best

code ambiguity

Hi, thank you for your great work!

https://github.com/princeton-vl/DeepV2D/blob/master/deepv2d/modules/motion.py#L318

It looks this line needs intrinsics_pred = or intrinsics = .
Do I misunderstand something?

NYU preprocessing

Hi, thanks for your excellent work. If I make the nyud tfrecord myself, should I preprocess the depth first(using the official matlab tool) to align the depth ?

Why img is flip before resize

DeepV2D/deepv2d/slam.py

Line 273 in eb362f2

images = images[:, :, crop_h0:-crop_h1, crop_w:-crop_w, ::-1]

Thank you for your great work and codes.
I am new to tensorflow, I want to know why here img is flip before resize?

Intrinsics file - what do numbers mean?

In all kitti demo sequences there is a file called "intrinsics.txt" with four numbers. What do they mean, why are they necessary and where do you get the values from?

If I followed the code correctly, they refer to fx, fy, cx, cy (see camera.py, line50) and you need them to reproject 2d points to 3d. This makes sense to me. But how do you get those values? In my understanding, Kitti raw provides the camera intrinsic matrix in the files "calib_cam_to_cam.txt" in lines starting with "K_0". It also provides the projection matrix directly in lines starting with "P_0". But the values specified there significantly differ from the values in the "intrinsics.txt" file. In particular, fx is approximately equal to fy in "calib_cam_to_cam.txt", whereas in the "intrinsics.txt" file, the first and second value differ by ~10% ?!

Problem running KITTI demo

After installing the requirements, entering "python demos/kitti_demo.py --cfg cfgs/kitti.yaml --sequence demo_videos/kitti_demos/032/" yields ...

Traceback (most recent call last):
  File "demos/kitti_demo.py", line 67, in <module>
    main(args)
  File "demos/kitti_demo.py", line 44, in main
    depths = net.forward(data_blob)
  File "lib/deepv2d.py", line 113, in forward
    output = self.sess.run(self.outputs, feed_dict=feed_dict)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[3,47,271] = [3, 47, 272] does not index into param shape [4,48,272,64]
	 [[node motion/GatherNd_1 (defined at lib/utils/bilinear_sampler.py:50) ]]

Caused by op 'motion/GatherNd_1', defined at:
  File "demos/kitti_demo.py", line 67, in <module>
    main(args)
  File "demos/kitti_demo.py", line 38, in main
    net = DeepV2D(INPUT_DIMS, cfg)
  File "lib/deepv2d.py", line 38, in __init__
    poses_pred = motion.forward(images[:, 1:], image_star, depth, intrinsics)
  File "lib/networks/motion.py", line 84, in forward
    G = self.flowse3(feat1, feat2, depth1, intrinsics/SC, G=G, reuse=i>0)
  File "lib/networks/motion.py", line 108, in flowse3
    featw = bilinear_sampler.bilinear_sampler(feat2, coords)
  File "lib/utils/bilinear_sampler.py", line 88, in bilinear_sampler
    output = bilinear_sampler_general(imgs, coords)
  File "lib/utils/bilinear_sampler.py", line 50, in bilinear_sampler_general
    img01 = tf.gather_nd(imgs, coords01)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3647, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[3,47,271] = [3, 47, 272] does not index into param shape [4,48,272,64]
	 [[node motion/GatherNd_1 (defined at lib/utils/bilinear_sampler.py:50) ]]

How to load 2-stage checkpoints for evaluation?

Hi @zachteed @heilaw @anewell @jiadeng ,

Thank you for your work! I have a question about the ckpt for demo and evaluation.

When we train the model we can get two checkpoints for stage_1 and stage_2, but I notice we only need to load one ckpt file for evaluation and demo. How can we get this final ckpt file and could you please explain more about the relationship between this final ckpt and two-stage ckpts got from training.

Thank you so much!

Simultaneous Tracking and Mapping?

Hi, thanks for the great work. I wonder if you can provide a demo code to perform tracking (camera pose estimation) and mapping (depth estimation) simultaneously.

scale in evaluation?

Hi, thanks for sharing the good work.
However, I'm curious about the scale here in evaluation.

From my understanding, deepv2d is supervised, and should require no scaling in depth or pose evaluation.
However, in your evaluation script, all depth and pose are rescaled, why do we need that?

Another problem is about the scaling factor when calculating the trans(cm)

DeepV2D/evaluation/eval_utils.py

Line 57 in a3fbef1

a = np.dot(t1, t2) / np.dot(t2, t2)

Shouldn't it be np.dot(t1,t1)/np.dot(t1*t2)?

About ScanNet

Thanks for sharing the wonderful work.

I have a question for the usage of the scenes in the ScanNet dataset.
While ScanNet itself provides train/val/test splits, it seems like this paper utilized specific scenes as below.

DeepV2D/data/scannet/scannet_test.txt

Line 1 in eb362f2

/local-scratch/scannet/scene0688_00/rgb/frame-000346.color.jpg

I want to double-check whether I correctly understand the author's intentions.

connecting error when downloading the pretrained model

The pretrained model on "https://www.dropbox.com/s/1cerfm260gqu9bt/models.zip" in the script "./data/download_models.sh" is not reachable. Is there another way to get the pretrained model, such as the google drive? Tks.

About heavy distortion of video.

Hi @heilaw @anewell @jiadeng @zachteed
Thanks for your work, I notice that if the video is uncalibrated with unknown focal, you offer the demos/demo_uncalibrated.py , which can estimates the focal length during inference. So I wonder do you estimate the distortion parameters k1, k2, p1, p2 as well? Since I want to run on uncalibrated video with heavy distortion.

Thanks.

Can't run demo with batch size 8

Hi I was trying to run the demo
python demos/demo_v2d.py --model=models/scannet.ckpt --sequence=data/demos/scannet_0
But got the following error

2020-02-27 14:07:27.062479: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
2020-02-27 14:07:27.062517: E tensorflow/stream_executor/cuda/cuda_blas.cc:2574] Internal: failed BLAS call, see log for details
Traceback (most recent call last):   
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
         [[{{node motion/PnP/einsum_1/MatMul}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape
_1)]]
         [[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", 
send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):   
  File "demos/demo_v2d.py", line 82, in <module>
    main(args)
  File "demos/demo_v2d.py", line 64, in main
    depths, poses = deepv2d(images, intrinsics, viz=True, iters=args.n_iters)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 462, in __call__
    self.update_poses(i)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 368, in update_poses
    self.poses, self.intrinsics, self.weights = self.sess.run(outputs, feed_dict=feed_dict)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
         [[node motion/PnP/einsum_1/MatMul (defined at /projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py:49)  = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job
:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape_1)]]
         [[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", 
send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'motion/PnP/einsum_1/MatMul', defined at:
  File "demos/demo_v2d.py", line 82, in <module>
    main(args)
  File "demos/demo_v2d.py", line 55, in main
    deepv2d = DeepV2D(cfg, args.model, use_fcrn=args.fcrn, is_calibrated=is_calibrated, mode=args.mode)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 68, in __init__
    self._build_motion_graph()
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 129, in _build_motion_graph
    images, depths, intrinsics, edge_inds, init=do_init)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/modules/motion.py", line 282, in forward
    Tij = Tij.keyframe_optim(target, weight, depths, intrinsics)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/geometry/transformation.py", line 364, in keyframe_optim
    J = einsum('...ij,...jk->...ik', jproj, jtran)
  File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py", line 49, in einsum
    out = tf.einsum(equation, *inputs)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/special_math_ops.py", line 257, in einsum
    axes_to_sum)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/special_math_ops.py", line 389, in _einsum_reduction
    product = math_ops.matmul(t0, t1)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2019, in matmul
    a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1245, in batch_mat_mul
    "BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
         [[node motion/PnP/einsum_1/MatMul (defined at /projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py:49)  = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape_1)]]
         [[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

My environment setup is python 3.6.7, tensorflow-gpu 1.12.0
Seems that the problem is the batch size is too big. I have success when I only use 4 images. Can you help?

LS-OPTIMIZATION LAYER back propagation

In paper I found a sentence
"In the backward pass, the gradients can be found by solving another linear system." in appendix under the title LS-OPTIMIZATION LAYER.
1.)Which is that linear system ?
2.)How did you get equation (16) in appendix?

can anyone please help me

Have you tried Tracking and Mapping like DeepTAM?

NYU Depth TFRecord

Hi,
Thanks for sharing this great work.
I'm wondering: where does the nyu_train.tfrecords file (https://github.com/princeton-vl/DeepV2D#nyuv2-1) come from?
It seems there are 13776 examples, each with 9 RGB images, 1 depth image and smaller things like intrinsics and poses.
It's about 138GB but NYU Depth V2 is more like 400GB, which surprises me (even though encoding is not the same). Maybe this file was built using NYU Depth V1, which is 90GB? Is this file the one used in the experiments reported in the paper?

Dose demo_slam on nyu support loop closure?

tensorflow-gpu version

FYR: Tested ok under tensorflow-gpu==1.14.0

Errors using tensorflow-gpu==1.12.0

module 'tensorflow' has no attribute 'custom_gradient'

Errors using tensorflow-gpu==1.13.1

failed to run optimizer arithmeticoptimizer, stage removestackstridedslicesameaxis node

Testing command

python demos/demo_slam.py --dataset=scannet --n_keyframes=3

Here's my conda enviroments.yml

name: py37-deepv2d
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _tflow_select=2.1.0=gpu
  - absl-py=0.7.1=py37_0
  - astor=0.7.1=py37_0
  - blas=1.0=mkl
  - c-ares=1.15.0=h7b6447c_1
  - ca-certificates=2019.5.15=0
  - certifi=2019.3.9=py37_0
  - cudatoolkit=10.0.130=0
  - cudnn=7.6.0=cuda10.0_0
  - cupti=10.0.130=0
  - gast=0.2.2=py37_0
  - grpcio=1.16.1=py37hf8bcb03_1
  - h5py=2.9.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - intel-openmp=2019.4=243
  - keras-applications=1.0.8=py_0
  - keras-preprocessing=1.1.0=py_1
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libprotobuf=3.8.0=hd408876_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - markdown=3.1.1=py37_0
  - mkl=2019.4=243
  - mkl_fft=1.0.12=py37ha843d7b_0
  - mkl_random=1.0.2=py37hd81dba3_0
  - mock=3.0.5=py37_0
  - ncurses=6.1=he6710b0_1
  - numpy=1.16.4=py37h7e9f1db_0
  - numpy-base=1.16.4=py37hde5b4d6_0
  - openssl=1.1.1c=h7b6447c_1
  - pip=19.1.1=py37_0
  - protobuf=3.8.0=py37he6710b0_0
  - python=3.7.3=h0371630_0
  - readline=7.0=h7b6447c_5
  - scipy=1.2.1=py37h7c811a0_0
  - setuptools=41.0.1=py37_0
  - six=1.12.0=py37_0
  - sqlite=3.28.0=h7b6447c_0
  - tensorboard=1.13.1=py37hf484d3e_0
  - tensorflow=1.13.1=gpu_py37hc158e3b_0
  - tensorflow-base=1.13.1=gpu_py37h8d69cac_0
  - tensorflow-estimator=1.13.0=py_0
  - tensorflow-gpu=1.13.1=h0d30ee6_0
  - termcolor=1.1.0=py37_1
  - tk=8.6.8=hbc83047_0
  - werkzeug=0.15.4=py_0
  - wheel=0.33.4=py37_0
  - xz=5.2.4=h14c3975_4
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - attrs==19.1.0
    - backcall==0.1.0
    - bleach==3.1.0
    - cycler==0.10.0
    - decorator==4.4.0
    - defusedxml==0.6.0
    - easydict==1.9
    - entrypoints==0.3
    - google-pasta==0.1.8
    - ipykernel==5.1.1
    - ipython==7.5.0
    - ipython-genutils==0.2.0
    - jedi==0.13.3
    - jinja2==2.10.1
    - jsonschema==3.0.1
    - jupyter-client==5.2.4
    - jupyter-core==4.4.0
    - jupyterlab==0.35.6
    - jupyterlab-server==0.2.0
    - kiwisolver==1.1.0
    - markupsafe==1.1.1
    - matplotlib==3.1.0
    - mistune==0.8.4
    - nbconvert==5.5.0
    - nbformat==4.4.0
    - notebook==5.7.8
    - opencv-python==3.4.5.20
    - pandas==0.24.2
    - pandocfilters==1.4.2
    - parso==0.4.0
    - pexpect==4.7.0
    - pickleshare==0.7.5
    - prometheus-client==0.7.0
    - prompt-toolkit==2.0.9
    - ptyprocess==0.6.0
    - pygments==2.4.2
    - pyparsing==2.4.0
    - pyrsistent==0.15.2
    - python-dateutil==2.8.0
    - pytz==2019.1
    - pyyaml==5.3
    - pyzmq==18.0.1
    - seaborn==0.9.0
    - send2trash==1.5.0
    - terminado==0.8.2
    - testpath==0.4.2
    - toposort==1.5
    - tornado==6.0.2
    - tqdm==4.43.0
    - traitlets==4.3.2
    - vtk==8.1.2
    - wcwidth==0.1.7
    - webencodings==0.5.1
    - wrapt==1.12.0
prefix: /home/yoyee/miniconda3/envs/py37-deepv2d

about jacobian

Thank you for your great work!
I want to know how following derivation is achieved:

Could you please refer me some material from which I can learn related knowledge?
It will be very helpful.

demo_slam on custom video?

Hi,
Thanks for the great work. I see that from the code in demo_slam.py it is taking the video sequence from the nyu/kitti/scannet dataset, is there a way to use demo_slam with my custom video of n indoor scene recorded using smart phone camera?r

why scale depth_pred for evaluation?

Hello. I notice you scale both depth and pose estimation for evaluation. It's reasonable to scale pose same as previous works but it's unfair to scale depth_pred too since the ground truth depth is used in the loss function. Yours is a supervised depth estimation method, why you also scale the estimated depth?

Why Depth prediciton on kitti dataset is a relative depth

Hi, I hold a question that since the kitti groundtruth is utilized during training, why the final output is not reside in same scale as the kitti groundtruth?(In evaluation, a median scaling is applied to depth prediciton)

Why here self.total_loss is set to depth_loss

DeepV2D/deepv2d/trainer.py

Line 203 in eb362f2

self.total_loss = depth_loss

Thanks for your great work, please help me figure out why here self.total_loss is set to depth_loss.

Image reconstruction quality on kitti

Hi! I try to reconstruct image at Frame T using image at Frame T+1. However, the visualization seems odd.

Here is how I do the reconstruction:
Set D, RgbT, RgbT+, PoseT, PoseT+ as predicted depth(unscaled), input Rgb at frame T, input Rgb at fram T+1, Pose predicted at T, Pose predicted at T+1.

Then:
1. pts3d = backproject(Depth)
2. pts3d_at_frameT+1 = PoseT+ * inv(PoseT)
3. pts2d_at_frameT+1 = project(pts3d_at_frameT+1)
4. grid sample

However, below is a visualized reconstruction at 2011_10_03_drive_0027_0000000799.png. First row is original input, second row is reconstructed rgb, third row is flow visualizion:

I notice an obvious lack of scale in the reconstruction, it is general for other sequences. The pose I used come from Depth prediciton process(the pose results from eval_kitti scipt.). Ideally, the left corner car's position should not move since it is static.

question about view pooling

Hi, thanks to your code.Although I view the code, I still don't understand the meaning of view pooling.In '3D Matching Network with view concatenation', you build cost volume for each image pairs, then you stack all of it and refine it with 3dcnn(_hourglass_3d) and output the probablity of depth.For me, I don't know where is the pooling work for different cost volume, it seems that you stack all the cost volume and output the depth map.

Question about scaling?

Hi @zachteed @heilaw @anewell @jiadeng

I notice that you scale both the depth map and the translation in pose matrix with scaling ratio 0.1 when training the KITTI dataset.
However, in the data streaming script for NYU and SCANNET, I only find the scaling for depth map with scaling ratio 1/5000 and 1/1000. Could you please explain why we don't need to scale the translation for NYU and SCANNET?

Thank you so much!

Problem running demo

Hi, I'm trying to run the demo with both kitti and nyu but i'm getting the following error:

Backprojection Op not available: Using python implementation
Traceback (most recent call last):
File "demos/nyu_demo.py", line 68, in
main(args)
File "demos/nyu_demo.py", line 41, in main
net = DeepV2D(INPUT_DIMS, cfg)
File "lib/deepv2d.py", line 38, in init
poses_pred = motion.forward(images[:, 1:], image_star, depth, intrinsics)
File "lib/networks/motion.py", line 84, in forward
G = self.flowse3(feat1, feat2, depth1, intrinsics/SC, G=G, reuse=i>0)
File "lib/networks/motion.py", line 107, in flowse3
coords = camera.camera_transform_project(G, depth, intrinsics)
File "lib/camera.py", line 87, in camera_transform_project
X = point_cloud_from_depth(depth, intrinsics)
File "lib/camera.py", line 72, in point_cloud_from_depth
X = iproj(pix, depth, kv)
File "lib/camera.py", line 54, in iproj
fx, fy, cx, cy = tf.split(kv, [1, 1, 1, 1], axis=-1)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1226, in split
name=name)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3289, in _split_v
num_split=num_split, name=name)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2508, in create_op
set_shapes_for_outputs(ret)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1873, in set_shapes_for_outputs
shapes = shape_func(op)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1823, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 676, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Dimension size, given by scalar input 2, must be non-negative but is -1 for 'motion/split' (op: 'SplitV') with input shapes: [7,4], [4], [] and with computed input tensors: input[2] = <-1>.

How to inference on a longer video?

Hello Teed
I'm new to video to depth area, thanks for your excellent work.
I'm using your codes to predict the dpeth maps from "golf.mov ", however, I found you only predict 8 depth maps from a single video, I tried to remove this constraint but the out of memory error happened.
How to predict dense depth maps from a single video with your project? I'm looking forward to your reply, thank you very much!

kitti.ckpt does not support global mode

Hi, when I use kitti.ckpt and set --mode=global, the following error arises:

Traceback (most recent call last):
  File "demos/demo_v2d.py", line 84, in <module>
    main(args)
  File "demos/demo_v2d.py", line 66, in main
    depths, poses = deepv2d(images, intrinsics, viz=True, iters=args.n_iters)
  File "deepv2d/deepv2d.py", line 467, in __call__
    self.update_poses(i)
  File "deepv2d/deepv2d.py", line 368, in update_poses
    self.poses, self.intrinsics, self.weights = self.sess.run(outputs, feed_dict=feed_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1149, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 192, 1088) for Tensor 'Placeholder_1:0', which has shape '(5, 192, 1088)'

The nyu.ckpt is normal in both global and keyframe mode.

What is the problem?

Many thanks.

cuda version error?

Hi, thanks for your work. I encountered a problem that,

2020-09-06 16:43:02.192510: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED

Is it a problem about cuda version? My environment is TF1.12, CUDA 9.2

Blas xGEMMBatched launch failed : a.shape=[7,3,3], b.shape=[7,3,1], m=3, n=1, k=3, batch_size=7

when I run the demo with gpu,there is something wrong:
Caused by op 'stereo/MatMul', defined at:
File "demos/demo_v2d.py", line 81, in
main(args)
File "demos/demo_v2d.py", line 55, in main
deepv2d = DeepV2D(cfg, args.model, use_fcrn=args.fcrn, is_calibrated=is_calibrated, mode=args.mode)
File "Deepv2d/deepv2d.py", line 73, in init
self._build_depth_graph()
File "Deepv2d/deepv2d.py", line 164, in _build_depth_graph
depths = self.depth_net.forward(Ts, images, intrinsics, adj_list)
File "Deepv2d/modules/depth.py", line 187, in forward
spred = self.stereo_network_avg(poses, images, intrinsics, idx)
File "Deepv2d/modules/depth.py", line 116, in stereo_network_avg
volume = operators.backproject_avg(Ts, depths, intrinsics, fmaps, adj_list)
File "Deepv2d/special_ops/operators.py", line 55, in backproject_avg
Tii = Ts.gather(ii) * Ts.gather(ii).inv() # this is just a set of id trans.
File "Deepv2d/geometry/transformation.py", line 146, in inv
Ginv = se3_matrix_inverse(self.matrix())
File "Deepv2d/geometry/se3.py", line 203, in se3_matrix_inverse
t = -tf.matmul(R, t)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2019, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1245, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[7,3,3], b.shape=[7,3,1], m=3, n=1, k=3, batch_size=7
[[node stereo/MatMul (defined at Deepv2d/geometry/se3.py:203) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](stereo/transpose, stereo/strided_slice_1)]]
[[{{node Sum/_2107}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3915_Sum", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

my cuda version is 9.0,what should i do?

ScanNet training

Hi,
in the paper you train on KITTI, NYU and ScanNet for your best results (Scannet gets the most iterations during stage 1). However, there are only training scipts for KITTI and NYU present here.

What is the reason for this? Could this be additionally provided?

Another question: Afterwards, you report that stage 2 is trained for another 120k iterations. On what benchmark is this? Is this on the individual benchmark, where you report your results or do you train on several as in stage 1?

Best regards!

ImportError: No module named 'vtkOpenGLKitPython'

Hi, I run the demo as README. I install the vtk and I am sure that 'vtkOpenGLKitPython.so' is in /usr/local/lib/python3.5/dist-packages/vtk/ folder.
but still got the error as mentioned above.

Test on video

Hi,

I'd like to test on a video sequence in TUM. I'm wondering how you test on a video sequence.
If the poses are unknown, how do you compute the poses? What's the batch size do you use to optimize the poses? Which images do you sample to compute the pose for a certain frame.

If the poses are known, do you only update the depth for one iteration?

Xuan

cannot load checkpoint

I tried the demo
python3 demos/demo_uncalibrated.py --video=data/demos/golf.mov
but it crashed:

tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key stereo/BatchNorm/moving_stddev not found in checkpoint
Traceback (most recent call last):
  File "/Users/l0stpenguin/Library/Python/3.7/lib/python/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/Users/l0stpenguin/Library/Python/3.7/lib/python/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/Users/l0stpenguin/Library/Python/3.7/lib/python/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key stereo/BatchNorm/moving_stddev not found in checkpoint
         [[{{node save/RestoreV2}}]]

I do not have gpu machine. Is it possible to run it without gpu?

Output of Demo scripts

Hi,
Thanks for uploading the code for this research paper.
I am successfully able to run the demo code for nyu, however the output is a single depth image and same goes for demo_uncalibrated script where the entire video is provided as input.
Shouldn't the output be multiple depth maps for different video frames or something similar as written in the paper?

Is your method end-2-end ?

Hello,
I have read your paper ! Thanks for uploading the code.
However, I would like to ask if your method can be trained end-2-end.
As I understand, the Depth module will build a cost volume around the key frame and then use 3D CNN network to predict the depth of that keyframe. In the Motion module, images and depths are required as the input to predict the relative poses.
If you have N = 5 input images, does it mean that you have to run your Depth module N times to get all N depth maps as input to the Motion module.

Why the memory keeps increasing during training

I train nyu at night and find that it is killed the next day morning. I repeat the training and find the memory keep increasing. My total memory is 32G. I use NVIDIA tensorflow r1.15 https://github.com/NVIDIA/tensorflow/tree/r1.15.

NYU gt association and pose

Hi, thx for sharing the code.
However, I have one problem about the dataloader for NYUv2.

DeepV2D/deepv2d/data_stream/nyuv2.py

Lines 183 to 184 in a3fbef1

    
           associations_file = osp.join(scene_dir, 'associations.txt') 
        
           camera_file = osp.join(scene_dir, 'pose.txt')

I have already downloaded the NYUv2 raw dataset, and want to generate the tfrecords on my own.
However, it seems that the association file and gt pose file is not provided in the official dataset.
Did you generate it from another approach?

Any idea to release a pytorch version？

Thank you for your great work and codes, this work is amazing.
Tensorflow is difficult for using and developing, any idea to release a pytorch version?

How to implement a single-view demo?

Hi, I'm trying to implement a single-view NYU evaluation. I noticed that you use 8 frames in your code and estimate the depth map of the first keyframe. I tried to reduce the number of frames to 1, the predicted depth maps are all nan values. I also tried to concatenate two same frames to predict, the resulted depth is not well.

How to implement a single-view demo correctly?

How did you evaluate TUM using translational rmse(m/s)

Hi, thank you for your nice work.
I'm wondering how you get the results from the paper.
I ran the code by

python demos/demo_slam.py --dataset=tum

Extract the poses from slam.poses.
Then, I use evo_rpe for evaluation.
But the metrics from evo_rpe is

{"title": "RPE w.r.t. translation part (m)\nfor delta = 1 (frames) using consecutive pairs\n(with Sim(3) Umeyama alignment)", "ref_name": "DeepV2D/data/slam/tum/rgbd_dataset_freiburg1_room/groundtruth.txt", "est_name": "DeepV2D/results/tum/poses.tum", "label": "RPE (m)"}

The aligned trajectory also doesn't look right.
May I ask if there's some conversion I missed?

Thank you.

InvalidArgumentError (see above for traceback): Cholesky decomposition was not successful. The input might not be valid.

Caused by op 'motion/PnP_1/Cholesky', defined at: 
  File "demos/demo_uncalibrated.py", line 152, in <module>  
    main(args) 
  File "demos/demo_uncalibrated.py", line 90, in main  
    use_fcrn=True, is_calibrated=False, use_regressor=False)
  File "deepv2d/deepv2d.py", line 68, in __init__ 
    self._build_motion_graph()
  File "deepv2d/deepv2d.py", line 129, in _build_motion_graph    
    images, depths, intrinsics, edge_inds, init=do_init)    
  File "deepv2d/modules/motion.py", line 287, in forward    
    (jj,ii), num_fixed=num_fixed, include_intrinsics=(not self.is_calibrated))  
  File "deepv2d/geometry/transformation.py", line 527, in global_optim
    delta_update = cholesky_solve(H, b) 
  File "deepv2d/geometry/cholesky.py", line 32, in solve    
    x = cholesky_solve(H, b)  
  File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 111, in decorated  
    return _graph_mode_decorator(f, *args, **kwargs)
  File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 132, in 
    result, grad_fn = f(*args)
  File "deepv2d/geometry/cholesky.py", line 9, in cholesky_solve
    chol = tf.linalg.cholesky(H)
  File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_linalg_ops.py", line 709, in 
    "Cholesky", input=input, name=name)
  File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in 
    op_def=op_def)
  File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in 
    return func(*args, **kwargs)
  File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in 
    op_def=op_def)
  File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in 
    self._traceback = tf_stack.extract_stack() 

InvalidArgumentError (see above for traceback): Cholesky decomposition was not successful. The input might not be valid.
    [[node motion/PnP_1/Cholesky (defined at deepv2d/geometry/cholesky.py:9) = Cholesky[T=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/device:CPU:0"](motion/PnP_1/Cast_3)]]
    [[{{node motion/PnP_2/Cast_5/_2999}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5542_motion/PnP_2/Cast_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Why are the intrinsic params divided by 4?

Hi, thanks for your nice work!
I wonder why the intrinsic parameters are divided by 4 at here
Thanks!

Questions about intrinsics for ScanNet

Hello. Thanks for the great work.

I am trying to run your evaluation code on ScanNet. I think I am using a newer version of ScanNet where some intrinsics file are placed elsewhere. So I got FileNotFound error here:

https://github.com/princeton-vl/DeepV2D/blob/master/deepv2d/data_stream/scannet.py#L111

Can you let me know what intrinsics (with respect to what image size) should be put here so that I can assign them manually?

Also, I am confused why only the depth intrinsics were used:
https://github.com/princeton-vl/DeepV2D/blob/master/deepv2d/data_stream/scannet.py#L143
while I guess the network will need color intrinsics instead.

Question about pose preprocessing

Hi,

I have a question about retrieving the pose data.
As referenced below, after the pose is converted from quaternion to matrix, it follows by an inverse operation. Why is this inverse operation necessary?

DeepV2D/deepv2d/data_stream/nyuv2.py

Lines 112 to 113 in eb362f2

    
           pose_mat = pose_vec2mat(pose_vec) 
        
           poses.append(np.linalg.inv(pose_mat))

Thanks

Results even better using "default" validation method?

I believe your results might be even slightly better if you use the default validation method:

The paper states that you directly use the 192x1088 output image of the CNN for evaluation. In contrast, other papers first resize the inferred image to the RGB size, crop it and then evaluate it, see https://github.com/nianticlabs/monodepth2/blob/master/evaluate_depth.py#L187

You can do the same if you first pad the output image with 108 pixels to undo the previous cropping and then perform the resizing and cropping. In that case I get an absRelErr=0.0640. I believe the improvement is due to the fact that I see some artifacts at the top which are simply cropped away with this method.

Note however, that I have skipped some of the 697 images from the Eigen split, if one of the four neighboring images was not available. How have you dealt with these cases? It is not mentioned at all in the paper.

NYUD pose

Hi, thanks for you work. The tfrecord is too big to download. Could you share a compressed file of pose information?

Significance of multiplying the translation vector by 0.1(args['scale'])

Hi,

Thank you for sharing the code.

I am not able to understand the significance of multiplying the translation vector by 0.1(args['scale']) constant in kitti.py file to update variables trajectory[i][0:3, 3].

Can you explain to me why you multiplied the translation vector by 0.1(args['scale'])?

        for i in range(len(trajectory)):
            trajectory[i] = np.dot(imu2cam, util.inv_SE3(trajectory[i]))
            trajectory[i][0:3, 3] *= self.args['scale']

	associations_file = osp.join(scene_dir, 'associations.txt')
	camera_file = osp.join(scene_dir, 'pose.txt')

	pose_mat = pose_vec2mat(pose_vec)
	poses.append(np.linalg.inv(pose_mat))