nianticlabs / wavelet-monodepth Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2021] Monocular depth estimation using wavelets for efficiency
License: Other
[CVPR 2021] Monocular depth estimation using wavelets for efficiency
License: Other
Hi,
Thanks for the brilliant work! I notice the inverse DWT you use in the code is IDWT(). However, I didn't find its usage in the current documentation of the Pytorch Wavelets library. I wonder which version of Pytorch Wavelets did you use in the paper? And if I want to use the DWT transformation under your code settings, am I supposed to use a function like DWT(img)?
Thanks a lot!
Hi,
I have well understood the flow then, I can see that :
"self.is_test" always remain False, hence the control will never enter in "If" condition
wavelet-monodepth/NYUv2/data.py
Line 133 in 5bc1939
Could you please tell me why you are doing either multiply by 1000 or dividing by 1000.
Another point, as I am working with the smaller version of the NTU dataset i.e. "nyu_depth_v2_labeled.mat", whether this step of divide or multiply is also applicable in this dataset also ?
(wavelet-mdp) enb@enb-MS-7D42:/media/enb/d246805e-89c3-40bf-95da-9f0d81ea7b05/home/enb/ZYF/wavelet-monodepth-main$ python KITTI/test_simple.py --image_path /media/enb/d246805e-89c3-40bf-95da-9f0d81ea7b05/home/enb/ZYF/Video_test/photo2/set1/ --model_path pretrain/ours_mobilenet_nyuv2.pth --encoder_type mobilenet
-> Loading model from pretrain/ours_mobilenet_nyuv2.pth
Building network
Encoder... Building mobilenet... Traceback (most recent call last):
File "KITTI/test_simple.py", line 185, in
test_simple(args)
File "KITTI/test_simple.py", line 81, in test_simple
encoder = make_depth_encoder(args)
File "/media/enb/d246805e-89c3-40bf-95da-9f0d81ea7b05/home/enb/ZYF/wavelet-monodepth-main/KITTI/networks/network_constructors.py", line 14, in make_depth_encoder
use_pretrained = opts.weights_init == "pretrained"
AttributeError: 'Namespace' object has no attribute 'weights_init'
Hello,
Thanks for the impressive work! I've cloned the code and setting the environment as required (pytorch 1.7.1, torchvision 0.8.2). When running the code on KITTI (w/o depth hints), I encountered two problems.
There is a warning 'UserWarning: Detected call of lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.' Consider I use the 1.7.1 version torch, do I need to change the order in run_epoch.py as suggested by the warning?
An error shows in line 231 of mono_dataset.py: inputs[("depth_gt", scale)] = self.resizescale
'TypeError: img should be PIL Image. Got <class 'numpy.ndarray'>'
I transform 'depth_gt' to PIL Image format and the problem is settled. I wonder if the error is an individual case for me, or how do you handle this problem.
There is another small error when using grid_sample(): 'UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.' Do I need to change the 'align_corners' parameter or just leave it unchanged?
I wonder how do you settle these issues in your implementation. Thank you a lot!
Hi,
When I run the code on KITTI, I can not get the scores reported on the paper. I run the code with stereo training, with 1024x320 resolution and wavelet decomposition, the command is shown below:
"train.py --data_path --log_dir --encoder_type resnet --num_layers 50 --width 1024 --height 320 --frame_ids 0 --use_stereo --split eigen_full --num_epochs 300 --use_depth_hints --depth_hint_path --use_wavelets"
I evaluate the model on Epoch 20 and got the result:
" abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.1146 & 0.8996 & 4.8552 & 0.2024 & 0.8582 & 0.9519 & 0.9785 \",
which is different from the reported socres in the paper :
"0.097 & 0.718 & 4.387 & 0.184 & 0.891 & 0.962 & 0.982"
I use pytorch 1.7.1 (with cuda 10.1) , torchvision 0.8.2 (with cuda 10.1), pytorch-wavelets 1.3.0, numpy 1.19.5, opencv 3.4.2, pillow 6.2.1, scikit-learn 0.24.2, on python 3.7.10. The setting is consistent with that suggested by the github repository.
So I wonder how to produce to desired scores, is there anything wrong with my settings? Thank you for your advice :D
Hello, what is the function of 2 **2 when obtaining 'h'
h = (2 ** 2) * self.wave1(x_d1).unsqueeze(1)
and other examples, including
ll = (2 ** 3) * self.wave1_ll(x_d1)
h = (2 ** 1) * self.wave2(x_d2).unsqueeze(1)
Could you please tell the function of that? Thank you in advance.
Hi,
Here comes a new problem.
When I trained the model with the resolution of 192*640 with command
train.py --data_path <path> --log_dir <path> --encoder_type resnet --num_layers 50 --width 640 --height 192 --frame_ids 0 --use_stereo --split eigen_full --num_epochs 30 --use_depth_hints --depth_hint_path <path> --use_wavelets
, the training process was stuck. The logging message of the first epoch did not show. There was no other message nor error reported. It's pretty weird because the resolution is not supposed to interfere with the training process. Can you please help to debug that? Thank you!
Hello, when I use the training command without depth hints ,the loss is very small , I'm not sure whether this is the normal phenomenon.
The loss is displayed as 0. 0000
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.