Code Monkey home page Code Monkey logo

Comments (8)

MichaelRamamonjisoa avatar MichaelRamamonjisoa commented on June 2, 2024

Hi,

Indeed your reported results are quite strange. I expect some gap to happen between different runs, but not that much.
There are a few things we need to check:

  • Can you confirm that you are using depth hints?
  • Can you evaluate your Epoch 19 result, as logs indices start with index 0?
  • Can you run training again, setting --num_epochs 20 ?

Do you run this for evaluation:

python evaluate_depth.py \
  --data_path <your_KITTI_path>  
  --encoder_type resnet --num_layers 50 \
  --width 1024 --height 320 \
  --load_weights_folder <path_to_model>
  --use_wavelets
  --eval_stereo
  --eval_split eigen
  --post_process

I unfortunately cannot run experiments at the moment.
It would help us debug if you could run the same experiments without wavelets, and compare against the "Depth hints Resnet 50" results.

from wavelet-monodepth.

ruili3 avatar ruili3 commented on June 2, 2024

Hi,

Thank you for your reply :D
For these things to check, I can reply to some of them now

  • I can confirm I use depth hints. I set --use_depth_hints and --depth_hint_path in the training code, and I got the depth hints loss using the tensorboard. As shown below, the loss curve is quite smooth. Maybe you can help to check whether the loss value is reasonable.
    Image

  • The result is exactly the Epoch indexed by 19 during training, I re-named it as 20 in the previous post.

  • Yes, I can run it again by setting --num_epochs to 20. Based on my previous experiences on running Monodepth2 and Manydepth, the total number of epochs (>20) may influence little on the individual performance.

When I run the evaluation, the code is the same as you suggested above. The code conducts stereo evaluation and sets fixed scaling factor to 5.4.

I would like to conduct DepthHints without wavelets on 1024x320 resolution to see whether the DepthHint MonoDepth deviates from the expected results in my setting. I'll post the result once I finish training. Thanks a lot for your help for further debugging!

from wavelet-monodepth.

ruili3 avatar ruili3 commented on June 2, 2024

Hi,

I run the same experiments without wavelets(1023x320, depthhints, epoch-19), the scores still fall behind the expected results:
"abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.1136 & 0.8743 & 4.8357 & 0.2026 & 0.8599 & 0.9514 & 0.9779 \"
The value of "Depth hints Resnet 50 (1024x320)" in your paper is:
"abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.096 & 0.710 & 4.393 & 0.185 & 0.890& 0.962& 0.981\"
Seems the performances of both setttings drops simultaneously here.

It's wired that I cloned the code directly from your repository, and run training and evaluation following the commands of README.md. Is that possibly caused by the training data and running environment? I use the .jpg images (as processed by Monodepth2) for training and also use these images to generate depth hints files. Can you provide more details or share the environment files for your conda environment?

from wavelet-monodepth.

MichaelRamamonjisoa avatar MichaelRamamonjisoa commented on June 2, 2024

Hi,

Thanks a lot for these results!

The good news is that you get the correct main result: wavelets do not change the original performance, since the results are similar with or without wavelets.

I have made a few modifications of the original depth-hints code to simplify it before release, so it might also come from that, but I need more investigation, and unfortunately I cannot access GPUs to retrain with 1024x320 resolution and ResNet50.

Looking at the original depth-hints, I see that --disparity_smoothness 0 is added when training using depth-hints, and --scheduler_step_size 5. These are missing in my training command line.

Regarding jpg vs png, it seems that depth-hints are generated using .jpg format: https://github.com/nianticlabs/depth-hints/blob/aa2ecf7bc88ef2edbd434fbf064ac024cec8e85d/precompute_depth_hints.py#L178, so you should be ok.

I will leave this issue open until we solve it, but let me know if setting these flags changes everything to the final scores. In any case, a performance increase in depth hints will directly translate into a similar performance increase in WaveletMonoDepth.

In the meantime, if you'd like to try using our trained WaveletMonoDepth, I have put a link to trained weights in the README files.

from wavelet-monodepth.

ruili3 avatar ruili3 commented on June 2, 2024

Hi,

I think I find the reason :D

In L15 of KITTI/networks/network_constructors.py, the ResNet pretraining is disabled in your code, so the performance is worse than that of using the pre-trained model.

encoder = encoders.ResnetEncoder(opts.num_layers, False)

After using pre-trained ResNet 50 with wavelet, I get similar results as reported in your paper:
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.0979 & 0.7363 & 4.4214 & 0.1847 & 0.8902 & 0.9629 & 0.9819 \
You could change this line and the result will be good :D

Meanwhile, I have a small question regarding to the network design of the paper. I wonder why in L134-L135 of depth_decoder.py,

yh = 2**(scale-1) * self.sigmoid(self.convs[("waveconv", scale, 1)](input_features)).unsqueeze(1) - \

the outputs of two "waveconv" modules are scaled 2**(scale-1) according to current scales, and the final yh is calculated using the difference of the two "waveconv" outputs? Should I follow the same practice if I want to build a new network module with similar multi-level output? Thanks a lot!

from wavelet-monodepth.

MichaelRamamonjisoa avatar MichaelRamamonjisoa commented on June 2, 2024

Hey!

Good spot, thanks a lot for your help with debugging!

I'll put back the pretraining option then.
Regarding your question, the scaling factor is meant to make sure yh stays within [-2**(scale-1), 2**(scale-1)] range. The IDWT will then output a LL with values in range [0, 2**(scale-2)].

The "2-layer" design was used to have two separate layers for negative and positive values. But you could also use a tanh activation and remove the need for two layers, it will work fine too, although you might suffer a slight decrease in performance.

From previous experiments, I found that you can also use linear activations for waveconv layers and let the network learn to predict values in the right range, but I decided to keep these to improve training stability.

from wavelet-monodepth.

ruili3 avatar ruili3 commented on June 2, 2024

Yes, thanks for your explanation!

from wavelet-monodepth.

MichaelRamamonjisoa avatar MichaelRamamonjisoa commented on June 2, 2024

Fixed in commit 1d9f945. Closing.

from wavelet-monodepth.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.