Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

About v2 training about demon HOT 3 OPEN

lmb-freiburg commented on June 15, 2024

About v2 training

from demon.

Comments (3)

benjaminum commented on June 15, 2024

Hi @JiamingSuen ,

thanks for checking out our training code!

At the moment we use the hyperparameters as in the training code.
There is probably a lot of room for improving these parameters.
The losses will eventually converge if you train for a very long time, but it does not improve the testing performance.
v2 is an attempt to create a version of our network that can be trained easily with tensorflow.
It is meant as a basis for future experiments to improve the architecture.
First steps towards a better architecture are already in blocks.py.
We share it because we hope it will be useful to other researchers.
As you probably have noticed, the training procedure is quite complex and the training losses can be difficult to understand on first glance.
One important remaining task is to provide easy to use evaluation code to better assess the network performance.

Thanks for this amazing work!

Thank you!

from demon.

JiamingSuen commented on June 15, 2024

Thanks for the reply. I tried to initialize weights with tf.contrib.layers.variance_scaling_initializer(factor=2.0), which is the "MSRA-initialization" described in this paper, while it's not helping a lot.

What initialization did you use in the original Caffe implementation?
Is it because the input data is quite noisy? I'm thinking about adding batch normalization layer, do you think it's a good idea? Or just start the training with synthetic dataset..

Will keep update my progress here.

from demon.

TheCrazyT commented on June 15, 2024

Asking myself the same thing ... thought Totalloss should go down after a while.
But it does not really look good (+160k iterations) :
https://tensorboard.dev/experiment/aay2ZG8aRUaZM1EwML3jPA/#scalars&run=0_flow1%2Ftrainlogs&_smoothingWeight=0.989

Edit:
Guess i would need a total loss that does not include the *_sig (and instead include the *_sig_unscaled losses) to have a nice looking graph.
Atleast i now understand why total loss does not decrease much while training itself actually does improve.

from demon.

Recommend Projects