Code Monkey home page Code Monkey logo

Comments (3)

benjaminum avatar benjaminum commented on June 15, 2024

Hi @JiamingSuen ,

thanks for checking out our training code!

  1. At the moment we use the hyperparameters as in the training code.
    There is probably a lot of room for improving these parameters.
    The losses will eventually converge if you train for a very long time, but it does not improve the testing performance.

  2. v2 is an attempt to create a version of our network that can be trained easily with tensorflow.
    It is meant as a basis for future experiments to improve the architecture.
    First steps towards a better architecture are already in blocks.py.
    We share it because we hope it will be useful to other researchers.

  3. As you probably have noticed, the training procedure is quite complex and the training losses can be difficult to understand on first glance.
    One important remaining task is to provide easy to use evaluation code to better assess the network performance.

Thanks for this amazing work!

Thank you!

from demon.

JiamingSuen avatar JiamingSuen commented on June 15, 2024

Thanks for the reply. I tried to initialize weights with tf.contrib.layers.variance_scaling_initializer(factor=2.0), which is the "MSRA-initialization" described in this paper, while it's not helping a lot.

  1. What initialization did you use in the original Caffe implementation?
  2. Is it because the input data is quite noisy? I'm thinking about adding batch normalization layer, do you think it's a good idea? Or just start the training with synthetic dataset..

Will keep update my progress here.

from demon.

TheCrazyT avatar TheCrazyT commented on June 15, 2024

Asking myself the same thing ... thought Totalloss should go down after a while.
But it does not really look good (+160k iterations) :
https://tensorboard.dev/experiment/aay2ZG8aRUaZM1EwML3jPA/#scalars&run=0_flow1%2Ftrainlogs&_smoothingWeight=0.989

Edit:
Guess i would need a total loss that does not include the *_sig (and instead include the *_sig_unscaled losses) to have a nice looking graph.
Atleast i now understand why total loss does not decrease much while training itself actually does improve.

from demon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.