loshchil / sgdr Goto Github PK

View Code? Open in Web Editor NEW

250.0 250.0 49.0 6 KB

Python 100.00%

sgdr's People

Contributors

Stargazers

Watchers

Forkers

ml-lab hedgefair libardo1 pinglmlcv benjamesbabala deepxkn keskarnitish milestonesvn patrykchrabaszcz codeaudit vishalbelsare kastnerkyle heyuanhao arberzela 171767313 hoangcuong2011 vanpersie32 vafisher dreadlord1984 b2220333 mtkwt tingjinluo afcarl sgflower66 imhgchoi soonhwan-kwon rongyan236 phamcuong92 amirunpri2018 mhs321 yanzhaowu margaretye cosmoshua teryang syrilzhang csjunxu harryrhythm kc96226 atztao robot-ai-machinelearning vybui pgsrv jason36912 truepg ml-edu joyw12138 ortalby stjordanis wwwppp121

sgdr's Issues

Learning rate on pretrained model

if the model is pretrained in a good state, and I want to apply your models. How do I set the reasonable learning rate? If the initial learning rate is too large, then it may go out too far of the optimized region.

some problem

first form you paper used wrn model is 28-10 so the n should set to 4 but in this code the n is 5,so this just a misstake?or you paper result used?

[L2 Regularization] Queries

Hi,

The implementation shared on this repo uses (lasagne's) l2_regularization with a factor of 0.0005 for all the experiments. I assume that it is inspired by work done as a part of original WRN paper by Zagoruyko et al. In your SGDR paper (https://arxiv.org/abs/1608.03983), it is also mentioned that the value of weight decay should be 0.0005.

SGDR/SGDR_WRNs.py

Lines 251 to 252 in 5269a61

    
           sh_reg_fac = theano.shared(lasagne.utils.floatX(reg_fac)) 
        
           l2_penalty = lasagne.regularization.regularize_layer_params(all_layers, lasagne.regularization.l2) * sh_reg_fac

As per your followup paper on Decoupling Weight Decay Regularization (https://arxiv.org/abs/1711.05101), I found that for SGD the value of l2 regularization should be rescaled by learning rate.

Can you please clarify something (though, it'd be directly in terms of TensorFlow based implementation):

tfa.optimizers.SGDW(0.0005, lr) with SGDR (lr = tf.keras.experimental.CosineDecayRestarts(...))
tf.keras.layers.Dense(kernel_initializer=tf.keras.regularizers.L2(0.0005)) and or tf.keras.layers.Conv2D(kernel_initializer=tf.keras.regularizers.L2(0.0005))
Whether the above two are equivalent or not?

Let me know if I can help you with any additional details on the internal TF implementations.

Thanks in advance.

cos function in your paper while sin in your implementation

See the title, Eq. (5) in your paper employed cosine function while using sin in your code: https://github.com/loshchil/SGDR/blob/master/SGDR_WRNs.py#L311

loshchil / sgdr Goto Github PK

sgdr's People

Contributors

Stargazers

Watchers

Forkers

sgdr's Issues

Learning rate on pretrained model

some problem

[L2 Regularization] Queries

cos function in your paper while sin in your implementation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	sh_reg_fac = theano.shared(lasagne.utils.floatX(reg_fac))
	l2_penalty = lasagne.regularization.regularize_layer_params(all_layers, lasagne.regularization.l2) * sh_reg_fac