taki0112 / adabound-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

148.0 148.0 31.0 287 KB

Simple Tensorflow implementation of "Adaptive Gradient Methods with Dynamic Bound of Learning Rate" (ICLR 2019)

License: Apache License 2.0

Python 100.00%

adabound-tensorflow's Introduction

Junho Kim

👉 Research Scientist at NAVER AI Lab

Recent implementations

Recent accepted papers

🔥 Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models CVPRW 2023, Oral
Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization CVPRW 2023
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding CVPR 2023

Publications

⭐ Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models CVPRW 2023, Oral
Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization CVPRW 2023
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding CVPR 2023
Rarity Metric : A New Metric to Evaluate the Uncommonness of Synthesized Images ICLR 2023, Spotlight
Learning Input-agnostic Manipulation Directions in StyleGAN with Text Guidance ICLR 2023
Generator Knows What Discriminator Should Learn in Unconditional GANs ECCV 2022
⭐ Feature Statistics Mixing Regularization for Generative Adversarial Networks CVPR 2022
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks ICLR 2022
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing CVPR 2021
⭐ U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation ICLR 2020

adabound-tensorflow's People

Contributors

Stargazers

Watchers

adabound-tensorflow's Issues

About the TPU version (The _resouce_apply_dense function)

Hello,
Thanks for your amazing implementation of the Adabound optimizer in Tensorflow.
I am running some projects on TPU and want to try Adabound on it. But I found some data type incompatibility issue when calling your AdaBoundOptimizer.

Specifically speaking, when calling the _resouce_apply_dense function, it shown some error. The variable var there has type tf.float32 while the variable grad has type tf.resource which will cause error at line https://github.com/taki0112/AdaBound-Tensorflow/blob/master/AdaBound.py#L132.

I am not familiar with the optimizer the TPU optimization mechanism so I could not fix this. Do you have any idea?

Thanks

not work for sparse embedding model

thanks for your great job.
I found it didn't work for large scale model with huge sparse embedding parameters.
for the apply_sparse function may be not very efficient as AdamOptimizer in tensorflow.
So do you have a plan to re-implement the sparse version of adabound?

I compared the difference between this version and AdamOptimizer in TF.
I think the key difference of efficiency is:
AdamOptimizer of TF call the function training_ops.sparse_apply_adam() , this function accelerate the updating of sparse parameters.

`gamma` is wrong or README needs updating.

Currently gamma is seemingly constant unless you multiply it by the step before passing it into the function.

In the original:
https://github.com/Luolc/AdaBound/blob/2e928c3007a2fc44af0e4c97e343e1fed6986e44/adabound/adabound.py#L111

They multiply by the step, so presuming to keep the API the same as TF optimisers you need to keep a base_gamma and gamma_multiplied= gamma * step, passing the latter into the function?

In line 85, beta power1 is None type, causing errors.

beta1_power = math_ops.cast(self._get_non_slot_variable("beta1_power", graph=graph), var.dtype.base_dtype)

No configuration changes on my end, throwing an error that beta power1 is None.

x and y must have the same dtype, got tf.float32 != tf.resource

Version: Tensorflow 1.12 Stable.
When I run the code, it appears the error:

x and y must have the same dtype, got tf.float32 != tf.resource
in "AdaBound.py", line 132, in _resource_apply_dense
final_lr = self._final_lr * lr_t / base_lr_t

Question : How to get learning rate

Hi, your works help me a lot and I have a question on your code!

I would like to know how to get the Learning rate in your code. I'm not understanding very well the AdaBound algorithm and the logic behind Keras Optimizer workflow so your help will be very useful to guide me!

Thanks,
Hugo

Trouble finetuning by switching to SGD

I have seen a performance boost switching from Adam to AdaBound. I have tuned my model and found that a range of 2e-4 to 2e-2 works best. I am interested in fine tuning my model on a new dataset, but have found that switching to tf.train.gradientdescentoptimizer with a 1e-3 learning rate causes a slow divergence.

In the pytorch implementation, the authors were able to get their AdaBound to work with learning rate decay as follows:

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=150, gamma=0.1, last_epoch=start_epoch)
https://github.com/Luolc/AdaBound/blob/master/demos/cifar10/main.py

I'm unclear how to reduce the learning rate of AdaBound in tensorflow without switching to a new optimizer (SGD) with a set learning rate, which seems to mess things up.

Flag of "amsbound" is incorrect

The name of flag is "amsgrad" in readme, while is "amsbound" in the code