Code Monkey home page Code Monkey logo

adabound-tensorflow's Introduction

๐Ÿ‘‰ Research Scientist at NAVER AI Lab

stats

Recent implementations

Recent accepted papers

Publications

adabound-tensorflow's People

Contributors

huyu398 avatar joeyearsley avatar taki0112 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

adabound-tensorflow's Issues

About the TPU version (The _resouce_apply_dense function)

Hello,
Thanks for your amazing implementation of the Adabound optimizer in Tensorflow.
I am running some projects on TPU and want to try Adabound on it. But I found some data type incompatibility issue when calling your AdaBoundOptimizer.

Specifically speaking, when calling the _resouce_apply_dense function, it shown some error. The variable var there has type tf.float32 while the variable grad has type tf.resource which will cause error at line https://github.com/taki0112/AdaBound-Tensorflow/blob/master/AdaBound.py#L132.

I am not familiar with the optimizer the TPU optimization mechanism so I could not fix this. Do you have any idea?

Thanks

not work for sparse embedding model

thanks for your great job.
I found it didn't work for large scale model with huge sparse embedding parameters.
for the apply_sparse function may be not very efficient as AdamOptimizer in tensorflow.
So do you have a plan to re-implement the sparse version of adabound?

I compared the difference between this version and AdamOptimizer in TF.
I think the key difference of efficiency is:
AdamOptimizer of TF call the function training_ops.sparse_apply_adam() , this function accelerate the updating of sparse parameters.

Question : How to get learning rate

Hi, your works help me a lot and I have a question on your code!

I would like to know how to get the Learning rate in your code. I'm not understanding very well the AdaBound algorithm and the logic behind Keras Optimizer workflow so your help will be very useful to guide me!

Thanks,
Hugo

Trouble finetuning by switching to SGD

I have seen a performance boost switching from Adam to AdaBound. I have tuned my model and found that a range of 2e-4 to 2e-2 works best. I am interested in fine tuning my model on a new dataset, but have found that switching to tf.train.gradientdescentoptimizer with a 1e-3 learning rate causes a slow divergence.

In the pytorch implementation, the authors were able to get their AdaBound to work with learning rate decay as follows:

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=150, gamma=0.1, last_epoch=start_epoch)
https://github.com/Luolc/AdaBound/blob/master/demos/cifar10/main.py

I'm unclear how to reduce the learning rate of AdaBound in tensorflow without switching to a new optimizer (SGD) with a set learning rate, which seems to mess things up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.