cyberzhg / keras-radam Goto Github PK

View Code? Open in Web Editor NEW

324.0 15.0 46.0 56 KB

RAdam implemented in Keras & TensorFlow

Home Page: https://pypi.org/project/keras-rectified-adam/

License: MIT License

Python 98.63% Shell 1.37%

keras optimizers adam radam rectified-adam tensorflow

keras-radam's Introduction

Keras RAdam

[中文|English]

Unofficial implementation of RAdam in Keras.

Install

pip install keras-rectified-adam

External Link

tensorflow/addons:RectifiedAdam

Usage

from tensorflow import keras
import numpy as np
from keras_radam import RAdam

# Build toy model with RAdam optimizer
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_shape=(17,), units=3))
model.compile(RAdam(), loss='mse')

# Generate toy data
x = np.random.standard_normal((4096 * 30, 17))
w = np.random.standard_normal((17, 3))
y = np.dot(x, w)

# Fit
model.fit(x, y, epochs=5)

Use Warmup

from keras_radam import RAdam

RAdam(total_steps=10000, warmup_proportion=0.1, min_lr=1e-5)

keras-radam's People

Contributors

Stargazers

Watchers

Forkers

wangyoucao soonhwan-kwon thupalo cbinners timvink rosssong ajayaraman fuaiguo gasteigerjo zxzxzxygithub mstyura bcc2xp world4jason b2220333 pranaya-mathur pro-flynn ecarlson156 songchengwen lxytsos yangget romainbrault shaz13 iamvarunanand sonofantonv0 pgsrv wuyifanisai xuzhang5788 alexfrontxq guoyuankai pedromlsreis soccergame summer1234 ln-l satoshirobatofujimoto saber5433 neronjust2017 bestvit999 boangri satyenrajpal johnleo-xjtu zoujinfan whyki innat paul2002 bobo-zz dstalzjohn

keras-radam's Issues

No module name math_ops

import keras
from keras_radam import RAdam
from tensorflow.python.keras.optimizers import Adadelta
import tensorflow as tf
import tensorflow.keras.backend as K
import pandas as pd

ImportError Traceback (most recent call last)

in ()
1 import keras
----> 2 from keras_radam import RAdam
3 from tensorflow.python.keras.optimizers import Adadelta
4 import tensorflow as tf
5 import tensorflow.keras.backend as K

2 frames

/usr/local/lib/python3.6/dist-packages/keras_radam/init.py in ()
----> 1 from .selection import *
2
3 version = '0.17.0'

/usr/local/lib/python3.6/dist-packages/keras_radam/selection.py in ()
5
6 if TF_KERAS:
----> 7 from .optimizer_v2 import RAdam
8 else:
9 from .optimizers import RAdam

/usr/local/lib/python3.6/dist-packages/keras_radam/optimizer_v2.py in ()
1 import tensorflow as tf
2 from tensorflow.python.keras.optimizer_v2.optimizer_v2 import OptimizerV2
----> 3 from tensorflow.python import ops, math_ops, state_ops, control_flow_ops
4 from tensorflow.python.keras import backend as K
5

ImportError: cannot import name 'math_ops'

Typo in README.md

Describe the Bug

The README example states:

RAdam(total_step=10000, warmup_proportion=0.1, min_lr=1e-5)

Should be total_steps:

-RAdam(total_step=10000, warmup_proportion=0.1, min_lr=1e-5)
+RAdam(total_steps=10000, warmup_proportion=0.1, min_lr=1e-5)

Generate First Release

Hello there,
Would it be possible to have a version release for Keras-radam? Thanks!

ModuleNotFoundError: No module named 'tensorflow.python.keras.optimizer_v2'

Hi, thank you for releasing the package.
I'm using TF 1.12.0 and tf.keras 2.1.6-tf.
When I ran the code below:

import os
os.environ['TF_KERAS'] = '1'
from tensorflow import keras
import numpy as np
from keras_radam import RAdam

model = keras.models.Sequential()
model.add(keras.layers.Dense(input_shape=(17,), units=3))
model.compile(RAdam(), loss='mse')

x = np.random.standard_normal((4096 * 30, 17))
w = np.random.standard_normal((17, 3))
y = np.dot(x, w)

model.fit(x, y, epochs=5)

the error comes to "ModuleNotFoundError: No module named 'tensorflow.python.keras.optimizer_v2'"

Ranger Optimizer extension

Ranger optimizer https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d https://arxiv.org/abs/1907.08610v1 Is a new optimizer that reports state of the art optimization performance for Deep Networks. The interesting thing is that Ranger uses RAdam as the base optimizer.
As you have the best library for RAdam implementation in Tensorflow and Keras, can I request you to consider extending your work to include Lookahead as well?

TypeError: Unexpected keyword argument passed to optimizer: lr

Python = 3.6
Keras = 2.2.4
Tensorflow = 1.15

Hi Guys,

I tried to load my Yolov3 model. This running with no GPU But this error occurred:

food_model = load_model(config['predict']['foodnet_model'], custom_objects={"tf": tf, "RAdam":RAdam})
File "C:\Users\User\Anaconda3\envs\no-gpu\lib\site-packages\keras\engine\saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "C:\Users\User\Anaconda3\envs\no-gpu\lib\site-packages\keras\engine\saving.py", line 299, in _deserialize_model
custom_objects=custom_objects)
File "C:\Users\User\Anaconda3\envs\no-gpu\lib\site-packages\keras\optimizers.py", line 768, in deserialize
printable_module_name='optimizer')
File "C:\Users\User\Anaconda3\envs\no-gpu\lib\site-packages\keras\utils\generic_utils.py", line 147, in deserialize_keras_object
return cls.from_config(config['config'])
File "C:\Users\User\Anaconda3\envs\no-gpu\lib\site-packages\keras\optimizers.py", line 154, in from_config
return cls(**config)
File "C:\Users\User\Anaconda3\envs\no-gpu\lib\site-packages\keras_radam\optimizers.py", line 34, in init
super(RAdam, self).init(**kwargs)
File "C:\Users\User\Anaconda3\envs\no-gpu\lib\site-packages\keras\optimizers.py", line 79, in init
'passed to optimizer: ' + str(k))
TypeError: Unexpected keyword argument passed to optimizer: lr

ValueError: ('Could not interpret optimizer identifier:', <keras_radam.optimizers.RAdam object at 0x7fd0dab35358>)

Does this implementation work with TF 2.0?
I've tried the optimizer with this notebook
But after running the notebook the following Error occurs:
ValueError: ('Could not interpret optimizer identifier:', <keras_radam.optimizers.RAdam object at 0x7fd0dab35358>)

The modification is shown bellow

`tf.keras.backend.clear_session()
tf.random.set_seed(51)
np.random.seed(51)
train_set = windowed_dataset(x_train, window_size=60, batch_size=100, shuffle_buffer=shuffle_buffer_size)
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters=60, kernel_size=5,
strides=1, padding="causal",
activation="relu",
input_shape=[None, 1]),
tf.keras.layers.LSTM(60, return_sequences=True),
tf.keras.layers.LSTM(60, return_sequences=True),
tf.keras.layers.Dense(30, activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 400)
]

model.compile(loss=tf.keras.losses.Huber(),
optimizer=RAdam(),
metrics=["mae"])
history = model.fit(train_set,epochs=250)`

EDIT:
I've set the environment variable TF_KERAS to 1
import os os.environ['TF_KERAS'] = '1'

keras ReduceLROnPlateau callback expects lr property

Describe the Bug
The keras callback ReduceLROnPlateau, expects a property called "lr" for the learning rate of the optimiser. RAdam does not appear to have this? (also, the parameter is called learning_rate? where normally it would be just lr)

Usage for Tensorflow 2.0 minimize with var_list

Since the documentation for usage of OptimizerV2.minimize() is lacking, it is hard to realize how to use it, as the function does not query trainable_variables automatically anymore and requires var_list to be given.

My DNN to the date used TF 1.14, i would prefer to see RAdam working in it before migrating all code (production demands), or else i would give up on using it for any other scheme of adaptative learning rate.

epsilon not compatible with Adam

In this implementation epsilon is used inside the sqrt, but in Adam it is used outside. To switch from Adam to RAdam one should use epsilon^2 of the value used before. Failing to do so may result in bad performance. Please implement epsilon the same way as in Adam or at least clearly note the difference.

amsgrad parameter

Hi @CyberZHG and TY for sharing this !
Have you run some experiments with amsgrad=True ?
If so, have you notice significant improvement compared to RAdam+warmup alone ?
Best regards

Maintenance

Are you going to keep this repository up-to-date or only focus on version on tensorflow_addons?

TF.keras

How to call?

The keras backend support

Hi, thank you for this amazing package!

It runs perfectly by only using keras itself

I met backend issue when I was trying to use tf.keras

here is my problem:

Describe the Bug

When I was try to run the demo code which provided in readme file

and I ran the code with environment variable TF_KERAS=1

I got the error as :

 from keras_radam import RAdam
  File "/home/flydsc/.local/lib/python3.6/site-packages/keras_radam/__init__.py", line 1, in <module>
    from .selection import *
  File "/home/flydsc/.local/lib/python3.6/site-packages/keras_radam/selection.py", line 7, in <module>
    from .optimizer_v2 import RAdam
  File "/home/flydsc/.local/lib/python3.6/site-packages/keras_radam/optimizer_v2.py", line 5, in <module>
    from tensorflow.keras import backend_config
ImportError: cannot import name 'backend_config'

Version Info

I was using Tensorflow version 1.13 and I found no backend_config under keras folder any more.

Instead, there exsists a file called back which I suppose is the similar function of back_config.
You may found it from link: https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/keras/backend.py
In this case, if I modify the code(keras_radam/optimizer_v2.py) as

from tensorflow.python.keras import backend as backend_config

the demo code runs perfectly.

Minimal Codes To Reproduce

import tensorflow as tf
import numpy as np
from keras_radam import RAdam


model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(input_shape=(17,), units=3))
model.compile(RAdam(), loss='mse')


x = np.random.standard_normal((4096 * 30, 17))
w = np.random.standard_normal((17, 3))
y = np.dot(x, w)


model.fit(x, y, epochs=5)

Do I need to tune learning rates?

Thank you so much for your great implementation.
Do I need to add a callback like ReduceLROnPlateau? Can I combine RAdam and AdamW(Adam with weight decay) together? How about using RAdam with one-cycle-policy?

Any other parameter needed when warmup?

When used in Keras with the usage "RAdam(total_steps=10000, warmup_proportion=0.1, min_lr=1e-5)", any other parameters should be set ? Will RAdam work normally if other parameters are not passed, just like "decay" , "weight_decay"

Warmup causes NAN

I use RAdam in maskrcnn with keras implement. But after warmup completed, the loss value get NAN. If SGD used, without warmup, the loss value is normal. I just do the operation during heads layers training. Usage is as following, and any reply will be appreciated: learning_rate is 0.001.

if warmup:
   optimizer_use=RAdam(learning_rate=1e-5,total_steps=all_steps, 
 warmup_proportion=0.05,min_lr=learning_rate)`
           
else:
           
   optimizer_use=RAdam()

self.keras_model.compile(optimizer=optimizer_use, loss=[ None] * len(self.keras_model.outputs))

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

as I replaced "model.compile(loss=loss, optimizer='adam')" in my code as "model.compile(RAdam(), loss='mse')", I got "ValueError: An operation has None for gradient." error, how to solve this ?

not compatable with tf 1.8 , AttributeError: 'RAdamOptimizer' object has no attribute '_call_if_callable'

Describe the Bug
when import tf 1.8，the console gives the error log below:

AttributeError: 'RAdamOptimizer' object has no attribute '_call_if_callable'

Version Info

I'm using the latest version

AttributeError: 'TFOptimizer' object has no attribute 'learning_rate'

python 3.6.0 keras 2.3.1 tensorflow 2.1.0
AttributeError: 'TFOptimizer' object has no attribute 'learning_rate'

Unknown optimizer: RAdam

I'm currently using Keras/TF 1.12.
Model is trained with optimizer RAdam and saved.
When I load the model again with keras.models.load_model(x)
I get:

  File "src/models/eval_model.py", line 105, in <module>
    model = keras.models.load_model(args.model_filepath)
  File "keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "keras/engine/saving.py", line 299, in _deserialize_model
    custom_objects=custom_objects)
  File "keras/optimizers.py", line 768, in deserialize
    printable_module_name='optimizer')
  File "keras/utils/generic_utils.py", line 138, in deserialize_keras_object
    ': ' + class_name)
ValueError: Unknown optimizer: RAdam

I did an import: from keras_radam import RAdam
What am I doing wrong?

ModuleNotFoundError: No module named 'keras.legacy'

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-25-4849a56a6cd9> in <module>()
      1 get_ipython().system('pip install keras-adamw')
----> 2 from keras_adamw import AdamW

1 frames
/usr/local/lib/python3.6/dist-packages/keras_adamw/optimizers.py in <module>()
      1 import numpy as np
      2 from keras import backend as K
----> 3 from keras.legacy import interfaces
      4 from keras.optimizers import Optimizer
      5 from .utils import _init_weight_decays, _apply_weight_decays, _check_args

ModuleNotFoundError: No module named 'keras.legacy'

NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

Numerical instability with higher learning rates & larger networks

Describe the Bug
Numerical instability with learning rate 0.1, on CIFAR-10 with simple VGG7 architecture. Using TF 2.0 Keras.
Problem does not exist with learning rate 0.001

When using a simple dummy architecture with 1 layer, this problem does not exist.

Training Output
Using RAdam on VGG7:
"Epoch 1, Loss: nan, Accuracy: 10.010000228881836, Test Loss: nan, Test Accuracy: 10.0"

Using RAdam on dummy 1 layer:
"Epoch 1, Loss: 13058.6318359375, Accuracy: 10.107999801635742, Test Loss: 2.3037562370300293, Test Accuracy: 10.0"

When using Adam:
"Epoch 1, Loss: 14431383552.0, Accuracy: 9.960000038146973, Test Loss: 2.3032336235046387, Test Accuracy: 10.0"

Version Info
tensorflow-gpu 2.0.0-beta1

Minimal Codes To Reproduce

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.datasets import cifar10
from keras_radam.optimizer_v2 import RAdam
from tensorflow.keras import Model
import numpy as np
from tensorflow.keras.layers import Dense, Flatten, Conv2D, BatchNormalization, MaxPooling2D, AvgPool2D, ReLU

import argparse
import sys

# parse CLI
parser = argparse.ArgumentParser()
parser.add_argument("--batch_size", type=int, default=128)
parser.add_argument("--epochs", type=int, default=150)
parser.add_argument("--learning_rate", type=float, default=0.1)
parser.add_argument("--opt", type=str, default='radam')
parser.add_argument("--cfg", type=int, default=7)
args = parser.parse_args(sys.argv[1:])

cfgs = {
7: [128, 128, 'M', 256, 256, 'M', 512, 512, 'M'],
1: [128, 'M']
}


class VGG(Model):
    def __init__(self, cfg=7, batch_norm=True):
        print(cfg)
        super(VGG, self).__init__()
        self.mylayers = []
        for c in cfgs[cfg]:
            if c == 'M':
                self.mylayers.append(MaxPooling2D(strides=(2,2)))
            else:
                self.mylayers.append(Conv2D(c, (3,3), padding='same'))
                if batch_norm and len(self.mylayers) > 1:
                    self.mylayers.append(BatchNormalization())
                self.mylayers.append(ReLU())
        self.mylayers.append(Flatten())
        self.mylayers.append(Dense(10))
        self.mylayers.append(tf.keras.layers.Softmax())

    def _get_layers(self):
        weights=[]
        for layer in self.layers:
            weights.append(layer.get_weights())
        return weights


    def call(self, x):
        for layer in self.mylayers:
            x = layer(x)
        return x

def make_model(**kwargs):
    a = tf.keras.layers.Input(shape=(32,32,3))
    model = VGG(**kwargs)
    b=model(a)
    return tf.keras.models.Model(inputs=a, outputs=b)


#create the model
model = make_model(cfg=args.cfg)

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Convert class vectors to binary class matrices.
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# create dataset objects
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10000).batch(args.batch_size)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).shuffle(1000).batch(args.batch_size)

# declare loss functions
loss_object = tf.keras.losses.CategoricalCrossentropy()

if args.opt == 'adam':
    optimizer = tf.keras.optimizers.Adam(learning_rate=args.learning_rate)
elif args.opt == 'radam':
    optimizer = RAdam(learning_rate=args.learning_rate)

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(name='test_accuracy')

# declare training step
@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
        predictions = model(images)
        loss = loss_object(labels, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    train_loss(loss)
    train_accuracy(labels, predictions)

# declare test step
@tf.function
def test_step(images, labels):
    predictions = model(images)
    t_loss = loss_object(labels, predictions)

    test_loss(t_loss)
    test_accuracy(labels, predictions)

for epoch in range(args.epochs):
    for images, labels in train_ds:
        train_step(images, labels)

    for test_images, test_labels in test_ds:
        test_step(test_images, test_labels)

    template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
    print(template.format(epoch+1,
                        train_loss.result(),
                        train_accuracy.result()*100,
                        test_loss.result(),
                        test_accuracy.result()*100))

    # Reset the metrics for the next epoch
    train_loss.reset_states()
    train_accuracy.reset_states()
    test_loss.reset_states()
    test_accuracy.reset_states()

Cannot start training with TensorFlow 2.0 and distribute.MirroredStrategy

Describe the Bug
Cannot start training with TensorFlow 2.0 and distribute.MirroredStrategy.

Version Info
TensorFlow 2.0beta1
Python 3.6.8

[ yes] I'm using the latest version

Minimal Codes To Reproduce

strategy = tf.distribute.MirroredStrategy(devices=FLAGS.compute_devices,
           cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())

with strategy.scope():
    optimizer = RAdam(learning_rate=1e-3)
    model.compile(optimizer=optimizer, ... run_eagerly=False)
    model.fit(train_dataset)

Please add weight decay

Can you please add weight decay parameter to optimizer?

AttributeError: 'RAdam' object has no attribute 'apply_gradients'

Describe the Bug

The optimizer has a different API from other optimizers in TF.Keras so when we try to use it as a drop-in replacement for tf.keras.optimizers.Adam, it crashes

Version Info

yes, I'm using the latest version

Minimal Codes To Reproduce

import tensorflow as tf
import os
os.environ['TF_KERAS'] = '1'
from keras_radam import RAdam
optimizer = Radam()
inputs = get_task_inputs(xxxxxx)
with tf.GradientTape() as tape:
     y_pred = model(inputs)
     losses = loss_fn(y_true, y_pred)
gradients = tape.gradient(losses, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))

Traceback (most recent call last):
File "neuromax.py", line 67, in
results = [agent.train() for _ in range(MAX_LOOPS)]
File "neuromax.py", line 67, in
results = [agent.train() for _ in range(MAX_LOOPS)]
File "/home/bion/hax/neuromax/nature/agent.py", line 289, in train
for episode_number in range(EPISODES_PER_PRACTICE_SESSION)]
File "/home/bion/hax/neuromax/nature/agent.py", line 289, in
for episode_number in range(EPISODES_PER_PRACTICE_SESSION)]
File "/home/bion/hax/neuromax/nature/agent.py", line 288, in
for task_key, task_dict in self.tasks.items()]
File "/home/bion/hax/neuromax/nurture/clevr/clevr.py", line 56, in run_clevr_task
agent.train_op(task_id, inputs, loss_fn, y_true, priors)
File "/home/bion/hax/neuromax/nature/agent.py", line 257, in train_op
self.optimizer.apply_gradients(gradients_and_variables)
AttributeError: 'RAdam' object has no attribute 'apply_gradients'

Could not interpret optimizer identifier

I am currently running tensorflow==2.0.0rc0

When I run:

import tensorflow.keras as keras
import numpy as np
from keras_radam import RAdam

# Build toy model with RAdam optimizer
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_shape=(17,), units=3))
model.compile(RAdam(), loss='mse')

I get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/porgull/Desktop/Projects/rfw-keras/env/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/home/porgull/Desktop/Projects/rfw-keras/env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 256, in compile
    self.optimizer = optimizers.get(optimizer)
  File "/home/porgull/Desktop/Projects/rfw-keras/env/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizers.py", line 848, in get
    raise ValueError('Could not interpret optimizer identifier:', identifier)
ValueError: ('Could not interpret optimizer identifier:', <keras_radam.optimizers.RAdam object at 0x7fc9656b8610>)

RAdam in Tensorflow

Are you going to do the RAdam in Tensorflow, which is similar to tf.train.AdamOptimizer? Thanks

init() missing 1 required positional argument: 'name'

I'm using TF 2.0 Beta and got this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-1ddda9576b8d> in <module>
----> 1 model.compile(optimizer=RAdam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

<ipython-input-13-6411f32d9ded> in __init__(self, learning_rate, beta_1, beta_2, epsilon, decay, weight_decay, **kwargs)
     18                  epsilon=None, decay=0., weight_decay=0., **kwargs):
     19         learning_rate = kwargs.pop('lr', None) or learning_rate
---> 20         super(RAdam, self).__init__(**kwargs)
     21         with K.name_scope(self.__class__.__name__):
     22             self.iterations = K.variable(0, dtype='int64', name='iterations')

TypeError: __init__() missing 1 required positional argument: 'name'

Thanks for your work!

Issue about the dtype

Hi! Thanks for your implementation! I would like to use your repo in my own code, but find that the dtype has conflict. My code is written in tf.float 64. When I used your code, it reports the bug that Op has type float64 that does not match type float 32. Is there any way to solve this problem?
Thanks again for your elegant implementation!

min_lr isn't set properly

in __init__:
this line

self.min_lr = K.variable(lr, name='min_lr')

needs to be fixed to:

self.min_lr = K.variable(min_lr, name='min_lr')

What version number is the latest version of keras-rectified-adam?

Very slow implementation

Describe the Bug
After doing some tests with your tensorflow implementation in training.py I realized that your code is unexpectedly slow. I tested RADAM on a ResNet32 trained on CIFAR-10 with the following configuration: Nvidia RTX 2080 ti, Tensorflow 1.15, cuda 10.0, CUDNN 7.6.4.
Your RADAM implementation achieved 2510 train steps per minute.
The tendorflow ADAM implementation achieved 3088 train steps per minute.
I then reimplemented RADAM in a very basic way and achieved 2840 train steps per minute.
Unfortunately, I have no idea why your implementation is that slow.

My basic RADAM implementation for reference:

import tensorflow as tf
class RAdamOptimizer(tf.train.Optimizer):

def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8, use_locking=True):


    super().__init__(use_locking, "RAdam")
    self.learning_rate = learning_rate

    self.beta_1 = float(beta1)
    self.beta_2 = float(beta2)
    self.epsilon = float(epsilon)
    self.roh_inf = 2.0 / (1.0 - beta2) - 1.0
    print("roh_inf=", self.roh_inf)

def apply_gradients(self, grad_var_tuples, global_step, name="train_optimizer"):

    self._train_vars = [x[1] for x in grad_var_tuples]
    self._grads = [x[0] for x in grad_var_tuples]
    return self._initialize_train_ops(global_step)

def _initialize_train_ops(self, global_step, name="train_optimizer"):

    if global_step is None:
        self._global_step = tf.Variable(1.0, trainable=False, name="global_step", dtype=tf.float32)
    else:
        self._global_step = global_step
    self._increase_global_step_op = tf.assign(self._global_step, self._global_step + 1)
    time_step = tf.cast(self._global_step, dtype=tf.float32)
    with tf.variable_scope("RAdam_Variables"):

        self._m_hat_ops = []
        self._v_vars_ops = []
        for grad, var in zip(self._grads, self._train_vars):
            new_var_1 = tf.Variable(tf.zeros(var.shape), trainable=False, name=grad.name[0:-2] + "_m")
            new_var_2 = tf.Variable(tf.zeros(var.shape), trainable=False, name=grad.name[0:-2] + "_v")

            m_op = new_var_1.assign(self.beta_1 * new_var_1 + (1.0 - self.beta_1) * grad,
                                    use_locking=self._use_locking)
            v_op = new_var_2.assign(self.beta_2 * new_var_2 + (1.0 - self.beta_2) * tf.multiply(grad, grad),
                                    use_locking=self._use_locking)
            m_hat_op = m_op / (1.0 - tf.pow(self.beta_1, time_step))
            self._m_hat_ops.append(m_hat_op)
            self._v_vars_ops.append(v_op)

    with tf.name_scope("Weight_Update_Operators"):
        self.weight_vars_assign_ops = []
        roh_t = self.roh_inf - 2.0 * time_step * tf.pow(self.beta_2, time_step) / (
                    1.0 - tf.pow(self.beta_2, time_step))
        r_t = tf.sqrt(((roh_t - 4.0) * (roh_t - 2.0) * self.roh_inf) / ((self.roh_inf - 4.0) * (
                    self.roh_inf - 2.0) * roh_t))  # r_t is nan if roh_t <4 -> wanted bahavior

        def roh_t_greater_4(m_hat_op, v_op):
            v_hat_op = tf.sqrt(v_op / (1 - tf.pow(self.beta_2, time_step)))
            update = -self.learning_rate * r_t * m_hat_op / (v_hat_op + self.epsilon)
            return update

        def roh_t_se_4(m_hat_op):
            update = -self.learning_rate * m_hat_op
            return update

        for weight_matrix, m_hat_op, v_op in zip(self._train_vars, self._m_hat_ops, self._v_vars_ops):
            update = tf.cond(tf.greater(roh_t, 5), lambda: roh_t_greater_4(m_hat_op, v_op),
                             lambda: roh_t_se_4(m_hat_op))
            ass_op = tf.assign_add(weight_matrix, update, use_locking=self._use_locking)
            self.weight_vars_assign_ops.append(ass_op)
        with tf.control_dependencies([self._increase_global_step_op]):
            a = tf.group(self.weight_vars_assign_ops)
        return a

def minimize(self, loss_tensor, global_step=None):
    self._train_vars = tf.trainable_variables()
    self._grads = tf.gradients(loss_tensor, self._train_vars, colocate_gradients_with_ops=True)

    return self._initialize_train_ops(global_step)

To use Warmup

I think to use Warmup, the code should be:

from keras_radam import RAdam

RAdam(total_steps=10000, warmup_proportion=0.1, min_lr=1e-5)

with a "s" in "total_steps".

cyberzhg / keras-radam Goto Github PK

keras-radam's Introduction

Keras RAdam

Install

External Link

Usage

Use Warmup

keras-radam's People

Contributors

Stargazers

Watchers

Forkers

keras-radam's Issues

To view examples of installing some common dependencies, click the "Open Examples" button below.

Recommend Projects

Recommend Topics

Recommend Org

To view examples of installing some common dependencies, click the
"Open Examples" button below.