Code Monkey home page Code Monkey logo

gaaf-keras's Introduction

GAAF-Keras

This is a Keras implementation of "Gradient Acceleration in Activation Functions (GAAF)". This repository includes ResNet_v1 and ResNet_v2 implementation to test and compare GAAF with original activation functions.

Gradient Acceleration in Activation Functions

"Gradient Acceleration in Activation Functions" proposes a new technique for activations functions, gradient acceeration in activation function (GAAF), that accelerates gradients to flow even in the saturation area. Then, input to the activation function can climb onto the saturation area which makes the network more robust because the model converges on a flat region.

GAAF

Experimental Results in the Paper

Prerequisites

  • Python 3.x
  • Keras (Tensorflow backend)

Prepare Data set

This repository use Cifar10 dataset. When you run the training script, the dataset will be automatically downloaded.

Change Activation Function

This repository supports two GAAF: GAAF_relu and GAAF_tanh. For the GAAF_relu, It uses shifted sigmoid function as shape function and you can set the shift parameter. For the GAAF_tanh, It uses modified Gaussian function with a peak point at y=1.

To change activation function, you can set activation on main.py with GAAF_relu, GAAF_tanh, and relu.

Supportive CNN Models

You can train and test with base CNN model listed below.

  • ResNet_v1 (e.g. ResNet20, ResNet32, ResNet44, ResNet56, ResNet110, ResNet164, ResNet1001)
  • ResNet_v2 (e.g. ResNet20, ResNet56, ResNet110, ResNet164, ResNet1001)

Train a Model

You can simply train a model with main.py.

  1. Set depth for ResNet model
    • e.g. depth=20
  2. Define a model you want to train.
    • e.g. model = resnet_v1.resnet_v1(input_shape=input_shape, depth=depth, activation=activation)
  3. Set other parameter such as batch_size, epochs, data_augmentation and so on.
  4. Run the main.py file
    • e.g. python main.py

Test Results

I conducted some experiments on ResNet20_v1 by replacing the original activation (relu) with GAAF_relu and the results is described as below.

num data backbone activation shift steps acc batch_size optimizer lr
baseline cifar10 resnet20_v1 relu - 200 0.8084 128 adam 0.001
ex1 cifar10 resnet20_v1 GAAF_relu 5 200 0.6517 128 adam 0.001
ex2 cifar10 resnet20_v1 GAAF_relu 4 200 0.6924 128 adam 0.001
ex3 cifar10 resnet20_v1 GAAF_relu 3 200 0.7042 128 adam 0.001
ex4 cifar10 resnet20_v1 GAAF_relu 2 200 0.7441 128 adam 0.001
ex5 cifar10 resnet20_v1 GAAF_relu 1 200 0.78 128 adam 0.001
ex6 cifar10 resnet20_v1 GAAF_relu 0 200 0.7886 128 adam 0.001
ex7 cifar10 resnet20_v1 GAAF_relu -0.5 200 0.7945 128 adam 0.001
ex8 cifar10 resnet20_v1 GAAF_relu -1 200 0.7948 128 adam 0.001
ex9 cifar10 resnet20_v1 GAAF_relu -2 200 0.78 128 adam 0.001
ex10 cifar10 resnet20_v1 GAAF_relu -3 200 0.7768 128 adam 0.001
ex11 cifar10 resnet20_v1 GAAF_relu -4 200 0.7733 128 adam 0.001
ex12 cifar10 resnet20_v1 GAAF_relu no s(x) 200 0.6603 128 adam 0.001
ex13 cifar10 resnet20_v1 GAAF_relu(K.round) -1 200 0.8054 128 adam 0.001

GAAF_relu actually did not give any improvement for every experiments. At the beginning of the experiment, I thought shift=4 will give the best performance as shifted sigmoid shape is very similar with the shape function that GAAF paper suggested. However, shift=-1 gives the best performance but still lower than the baseline. Interesting thing is, when I changed mut-tf.floor(mut) with K.abs(mut-K.round(mut)), it gives the best performance but less than the baseline. (Keras backend function does not have floor operation, so I attempted to use K.round)

If there is any implementation error in this repository, please let me know.

Related Works

Reference

Author

Byung Soo Ko / [email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.