Code Monkey home page Code Monkey logo

fitrialif / smate--syntheticminorityadversarialtechnique Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pablo-tech/smate--syntheticminorityadversarialtechnique

0.0 1.0 0.0 48.78 MB

The novel SMate approach leverages GAN minority-class image generators, which benefit from Transfer Learning from majority-class image generators. Consequently, SMate outperforms SMOTE for imbalanced image data-sets. Research at Stanford University, by: Pablo Rodriguez Bertorello, Liang Ping Koh

Jupyter Notebook 98.83% Python 1.14% Shell 0.03%

smate--syntheticminorityadversarialtechnique's Introduction

SMate: Synthetic Minority Adversarial Technique

Team

  • Pablo Rodriguez Bertorello, Computer Science, Stanford University
  • Liang Ping Koh, Statistics, Stanford University

Abstract

In important prediction scenarios, data-sets are naturally imbalanced, for instance in cancer detection: a small minority of people may exhibit the disease. This poses a significant classification challenge to machine learning algorithms. Data imbalance can cause lower performance for the class of interest, e.g. classifying with high precision that the person has cancer. When training data is abundant, a possible approach is to down-sample the majority class, thus restoring balance. Another prevalent approach is weighting, accelerating learning for minority class training examples. Synthesis is a major alternative, producing examples of the minority class, adding them to the training set to overcome the class imbalance. The Synthetic Minority Over-sampling Technique, SMOTE is widely applied, but it was not developed for image data. Rather, this research applies Generative Adversarial Networks, which generate image examples drawn from the minority class distribution. The novel SMate approach leverages GAN minority-class image generators, which benefit from Transfer Learning from majority-class image generators. Consequently, SMate outperforms SMOTE for imbalanced image data-sets.

Poster

picture

Code

A library is published, composed of several classes implemented for plug-and-play experimentation with different GAN architectures:

  • Main: searches for hyper-parameters including GAN architecture, optimizer, learning rate, batch size. For experiment reproduce-ability, it sets random seeds
  • Data: pre-configured data-sets include CIFAR10 and MNIST. It includes methods to selects the classes of interest from a given dataset, performing data normalization and augmentation
  • Augmentation: Flip, Crop, GaussianBlur, ContrastNormalization, AdditiveGaussianNoise, Multiply, Afine
  • Architecture: pre-loaded GAN architectures include Brownlee and Atienza. Adding new architectures takes few minutes, by simply specifying the corresponding Sequence file
  • Loss: As necessary, different loss functions can be utilized, with examples provided: GAN, WGAN, LSGAN, DRAGAN, HINGE
  • Adversarial: instantiates a Discriminator network with a D neural net in it, and a Generator network with a G and the same D networks in it. The learning rate is set lower for the Discriminator, to ease training convergence
  • Trainer: concurrently trains the GAN's Generator and Discriminator, tracks metrics, saves logs. Depending on configuration, it dynamically adapts learning rate, and calls for early termination of training
  • Sequence: can be instantiated to either build Generator or a Distributor model. It is able to transfer learning from a pre-existing model. And it can adapt during training by swapping its optimizer and learning rate
  • Optimizer: is a factory that includes Adam and RMSProp, where few lines of code are required to add addititional optimizers
  • Configuration: folders used for models, logs, images generated, data-sets, sampled data-sets. Also training settings like number of steps, thresholds for adaptation and termination
  • Util: useful methods for folder creation, plotting single and grids of images, logging

How To:

To train a GAN to generate minority class examples use, start with:

onenow_gan_main_generator_train.ipynb

The performance of the GAN as well as prior-art methods is evaluated here:

Classifier.ipynb

Key helper classes:

- onenow_gan_factory_adversarial.py abstracts a General Adversarial Network
- onenow_gan_factory_sequential.py abstracts a Generator of Discriminator within a GAN
- onenow_gan_factory_optimizer.py is a factory of optimizers

The library plug-and-plays with different GAN architectures, under /src/architecture

smate--syntheticminorityadversarialtechnique's People

Contributors

pablo-tech avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.