Code Monkey home page Code Monkey logo

digit-gan's Introduction

Digit Creation with GANs

Generative Adversarial Networks (GANs) were developed by Ian J. Goodfellow in 2014. They can be used to generate complex colour images such as faces. Images created by GANs can look so real that it is practically impossible to distinguish between real and fake images. However, generating complex images of these level of realism requires a large amount of resources to train the network.

GAN - General Adversarial Network

In a GAN two neural networks, a generator and a discriminator are trained simultaneously by an adversarial process. The generator (artist) learns to create images that look real. The discriminator (critic) learns to detect fake images. The two competing models try to beat each other. The goal is to train the generator to outperform the discriminator.

Table of Contents

How does GAN work?

Training a GAN consists of two parts:

  1. While the generator remains idle, the discriminator is trained on real images for a number of epochs to see if it can correctly predict them as real. In the same phase the discriminator is trained on fake images to (generated by the generator) to see if it can predict them as fake.
  2. While the discriminator is idle, train the generator and use the results from the discriminator to improve the images. These steps are repeated for a large number of epochs. The results (fake images) are examined manually to determine if they look real. If they look real the training is stopped. If not the preceding two steps are repeated until the fake images appears real. This process is depicted in Figure 1.

Figure 1 - GAN working schematic

image Source: Sarang(2021)

Model Architecture

The Generator

The generator schematic is shown in Figure 2. The generator takes a random noise vector of a given dimension, in this example a dimension of 100 is used. Using this vector, an image of 64x64x3 is generated. The image is upscaled in a series of transitions through convolutional layers. Each convolutional layer is followed by a batch normalisation and leaky ReLU. The leaky ReLU has neither the dying ReLU or vanishing gradient problems. Strides are used in each convolutional layer to avoid unstable training.

Figure 2 - Generator architecture

image Source: Sarang(2021)

The Discriminator

The discriminator schematic is shown in Figure 3. The discriminator downsizes the given image for evaluation using convolutional layers.

Figure 3 - Discriminator schematic

image Source: Sarang(2021)

Model Architecture

Defining the Generator

The purpose of the generator is to create images containing the digit 5 which look similar to the images in the training dataset. A Keras sequential model is used to create the generator, show in Figure 4. A summary of the generator model is shown in Figure 5.

Figure 4 - Generator architecture

image

Figure 5 - Generator model summary

image

Testing the Generator

Test output from the generator is shown in Figure 6.

Figure 6 - Test Generator output

image

Defining the discriminator model

The discriminator model uses just two convolutional layers. The output of the last convolutional layer is of type (batch size, height, width, filters). The Flatten layer in the network flattens this output to feed it into the last Dense layer in the network.

Figure 7 - Discriminator model architecture

image

Figure 8 - Discriminator model summary

image

Testing the Discriminator

The discriminator can be tested by feeding it with the earlier generated image. The discriminator will give a negative value of the image is fake and a positive value if it is real.

Figure 9 - Discriminator test output

image

The decision value is -0.0014415, a negative value indicating that the image is fake.

Defining Loss Functions

Keras’s binary cross-entropy is used for the loss function as there are two classes (1) for a real image and (0) for a fake one. The return value of the function is how well the generator tricks the discriminator. If the generator performs well, the discriminator will classify the fake image as real, returning a decision of 1. The function compares the discriminators decision on the generated image with an array of ones. The discriminator first considers the image is real and computes the loss with respect to an array of ones. The discriminator then considers the image is fake and computes the loss with respect to an array of zeros. The total loss determined by the discriminator is the sum of the two losses.

Model Training

The generator and discriminator models are trained in several steps. Gradient tape is used for automatic differentiation on both the generator and the discriminator.

At each step, a batch of images is given to the function as an input. The discriminator is asked to produce outputs for both the training and generated images. The training output is called as real and the generated image as fake. The generator loss is calculated on the fake image, and the discriminator loss on the real and fake. Gradient tape is used to compute the gradients on both of these losses and apply new gradients to the models.

Figure 10 - Images for the digit 5 generated by the GAN

image
The GAN can create an acceptable output after just 20 / 30 epochs with better quality above 70 epochs.

Summary

Generative Adversarial Networks (GANs) provide a methods for imitating given images. GANs consist of two networks - a generator and discriminator which are trained simultaneously in an adversarial process. In this repo a GAN network was constructed and trained on handwritten digits, alphabets and anime characters. Training a GAN can require huge resources, but the results can be impressive. GANs have been successfully applied to many applications from generating images for large datasets, creating celebrity faces and generating emojis from photos.

Acknowledgements

Artificial Neural Networks with TensorFlow 2, ANN Architecture Machine Learning Projects (2021) Poornachandra Sarang

Connect with me

digit-gan's People

Contributors

matthewbishop58 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.