Code Monkey home page Code Monkey logo

ssupergan's Introduction

Self-Supervised Face Generation using Panel Context Information (SSuperGAN)

This model tries to generate masked faces of the characters given the previous sequential frames.

Notes:

This repository is not fully completed!

Datasets:

The whole panel data is processed by a cartoon Face Detector model (which can be found in here) by using mixed_r50 weights and by setting confidence threshold to 0.55 and nms threshold to 0.2. The following statistics are retrieved from the outputs of the detector model.

  • ** Total files:** 1229664
  • Total files with found faces: 684885
  • Total faces: 1063804
  • Faces above 64px: 309079 / 521089 (min(width, height) >= 64 / max(width, height) >= 64)
  • Faces above 128px: 75111 / 158988 (min(width, height) >= 128 / max(width, height) >= 128)
  • Faces above 256px: 13214 / 27471 (min(width, height) >= 256 / max(width, height) >= 256)
  • Panel Height: mean=510.0328 / median=475 / mode=445
  • Panel Width: mean=508.4944 / median=460 / mode=460

Model Architecture

gmodel

Results

Visual Results

Result 1

Result 2

Metric Results

WIP

Pretrained Models and Links

  • Face detection (Siamese) on iCartoonDataface (~%86 test acc) link
  • Google Sheet for recording Experiment Results

Modules

USING GOLDEN AGE DATA

  • In order to run the module 'golden_age_config.yaml' file should be created under configs.
  • Example Config:
# For directly face generation task
faces_path: /userfiles/comics_grp/golden_age/faces_128/
face_train_test_ratio: 0.9

# For panel face reconstruction task
panel_path: /datasets/COMICS/raw_panel_images/
sequence_path: /userfiles/comics_grp/golden_age/panel_face_areas.json
annot_path: /userfiles/comics_grp/golden_age/face_annots/
mask_val: 1
mask_all: False
return_mask: False
return_mask_coordinates: False
train_test_ratio: 0.95
train_mode: True
panel_dim: 
    - 300
    - 300

USING PLAIN SSUPERVAE MODULE

  • To train the PlainSSuperVAE network, you have to specify the following parameters in the ssupervae_config.yaml file under the configs folder. To use the LSTM structure, simply set the flag use_lstm to True.
# Encoder Parameters
backbone: "efficientnet-b5"
embed_dim: 256
latent_dim: 256 
use_lstm: False

# Plain Encoder Parameters
seq_size: 3

# LSTM Encoder Parameters
lstm_hidden: 256
lstm_dropout: 0
lstm_bidirectional: False
fc_hidden_dims: []
fc_dropout: 0
num_lstm_layers: 1
masked_first: True

# Decoder Parameters
decoder_channels:
    - 64
    - 128
    - 256
    - 512
image_dim: 64

# Training Parameters
batch_size: 4
train_epochs: 100
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
g_clip: 100

VAE MODULE

  • In order to run the module 'vae_config.yaml' file should be created under configs.
  • Example Config:
num_training_samples: 30000
num_test_samples: 10240
test_samples_range:
    - 10240
    - 10640
image_dim: 64
batch_size: 64
train_epochs: 100
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
latent_dim_z: 256
g_clip: 100
channels:
    - 64
    - 128
    - 256
    - 512

INTRO VAE MODULE

  • In order to run the module 'intro_vae_config.yaml' file should be created under configs.
  • Example Config:
face_image_folder_train_path: /home/gsoykan20/Desktop/ffhq_thumbnails/thumbnails128x128/
face_image_folder_test_path: /home/gsoykan20/Desktop/ffhq_thumbnails/thumbnails128x128/
num_training_samples: 100
test_samples_range:
    - 10240
    - 10640
image_dim: 64
batch_size: 32
train_epochs: 200
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
latent_dim_z: 256
g_clip: 100
channels:
    - 64
    - 128
    - 256
    - 512

# Check paper for the meaning of this params https://arxiv.org/abs/1807.06358
adversarial_alpha: 0.25
ae_beta: 5
adversarial_margin: 110

DCGAN MODULE

  • In order to run the module 'dcgan_config.yaml' file should be created under configs.
  • Example Config:
# For directly face generation task
dataroot : "data/celeba"
# Number of workers for dataloader
workers : 4

# Batch size during training
batch_size : 128

# Spatial size of training images. All images will be resized to this
#   size using a transformer.
image_size : 64

# Number of channels in the training images. For color images this is 3
nc : 3

# Size of z latent vector (i.e. size of generator input)
nz : 100

# Size of feature maps in generator
ngf : 64

# Size of feature maps in discriminator
ndf : 64

# Number of training epochs
num_epochs : 150

# Learning rate for optimizers
lr : 0.0002

# Beta1 hyperparam for Adam optimizers
beta1 : 0.5

# Number of GPUs available. Use 0 for CPU mode.
ngpu : 1

# Dataset path
#dataset_path : "/userfiles/ckoksal20/img_align_dataset"
dataset_path : "/kuacc/users/ckoksal20/img_align_dataset"

SSuper_DCGAN MODULE

  • In order to run the module 'ssuper_dcgan_config.yaml' file should be created under configs.
  • Example Config:
# Encoder Parameters
backbone: "efficientnet-b5"
embed_dim: 256
latent_dim: 100 
use_lstm: False

# Plain Encoder Parameters
seq_size: 3

# LSTM Encoder Parameters
lstm_hidden: 256
lstm_dropout: 0
fc_hidden_dims: []
fc_dropout: 0
num_lstm_layers: 1
masked_first: True


# Training Parameters
batch_size: 8
train_epochs: 300
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
g_clip: 100


#
# Spatial size of training images. All images will be resized to this
#   size using a transformer.
image_dim : 64
# Number of channels in the training images. For color images this is 3
nc : 3
# Size of z latent vector (i.e. size of generator input)
nz : 100
# Size of feature maps in generator
ngf : 64
# Size of feature maps in discriminator
ndf : 64
# Number of GPUs available. Use 0 for CPU mode.
ngpu : 1

Context Attention MODULE

  • In order to run the module 'vae_context_attn_config.yaml' file should be created under configs.
  • Example Config:
# Encoder Parameters
backbone: "efficientnet-b5"
seq_size: 3
embed_dim: 256

# Decoder Parameters
latent_dim: 256
decoder_channels:
    - 64
    - 128
    - 256
    - 512
image_dim: 64

# Training Parameters
batch_size: 1
train_epochs: 100
lr: 0.0001
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.9
g_clip: 100

# contextual attention related
compute_g_loss: True
coarse_l1_alpha: 1.2
l1_loss_alpha: 1.2
ae_loss_alpha: 1.2

global_wgan_loss_alpha: 1.
gan_loss_alpha: 0.001
wgan_gp_lambda: 10

netG:
  input_dim: 3
  ngf: 16

netD:
  input_dim: 3
  ndf: 32

Glbal & Local Discrimination MODULE

  • In order to run the module 'global_local_disc_config.yaml' file should be created under configs.
  • Example Config:
global_wgan_loss_alpha: 1.
gan_loss_alpha: 0.001
wgan_gp_lambda: 10

SSUPER_MSGGAN Module

  • In order to run the module 'ssuper_msggan_config.yaml' file should be created under configs.
  • Example Config:
# Encoder Parameters
backbone: "efficientnet-b5"
embed_dim: 256
latent_dim: 512 
use_lstm: False

# Plain Encoder Parameters
seq_size: 3

# LSTM Encoder Parameters
lstm_hidden: 256
lstm_dropout: 0
fc_hidden_dims: []
fc_dropout: 0
num_lstm_layers: 1
masked_first: True

image_dim : 64

# Training Parameters
batch_size: 4
train_epochs: 100
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
g_clip: 100


depth : 5
use_eql : False
use_ema : False
ema_decay : 0.999

g_lr : 0.003

d_lr : 0.001

loss_function : "relativistic-hinge"

Project Based Configuration

One should check and update 'configs/base_config' for global config parameters such base project directory.

ssupergan's People

Contributors

gsoykan avatar barisbatuhan avatar caghankoksal avatar

Stargazers

Baris Yazici avatar  avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.