inkn'hue: Enhancing Manga Colorization from Multiple Priors with Alignment Multi-Encoder VAE

Examples of original grayscale inputs (top-left) and rough color inputs (bottom-left). Final colorizations (right) are from our multi-encoder VAE outputs blended with rough color inputs in CIELAB color space

This repository contains the official PyTorch implementation of inkn'hue

inkn'hue: Enhancing Manga Colorization from Multiple Priors with Alignment Multi-Encoder VAE

Abstract: Manga, a form of Japanese comics and distinct visual storytelling, has captivated readers worldwide. Traditionally presented in black and white, manga's appeal lies in its ability to convey complex narratives and emotions through intricate line art and shading. Yet, the desire to experience manga in vibrant colors has sparked the pursuit of manga colorization, a task of paramount significance for artists. However, existing methods, originally designed for line art and sketches, face challenges when applied to manga. These methods often fall short in achieving the desired results, leading to the need for specialized manga-specific solutions. Existing approaches frequently rely on a single training step or extensive manual artist intervention, which can yield less satisfactory outcomes. To address these challenges, we propose a specialized framework for manga colorization. Leveraging established models for shading and vibrant coloring, our approach aligns both using a multi-encoder VAE. This structured workflow ensures clear and colorful results, with the option to incorporate reference images and manual hints.

Prerequisites

opencv-contrib-python>=4.1.0.25
tensorflow>=2.12.0 (for batch Style2Paints inference)
gradio>=3.20.1
scikit-learn>=0.23.1
scikit-image>=0.14.5
tqdm
numpy<1.24
torch>=2.0.1
torchvision>=0.15.2
rich
matplotlib
einops
Pillow
accelerate
omegaconf
wandb
huggingface-hub
taming-transformers

Setup

Clone this repository

git clone https://github.com/rossiyareich/inknhue.git
cd inknhue

Setup conda environment with Python 3.10 and cudatoolkit>=11.8

conda env create -f environment.yaml
conda activate inknhue

Download model parameters from Hugging Face

rm -r models
git lfs install
git clone https://huggingface.co/rossiyareich/inknhue models

Inference using Gradio

From the project root folder, run

python app.py

Then access the app locally with a browser

You'll need your original b&w manga and a Style2PaintsV4.5 colorized version for the model inputs. The model performs best with blended_smoothed_careless priors. Learn more about Style2Paints here: lllyasviel/style2paints

Pipeline

Expanded overview of the framework. Our pipeline utilizes trained parameters from related works including Style2Paints (shown in green), manga-colorization-v2 (shown as "Shading generator"), and Tag2Pix extractor (shown as "SEResNeXt LFE" (Local Feature Extractor)). The framework aligns results from the shading generator (shaded grayscale) and Style2Paints (rough-colored) using an alignment variational autoencoder and an auxiliary alignment encoder (shown in violet). The input consists of the original manga pages (bottom-leftmost), along with the color hints and/or reference images (top-leftmost) that are to be used as local and global color hints, respectively. The outputs from the last-stage model are then interpolated with the rough-colored outputs (shown in red) based on a user-defined interpolation value to produce the most appealing final colorized results (top-rightmost).

Overview of the stages of our colorization framework. Starting with the original image (1), the shading model generates a shaded grayscale version (3). Alongside this, the colorization model produces an initial rough-colored version (4) guided by additional cues provided by user-inputted color hints and/or a reference image (2). The combination model combines both the shaded (3) and rough-colored (4) stages, interpolating colors from the latter to produce the final colorization result (5).

Architecture

Architectural diagram of the alignment multi-encoder variational autoencoder. The number of feature dimensions of the output are depicted at the top, while the input resolutions are indicated at the bottom of each subnetwork block.

Results

Significance of the generator. Detail lost from the Style2Paints process are restored, and more accurrate shading is achieved.

Significance of post-processing. The generator may desaturate or overcorrect for inaccurate colors without post-processing.

Qualitative comparison. Additional outputs from each model stage and comparisons to manga-colorization-v2 are shown.

Additional colorization results are shown.

References

Tag2Pix: https://github.com/blandocs/Tag2Pix
manga-colorization-v2: https://github.com/qweasdd/manga-colorization-v2
Style2Paints: https://github.com/lllyasviel/style2paints

BibTeX

If you use our work in your research, please cite our arXiv article

@misc{jiramahapokee2023inknhue,
      title={inkn'hue: Enhancing Manga Colorization from Multiple Priors with Alignment Multi-Encoder VAE}, 
      author={Tawin Jiramahapokee},
      year={2023},
      eprint={2311.01804},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

syunar / inknhue Goto Github PK

inknhue's Introduction

inkn'hue: Enhancing Manga Colorization from Multiple Priors with Alignment Multi-Encoder VAE

Prerequisites

Setup

Inference using Gradio

Pipeline

Architecture

Results

References

BibTeX

inknhue's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent