Font Interpolation

We aim to address the challenge of limited fonts for low-resource languages, which results in reduced Scene Text Recognition accuracy. Given two font families, our focus is on creating multiple fonts by interpolating between them. These newly generated fonts can be utilized to create synthetic datasets for model training, thereby improving OCR accuracy on real-world data. This work only targets Latin font generation.

This repository contains the code for:

Generating images from TTF Fonts.
Performing Font interpolation using the DiffAE model.
Generating TTF fonts from interpolated images.
Generating synthetic datasets from interpolated fonts.
Training and benchmarking OCR model on the generated datasets.

Requirements:

Create a conda envrionment with the given requirement.txt file.

Instructions:

Generate images from TTF fonts:
- Run font2image.py script to extract images from the ttf font.
Font Interpolation:
- Various off-the-shelf GAN & diffusion models have been explored for Font Interpolation, some mentioned in the Literature section.
- We opted for Diffusion autoencoder (DiffAE) as the baseline due to its promising results in general image interpolation.
- The architecture was slightly modified by adding an MLP for font classification, aiming to improve the image latent space of the font. The model was trained on a synthetic dataset generated using Synthtiger (1M images with 5 fonts). The OCR accuracy results below are from this model. However, we also experimented with training the model solely on font images from all English fonts (~200K images), observing overfitting.
- Checkpoints: /home/data/submit/WORKING/devesh/Font-Interpolation/diffae_checkpoints
Training:
1. Run /diffae/run_fonts_ddim.py to train the model
2. Note that we only need to train the encoder, we don't train latent DPM.
Inference:
1. Run /diffae/interpolate.ipynb for interpolation
2. Run /diffae/interpolate_pullback.ipynb for interpolating through parallel transport, as mentioned in this paper(https://arxiv.org/abs/2307.12868)
3. For running interpolation on all the pairs, run /diffae/prepare_fonts_nc2.py to build pairs of fonts to be interpolated, run /diffae/batch_intp.py to perform interpolation.
Generating TTF fonts from interpolated images:
1. Each font interpolation yields 10 images (2 original and 8 interpolated). To select a character image from these, we utilize an English OCR model. Among the 3rd to 7th interpolated images, we choose the one that the model correctly predicts with the lowest confidence. This approach ensures both correctness and variability.
2. /img2ttf/place_font_on_template.py script will automatically place the font images at appropriate places in the given template(this greatly saves time to create fonts manually 😊). You also need to download checkpoint of [PARSeq] (https://github.com/baudm/parseq) OCR model.
3. Upload this template to https://www.calligraphr.com/ for generating ttf fonts.
Generating synthetic datasets from interpolated fonts: We use synthtiger to generate Synthetic Data from the given fonts
Training and benchmarking OCR model on the generated datasets: We use [PARSeq] (https://github.com/baudm/parseq) model to train and benchmark OCR results. Results of OCR accuracy with model trained on 1M synthetic data generated using 10 random real fonts vs 10 + 45 addional interpolated fonts.

Fonts Word Accuracy Character Accuracy

10 random original fonts (1M) 70.56 87.12

10 original + 45 Interpolated fonts (1+1M) 81.45 92.19

Fonts	Word Accuracy	Character Accuracy
10 random original fonts (1M)	70.56	87.12
10 original + 45 Interpolated fonts (1+1M)	81.45	92.19

Limitations and Difficulties in the Current Work:

The model architecture has limited novelty. Additionally, in terms of qualitative results, we don't achieve the same level as mentioned in the existing works (Link).
The OCR results are satisfactory, but we are comparing results after training the model on datasets of 1M images vs 2M images. It's suggested to generate 2M images with the original 10 fonts and then compare accuracy.
The Pullback and parallel transport utilized in interpolate_pullback.ipynb isn't working as expected 😔.
Difficulty in Indic Language Font Interpolation:
Generating fonts from images of Devanagari characters presents a challenge, primarily due to the presence of Matras or Half Characters. These special characters require precise positioning in the template while creating TTF font. For further details, please refer to: Microsoft Typography - Devanagari Script Development.

Future Work:

Debug interpolate_pullback.ipynb.
Explore Bezier curve direction for interpolation in curve space instead of image space.
Explore Indic languages, if not Devanagari, then perhaps some simpler but low-resource language.

Literature for Font Interpolation

https://www.lucasfonts.com/learn/interpolation-theory- Talks about interpolation between the fonts belonging to the same family. Focuses on generating fonts with different stroke widths.
Learning a Manifold of Fonts - (SIGGRAPH 2014) https://dl.acm.org/doi/pdf/10.1145/2601097.2601212
Attribute2Font: Creating Fonts You Want From Attributes- (SIGGRAPH 2022) https://dl.acm.org/doi/pdf/10.1145/3386569.3392456 (https://github.com/hologerry/Attr2Font)
Font Style Interpolation with Diffusion Models (https://arxiv.org/html/2402.14311v1)
SKELETON ENHANCED GAN-BASED MODEL FOR BRUSH HANDWRITING FONT GENERATION(https://arxiv.org/pdf/2204.10484.pdf)
Neural Transformation Fields for Arbitrary-Styled Font Generation (https://openaccess.thecvf.com/content/CVPR2023/papers/Fu_Neural_Transformation_Fields_for_Arbitrary-Styled_Font_Generation_CVPR_2023_paper.pdf)

shashankkrishnav / font-interpolation-iitd Goto Github PK