awei669 / vq-font Goto Github PK

[ICCV 2023] Few shot font generation via transferring similarity guided global and quantization local styles

Home Page: https://arxiv.org/abs/2309.00827

Python 45.42% Shell 0.17% Jupyter Notebook 54.40%

font-generation pytorch iccv2023 contrastive-learning vq-vae few-shot-learning

vq-font's Introduction

Few shot font generation via transferring similarity guided global and quantization local styles（ICCV2023）

Official Pytorch Implementation of "Few shot font generation via transferring similarity guided global and quantization local styles" by Wei Pan, Anna Zhu*, Xinyu Zhou, Brian Kenji Iwana, and Shilin Li.

Our method is based on Vector Quantization, so we named our FFG method VQ-Font.

Paper can be found at ./Paper_IMG/ | Arxiv｜CVF.

Abstract

Automatic few-shot font generation (AFFG), aiming at generating new fonts with only a few glyph references, reduces the labor cost of manually designing fonts. However, the traditional AFFG paradigm of style-content disentanglement cannot capture the diverse local details of different fonts. So, many component-based approaches are proposed to tackle this problem. The issue with component-based approaches is that they usually require special pre-defined glyph components, e.g., strokes and radicals, which is infeasible for AFFG of different languages. In this paper, we present a novel font generation approach by aggregating styles from character similarity-guided global features and stylized component-level representations. We calculate the similarity scores of the target character and the referenced samples by measuring the distance along the corresponding channels from the content features, and assigning them as the weights for aggregating the global style features. To better capture the local styles, a cross-attention-based style transfer module is adopted to transfer the styles of reference glyphs to the components, where the components are self-learned discrete latent codes through vector quantization without manual definition. With these designs, our AFFG method could obtain a complete set of component-level style representations, and also control the global glyph characteristics. The experimental results reflect the effectiveness and generalization of the proposed method on different linguistic scripts, and also show its superiority when compared with other state-of-the-art methods.

The model receives several style reference characters (from the target style) and content characters (from the source font) to generate style-transformed characters.

Usage

Dependencies

python >= 3.7
torch >= 1.12.0
torchvision >= 0.13.0
sconf >= 0.2.5
lmdb >= 1.2.1

Data Preparation

Images and Characters

Collect a series of '.ttf'(TrueType) or '.otf'(OpenType) files to generate images for training models. and divide them into source content font and training set and test set. In order to better learn different styles, there should be differences and diversity in font styles in the training set. The fonts we used in our paper can be found in here.
Secondly, specify the characters to be generated (including training characters and test characters), eg the first-level Chinese character table contains 3500 Chinese characters.

trian_val_3500: {乙、十、丁、厂、七、卜、人、入、儿、匕、...、etc}
train_3000: {天、成、在、麻、...、etc}
val_500: {熊、湖、战、...、etc}

Convert the characters in the second step into unicode encoding and save them in json format, you can convert the utf8 format to unicode by using hex(ord(ch))[2:].upper():, examples can be found in ./meta/.

trian_val_all_characters: ["4E00", "4E01", "9576", "501F", ...]
train_unis: ["4E00", "4E01", ...]
val_unis: ["9576", "501F", ...]

After that, draw all font images via ./datasets/font2image.py. All images are named by 'characters + .png', such as ‘阿.png’. Organize directories structure as below, and train_3000.png means draw the image from train_unis: ["4E00", "4E01", ...].

Font Directory
|--| content
| --| kaiti4train_VAE
| --| train_3000.png
| --| ...
| --| kaiti4val_VAE
| --| val_500.png
| --| ...
| --| kaiti4train_FFG
| --| trian_val_3500.png
| --| ...
|--| train
| --| train_font1
| --| train_font2
| --| trian_val_3500.png
| --| ...
| --| ...
|--| val
| --| val_font1
| --| val_font2
| --| trian_val_3500.png
| --| ...
| --| ...

Build meta files and lmdb environment

Run script ./build_trainset.sh

 python3 ./build_dataset/build_meta4train.py \
 --saving_dir ./results/your_task_name/ \
 --content_font path\to\all_content \
 --train_font_dir path\to\training_font \
 --val_font_dir path\to\validation_font \
 --seen_unis_file path\to\train_unis.json \
 --unseen_unis_file path\to\val_unis.json

Training

The training process is divided into two stages: 1）Pre-training the content encoder and codebook via VQ-VAE, 2）Training the few shot font generation model via GAN.

Pre-train VQ-VAE

When pre-training VQ-VAE, the reconstructed character object comes from train_unis in the content font, The training process can be found at ./model/VQ-VAE.ipynb.

Then use the pre-trained content encoder to calculate a similarity between all training and test characters and store it as a dictionary.

{'4E07': {'4E01': 0.2143, '4E03': 0.2374, ...}, '4E08': {'4E01': 0.1137, '4E03': 0.1020, ...}, ...}

Few shot font generation

Modify the configuration in the file ./cfgs/custom.yaml

Keys

work_dir: the root directory for saved results. (keep same with the saving_dir above)
data_path: path to data lmdb environment. (saving_dir/lmdb)
data_meta: path to train meta file. (saving_dir/meta)
content_font: the name of font you want to use as source font.
all_content_char_json: the json file which stores all train and val characters.
other values are hyperparameters for training.

Run scripts

python3 train.py task_name cfgs/custom.yaml
  #--resume \path\to\your\pretrain_model.pdparams

Test

Run scripts

python3 inference.py ./cfgs/custom.yaml \
--weight \path\to\saved_model.pdparams \
--content_font \path\to\content_imgs \
--img_path \path\to\test_imgs \
--saving_root ./infer_res

Citation

If you find the code or paper helpful, please consider citing our paper.

@InProceedings{Pan_2023_ICCV,
    author    = {Pan, Wei and Zhu, Anna and Zhou, Xinyu and Iwana, Brian Kenji and Li, Shilin},
    title     = {Few Shot Font Generation Via Transferring Similarity Guided Global Style and Quantization Local Style},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year      = {2023},
    pages     = {19506-19516}
}

Acknowledgements

Our approach is inspired by FS-Font, code is modified based on the LF-Font and VQ-VAE.

vq-font's People

Contributors

Stargazers

Watchers

Forkers

cv-synthesis biwana aniketgurav aceliuchanghong nanigash1 gabrielprogramist

vq-font's Issues

关于第二阶段训练多进程处理数据，can't pickle Environment objects

当在cfg中设置n_workers大于0时，终端就会报错，提示：
File "train.py", line 211, in train
trainer.train(trn_loader, st_step, cfg["iter"], component_objects, chars_sim_dict)
File "F:\tools\VQ-Font-main\trainer\combined_trainer.py", line 40, in train
content_imgs, trg_unis, style_sample_index, trg_sample_index, ref_unis) in cyclize(loader):
File "F:\tools\VQ-Font-main\datasets\datautils.py", line 12, in cyclize
for x in loader:
File "D:\anaconda3\envs\vq\lib\site-packages\torch\utils\data\dataloader.py", line 438, in iter
File "D:\anaconda3\envs\vq\lib\site-packages\torch\utils\data\dataloader.py", line 1048, in init
self._popen = self._Popen(self)
return _default_context.get_context().Process._Popen(process_obj)
File "D:\anaconda3\envs\vq\lib\multiprocessing\context.py", line 322, in _Popen
TypeError: can't pickle Environment objects

当想避免使用多进程处理数据，将n_works设置为0。此时又出现了一个新的Error：
Traceback (most recent call last):
File "F:/tools/VQ-Font-main/train.py", line 222, in
main()
File "F:/tools/VQ-Font-main/train.py", line 218, in main
train(args, cfg)
File "F:/tools/VQ-Font-main/train.py", line 127, in train
data_meta = load_json(cfg.data_meta) # load train.json
KeyboardInterrupt

想请教一下，多进程处理数据时，应该如何辨别Environment对象？此对象是否为代码中出现的 env ？关于避免多线程处理数据后，出现的新Error应该怎么应对呢？

代码运行环境 Windows 11; python 3.7

训练图片问题

您好我想问一下这个图片第一行第二行都分别表示什么，然后就是为什么我的第三行生成字体和第二行字体内容风格均一样啊，我的图片格式如下

代码里面的一些问题

感谢你的工作，我也打算做少样本字体生成方向，我复现了一下你的模型，发现代码里面有少量的问题，torch.concat应该改为torch.cat，否则代码会报错

训练问题

你好，这个是哪里空值了，怎么解决呢？

I met some problems during the training process

At the end of the training session, I encountered the following issue while trying to save the loss data

关于图3

你好，我想请问一下图3的那个可视化是怎么弄的？？可以具体的说一下吗？

训练VQ-VAE问题

训练VQ-VAE时，抛出错误StopIteration

模型训练问题

请问代码中的训练参数是最终的模型训练参数吗？如果不是的话，请问您有最终生成效果比较好的模型参数吗？
然后还想问一下正常模型训练L1 loss大概会在多少轮的时候收敛？收敛到大概多少

评估问题

训练后的图片是一张许多图片拼接而成的大图，请问我如何才能获得单一图片，来计算进行模型评估指标的

lmdb包准备问题

不知道哪一步准备lmdb包出现问题：
我的第二阶段数据准备文件夹为second，其中包含了与第一阶段不同的字体的content，train，val三个文件夹，其中图片数为2：8：2。

我运行build_trainset:
python3 build_meta4train.py
--saving_dir F:\python_project\VQ-Font-main\VQ-Font-main\results\your_task_name
--content_font F:\python_project\VQ-Font-main\VQ-Font-main\Font_Directory\second\content
--train_font_dir F:\python_project\VQ-Font-main\VQ-Font-main\Font_Directory\second\train
--val_font_dir F:\python_project\VQ-Font-main\VQ-Font-main\Font_Directory\second\val
--seen_unis_file F:\python_project\VQ-Font-main\VQ-Font-main\meta\train_unis.json
--unseen_unis_file F:\python_project\VQ-Font-main\VQ-Font-main\meta\val_unis.json
分别对应：lmdb的储存位置/第二阶段content图片/第二阶段train图片/第二阶段val图片/train内字体的图片对应的json文件/val内图片字体的对应图片

结果报错：
F:\python_project\VQ-Font-main\VQ-Font-main\Font_Directory\second\train
num of fonts: 2
ttf_path_list: 3
100%|███████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "build_meta4train.py", line 213, in
build_meta4train_lmdb(args)
File "build_meta4train.py", line 139, in build_meta4train_lmdb
valid_dict = save_lmdb(lmdb_path, out_dict)
File "build_meta4train.py", line 26, in save_lmdb
env = lmdb.open(env_path, map_size=1024 ** 4)
lmdb.Error: F:\python_project\VQ-Font-main\VQ-Font-main\results\your_task_name\lmdb: ��̿ռ䲻�㡣

csdn内让我改小 map_size大小，但是改后报错：
F:\python_project\VQ-Font-main\VQ-Font-main\Font_Directory\second\train
num of fonts: 2
ttf_path_list: 3
100%|███████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "build_meta4train.py", line 213, in
build_meta4train_lmdb(args)
File "build_meta4train.py", line 139, in build_meta4train_lmdb
valid_dict = save_lmdb(lmdb_path, out_dict)
File "build_meta4train.py", line 46, in save_lmdb
char_img = Image.fromarray(char_img)
File "F:\python\lib\site-packages\PIL\Image.py", line 3078, in fromarray
arr = obj.array_interface
AttributeError: 'NoneType' object has no attribute 'array_interface'

运行我的文件后已经得到了一个lmdb包，但是运行第二阶段train.py出现报错：Traceback (most recent call last):
File "F:\python_project\VQ-Font-main\VQ-Font-main\train.py", line 221, in
main()
File "F:\python_project\VQ-Font-main\VQ-Font-main\train.py", line 217, in main
train(args, cfg)
File "F:\python_project\VQ-Font-main\VQ-Font-main\train.py", line 140, in train
drop_last=True)
File "F:\python_project\VQ-Font-main\VQ-Font-main\datasets_init_.py", line 24, in get_comb_trn_loader
collate_fn=dset.collate_fn, **kwargs)
File "F:\python\lib\site-packages\torch\utils\data\dataloader.py", line 347, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "F:\python\lib\site-packages\torch\utils\data\sampler.py", line 106, in init
if not isinstance(self.num_samples, int) or self.num_samples <= 0:
File "F:\python\lib\site-packages\torch\utils\data\sampler.py", line 114, in num_samples
return len(self.data_source)
File "F:\python_project\VQ-Font-main\VQ-Font-main\datasets\dataset_transformer.py", line 116, in len
return sum([len(v) for v in self.avails.values()])
AttributeError: 'list' object has no attribute 'values'
显示我的train的json文件内为列表而非字典

应该是我的lmdb包准备有问题，但是我实在是不知道哪里有问题，改后问题也没有解决，求帮助！！！！感谢！！

TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not NoneType

训练时出现如下问题：

Traceback (most recent call last):
  File "train.py", line 220, in <module>
    main()
  File "train.py", line 216, in main
    train(args, cfg)
  File "train.py", line 209, in train
    trainer.train(trn_loader, st_step, cfg["iter"], component_objects, chars_sim_dict)
  File "/app/trainer/combined_trainer.py", line 78, in train
    out_1, style_components_1 = self.gen.read_decode(trg_style_ids, trg_sample_index,
  File "/app/model/generator.py", line 69, in read_decode
    reference_feats = self.memory.read_chars(target_style_ids, trg_sample_index, reduction=reduction)
  File "/app/model/memory.py", line 90, in read_chars
    sc_feat = read_char(style_id, sample_index, reduction)
  File "/app/model/memory.py", line 49, in read_char
    comp_feat = self.read_point(style_id, sample_index, reduction)
  File "/app/model/memory.py", line 41, in read_point
    return torch.stack(sc_feats)
TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not NoneType

关于第二阶段训练

您好，我正在复现第二阶段的训练，我不太明白某些参数的含义，请问输出的png中的sfsu, sfuu, ufsu, ufuu分别是什么含义呢？以及现在第三行的效果不是很好，在训练初期是正常的吗？（以25000步为例）

第二阶段训练的问题，

我已经按照文档要求构造了数据集，个人觉得已经满足文档所需要求构造数据。但是一直出现这个问题。不明白是什么情况

Traceback (most recent call last):
File "train.py", line 220, in
main()
File "train.py", line 216, in main
train(args, cfg)
File "train.py", line 209, in train
trainer.train(trn_loader, st_step, cfg["iter"], component_objects, chars_sim_dict)
File "/root/autodl-tmp/VQ-Font/trainer/combined_trainer.py", line 79, in train
out_1, style_components_1 = self.gen.read_decode(trg_style_ids, trg_sample_index,
File "/root/autodl-tmp/VQ-Font/model/generator.py", line 75, in read_decode
content_feats = self.content_encoder(content_imgs) # 目标内容图片[B,C,H,W]
File "/root/miniconda3/envs/VQ/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/autodl-tmp/VQ-Font/model/content_encoder.py", line 17, in forward
out = self.net(x)
File "/root/miniconda3/envs/VQ/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/VQ/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/root/miniconda3/envs/VQ/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/autodl-tmp/VQ-Font/model/modules/blocks.py", line 103, in forward
x = self.conv(self.pad(x))
File "/root/miniconda3/envs/VQ/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/VQ/lib/python3.8/site-packages/torch/nn/modules/padding.py", line 178, in forward
return F.pad(input, self.padding, 'reflect')
RuntimeError: Padding size should be less than the corresponding input dimension, but got: padding (1, 1) at dimension 1 of input [1024, 1, 128]

How to train VQ-Font on multiple GPUs?

Like MX-Font or LF-Font?

关于inference.py

请问content_font和img_path应该传入哪两个文件？

第二阶段损失值一般到多少才是最好呢

我训练最多到 L1 loss 0.26左右,效果还是不太像
L1 0.3392 Contrastive 2.5169 D 3.425 G 1.538 B_stl 2.0 B_trg 2.0

Dataset Preparation

I am confused about dataset preparation. In Data Preparation, you have mentioned structure as given in the image.

But in one of the issues, you have mentioned the following structure. Can you help me with this?

About the training of the VQ-VAE.

In VQ-Font/model/VQ-VAE.ipynb.

for i in xrange(num_training_updates):
data = next(iter(train_loader))
train_data_variance = torch.var(data)
# print(train_data_variance)
# show(make_grid(data.cpu().data) )
# break
data = data - 0.5 # normalize to [-0.5, 0.5]
data = data.to(device)
optimizer.zero_grad()

The code normalize data to [-0.5, 0.5]. However, the last layer of the decoder of the VQ-VAE model is sigmoid. Is this a mistake?

少用本方法的疑问

请问一下，咱们这个方法中少样本是通过什么方法来实现的？

测试结果和训练结果相差很大

您好我是用inference生成的图片和训练中展示的图片相差非常大，其中训练中的图片最后的生成结果很好，但是inference.py生成的结果很差，请问这应该怎么解决呢。我inference使用的就是训练中出现的字体。

Tryin to Generate Korean Characters

I am trying to generate Korean characters using your model. For the Chinese everything is fine, but when I am trying to generate Korean characters, I keep getting an error

File "/media/hdd1/Irfan/VQ-Font-korean/datasets/dataset_transformer.py", line 96, in getitem
content_imgs = torch.stack([self.env_get(self.env, self.content_font, uni, self.transform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/hdd1/Irfan/VQ-Font-korean/datasets/dataset_transformer.py", line 96, in
content_imgs = torch.stack([self.env_get(self.env, self.content_font, uni, self.transform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/hdd1/Irfan/VQ-Font-korean/train.py", line 126, in
env_get = lambda env, x, y, transform: transform(read_data_from_lmdb(env, f'{x}_{y}')['img'])
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
TypeError: 'NoneType' object is not subscriptable

When I tried to debug, it appeared that no matter what font I chose, it wasn't loading content images. Other images e.g. style images etc are loading.

Following is the output of build_trainset.sh

build_meta4train_lmdb done!
all_style_fonts: 7
train_style_fonts: 5
val_style_fonts: 2
seen_unicodes: 2000
unseen_unicodes: 350

Any idea what is it that I am doing wrong?

About datasets

Great Work!!! How did you create the dataset?

from .VQ_VAE import Model

ModuleNotFoundError: No module named 'model.VQ_VAE'

train_3000.png这种是哪些需要这样命名呢?

4.Organize directories structure as below, and train_3000.png means draw the image from train_unis: ["4E00", "4E01", ...].
这儿是需要把字体转图片的每个字体的所有图片文件再重命名为train_1.png..train_100.png..train_1000.png..这种形式吗?没太懂,虚心求教

How to change kshot？

Is it necessary to adjust other parameters while adjusting the kshot? When I try to adjust the number of kshot, I encounter the following error：

Traceback (most recent call last):
File "train.py", line 220, in
main()
File "train.py", line 216, in main
train(args, cfg)
File "train.py", line 209, in train
trainer.train(trn_loader, st_step, cfg["iter"], component_objects, chars_sim_dict)
File "/disk3/lyf/VQ-Font/trainer/combined_trainer.py", line 161, in train
bs_component_embeddings, chars_sim_dict)
File "/disk3/lyf/VQ-Font/evaluator.py", line 50, in cp_validation
phase=phase, reduction=reduction)
File "/home/mac418/miniconda3/envs/VQ_Font/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/disk3/lyf/VQ-Font/evaluator.py", line 18, in decorated
ret = val_fn(self, gen, *args, **kwargs)
File "/disk3/lyf/VQ-Font/evaluator.py", line 57, in comparable_val_saveimg
reduction=reduction)
File "/home/mac418/miniconda3/envs/VQ_Font/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/disk3/lyf/VQ-Font/evaluator.py", line 18, in decorated
ret = val_fn(self, gen, *args, **kwargs)
File "/disk3/lyf/VQ-Font/evaluator.py", line 75, in infer_loader
reduction=reduction)
File "/disk3/lyf/VQ-Font/model/generator.py", line 192, in infer
style_components = self.Get_style_components(learned_components, reference_feats) # [B,N,C]
File "/home/mac418/miniconda3/envs/VQ_Font/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/disk3/lyf/VQ-Font/model/Component_Attention_Module.py", line 73, in forward
value_reference_matrix)
File "/disk3/lyf/VQ-Font/model/Component_Attention_Module.py", line 54, in cross_attention
scores = torch.matmul(key, query) # [b m khw n]
RuntimeError: The size of tensor a (24) must match the size of tensor b (16) at non-singleton dimension 0

awei669 / vq-font Goto Github PK

vq-font's Introduction

Few shot font generation via transferring similarity guided global and quantization local styles（ICCV2023）

Abstract

Usage

Dependencies

Data Preparation

Images and Characters

Build meta files and lmdb environment

Training

Pre-train VQ-VAE

Few shot font generation

Keys

Run scripts

Test

Run scripts

Citation

Acknowledgements

vq-font's People

Contributors

Stargazers

Watchers

Forkers

vq-font's Issues

Recommend Projects

Recommend Topics

Recommend Org