lakshw1n / srnet Goto Github PK

A pytorch implementation of the SRNet architecture from the paper Editing text in the wild (Liang Wu et al.)

Python 5.16% Shell 0.08% C++ 94.75%

srnet's Introduction

SRNet

Update (15th Janurary 2022): Paths to download data-files have been updated.

Update (27th August 2020) :

A bug related to variable image size is fixed. You can now train with variable image sizes. This will improve generations significantly.

Training is now significantly faster. Pull all changes and train as usual.

Update (26th July 2020) :

Pre-trained weights have been uploaded. Please refer to the Pre-trained weights section for usage.
The latest commit makes a few modifications to the model. Pull all changes before using the pre-trained weights.

This repository presents SRNet (Liang Wu et al), a neural network that tackles the problem of text editing in images. It marks the inception of an area of research that could automate advanced editing mechanisms in the future.

SRNet is a twin discriminator generative adversarial network that can edit text in any image while maintaining context of the background, font style and color. The demo below showcases one such use case. Movie poster editing.

L - Source ; R - Modified

Architecture changes

This implementation of SRNet introduces two main changes.

Training: The original SRNet suffers from instability. The generator loss belies the instability that occurs during training. This imbalance affects skeleton (t_sk) generation the maximum. The effect manifests when the generator produces a sequence of bad t_sk generations, however instead of bouncing back, it grows worse and finally leads to mode collapse. The culprit here is the min-max loss. A textbook method to solve this problem is to let the discriminator always be ahead of the generator. The same was employed in this implementation.
Generator: In order to accomodate for a design constraint in the original net, I have added three extra convolution layers in the decoder_net.

Incorporating these changes improved t_sk generations dramatically and increased stability. However, this also increased training time by ~15%.

Usage

A virtual environment is the most convenient way to setup the model for training or inference. You can use virtualenv for this. The rest of this guide assumes that you are in one.

Clone this repository:

$ git clone https://github.com/Niwhskal/SRNet.git

$ cd SRNet

Install requirements (Make sure you are on python3.xx):
```
$ pip3 install -r requirements.txt
```

Data setup

This repository provides you with a bash script that circumvents the process of synthesizing the data manually as the original implementation does. The default configuration parameters set's up a dataset that is sufficient to train a robust model.

Grant execute permission to the bash script:
```
$ chmod +x data_script.sh
```
Setup training data by executing:
```
$ ./data_script.sh
```

The bash script downloads background data and a word list, it then runs a datagenerator script that synthesizes training data. Finally, it modifies paths to enable straightforward training. A detailed description of data synthesis is provided by youdao-ai in his original datagenerator repository.

If you wish to synthesize data with different fonts, you could do so easily by adding custom .ttf files to the fonts directory before running datagen.py. Examine the flow of data_script.sh and change it accordingly.

Training

Once data is setup, you can immediately begin training:
```
$ python3 train.py
```

If you wish to resume training or use a checkpoint, update it's path and run train.py

If you are interested in experimenting, modify hyperparameters accordingly in cfg.py

Prediction

In order to predict, you will need to provide a pair of inputs (The source i_s and the custom text rendered on a plain background in grayscale (i_t) -examples can be found in SRNet/custom_feed/labels-). Place all such pairs in a folder.

Inference can be carried out by running:

$ python3 predict.py --input_dir *data_dir* --save_dir *destination_dir* --checkpoint *path_to_ckpt*

Pre-trained weights

You can download my pre-trained weights here

Some results from the example directory:

Source	Result

Demo

Code for the demo is hastily written and is quite slow. If anyone is interested in trying it out or would like to contribute to it, open an issue, submit a pull request or send me an email at [email protected]. I can host it for you.

References

Editing Text in the Wild: An innovative idea of using GAN's in an unorthodox manner.
Youdao-ai's original repository: The original tensorflow implementation which helped me understand the paper from a different perspective. Also, credit to youdao for the data synthesis code. If anyone is interested in understanding the way data is synthesized for training, examine his repository.
SynthText project: This work provides the background dataset that is instrumental for data synthesis.
Streamlit docs: One of the best libraries to build and publish apps. Severely underrated.

srnet's People

Contributors

Stargazers

Watchers

srnet's Issues

Multiple processes are stuck when run datagen.py

Hi, When run datagen.py in data_script.sh, the code was stuck at this step
`
Hello from the pygame community. https://www.pygame.org/contribute.html

Generating step 1 / 100000
`

About trainning

hi iam beginner, i have some questions about trainning
iam trying to train my own model using chinese, but the result is so bad that the style font did not erase and the target font did not gen on o_f.jpg.
is i input too many kind of chinese fonts during trainning, or the input bg_image too complex（that i found some input are hard to figure out the detail）
i used 50K image as input, also train 50K iteration

Demo code

Release your demo code or upload. interested and want to contribute

about F.to_pil_image

Hi, i encounter some problem in image saving.
https://github.com/Niwhskal/SRNet/blob/30bb6a5370b2a087e87ac71e255272491c87f1c6/predict.py#L88-L90
I found in F.to_pil_image it multiplies with 255 if input is floattensor, and save result will be noise.
And

 o_t = F.to_pil_image((o_t + 1)/2)

will work.

No sigmoid in discriminator output and too much gradient clip

It seems there is no sigmoid activation in the discriminator output before BCE loss. also clamping the graident to -0.01 to 0.01 seems to be too much and training stuck with loss not decreasing. After changing it to -1 to 1, loss started decreasing again

About the training results

Hi,Niwhskal.
Following your suggestion, I used your pre-trained model and trained for about 150k iterations, but the final results are not very good, especially in the background inpainting module, its loss remains at 4.612..., the following is some training results, can you give some effective suggestions, thank you very much.

请问一些图片有多行文本的话，需要将每行裁剪出来，替换文本之后，再粘贴到原来的位置吗？如果不是这样的话，替换后的文本怎么对应上原来的位置呢？谢谢

Error in prediction

Hello sir,

I am running prediction using the command provided in Readme . But I am getting this error.

Please guide me on how to solve this error.

These two links cannot be opened, can you send me the content or dataset of these two links? @Niwhskal

http://www.robots.ox.ac.uk/~vgg/data/scenetext/preproc/imnames.cp
http://www.robots.ox.ac.uk/~vgg/data/scenetext/preproc/bg_img.tar.gz
These two links cannot be opened, can you send me the content or dataset of these two links? @Niwhskal

Questions about this command “$ ./data_script.sh”

We can't download the two data you said。
http://www.robots.ox.ac.uk/~vgg/data/scenetext/preproc/imnames.cp
http://www.robots.ox.ac.uk/~vgg/data/scenetext/preproc/bg_img.tar.gz

When will u propose the final code

When will u propose the final code?

Multi GPU operation error

Your work is excellent. I tried in CFG Py is changed to multi GPU operation, but the background is still single GPU operation. Is this normal?

question on style transfer & other application situation

Q1: I am confused about the style transfer. So, what is the style refer to? For example, can the font size and style, text color be transfered in the fusion module??

Q2: Can this model apply on the situation with many text lines not one text line?

能不能使用这个网络将有风格的字体图像朝着标准字体学习。

能不能使用这个网络将有风格的字体图像朝着标准字体学习，我试着更改了一下数据集，但是发现无法生成。请问您有什么好的方法吗？

Link download pretrained model die .

Can you share new link download pretrain model ? Thank so much

evaluation.py

Can you upload the code of the evaluation index? This is very important to me. Thank you and look forward to your reply.

about training process

Thank you for the amazing work. I encountered some problems during training process.

How many iterations did it take to converge during your training process?
In fact I can't get the right generation result (I didn't change the hyperparameters).*
Besides, I found the trainging process is slower than the tensorflow version.
Do you have any clue on this?

About the training process

Hello, I really appreciate the your re-implement for text editing. I follow your instruction to run the project and only changes the batch size to 4. I found the loss converge really slow and the results are bad. After 20w iters, the G loss is 19, D_fus loss is 4.8, D_bg loss is 5.5071. Is it normal or some problems exist?