Code Monkey home page Code Monkey logo

srnet's Introduction

SRNet

Update (15th Janurary 2022): Paths to download data-files have been updated.

Update (27th August 2020) :

A bug related to variable image size is fixed. You can now train with variable image sizes. This will improve generations significantly.

Training is now significantly faster. Pull all changes and train as usual.

Update (26th July 2020) :

  • Pre-trained weights have been uploaded. Please refer to the Pre-trained weights section for usage.

  • The latest commit makes a few modifications to the model. Pull all changes before using the pre-trained weights.


This repository presents SRNet (Liang Wu et al), a neural network that tackles the problem of text editing in images. It marks the inception of an area of research that could automate advanced editing mechanisms in the future.

SRNet is a twin discriminator generative adversarial network that can edit text in any image while maintaining context of the background, font style and color. The demo below showcases one such use case. Movie poster editing.

L - Source ; R - Modified


Architecture changes

This implementation of SRNet introduces two main changes.

  1. Training: The original SRNet suffers from instability. The generator loss belies the instability that occurs during training. This imbalance affects skeleton (t_sk) generation the maximum. The effect manifests when the generator produces a sequence of bad t_sk generations, however instead of bouncing back, it grows worse and finally leads to mode collapse. The culprit here is the min-max loss. A textbook method to solve this problem is to let the discriminator always be ahead of the generator. The same was employed in this implementation.

  2. Generator: In order to accomodate for a design constraint in the original net, I have added three extra convolution layers in the decoder_net.

Incorporating these changes improved t_sk generations dramatically and increased stability. However, this also increased training time by ~15%.


Usage

A virtual environment is the most convenient way to setup the model for training or inference. You can use virtualenv for this. The rest of this guide assumes that you are in one.

  • Clone this repository:
    $ git clone https://github.com/Niwhskal/SRNet.git
    
    $ cd SRNet
  • Install requirements (Make sure you are on python3.xx):
    $ pip3 install -r requirements.txt

Data setup

This repository provides you with a bash script that circumvents the process of synthesizing the data manually as the original implementation does. The default configuration parameters set's up a dataset that is sufficient to train a robust model.

  • Grant execute permission to the bash script:
    $ chmod +x data_script.sh
  • Setup training data by executing:
    $ ./data_script.sh

The bash script downloads background data and a word list, it then runs a datagenerator script that synthesizes training data. Finally, it modifies paths to enable straightforward training. A detailed description of data synthesis is provided by youdao-ai in his original datagenerator repository.

If you wish to synthesize data with different fonts, you could do so easily by adding custom .ttf files to the fonts directory before running datagen.py. Examine the flow of data_script.sh and change it accordingly.

Training

  • Once data is setup, you can immediately begin training:
    $ python3 train.py

If you wish to resume training or use a checkpoint, update it's path and run train.py

If you are interested in experimenting, modify hyperparameters accordingly in cfg.py

Prediction

In order to predict, you will need to provide a pair of inputs (The source i_s and the custom text rendered on a plain background in grayscale (i_t) -examples can be found in SRNet/custom_feed/labels-). Place all such pairs in a folder.

  • Inference can be carried out by running:
    $ python3 predict.py --input_dir *data_dir* --save_dir *destination_dir* --checkpoint *path_to_ckpt*

Pre-trained weights

You can download my pre-trained weights here

Some results from the example directory:

Source Result

Demo

Code for the demo is hastily written and is quite slow. If anyone is interested in trying it out or would like to contribute to it, open an issue, submit a pull request or send me an email at [email protected]. I can host it for you.

References

  • Editing Text in the Wild: An innovative idea of using GAN's in an unorthodox manner.

  • Youdao-ai's original repository: The original tensorflow implementation which helped me understand the paper from a different perspective. Also, credit to youdao for the data synthesis code. If anyone is interested in understanding the way data is synthesized for training, examine his repository.

  • SynthText project: This work provides the background dataset that is instrumental for data synthesis.

  • Streamlit docs: One of the best libraries to build and publish apps. Severely underrated.

srnet's People

Contributors

dependabot[bot] avatar lksshw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

srnet's Issues

About trainning

hi iam beginner, i have some questions about trainning
iam trying to train my own model using chinese, but the result is so bad that the style font did not erase and the target font did not gen on o_f.jpg.
is i input too many kind of chinese fonts during trainning, or the input bg_image too complex(that i found some input are hard to figure out the detail)
i used 50K image as input, also train 50K iteration

Demo code

Release your demo code or upload. interested and want to contribute

No sigmoid in discriminator output and too much gradient clip

It seems there is no sigmoid activation in the discriminator output before BCE loss. also clamping the graident to -0.01 to 0.01 seems to be too much and training stuck with loss not decreasing. After changing it to -1 to 1, loss started decreasing again

About the training results

Hi,Niwhskal.
Following your suggestion, I used your pre-trained model and trained for about 150k iterations, but the final results are not very good, especially in the background inpainting module, its loss remains at 4.612..., the following is some training results, can you give some effective suggestions, thank you very much.
result

Error in prediction

Hello sir,

I am running prediction using the command provided in Readme . But I am getting this error.

error

Please guide me on how to solve this error.

Multi GPU operation error

Your work is excellent. I tried in CFG Py is changed to multi GPU operation, but the background is still single GPU operation. Is this normal?

question on style transfer & other application situation

Q1: I am confused about the style transfer. So, what is the style refer to? For example, can the font size and style, text color be transfered in the fusion module??

Q2: Can this model apply on the situation with many text lines not one text line?

evaluation.py

Can you upload the code of the evaluation index? This is very important to me. Thank you and look forward to your reply.

about training process

Thank you for the amazing work. I encountered some problems during training process.

  • How many iterations did it take to converge during your training process?
    In fact I can't get the right generation result (I didn't change the hyperparameters).*
  • Besides, I found the trainging process is slower than the tensorflow version.
    Do you have any clue on this?

About the training process

Hello, I really appreciate the your re-implement for text editing. I follow your instruction to run the project and only changes the batch size to 4. I found the loss converge really slow and the results are bad. After 20w iters, the G loss is 19, D_fus loss is 4.8, D_bg loss is 5.5071. Is it normal or some problems exist?

Regarding rendering custom text as target image

How to generate the target rendered images for the source images in Fig. 3 of your paper? More specifically, can you share examples for source and target images for the cases where you are editing multiple text fields that are part of an image.

Question about demo

Hi, I wanted to know if It is possibile ti release the demo showed in the readme online, for testing purposes.

Can not fit for only one image for training?

Hi, @Niwhskal:
I use only one training image to overfit the model (to verify the training process). But I found it even can not fit this image. My data is like that:
i_s:
image
i_t:
image
mask_t:
image
t_b:
image
t_f:
image
t_sk:
image
t_t:
image

after training 500000 iteration, I ran the predict.py but get result like that:
image

Is there some thing wrong for my data? Or it is wrong for only training on one image for this model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.