Code Monkey home page Code Monkey logo

simplehtr's Introduction

Handwritten Text Recognition with TensorFlow

  • Update 2021: more robust model, faster dataloader, Python3 only
  • Update 2020: code is compatible with TF2

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words as shown in the illustration below. 3/4 of the words from the validation-set are correctly recognized, and the character error rate is around 10%.

htr

Run demo

Download the model trained on the IAM dataset. Put the contents of the downloaded file model.zip into the model directory of the repository. Afterwards, go to the src directory and run python main.py. The input image and the expected output is shown below.

test

> python main.py
Init with stored values from ../model/snapshot-39
Recognized: "Hello"
Probability: 0.42098119854927063

Command line arguments

  • --train: train the NN on 95% of the dataset samples and validate on the remaining 5%
  • --validate: validate the trained NN
  • --decoder: select from CTC decoders "bestpath", "beamsearch", and "wordbeamsearch". Defaults to "bestpath". For option "wordbeamsearch" see details below
  • --batch_size: batch size
  • --data_dir: directory containing IAM dataset (with subdirectories img and gt)
  • --fast: use LMDB to load images (faster than loading image files from disk)
  • --dump: dumps the output of the NN to CSV file(s) saved in the dump folder. Can be used as input for the CTCDecoder

If neither --train nor --validate is specified, the NN infers the text from the test image (data/test.png).

Integrate word beam search decoding

It is possible to use the word beam search decoder [4] instead of the two decoders shipped with TF. Words are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized. The following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail.

decoder_comparison

Follow these instructions to integrate word beam search decoding:

  1. Clone repository CTCWordBeamSearch
  2. Compile custom TF operation (follow instructions given in README)
  3. Copy binary TFWordBeamSearch.so from the CTCWordBeamSearch repository to the src directory of the SimpleHTR repository

Word beam search can now be enabled by setting the corresponding command line argument. The dictionary is created (in training and validation mode) by using all words contained in the IAM dataset (i.e. also including words from validation set) and is saved into the file data/corpus.txt. Further, the (manually created) list of word-characters can be found in the file model/wordCharList.txt. Beam width is set to 50 to conform with the beam width of vanilla beam search decoding.

Train model with IAM dataset

Follow these instructions to get the IAM dataset [5]:

  • Register for free at this website
  • Download words/words.tgz
  • Download ascii/words.txt
  • Create a directory for the dataset on your disk, and create two subdirectories: img and gt
  • Put words.txt into the gt directory
  • Put the content (directories a01, a02, ...) of words.tgz into the img directory

Start the training

  • Delete files from model directory if you want to train from scratch
  • Go to the src directory and execute python main.py --train --data_dir path/to/IAM
  • Training stops after a fixed number of epochs without improvement

Fast image loading

Loading and decoding the png image files from the disk is the bottleneck even when using only a small GPU. The database LMDB is used to speed up image loading:

  • Go to the src directory and run createLMDB.py --data_dir path/to/IAM with the IAM data directory specified
  • A subfolder lmdb is created in the IAM data directory containing the LMDB files
  • When training the model, add the command line option --fast

Using the --fast option and a GTX 1050 Ti training takes around 3h with a batch size of 500.

Information about model

Overview

The model [1] is a stripped-down version of the HTR system I implemented for my thesis [2][3]. What remains is what I think is the bare minimum to recognize text with an acceptable accuracy. It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer. The illustration below gives an overview of the NN (green: operations, pink: data flowing through NN) and here follows a short description:

  • The input image is a gray-value image and has a size of 128x32
  • 5 CNN layers map the input image to a feature sequence of size 32x256
  • 2 LSTM layers with 256 units propagate information through the sequence and map the sequence to a matrix of size 32x80. Each matrix-element represents a score for one of the 80 characters at one of the 32 time-steps
  • The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)

nn_overview

Analyze model

Run python analyze.py with the following arguments to analyze the image file data/analyze.png with the ground-truth text "are":

  • --relevance: compute the pixel relevance for the correct prediction
  • --invariance: check if the model is invariant to horizontal translations of the text
  • No argument provided: show the results

Results are shown in the plots below. For more information see this article.

analyze

FAQ

  • I get the error message "... TFWordBeamSearch.so: cannot open shared object file: No such file or directory": if you want to use word beam search decoding, you have to compile the custom TF operation from source
  • Where can I find the file words.txt of the IAM dataset: it is located in the subfolder ascii on the IAM website
  • I want to recognize text of line (or sentence) images: this is not possible with the provided model. The size of the input image is too small. For more information read this article or have a look at the lamhoangtung/LineHTR repository
  • I get an error when running the script more than once from an interactive Python session: do not call function main() in file main.py from an interactive session, as the TF computation graph is created multiple times when calling main() multiple times. Run the script by executing python main.py instead

References

[1] Build a Handwritten Text Recognition System using TensorFlow

[2] Scheidl - Handwritten Text Recognition in Historical Documents

[3] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

[4] Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm

[5] Marti - The IAM-database: an English sentence database for offline handwriting recognition

simplehtr's People

Contributors

githubharald avatar n1snt avatar chazzz avatar reed-jones avatar

Stargazers

Louis Stanko avatar

Forkers

tejasmohanayyar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.