Code Monkey home page Code Monkey logo

handwritten-text-recognition-for-apache-mxnet's Introduction

Handwritten Text Recognition (OCR) with MXNet Gluon

Local Setup

git clone https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet --recursive

You need to install SCLITE for WER evaluation You can follow the following bash script from this folder:

cd ..
git clone https://github.com/usnistgov/SCTK
cd SCTK
export CXXFLAGS="-std=c++11" && make config
make all
make check
make install
make doc
cd -

You also need hsnwlib

pip install pybind11 numpy setuptools
cd ..
git clone https://github.com/nmslib/hnswlib
cd hnswlib/python_bindings
python setup.py install
cd ../..

if "AssertionError: Please enter credentials for the IAM dataset in credentials.json or as arguments" occurs rename credentials.json.example and to credentials.json with your username and password.

Overview

The pipeline is composed of 3 steps:

The entire inference pipeline can be found in this notebook. See the pretrained models section for the pretrained models.

A recorded talk detailing the approach is available on youtube. [video]

The corresponding slides are available on slideshare. [slides]

Pretrained models:

You can get the models by running python get_models.py:

Sample results

The greedy, lexicon search, and beam search outputs present similar and reasonable predictions for the selected examples. In Figure 6, interesting examples are presented. The first line of Figure 6 show cases where the lexicon search algorithm provided fixes that corrected the words. In the top example, “tovely” (as it was written) was corrected “lovely” and “woved” was corrected to “waved”. In addition, the beam search output corrected “a” into “all”, however it missed a space between “lovely” and “things”. In the second example, “selt” was converted to “salt” with the lexicon search output. However, “selt” was erroneously converted to “self” in the beam search output. Therefore, in this example, beam search performed worse. In the third example, none of the three methods significantly provided comprehensible results. Finally, in the forth example, the lexicon search algorithm incorrectly converted “forhim” into “forum”, however the beam search algorithm correctly identified “for him”.

Dataset:

  • To use test_iam_dataset.ipynb, create credentials.json using credentials.json.example and editing the appropriate field. The username and password can be obtained from http://www.fki.inf.unibe.ch/DBs/iamDB/iLogin/index.php.

  • It is recommended to use an instance with 32GB+ RAM and 100GB disk size, a GPU is also recommended. A p3.2xlarge would be the recommended starter instance on AWS for this project

Appendix

1) Handwritten area

Model architecture

Results

2) Line Detection

Model architecture

Results

3) Handwritten text recognition

Model architecture

Results

handwritten-text-recognition-for-apache-mxnet's People

Contributors

ehsanmok avatar jalvathi avatar jb-delafosse avatar jonomon avatar sethuramanio avatar simoncorstonoliver avatar thomasdelteil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

handwritten-text-recognition-for-apache-mxnet's Issues

Is K-fold crossvalidation happening?

HI,

First, let me appreciate your great work in sharing this handwriting model as open source!

I just understood that you are splitting the dataset into fixed validation set for validation.

Will it be more efficient to implement k-fold cross-vaildation, so that the we can increase the amount of data used for training.

One more question, Will it be okay to use the entire dataset for testing? or should it be necessary that I should only use the unseen training dataset for testing the model?

Thanks,
Anand.

Invalid NDArray file format

I get this error while running the 0_handwriting_ocr notebook: "src/ndarray/ndarray.cc:1851: Check failed: fi->Read(data): Invalid NDArray file format" while loading the handwriting_line8.params file. Could you please help me with this issue? I have mxnet 1.6.0 and gluonnlp 0.9.1. Thank you in advance.

kernal dying

Kernel dying while downloading processing IAM dataset

Kernel Restarting
The kernel for projects/DocByte/handwriting_recog/amazon_HWR/handwritten-text-recognition-for-apache-mxnet/0_handwriting_ocr.ipynb appears to have died. It will restart automatically.

test_ds = IAMDataset("form_original", train=False)
running this line

Links in the README file are broken

The links in README file still refer to the old repo's master branch files which are unavailable.

Please update README with latest changes.

Context set doesn't check for presence of GPU in 0_handwriting_ocr.ipynb

When setting the context in the denoising section of 0_handwriting_ocr.ipynb, the code doesn't check for the presence of GPUs and therefore fails when no GPUs are available.

Based on other context setting code in the example, I would suggest that this line...
ctx_nlp = mx.gpu(3)

Should read...
ctx_nlp = mx.gpu(3) if mx.context.num_gpus() > 0 else mx.cpu()

Train on IAMDataset "Word" Crashes the code

Hello, I have a need to train the model with words. Can you please help me?

I tried by updating the code with max_seq_len =96 as "jonomon" mentioned but it crashes with error
DeferredInitializationError: Parameter 'cnnbilstm0_hybridsequential1_hybridsequential0_encoderlayer0_lstm0_l0_i2h_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

Gradient of Parameter `ssd1_batchnorm0_beta` on context gpu(0) has not been updated by backward since last `step`

I'm trying to execute the code as it is .. but i dont know why i'm getting this error ..
"UserWarning: Gradient of Parameter ssd1_batchnorm0_beta on context gpu(0) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient " on trainer.step(stepsize)

AssertionError: Please enter credentials for the IAM dataset in credentials.json or as arguments


AssertionError Traceback (most recent call last)
in ()
----> 1 test_ds = IAMDataset("form_original", train=False)

/content/iam_dataset.py in init(self, parse_method, credentials, root, train, output_data, output_parse_method, output_form_text_as_array)
174 self._credentials = (credentials["XXX"], credentials["XXX"])
175 else:
--> 176 assert False, "Please enter credentials for the IAM dataset in credentials.json or as arguments"
177 else:
178 self._credentials = credentials

AssertionError: Please enter credentials for the IAM dataset in credentials.json or as arguments

I saw this error when I try to call : IAMDataset("form_original", train=False) . I already resigned and have user and password but I do not know how to put user and pass word to credentials.json . Please help me to solve this problem, thank you very much !

How to make predictions from pre-trained models?

Good work, thanks.
I am using pre-trained models to get text from images. When I was going through the codes on how to do it, I learned that my test images's format has to match the format 'IAMDastaset' class in 'ocr.utils.iam_dataset' outputs.
So, How do I modify 'IAMDastaset' class in 'ocr.utils.iam_dataset' to change an input image to match test dataframe format that this class outputs or How do I get dataframe for images other than the ones in IAM dataset. I couldn't understand this class completely. So, if anyone worked on this, please help me solve this.

regions of text not being detected properly

Here is my input image in color:
new paragraph image
1. The detection that happended from pre-trained models can be seen below, when the above image is converted from 0. RGB to grayscale and 1. BGR to grayscale (happended at paragraph segmentation from '0_handwriting_ocr.ipynb' notebook & I have made no changes in the code from that notebook):
image
2. Because of this improper region detection, areas that actually has text are being cropped out.
And, when i try form size other than what is in the code(because my images have smaller aspect ratio), that is form_size = (1120, 800), computer crashes. What is causing this and how can i not have this happen?
3. Presumably, because of the above improper detection or may be becuase line/word segmentaion not happening properly, here's the word segmentation:
image
and here's the line segmentation:
image
How do i fix these?

ConnectionResetError at word segmentation

Hi, I already mentioned my problem but I didin't find an issue describing what I experience at the moment. When I start the 2_line_word_segmentation.ipynb I get the following error:

ConnectionResetErrorTraceback (most recent call last)
<ipython-input-13-fbd64d2ad138> in <module>
      3     cls_metric = mx.metric.Accuracy()
      4     box_metric = mx.metric.MAE()
----> 5     train_loss = run_epoch(e, net, train_data, trainer, log_dir, print_name="train", is_train=True, update_metric=False)
      6     test_loss = run_epoch(e, net, test_data, trainer, log_dir, print_name="test", is_train=False, update_metric=True)
      7     if test_loss < best_test_loss:

<ipython-input-6-6b90c6f2ae19> in run_epoch(e, network, dataloader, trainer, log_dir, print_name, is_train, update_metric)
     32 
     33     total_losses = [0 for ctx_i in ctx]
---> 34     for i, (X, Y) in enumerate(dataloader):
     35         X = gluon.utils.split_and_load(X, ctx)
     36         Y = gluon.utils.split_and_load(Y, ctx)

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/data/dataloader.py in __next__(self)
    503         try:
    504             if self._dataset is None:
--> 505                 batch = pickle.loads(ret.get(self._timeout))
    506             else:
    507                 batch = ret.get(self._timeout)

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/data/dataloader.py in rebuild_ndarray(pid, fd, shape, dtype)
     59             fd = multiprocessing.reduction.rebuild_handle(fd)
     60         else:
---> 61             fd = fd.detach()
     62         return nd.NDArray(nd.ndarray._new_from_shared_mem(pid, fd, shape, dtype))
     63 

/usr/lib/python3.6/multiprocessing/resource_sharer.py in detach(self)
     55         def detach(self):
     56             '''Get the fd.  This should only be called once.'''
---> 57             with _resource_sharer.get_connection(self._id) as conn:
     58                 return reduction.recv_handle(conn)
     59 

/usr/lib/python3.6/multiprocessing/resource_sharer.py in get_connection(ident)
     85         from .connection import Client
     86         address, key = ident
---> 87         c = Client(address, authkey=process.current_process().authkey)
     88         c.send((key, os.getpid()))
     89         return c

/usr/lib/python3.6/multiprocessing/connection.py in Client(address, family, authkey)
    491 
    492     if authkey is not None:
--> 493         answer_challenge(c, authkey)
    494         deliver_challenge(c, authkey)
    495 

/usr/lib/python3.6/multiprocessing/connection.py in answer_challenge(connection, authkey)
    730     import hmac
    731     assert isinstance(authkey, bytes)
--> 732     message = connection.recv_bytes(256)         # reject large message
    733     assert message[:len(CHALLENGE)] == CHALLENGE, 'message = %r' % message
    734     message = message[len(CHALLENGE):]

/usr/lib/python3.6/multiprocessing/connection.py in recv_bytes(self, maxlength)
    214         if maxlength is not None and maxlength < 0:
    215             raise ValueError("negative maxlength")
--> 216         buf = self._recv_bytes(maxlength)
    217         if buf is None:
    218             self._bad_message_length()

/usr/lib/python3.6/multiprocessing/connection.py in _recv_bytes(self, maxsize)
    405 
    406     def _recv_bytes(self, maxsize=None):
--> 407         buf = self._recv(4)
    408         size, = struct.unpack("!i", buf.getvalue())
    409         if maxsize is not None and size > maxsize:

/usr/lib/python3.6/multiprocessing/connection.py in _recv(self, size, read)
    377         remaining = size
    378         while remaining > 0:
--> 379             chunk = read(handle, remaining)
    380             n = len(chunk)
    381             if n == 0:

ConnectionResetError: [Errno 104] Connection reset by peer

I am using a Docker Image on a Linux system. Can you help me to get the notebook to run?

Downloading gbw dataset crashes machine

Below code in 'Denoising text ouptut' section of '0_handwriting_ocr.ipynb' file, when run on a machine with 35.35GB RAM and 107.77GB disk space(google colab TPU Session) crashes system for unknown reason.

ctx_nlp = mx.gpu(3)
language_model, vocab = nlp.model.big_rnn_lm_2048_512(dataset_name='gbw', pretrained=True, ctx=ctx_nlp)
moses_tokenizer = nlp.data.SacreMosesTokenizer()
moses_detokenizer = nlp.data.SacreMosesDetokenizer()

How do i download this dataset without crashing the machine? And also, I don't want to download it from next time, so can I save this dataset too?

Questions about train and evaluation

Thanks for your great work. I am a rookie on handwriting recognition and have some questions about train and evaluation.

  1. This repo uses SCLITE for WER evaluation. I found that it will ignore space between words when SCLITE evaluates words of one line. But other mothods such as https://github.com/githubharald/SimpleHTR/blob/master/src/main.py#L81, https://github.com/jpuigcerver/xer/blob/master/xer#L116, are not like this. Which is the criterion in general?
  2. why 100.0 - float(er)? I think it's float(er)
   for line in output_file.readlines():
            match = re.match(match_tar, line.decode('utf-8'), re.M|re.I)
            if match:
               # I think there are matching problems
                number = match.group(1)    #  --> match.group().split()[4]
                er = match.group(2)  # --> match.group().split()[-3]
        assert number != None and er != None, "Error in parsing output."
        return float(number), 100.0 - float(er)  #  return float(number), float(er)
  1. It's average cer of all lines, not global cer.
# https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/0_handwriting_ocr.ipynb
def get_qualitative_results_lines(denoise_func):
    sclite.clear()
    test_ds_line = IAMDataset("line", train=False)
    for i in tqdm(range(1, len(test_ds_line))):
       # ....
        sclite.add_text([decoded_text], [actual_text])
    cer, er = sclite.get_cer()
    print("Mean CER = {}".format(cer))
    return cer
  1. The pretrained model handwriting_line8.params works well. But I can't train such a good model.
# https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/handwriting_line_recognition.py#L30
# Best results:
# python handwriting_line_recognition.py --epochs 251 -n handwriting_line.params -g 0 -l 0.0001 -x 0.1 -y 0.1 -j 0.15 -k 0.15 -p 0.75 -o 2 -a 128

Looking forward to your reply. Thanks a lot.

FileNotFound

Hello guys, i have i think a simple problem : when i launch test_iam_dataset i have this error :

FileNotFoundError: [Errno 2] File /home/roo/sf_workspace/Image Médecine douce/handwritting model/handwritting notebook/ocr/utils/../../dataset/iamdataset/subject/trainset.txt does not exist: '/home/roo/sf_workspace/Image Médecine douce/handwritting model/handwritting notebook/ocr/utils/../../dataset/iamdataset/subject/trainset.txt'

I don't know what kind of file is it

If someone has an idea, thank's a lot !

is 50% dropout a good value to set?

HI,

I could see that the dropout percentage that is set in the handwritten_line_recognition.py script is 50%, is dropping half of the nodes a good suggestion?

self.p_dropout = 0.5

Please advice why 50% dropout is set here.

Thanks,
Anand.

Mxnet package gives error post installation

getting the error "OSError: [WinError 126] The specified module could not be found" in Windows Server 2016. What is the dll file missing?

Installed using pip

python :3.7.4

Unable to run the test_iam_dataset.ipynb

Traceback (most recent call last): File "iam_dataset.py", line 23, in <module> from .expand_bounding_box import expand_bounding_box SystemError: Parent module '' not loaded, cannot perform relative import

python version =3.5.2
os = windows
editor = pycharm

Is the learning rate 0.0001 (default) is good or the (0.00001) is good?

HI,

I was trying to tweak the learning_rate and dropout parameters for the handwriting_line_recognition.py model.

Since there is no much change in the loss for changing the dropout parameters (20%, 35%, 50%) i'm just fixing the default one.

But for the learning rate change from 0.0001 to 0.00001 there is a huge increase in the stability of the model as plotted below. (training loss is equivalent to the test loss)

plotted graph image: https://prnt.sc/rv6lzm

graph_label notations:

lr-e5 => learning_rate = 0.00001
lr-e4 => learning_rate = 0.0001

-> Bottom two lines are the train and test loss calculation for the 0.0001 learning_rate parameters and all above lines are plotted for 0.00001. We could see the bottom two lines are not stable where as the other lines are very stable (training loss is equivalent to the test loss)

Since the lr 0.00001 is better than 0.0001, can we fix 0.00001 as default or do we face any other problem if we use this new lr rate?

Please advice.

Thanks,
Anand.

MXNetError: [11:26:33] C:\Jenkins\workspace\mxnet-tag\mxnet\src\ndarray\ndarray.cc:1279: GPU is not enabled

When executing the following command:

ctx_nlp = mx.gpu (3)
language_model, vocab = nlp.model.big_rnn_lm_2048_512 (dataset_name = 'gbw', pretrained = True, ctx = ctx_nlp)
moses_tokenizer = nlp.data.SacreMosesTokenizer ()
moses_detokenizer = nlp.data.SacreMosesDetokenizer ()

I got as a result a download of some compressed files, my download stopped and when I compile that line again, I get an error from the GPU, what can I do to detect the files that were being downloaded and where should I place them?

AssertionError: Shape of params are incompatible

Hi again,

I am a bit confused about this error, happening in the 4_text_denoising notebook. I just did every step from before but something does not fit with the dimensions. Can you explain why this is happening?

AssertionErrorTraceback (most recent call last)
<ipython-input-31-2a0e848a57c4> in <module>
      1 model_path = 'models/denoiser2.params'
      2 if (os.path.isfile(model_path)):
----> 3     net.load_parameters(model_path, ctx=ctx)
      4     print("Loaded parameters")
      5     best_test_loss = evaluate(net, val_data_ft)

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/block.py in load_parameters(self, filename, ctx, allow_missing, ignore_extra, cast_dtype, dtype_source)
    553                         name, filename, _brief_print_list(self._params.keys())))
    554             if name in params:
--> 555                 params[name]._load_init(loaded[name], ctx, cast_dtype=cast_dtype, dtype_source=dtype_source)
    556 
    557     def load_params(self, filename, ctx=None, allow_missing=False,

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py in _load_init(self, data, ctx, cast_dtype, dtype_source)
    280                     "Failed loading Parameter '%s' from saved params: " \
    281                     "shape incompatible expected %s vs saved %s"%(
--> 282                         self.name, str(self.shape), str(data.shape))
    283             self.shape = tuple(i if i != unknown_dim_size else j
    284                                for i, j in zip(self.shape, data.shape))

AssertionError: Failed loading Parameter 'transformer_enc_const' from saved params: shape incompatible expected (150, 512) vs saved (150, 256)

Train on IAMDataset "Word" Crashes the code

The 3_handwriting_recognition.py works fine with IAMDataset("line", output_data="text", train=True) but crashes when using the word IAMDataset. Specifically, doing this crashes the code.

train_ds = IAMDataset("word", output_data="text", train=True)
print("Number of training samples: {}".format(len(train_ds)))

test_ds = IAMDataset("word", output_data="text", train=False)
print("Number of testing samples: {}".format(len(test_ds)))

Gives:
mxnet.base.MXNetError: Shape inconsistent, Provided = [13320192], inferred shape=(8863744,)

Retrain the model with additional dataset

Hello, How can i retrain the mode with new dataset? I looked at the XML files for bounding box information but it looks different.

What are the preparation steps for retraining the model?

Please provide information!

Thanks
Dinesh

How to use it for line or words

I want to use same code to generate text from a line containing few words. what possible changes shall I look for as it is made for paragraph text generation?

issue with resizing image(`resize_image()` function) from `ocr.utils.ims_dataset.py` for paragraph segmentation!

During Paragraph Segmentation in "0_handwriting_ocr.ipynb", when paragraph_segmentation_transform(image, form_size) is called to paragraph-segment the image, which in turn calls resize_image() function from ocr.utils.iam_dataset.py to resize image(images i have passed are not from IAM dataset. I have passed my own images to images array for text recongition). Error occurs at line 72 of that file:

color = image[0][0]
    if color < 230:
        color = 230

the problem occurs becuase image[0][0] is an array, not a single value. How do i fix this and proceed further.
Here is the screenshot of error:
image

How to do incremental learning with the current model?

Thanks for this wonderful piece of work from your team!.It is really very helpful for student community.

I used the model and trained it on a data set. It is working really great with close to 85% accuracy.

Now I have a new data set . I do not want to train it from scratch but use the current weights and train only on new data set. How can we do that?

Thanks!

Unable to run evaluate_cer

Hi, I just ran the evaluate_cer.
And I got this message: ParagraphSegmentationNet' is not defined

Any ideas?

Regarding GPU Integration issue

When I am executing "ctx = mx.gpu() if mx.context.num_gpus() > 0 else mx.cpu()"
The output of ctx is coming only cpu() and the output of "mx.context.num_gpus()" is coming only 0 even I have enabled the GPU. I am using Mxnet 1.4.0 and leven==1.0.4.

Help regarding running this code on Google Colab

I have uploded the data set on drive and now i don't understand what to change on ' iam_dataset.py " so that my code works . I did some changes and ran this code on my local computer without any problem but it takes forever on training and kernel dies in the middle of the training and I m totally new on google colab and also not that good on python . If anyone tried this code on google Colab , please share your iam_dataset.py code with me . I've been stuck with it for 4 days now and now i m kind of clueless about what to search for on google so that i can work .
Thats my email : [email protected]

Duration of training 4_text_denoising

Hi, just a short question. I started the 4_text_denoising and I have a single GPU (1x RTX2080TI ). It is running already over 50 hours. Is that normal?
At the moment I cant see the progress in the notebook anymore because I restarted the browser...

Question: How long does 4_text_denoising normally take?

Thanks for the help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.