Code Monkey home page Code Monkey logo

mountchicken / text-recognition-on-cross-domain-datasets Goto Github PK

View Code? Open in Web Editor NEW
66.0 1.0 11.0 2.08 MB

Improved Text recognition algorithms on different text domains like scene text, handwritten, document, Chinese/English, even ancient books

License: MIT License

C++ 0.01% Python 92.31% Shell 7.69%
crnn scene-text-recognition hand-written-recognition chinese-orc casia-hwdb text-recognition-datasets iam-dataset english-handwritten aster ocr

text-recognition-on-cross-domain-datasets's Introduction

Text Recognition on Cross Domain Datasets

Improved CRNN,ASTER,DAN on different text domains like scene text, hand written, document, chinese/english, even ancient books


Update🙂🙂

Date Description
7/30 Checkpoint for CRNN on IAM dataset has been released. You can test your English handwritten now
7/31 Checkpoint for CRNN on CASIA-HWDB2.x has been released. You can test your Chinese handwritten now
8/3 New Algorithms! ASTER is reimplemented here and checkpoint for scene text recognition is released
8/5 Checkpoint for ASTER on IAM dataset has beem released. It's much more accurate than CRNN due to attention model's implicit semantic information. You should not miss it😃
8/8 New Algorithms! DAN(Decoupled attention network) is reimplented. checkpoint forb both scene text and iam dataset are realesed
8/11 New Algorithms! ACE(Aggratation Cross-Entropy). It's a new loss function to handle text recognition task. Like CTC and Attention
8/17 Retrained ACE and DAN; Add a powerful augmentation tool
9/7 Training SRN and so on.

1. Welcome!😃😃

Now I'm focusing on a project to build a general ocr systems which can recognize different text domains. From scene text, hand written, document, chinese, english to even ancient books like confucian classics. So far I don't have a clear idea about how to do it, but let's just do it step by step. This repository is suitable for greens who are interesed in text recognition(I am a green too😂).


2. Contents👨‍💻👨‍💻

Part Description
Datasets Multible datasets in lmdb form
Alogrithms CRNN
ASTER
DAN
ACE
How to use Use
Checkpoints CheckPoints

Datasets

3.1 Scene Text Recognitons

3.1.1 Training Sets(Synthetic)

Dataset Description Examples BaiduNetdisk link
SynthText 9 million synthetic text instance images from a set of 90k common English words. Words are rendered onto nartural images with random transformations SynthText Scene text datasets(提取码:emco)
MJSynth 6 million synthetic text instances. It's a generation of SynthText. MJText Scene text datasets(提取码:emco)

3.1.2 Evaluation Sets(Real, and only provide test set)

Dataset Description Examples BaiduNetdisk link
IIIT5k-Words(IIIT5K) 3000 test images instances. Take from street scenes and from originally-digital images IIIT5K Scene text datasets(提取码:emco)
Street View Text(SVT) 647 test images instances. Some images are severely corrupted by noise, blur, and low resolution SVT Scene text datasets(提取码:emco)
StreetViewText-Perspective(SVT-P) 639 test images instances. It is specifically designed to evaluate perspective distorted textrecognition. It is built based on the original SVT dataset by selecting the images at the sameaddress on Google Street View but with different view angles. Therefore, most text instancesare heavily distorted by the non-frontal view angle. SVTP Scene text datasets(提取码:emco)
ICDAR 2003(IC03) 867 test image instances IC03 Scene text datasets(提取码:mfir)
ICDAR 2013(IC13) 1015 test images instances IC13 Scene text datasets(提取码:emco)
ICDAR 2015(IC15) 2077 test images instances. As text images were taken by Google Glasses without ensuringthe image quality, most of the text is very small, blurred, and multi-oriented IC15 Scene text datasets(提取码:emco)
CUTE80(CUTE) 288 It focuses on curved text recognition. Most images in CUTE have acomplex background, perspective distortion, and poor resolution CUTE Scene text datasets(提取码:emco)

3.2 Hand Written

Dataset Description Examples BaiduNetdisk link
IAM IAM dataset is based on handwritten English text copied from the LOB corpus. It contains 747 documents(6,482 lines) in the training set, 116 documents (976 lines)in the validation set and 336 documents (2,915 lines) in the testing set IAM IAM_line_level(提取码:u2a3)
CASIA-HWDB2.x CASIA-HWDB is a large-scale Chinese hand-written database. CASIA HWDB2.x(提取码:ozqu)

Algorithms

4.1 CRNN

4.1.1 On Scene Text

  • I reimplemented the most classic and wildly deployed algorithm CRNN. The orignal backbone is replaced by a modifyied ResNet and the results below are trained on MJ + ST.
# IIIT5K SVT IC03 IC13 IC15 SVTP CUTE
CRNN(reimplemented) 91.2 84.4 90.8 88.0 73.1 71.8 77.4
CRNN(original) 78.2 80.8 89.4 86.7 - - -
  • Some recognion results
Image GT Prediction
1 I am so sorry 'iamsosory'
2 I still love you 'istilloveyou'
3 Can we begin again 'canwebeginagain'
  • note that we only predict 0-9, a-z. No upper case and punctuations. If you want to predict them, you can modify the code

4.1.2 On Handwritten

  • Relative experiments are conducted on IAM dataset and CASIA-HWDB
Dataset Word Accuracy
IAM(line level) 67.2
CASIA-HWDB2.0-2.2 88.6
  • Some recognion results
Image GT Prediction
1 Just Somebody I Can Kiss 'Just Somebody I can kiss'
2 Just something I can turn to 'Just something I can turn to'
3 昨夜西风凋碧树,独上西楼,望尽天涯路。 '昨夜西风调瑟树,独上西楼。望尽天涯路'
4 衣带渐宽终不悔,为伊消得人憔悴 '衣带渐宽终不海,为伸消得人憔悴'
5 众里寻他千百度,蓦然回首,那人却在灯火阑珊处 '众里寻他千百度,暮然回首,那人却在灯火闻班然'
6 你好,** '你好,**'
7 欢迎来到重庆 '欢迎来到重庆'
  • Chinese handwritten are sufferd from imbalanced words contribution. So sometimes it's hard to recognize some rare words

4.2 ASTER

4.2.1 On Scene Text

  • ASTER is a classic text recognition algorithms with a TPS rectification network and attention decoder.
# IIIT5K SVT IC03 IC13 IC15 SVTP CUTE
ASTER(reimplemented) 92.9 88.1 91.2 88.6 75.9 78.3 78.5
ASTER(original) 91.93 88.76 93.49 89.75 # 74.11 73.26
  • Some recognion results
Image and Rectified Image GT Prediction
1 COLLEGE 'COLLEGE'
2 FOOTBALL 'FOOTBALL'
3 BURTON 'BURTON'

4.2.2 On Handwritten

  • Relative experiments are conducted on IAM dataset and CASIA-HWDB
Dataset Word Accuracy
IAM(line level) 69.8
CASIA-HWDB2.0-2.2 The model fails to convergence and I am still training
  • Some recognion results
Image GT Prediction
1 Coldplay is my favorate band 'Coldplay is my favorate band'
2 Night gathers and now my watch begins 'Night gathers and now my watch begins'
3 You konw nothing John Snow 'You konw nothing John snow'


DAN

4.3.1 On Scene Text

# IIIT5K SVT IC03 IC13 IC15 SVTP CUTE
DAN1D(reimplemented) 91.2 83.8 89.4 88.7 72.1 70.2 74.7
DAN1D(original) 93.3 88.4 95.2 94.2 71.8 76.8 80.6

4.3.2 On Handwritten

  • Relative experiments are conducted on IAM dataset and CASIA-HWDB
Dataset Word Accuracy
IAM(line level) 74.0
CASIA-HWDB2.0-2.2
  • Some recognion results
Image Prediction
1 'I have seen things you people would not believe lift'
2 'Attack ships on fire off the shoulder of Orien'
3 'I have watch bearans gitter in the does near the Tarhouser'
4 'All those moments will be lost in time'
5 'like tears in the rain'

ACE

4.4.1 On Scene Text

  • ACE is simple yet effective loss funciton. However, there is still a huge gap with CTC and Attention
# IIIT5K SVT IC03 IC13 IC15 SVTP CUTE
ACE(reimplemented) 84.8 76.7 84.0 82.6 65.3 64.8 68.8
ACE(original) 82.3 82.6 92.1 89.7 # # #

How to use

  • It's easy to start the training process. Firstly you need to download the datasets required.
  • Check the root
  scripts--
     ACE--
        CASIA_HWDB--
            train.sh
            test.sh
            inference.sh 
        iam_dataset--
            train.sh
            test.sh
            inference.sh
        scene_text--
            train.sh
            test.sh
            inference.sh
     ASTER--
        ...
     CRNN --
        ...
     DAB  --
        ...
  • let's say you want to train ACE on Scene text. Change the training and testing dataset path in scripts/ACE/scene_text/train.sh(The first two rows).
  • run
bash scripts/ACE/scene_text/train.sh
  • If you want to test the accuracy, follow the same step as training. Also, you need to set up the resume parameter in .sh. It's where the checkpoint is
  • run
bash scripts/ACE/scene_text/test.sh
  • To test a single image. Change the image path in corresponding .sh and the resume path
  • then run
bash scripts/ACE/scene_text/inference.sh

CheckPoints

CRNN

CRNN on Scene Text

CRNN on STR, Checkpoints(提取码:axf7)

CRNN on IAM dataset

CRNN on IAM, Checkpoints(提取码:3ajw)

CRNN on CASIA_HWDB dataset

CRNN on CASIA_HWDB, Checkpoints(提取码:ujpy)

ASTER

ASTER on Scene Text

ASTER on STR, Checkpoints(提取码:mcc9)

ASTER on IAM dataset

ASTER on IAM, Checkpoints(提取码:mqqm)

DAN

DAN on Scene Text

DAN on IAM dataset

DAN on IAM, Checkpoints(提取码:h7vp)

Email 📫

[email protected]

text-recognition-on-cross-domain-datasets's People

Contributors

mountchicken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

text-recognition-on-cross-domain-datasets's Issues

空格的问题

作者您好,我跑通了您的代码,但是有个小问题就是得出的结果和GT都不包含空格,但是我用的是包含空格的数据集,麻烦问一下是哪里出了问题了吗?
期待答复,万分感谢

执行casia_hwdb的inferrence错误

大神你的/CASIA_HWDB/inference.sh的 --alphabets casia_360cc \这个参数是不是写错了,应该写成所有的lable吧,这样写就只有这12个lable了,下面是我执行的错误。

.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ModelBuilder:
size mismatch for decoder.weight: copying a param with shape torch.Size([6018, 512]) from checkpoint, the shape in current model is torch.Size([12, 512]).
size mismatch for decoder.bias: copying a param with shape torch.Size([6018]) from checkpoint, the shape in current model is torch.Size([12]).

导出模型为onnx的时候遇到问题

我是使用的您的ResNet_CRNN_IAM模型进行训练手写汉字的数据集,但是将模型导出为onnx的时候遇到了问题,我看您在网络中定义了forward和inference(您的inferrence应该是笔误)两个方法,我给的输入dummy_input 是torch.randn(1, 3, 192, 2048)的时候,会报
x, rec_targets, rec_lengths = input_dict['images'],
IndexError: too many indices for tensor of dimension 4,这应该跟您在forward中定义需要读入input_dict 的字典作为输入参数有关,但我将dummy_input 换成这个字典的参数,还是无法正常生成。请问我应该怎么修改呢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.