jiangxiluning / master-tf Goto Github PK

View Code? Open in Web Editor NEW

138.0 7.0 43.0 73 KB

MASTER

License: MIT License

Python 100.00%

ocr ocr-recognition transformer deep-learning cv scene-text-recognition

master-tf's Introduction

MASTER-TensorFlow

TensorFlow reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This project is different from our original implementation that builds on the privacy codebase FastOCR of the company. You can also find PyTorch reimplementation at MASTER-pytorch repository, and the performance is almost identical. (PS. Logo inspired by the Master Oogway in Kung Fu Panda)

News

2021/07: MASTER-mmocr, reimplementation of MASTER by mmocr. @Jiaquan Ye
2021/07: TableMASTER-mmocr, 2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing Task B based on MASTER. @Jiaquan Ye
2021/07: Talk can be found at here (Chinese).
2021/05: Savior, which aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for RPA, is now integrated MASTER for captcha recognition. @Tao Luo
2021/04: Slides can be found at here.

Honors based on MASTER

1st place (2021/05) solution to ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX (Subtask I: Table structure reconstruction)
1st place (2021/05) solution to ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX (Subtask II: Table content reconstruction)
2nd place (2021/05) solution to ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table recognition
1st place (2020/10) solution to ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard (task2)
2nd and 5th places (2020/10) in The 5th China Innovation Challenge on Handwritten Mathematical Expression Recognition
4th place (2019/08) of ICDAR 2017 Robust Reading Challenge on COCO-Text (task2)
More will be released

Introduction

MASTER is a self-attention based scene text recognizer that (1) not only encodes the input-output attention, but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion and (3) owns a better training and evaluation efficiency. Overall architecture shown follows.

This repo contains the following features.

Multi-gpu Training
Greedy Decoding
Single image inference
Eval iiit5k
Convert Checkpoint to SavedModel format
Refactory codes to be more tensorflow-style and be more consistent to graph mode
Support tensorflow serving mode

Preparation

It is highly recommended that install tensorflow-gpu using conda.

Python3.7 is preferred.

pip install -r requirements.txt

Dataset

I use Clovaai's MJ training split for training.

please check src/dataset/benchmark_data_generator.py for details.

Eval datasets are some real scene text datasets. You can downloaded directly from here.

Training

# training from scratch
python train.py -c [your_config].yaml

# resume training from last checkpoint
python train.py -c [your_config].yaml -r

# finetune with some checkpoint
python train.py -c [your_config].yaml -f [checkpoint]

Eval

Since I made change to the usage of gcb block, the weight could not be suitable to HEAD. If you want to test the model, please use https://github.com/jiangxiluning/MASTER-TF/commit/85f9217af8697e41aefe5121e580efa0d6d04d92

Currently, you can download checkpoint from here with code o6g9, or from Google Driver, this checkpoint was trained with MJ and selected for the best performance of iiit5k dataset. Below is the comparision between pytorch version and tensorflow version.

Framework	Dataset	Word Accuracy	Training Details
Pytorch	MJ	85.05%	3 V100 4 epochs Batch Size: 3*128
Tensorflow	MJ	85.53%	2 2080ti 4 epochs Batch Size: 2 * 50

Please download the checkpoint and model config from here with code o6g9 and unzip it, and you can get this metric by running:

python eval_iiit5k.py --ckpt [checkpoint file] --cfg [model config] -o [output dir] -i [iiit5k lmdb test dataset]

The checkpoint file argument should be ${where you unzip}/backup/512_8_3_3_2048_2048_0.2_0_Adam_mj_my/checkpoints/OCRTransformer-Best

Tensorflow Serving

For tensorflow serving, you should use savedModel format, I provided test case to show you how to convert a checkpoint to savedModel and how to use it.

pytest -s tests/test_units::test_savedModel  #check the test case test_savedModel in tests/test_units
pytest -s tests/test_units::test_loadModel  # call decode to inference and get predicted transcript and logits out.

Citations

If you find MASTER useful please cite our paper:

@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

Thanks to the authors and their repo:

master-tf's People

Contributors

Stargazers

Watchers

Forkers

sunxingxingtf yanggui19891007 cqray1990 zhangzoffy lwzbuaa tanhao1983 uptodiff dlreseach advancer-debug sporterman tukjet qiu-pinggaizi delveintodetail gandalf012 wenwenyu meicsu199345 harmonicahappy tengerye xgmiao gds101054108 loovelj sazass zhangxiao339 msjyyt huangdaoqin benjamesbabala chadpieere pywangyu xiaolang564321 markhsia dikubab huyhoang17 gehongpeng panfei748 maxkinny222 amulyakali aspnetcs aniketgurav tunahansalih bbsvip airyym

master-tf's Issues

train_net.py 裡計算 step 和 epoch 總是不對

訓練的時候，總是差一個 step 並在下個 epoch 才完成，且 epoch 跟著錯誤 (logger 如下圖所示)，想請教一下可能的出錯情況是?

pytorch implementation

where can I find your pytorch implementation ?

what's the difference between transformer_tf and transformer.py ?

About inference time

作者您好，我使用您的网络进行测试，硬件环境也是V100，但FPS大概在4左右，averaged_infer_time: 203.730ms，和论文所给的9.22ms相差较远，是否存在什么问题？
另外，我为了和deep-text-recognition进行对比，将网络输入大小改为32*100，基于MJ+ST/lower_case训练了4个epoch，精度如下：
accuracy: IIIT5k_3000: 0.845 SVT: 0.804 IC03_860: 0.910 IC03_867: 0.908 IC13_857: 0.896 IC13_1015: 0.891 IC15_1811: 0.675 IC15_2077: 0.595 SVTP: 0.693 CUTE80: 0.667
total_accuracy: 0.779
好像这个精度相较于TPS-BLSTM-Attn, SAR, 另一篇基于transformer的网络([https://arxiv.org/pdf/1906.05708.pdf]))都会有一点偏低，无法看出网络的优越性。如何能实现论文所给的精度呢？

write OCR in file log

Hi, thank you very much for your reply in the past. I have a problem how can I output the recognition result to a .log file with the following format: "image_path predicted_labels confidence score". Thank you and look forward to your help.

Upload checkpoint to Google Drive

Thanks for your work.
Can you upload your checkpoint to google drive or another cloud ? I can't download it from baidu

english character or chinese words recognition?

which one ?

请问这个模型对长文本的识别和CRNN比较，哪个效果好？

之前看文章说基于attention的对长文本文字识别没有crnn效果好，针对文档类，非自然场景文字。不知道这个模型如何？

raise FileNotFoundError('LMDB file is not found. {}'.format(lmdb_path)) FileNotFoundError: LMDB file is not found. \data_lmdb_release\training\MJ\MJ_train

Hello, I appreciate your research, when I run the training, I get the following error. Please help, thanks.

h被设置为1了？

@jiangxiluning 看代码，h被设置为1了。请问是在实际实践有什么trick?

测试精度问题

您好，我使用您提供的weights，在IIIT5K数据集上测试，但Word Accuracy只有0.41，不知道是否有什么问题？

代码怎样进行单张图片的推理啊？

The training phase converges quickly (acc>0.95), but the validate result is very bad (acc<0.3)

您好，我用我自己的数据集（汉英，真实场景160k数据量）进行实验，发现训练很快收敛，但是验证结果很差，您出现过这种情况吗？我想的话，这是不是因为这种结构和输入方式，相当于设置了teaching_forcing = 1，很容易就导致过拟合了。

Hello, when I initialized the model, I only saw the decoder, not the GCAttention part

Can you help me explain

模型输入尺寸是限定的吗

您好，请问模型的输入尺寸是限定的吗？我测试的时候发现我用长一点的文本框，识别就会出错。