Code Monkey home page Code Monkey logo

master-tf's Introduction

MASTER-TensorFlow

TensorFlow reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This project is different from our original implementation that builds on the privacy codebase FastOCR of the company. You can also find PyTorch reimplementation at MASTER-pytorch repository, and the performance is almost identical. (PS. Logo inspired by the Master Oogway in Kung Fu Panda)

News

  • 2021/07: MASTER-mmocr, reimplementation of MASTER by mmocr. @Jiaquan Ye
  • 2021/07: TableMASTER-mmocr, 2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing Task B based on MASTER. @Jiaquan Ye
  • 2021/07: Talk can be found at here (Chinese).
  • 2021/05: Savior, which aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for RPA, is now integrated MASTER for captcha recognition. @Tao Luo
  • 2021/04: Slides can be found at here.

Honors based on MASTER

Introduction

MASTER is a self-attention based scene text recognizer that (1) not only encodes the input-output attention, but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion and (3) owns a better training and evaluation efficiency. Overall architecture shown follows.

This repo contains the following features.

  • Multi-gpu Training
  • Greedy Decoding
  • Single image inference
  • Eval iiit5k
  • Convert Checkpoint to SavedModel format
  • Refactory codes to be more tensorflow-style and be more consistent to graph mode
  • Support tensorflow serving mode

Preparation

It is highly recommended that install tensorflow-gpu using conda.

Python3.7 is preferred.

pip install -r requirements.txt

Dataset

I use Clovaai's MJ training split for training.

please check src/dataset/benchmark_data_generator.py for details.

Eval datasets are some real scene text datasets. You can downloaded directly from here.

Training

# training from scratch
python train.py -c [your_config].yaml

# resume training from last checkpoint
python train.py -c [your_config].yaml -r

# finetune with some checkpoint
python train.py -c [your_config].yaml -f [checkpoint]

Eval

Since I made change to the usage of gcb block, the weight could not be suitable to HEAD. If you want to test the model, please use https://github.com/jiangxiluning/MASTER-TF/commit/85f9217af8697e41aefe5121e580efa0d6d04d92

Currently, you can download checkpoint from here with code o6g9, or from Google Driver, this checkpoint was trained with MJ and selected for the best performance of iiit5k dataset. Below is the comparision between pytorch version and tensorflow version.

Framework Dataset Word Accuracy Training Details
Pytorch MJ 85.05% 3 V100 4 epochs Batch Size: 3*128
Tensorflow MJ 85.53% 2 2080ti 4 epochs Batch Size: 2 * 50

Please download the checkpoint and model config from here with code o6g9 and unzip it, and you can get this metric by running:

python eval_iiit5k.py --ckpt [checkpoint file] --cfg [model config] -o [output dir] -i [iiit5k lmdb test dataset]

The checkpoint file argument should be ${where you unzip}/backup/512_8_3_3_2048_2048_0.2_0_Adam_mj_my/checkpoints/OCRTransformer-Best

Tensorflow Serving

For tensorflow serving, you should use savedModel format, I provided test case to show you how to convert a checkpoint to savedModel and how to use it.

pytest -s tests/test_units::test_savedModel  #check the test case test_savedModel in tests/test_units
pytest -s tests/test_units::test_loadModel  # call decode to inference and get predicted transcript and logits out.

Citations

If you find MASTER useful please cite our paper:

@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

Thanks to the authors and their repo:

master-tf's People

Contributors

dependabot[bot] avatar harmonicahappy avatar jiangxiluning avatar meicsu199345 avatar wenwenyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

master-tf's Issues

About inference time

作者您好,我使用您的网络进行测试,硬件环境也是V100,但FPS大概在4左右,averaged_infer_time: 203.730ms,和论文所给的9.22ms相差较远,是否存在什么问题?
另外,我为了和deep-text-recognition进行对比,将网络输入大小改为32*100,基于MJ+ST/lower_case训练了4个epoch,精度如下:
accuracy: IIIT5k_3000: 0.845 SVT: 0.804 IC03_860: 0.910 IC03_867: 0.908 IC13_857: 0.896 IC13_1015: 0.891 IC15_1811: 0.675 IC15_2077: 0.595 SVTP: 0.693 CUTE80: 0.667
total_accuracy: 0.779
好像这个精度相较于TPS-BLSTM-Attn, SAR, 另一篇基于transformer的网络([https://arxiv.org/pdf/1906.05708.pdf]))都会有一点偏低,无法看出网络的优越性。如何能实现论文所给的精度呢?

write OCR in file log

Hi, thank you very much for your reply in the past. I have a problem how can I output the recognition result to a .log file with the following format: "image_path predicted_labels confidence score". Thank you and look forward to your help.

测试精度问题

您好,我使用您提供的weights,在IIIT5K数据集上测试,但Word Accuracy只有0.41,不知道是否有什么问题?

模型输入尺寸是限定的吗

您好,请问模型的输入尺寸是限定的吗?我测试的时候发现我用长一点的文本框,识别就会出错。

关于transformer_tf的Decoder中可能存在的一个bug

在transformer.py的Decoder中,call函数如下:

def call(self, x, memory, src_mask, tgt_mask, training=False):
        T = tf.shape(x)[1]
        x = x + self.decoder_pe[:, :T]

        for layer in self.layers:
            x = layer(x, memory, src_mask, tgt_mask, training=training)
        return self.norm(x)

可以看到这里的x应该已经是embedding了,
但是在transformer_tf.py的Decoder中, call函数如下:

def call(self, x, enc_output, training,
             look_ahead_mask, padding_mask):

        seq_len = tf.shape(x)[1]
        attention_weights = {}

        x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            x, block1, block2 = self.dec_layers[i](x, enc_output, training,
                                                   look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

这里的x显然是int32类型的label。
这两个地方的实现有diff,阅读了其他代码之后我确信transformer_tf.py中Decoder的call函数对x的处理应该省略开头一段取embedding的操作.
不知道论文中的实验结果是哪个版本的?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.