Code Monkey home page Code Monkey logo

lightweight-ocr's Introduction

lightweight-OCR

Description

Project of lightweight OCR contest.[url]

The best accuracy(score) is 0.7254, ranking 36th.

We tried different algorithms like CRNN, RARE, StarNet, and finally found CRNN is particularly useful in Chinese character recognition. Given this result, we adopted CRNN and focus on 1. obtaining a pruned (large-sparse) model from a large model and 2. designing a compact (small-dense) model.

Large-sparse Model

Pruning is a common method to derive a large-sparse network, and it usually contains pre-training a model, pruning strategy, and fine-tuning. PaddleSlim supported L1-norm, L2-norm, and FPGM based pruning strategies. We also adopted EgaleEye, which proposed adaptive batch normalization to fast and accurately evaluate the network. (See work/prune.py)

We split the original training dataset into two subsets with split percentages of 80% training and 20% validation using random selection.

We use ResNet-18 as the backbone to trained a recognition model on the new training dataset normally from scratch as a baseline. After training, the large model achieved 0.7587 accuracy on the validation dataset.

Run work/prune.py to get the sensitivities of each layer.

After pruning 90% FLOPs and fine-tuning, we get a large-sparse model with an accuracy of 0.7177 and an allocated size of 8.9MB. Squeeze-and-Excitation(SE) can further improve the accuracy of the large-sparse model from 0.7177 to 0.7202 but increase allocated size to 9.8MB.

Overall, these results indicate that pruning a large model(ResNet18) to get an ultra-lightweight model is challenging. In our case, we pruned 90% FLOPs, causing accuracy to drop rapidly from 0.7587 to 0.7177. With an allocated size limited to 10MB, it is not easy to introduce more modules like SE into the network to improve accuracy further.

Small-dense Model

In recent years, there has been an increasing amount of literature on designing small-dense models for the optimal trade-off between accuracy and efficiency. PaddleOCR use MobileNetV3 as backbone to design ultra-lightweight model for Chinese character recognition. In this project, we train MobileNetV3-small-0.5 as our baseline with an accuracy of 0.6836.

Coordinate Attention

SE module are integraed in a MobileNetV3. Some researchers proposed Coordinate Attention(CA) to replace SE for a better performance. We also try to replace SE module by CA module but the accuracy drop from 0.6836 to 0.6786. However, CA module reduces parameters and the allocated size of the model from 5.87MB to 4.49MB and from 6.9MB to 5.9MB, respectively.

Meta-ACON

MobileNetV3 introduces a new, fast, and quantization-friendly nonlinearity, h-swish function. Recently, TFNet proposed a novel activation function called ACON that explicitly learns to ACtivate the neurons Or Not. ACON-C function contains three learnable parameters p1, p2, and beta while MetaACON-C build a small network to learn beta. We try to replace HSwish and Relu with MetaACON-C, and we found that it is slower in back-propagation. The accuracy is 0.6787, and the allocated size is 10.5MB. Further experimental investigations are needed to estimate the practicality of MetaACON-C.

FPN

Inspired by Feature Pyramid Networks(FPN) and Dynamic Feature Pyramid Networks(DyFPN), we designed four FPNs to aggregate multi-scale feature information in recognition model.

FPN-A, FPN-B

We use to denote concatenation here. Given a list of input features with different scales , the output features are aggregated as

where denotes the level of pyramid. denotes convolution operation with different strides. denotes the resizing operation i.e. upsampling with a scale factor of (2,1).

Finally we sums the aggregated features to get output as:

In FPN-A, input features come from the first feature in state whereas in FPN-B, input features come from the last feature in state .

FPN-A achieved an accuracy of 0.7161 and allocated 4.8MB, whereas FPN-B achieved an accuracy of 0.7248 and allocated 7.5MB. The structures of FPN-A and FPN-B are shown below.

FPN-C

We redefine aggregated features and output as

where is convolution operation with different output channels. Compared to FPN-B, FPN-C improved accuracy to 0.7290 with allocated size of 7.3MB. The structure of FPN-C is shown below.

FPN-D

Inspired by DyFPN, is added by more convolution operations with three different kernel size as:

FPN-D reached highest accuracy of 0.7319 with allocated size of 8.5MB. The structure of FPN-D is shown below.

Implementation Details

Details of different models can refer to Table.

We use Adam optimizer with parameters beta1 of 0.9 and beta2 of 0.999 to train all models for 200 epochs, setting learning rate to 0.001, regularization parameter to 1e-5 and adpoting cosine learning rate schedule.

Conclusion

This project was undertaken to obtain a lightweight model from model pruning and compact model designing. One of the more significant findings to emerge from this project is that Feature Pyramid Networks(FPN) outputs aggregated features and helps to imrpove the recognition accuracy. Our final model reaches a training accuracy of 0.9670625 but a test accuracy of 0.735, which occurs overfitting. Therefore, the project was limited in several ways. First, the hyperparameters of training are not optimal. Adjust regularization parameter might alleviate overfitting. Second, the data is not preprocessed well. Data cleaning and data augmentation technology, e.g., removing dirty data, generating more data, can improve accuracy. Third, the neck and head of the model can be further optimized. Inspired by PlugNet, we also try to plug in a super-resolution unit to solve the low-quality text recognition problem but did not obtain a considerable result.

Overall Directory Structure

The overall directory structure of lightweight-OCR is introduced as follows:

lightweight-OCR   
├── 1833844.ipynb
├── data
│   ├── data87683
│   └── data87685
├── output
├── PaddleOCR
├── PaddleSlim
├── README.md
└── work
    ├── configs
    ├── label_dict.txt
    ├── label.txt
    ├── ppocr
    │   └── modeling
    │       ├── backbones
    │       └── necks
    └── tools
        ├── egaleeye_prune.py
        ├── export_pruned_model.py
        ├── infer
        ├── model_summary.py
        ├── prune.py
        └── train.py

Installation

Requirements:

  • Python 3.7.10
  • CUDA 10.1
  • PaddleOCR-release/2.1
  • PaddleSlim-release/2.0.0
  • PaddlePaddle-2.0.2

Instruction

Train:

Run command

python PaddleOCR/tools/train.py -c work/configs/rec_mobilev3_small_1_train

Eval:

Run command

python PaddleOCR/tools/eval.py -c work/configs/rec_mobilev3_small_1_train -o Global.checkpoints=./output/rec_mobilev3_small_1.0/best_accuracy

Test:

Run command to export inference model

python PaddleOCR/tools/export.py -c work/config/rec_mobilev3_small_1_train.yml -o Global.checkpoints=./output/rec_mobilev3_small_1.0/best_accuracy Global.save_inference_dir=./output/rec_mobilev3_small_1.0/

inference model will be exported in output/rec_mobilev3_small_1.0/inference

Change 'rec_model_dir' and run command

python PaddleOCR/tools/infer/predict_rec.py --image_dir=./data/test_images/A榜测试数据集/TestAImages/ --rec_char_dict_path=./work/label_dict.txt --rec_model_dir=./output/rec_mobilev3_small_1.0/

result file will be saved in output/%Y-%m-%d-%H-%M-%S.log.

Performance

Algorithm Backbone Neck Trick Socre Model Size Model Link
CRNN MobileNetV3-small-1.0 48BiGRU None 0.6836 6.9MB link
CRNN MobileNetV3-small-1.0 48BiGRU SE->CA 0.6786 5.9MB link
CRNN MobileNetV3-small-1.0 48BiGRU ReLu, H-Swish -> MetaAconC 0.6787 10.5MB link
CRNN MobileNetV3-small-1.0 48BiGRU FPN-A 0.7161 4.8MB link
CRNN MobileNetV3-large-0.5 96BiGRU FPN-A, 200epoch->500epoch 0.7243 7.6MB link
CRNN MobileNetV3-small-1.0 48BiGRU FPN-A, MaxPool->BlurPool 0.7145 4.8MB link
CRNN MobileNetV3-small-1.0 48BiGRU FPN-B 0.7248 7.5MB link
CRNN MobileNetV3-small-1.0 48BiGRU FPN-C 0.7290 7.3MB link
CRNN MobileNetV3-small-1.0 48BiGRU FPN-D 0.7319 8.5MB link
CRNN ResNet18 64BiGRU Prune 90% FLOPs 0.7177 8.9MB link
CRNN ResNet18SE 64BiGRU Prune 91% FLOPs 0.7202 9.8MB link
CRNN ResNet18 48BiGRU Prune 90% FLOPs 0.7076 8.9MB link
CRNN ResNet18SE 48BiGRU Prune 90% FLOPs 0.7087 9.1MB link
RARE MobileNetV3-samll-0.5 32BiGRU Remove TPS 0.4329 9.2MB link
Star MobileNetV3-samll-1.0 48BiGRU FPN-B 0.7093 15.4MB link
CRNN MobileNetV3-large-0.5 72BiGRU FPN-D 0.735 9.3MB link

lightweight-ocr's People

Contributors

yuksing12 avatar diegowongsiu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.