lightweight-OCR

Description

Project of lightweight OCR contest.[url]

The best accuracy(score) is 0.7254, ranking 36th.

We tried different algorithms like CRNN, RARE, StarNet, and finally found CRNN is particularly useful in Chinese character recognition. Given this result, we adopted CRNN and focus on 1. obtaining a pruned (large-sparse) model from a large model and 2. designing a compact (small-dense) model.

Large-sparse Model

Pruning is a common method to derive a large-sparse network, and it usually contains pre-training a model, pruning strategy, and fine-tuning. PaddleSlim supported L1-norm, L2-norm, and FPGM based pruning strategies. We also adopted EgaleEye, which proposed adaptive batch normalization to fast and accurately evaluate the network. (See work/prune.py)

We split the original training dataset into two subsets with split percentages of 80% training and 20% validation using random selection.

We use ResNet-18 as the backbone to trained a recognition model on the new training dataset normally from scratch as a baseline. After training, the large model achieved 0.7587 accuracy on the validation dataset.

Run work/prune.py to get the sensitivities of each layer.

After pruning 90% FLOPs and fine-tuning, we get a large-sparse model with an accuracy of 0.7177 and an allocated size of 8.9MB. Squeeze-and-Excitation(SE) can further improve the accuracy of the large-sparse model from 0.7177 to 0.7202 but increase allocated size to 9.8MB.

Overall, these results indicate that pruning a large model(ResNet18) to get an ultra-lightweight model is challenging. In our case, we pruned 90% FLOPs, causing accuracy to drop rapidly from 0.7587 to 0.7177. With an allocated size limited to 10MB, it is not easy to introduce more modules like SE into the network to improve accuracy further.

Small-dense Model

In recent years, there has been an increasing amount of literature on designing small-dense models for the optimal trade-off between accuracy and efficiency. PaddleOCR use MobileNetV3 as backbone to design ultra-lightweight model for Chinese character recognition. In this project, we train MobileNetV3-small-0.5 as our baseline with an accuracy of 0.6836.

Coordinate Attention

SE module are integraed in a MobileNetV3. Some researchers proposed Coordinate Attention(CA) to replace SE for a better performance. We also try to replace SE module by CA module but the accuracy drop from 0.6836 to 0.6786. However, CA module reduces parameters and the allocated size of the model from 5.87MB to 4.49MB and from 6.9MB to 5.9MB, respectively.

Meta-ACON

MobileNetV3 introduces a new, fast, and quantization-friendly nonlinearity, h-swish function. Recently, TFNet proposed a novel activation function called ACON that explicitly learns to ACtivate the neurons Or Not. ACON-C function contains three learnable parameters p1, p2, and beta while MetaACON-C build a small network to learn beta. We try to replace HSwish and Relu with MetaACON-C, and we found that it is slower in back-propagation. The accuracy is 0.6787, and the allocated size is 10.5MB. Further experimental investigations are needed to estimate the practicality of MetaACON-C.

FPN

Inspired by Feature Pyramid Networks(FPN) and Dynamic Feature Pyramid Networks(DyFPN), we designed four FPNs to aggregate multi-scale feature information in recognition model.

FPN-A, FPN-B

We use $\oplus$ to denote concatenation here. Given a list of input features with different scales $\{F3,F4,F5\}$ , the output features $\{P3,P4,P5\}$ are aggregated as

$P_l = f_l(F_l \oplus R(F_{l+1})), l = 3,4$

where denotes the level of pyramid. denotes $3\times 3$ convolution operation with different strides. denotes the resizing operation i.e. upsampling with a scale factor of (2,1).

Finally we sums the aggregated features to get output as:

$O = P_5 \oplus P_4 \oplus P_3$

In FPN-A, input features $\{F3,F4,F5\}$ come from the first feature in state $\{3, 4, 5\}$ whereas in FPN-B, input features come from the last feature in state $\{3, 4, 5\}$ .

FPN-A achieved an accuracy of 0.7161 and allocated 4.8MB, whereas FPN-B achieved an accuracy of 0.7248 and allocated 7.5MB. The structures of FPN-A and FPN-B are shown below.

FPN-C

We redefine aggregated features $\{P3, P4, P5\}$ and output as

$P_l = Conv_{1\times 1}(F_l \oplus R(P_{l+1})), l = 3,4$

$O = P_5 \oplus f_4(P_4) \oplus f_3(P_3)$

where $Conv_{1\times 1}$ is $1\times 1$ convolution operation with different output channels. Compared to FPN-B, FPN-C improved accuracy to 0.7290 with allocated size of 7.3MB. The structure of FPN-C is shown below.

FPN-D

Inspired by DyFPN, is added by more convolution operations with three different kernel size as:

$P_l = Conv_{1\times 1}(F_l \oplus R(P_{l+1})) + Conv_{3\times 3}(F_l \oplus R(P_{l+1})) + Conv_{5\times 5}(F_l \oplus R(P_{l+1})), l = 3,4$

FPN-D reached highest accuracy of 0.7319 with allocated size of 8.5MB. The structure of FPN-D is shown below.

Implementation Details

Details of different models can refer to Table.

We use Adam optimizer with parameters beta1 of 0.9 and beta2 of 0.999 to train all models for 200 epochs, setting learning rate to 0.001, regularization parameter to 1e-5 and adpoting cosine learning rate schedule.

Conclusion

This project was undertaken to obtain a lightweight model from model pruning and compact model designing. One of the more significant findings to emerge from this project is that Feature Pyramid Networks(FPN) outputs aggregated features and helps to imrpove the recognition accuracy. Our final model reaches a training accuracy of 0.9670625 but a test accuracy of 0.735, which occurs overfitting. Therefore, the project was limited in several ways. First, the hyperparameters of training are not optimal. Adjust regularization parameter might alleviate overfitting. Second, the data is not preprocessed well. Data cleaning and data augmentation technology, e.g., removing dirty data, generating more data, can improve accuracy. Third, the neck and head of the model can be further optimized. Inspired by PlugNet, we also try to plug in a super-resolution unit to solve the low-quality text recognition problem but did not obtain a considerable result.

Overall Directory Structure

The overall directory structure of lightweight-OCR is introduced as follows:

lightweight-OCR   
├── 1833844.ipynb
├── data
│   ├── data87683
│   └── data87685
├── output
├── PaddleOCR
├── PaddleSlim
├── README.md
└── work
    ├── configs
    ├── label_dict.txt
    ├── label.txt
    ├── ppocr
    │   └── modeling
    │       ├── backbones
    │       └── necks
    └── tools
        ├── egaleeye_prune.py
        ├── export_pruned_model.py
        ├── infer
        ├── model_summary.py
        ├── prune.py
        └── train.py

Installation

Requirements:

Python 3.7.10
CUDA 10.1
PaddleOCR-release/2.1
PaddleSlim-release/2.0.0
PaddlePaddle-2.0.2

Instruction

Train:

Run command

python PaddleOCR/tools/train.py -c work/configs/rec_mobilev3_small_1_train

Eval:

Run command

python PaddleOCR/tools/eval.py -c work/configs/rec_mobilev3_small_1_train -o Global.checkpoints=./output/rec_mobilev3_small_1.0/best_accuracy

Test:

Run command to export inference model

python PaddleOCR/tools/export.py -c work/config/rec_mobilev3_small_1_train.yml -o Global.checkpoints=./output/rec_mobilev3_small_1.0/best_accuracy Global.save_inference_dir=./output/rec_mobilev3_small_1.0/

inference model will be exported in output/rec_mobilev3_small_1.0/inference

Change 'rec_model_dir' and run command

python PaddleOCR/tools/infer/predict_rec.py --image_dir=./data/test_images/A榜测试数据集/TestAImages/ --rec_char_dict_path=./work/label_dict.txt --rec_model_dir=./output/rec_mobilev3_small_1.0/

result file will be saved in output/%Y-%m-%d-%H-%M-%S.log.

Performance

Algorithm	Backbone	Neck	Trick	Socre	Model Size	Model Link
CRNN	MobileNetV3-small-1.0	48BiGRU	None	0.6836	6.9MB	link
CRNN	MobileNetV3-small-1.0	48BiGRU	SE->CA	0.6786	5.9MB	link
CRNN	MobileNetV3-small-1.0	48BiGRU	ReLu, H-Swish -> MetaAconC	0.6787	10.5MB	link
CRNN	MobileNetV3-small-1.0	48BiGRU	FPN-A	0.7161	4.8MB	link
CRNN	MobileNetV3-large-0.5	96BiGRU	FPN-A, 200epoch->500epoch	0.7243	7.6MB	link
CRNN	MobileNetV3-small-1.0	48BiGRU	FPN-A, MaxPool->BlurPool	0.7145	4.8MB	link
CRNN	MobileNetV3-small-1.0	48BiGRU	FPN-B	0.7248	7.5MB	link
CRNN	MobileNetV3-small-1.0	48BiGRU	FPN-C	0.7290	7.3MB	link
CRNN	MobileNetV3-small-1.0	48BiGRU	FPN-D	0.7319	8.5MB	link
CRNN	ResNet18	64BiGRU	Prune 90% FLOPs	0.7177	8.9MB	link
CRNN	ResNet18SE	64BiGRU	Prune 91% FLOPs	0.7202	9.8MB	link
CRNN	ResNet18	48BiGRU	Prune 90% FLOPs	0.7076	8.9MB	link
CRNN	ResNet18SE	48BiGRU	Prune 90% FLOPs	0.7087	9.1MB	link
RARE	MobileNetV3-samll-0.5	32BiGRU	Remove TPS	0.4329	9.2MB	link
Star	MobileNetV3-samll-1.0	48BiGRU	FPN-B	0.7093	15.4MB	link
CRNN	MobileNetV3-large-0.5	72BiGRU	FPN-D	0.735	9.3MB	link

yuksing12 / lightweight-ocr Goto Github PK

lightweight-ocr's Introduction