Code Monkey home page Code Monkey logo

iternet's Introduction

IterVM: Iterative Vision Modeling Module for Scene Text Recognition

The official code of IterNet.

We propose IterVM, an iterative approach for visual feature extraction which can significantly improve scene text recognition accuracy. IterVM repeatedly uses the high-level visual feature extracted at the previous iteration to enhance the multi-level features extracted at the subsequent iteration.

framework

Runtime Environment

pip install -r requirements.txt

Note: fastai==1.0.60 is required.

Datasets

Training datasets (Click to expand) 1. [MJSynth](http://www.robots.ox.ac.uk/~vgg/data/text/) (MJ): - Use `tools/create_lmdb_dataset.py` to convert images into LMDB dataset - [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ) 2. [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) (ST): - Use `tools/crop_by_word_bb.py` to crop images from original [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) dataset, and convert images into LMDB dataset by `tools/create_lmdb_dataset.py` - [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ) 3. [WikiText103](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip), which is only used for pre-trainig language models: - Use `notebooks/prepare_wikitext103.ipynb` to convert text into CSV format. - [CSV dataset BaiduNetdisk(passwd:dk01)](https://pan.baidu.com/s/1yabtnPYDKqhBb_Ie9PGFXA)
Evaluation datasets (Click to expand) - Evaluation datasets, LMDB datasets can be downloaded from [BaiduNetdisk(passwd:1dbv)](https://pan.baidu.com/s/1RUg3Akwp7n8kZYJ55rU5LQ), [GoogleDrive](https://drive.google.com/file/d/1dTI0ipu14Q1uuK4s4z32DqbqF3dJPdkk/view?usp=sharing). 1. ICDAR 2013 (IC13) 2. ICDAR 2015 (IC15) 3. IIIT5K Words (IIIT) 4. Street View Text (SVT) 5. Street View Text-Perspective (SVTP) 6. CUTE80 (CUTE)
The structure of `data` directory (Click to expand) - The structure of `data` directory is ``` data ├── charset_36.txt ├── evaluation │   ├── CUTE80 │   ├── IC13_857 │   ├── IC15_1811 │   ├── IIIT5k_3000 │   ├── SVT │   └── SVTP ├── training │   ├── MJ │   │   ├── MJ_test │   │   ├── MJ_train │   │   └── MJ_valid │   └── ST ├── WikiText-103.csv └── WikiText-103_eval_d1.csv ```

Pretrained Models

Get the pretrained models from GoogleDrive. Performances of the pretrained models are summaried as follows:

Model IC13 SVT IIIT IC15 SVTP CUTE AVG
IterNet 97.9 95.1 96.9 87.7 90.9 91.3 93.8

Training

  1. Pre-train vision model
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --config=configs/pretrain_vm.yaml
    
  2. Pre-train language model
    CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
    
  3. Train IterNet
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --config=configs/train_iternet.yaml
    

Note:

  • You can set the checkpoint path for vision model (vm) and language model separately for specific pretrained model, or set to None to train from scratch

Evaluation

CUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_iternet.yaml --phase test --image_only

Additional flags:

  • --checkpoint /path/to/checkpoint set the path of evaluation model
  • --test_root /path/to/dataset set the path of evaluation dataset
  • --model_eval [alignment|vision] which sub-model to evaluate
  • --image_only disable dumping visualization of attention masks

Run Demo

google colab logo

python demo.py --config=configs/train_iternet.yaml --input=figures/demo

Additional flags:

  • --config /path/to/config set the path of configuration file
  • --input /path/to/image-directory set the path of image directory or wildcard path, e.g, --input='figs/test/*.png'
  • --checkpoint /path/to/checkpoint set the path of trained model
  • --cuda [-1|0|1|2|3...] set the cuda id, by default -1 is set and stands for cpu
  • --model_eval [alignment|vision] which sub-model to use
  • --image_only disable dumping visualization of attention masks

Citation

If you find our method useful for your reserach, please cite

@article{chu2022itervm,
  title={IterVM: Iterative Vision Modeling Module for Scene Text Recognition},
  author={Chu, Xiaojie and Wang, Yongtao},
  journal={26th International Conference on Pattern Recognition (ICPR)},
  year={2022}
}

License

The project is only free for academic research purposes, but needs authorization for commerce. For commerce permission, please contact [email protected].

Acknowledgements

This project is based on ABINet. Thanks for their great works.

iternet's People

Contributors

achusky avatar vdigpku avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.