Code Monkey home page Code Monkey logo

hyper-table-ocr's Introduction

Hyper-Table-OCR

A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.

This pipeline covers image preprocessing, table detection(optional), text OCR, table cell extraction, table reconstruction.

Are you seeking ideas for your own work? Visit my blog post on Hyper-Table-OCR to see more!

Update on 2021-08-20: Happy to see that Baidu has released their PP-Structure, which provides higher robustness due to its DL-driven structure prediction feature, instead of simple matching in our work.

Demo

gif demo

Demo Video (In English): YouTube

Hyper Table Recognition: A carefully-designed Table OCR pipeline

Demo Video (In Chinese): Bilibili

Features

  • Flexible modular architecture: by deriving from predefined abstract class, any module of this pipeline can be easily swapped to your preferred one. See the following Want to contribute? part!
  • A simple yet highly legible web interface.
  • A table reconstruction strategy based simply on coordinates of each cell, including identifying merged cell row & building table structure.
  • More to explore...

Getting Started

Clone this repo

git clone https://github.com/MrZilinXiao/Hyper-Table-Recognition
cd Hyper-Table-Recognition

Download weights

Download from here: GoogleDrive

MD5: (004fabb8f6112d6d43457c681b435631 models.zip)

Unzip it and make sure the directory layout matchs:

# ~/Hyper-Table-Recognition$ tree -L 1
.
├── models
├── app.py
├── config.yml
├── ...

Install Dependencies

This project is developed and tested on:

  • Ubuntu 18.04
  • RTX 3070 with Driver 455.45.01 & CUDA 11.1 & cuDNN 8.0.4
  • Python 3.8.3
  • PyTorch 1.7.0+cu110
  • Tensorflow 2.5.0
  • PaddlePaddle 2.0.0-rc1
  • mmdetection 2.7.0
  • onnxruntime-gpu 1.6.0

An NVIDIA GPU device is compulsory for reasonable inference duration, while GPU with less than 6GB VRAM may experience Out of Memory exception when loading multiple models. You may comment some models in web/__init__.py if experiencing such situation.

No version-specific framework feature is used in this project, so this means you could still enjoy it with lower versions of these frameworks. However, at this time(19th Dec, 2020), users with RTX 3000 Series device may have no access to compiled binary of Tensorflow, onnxruntime-gpu, mmdetection, PaddlePaddle via pip or conda.

Some building tutorials for Ubuntu are as follows:

Confirm all deep learning frameworks installation via:

python -c "import tensorflow as tf; print(tf.__version__); import torch; print(torch.__version__); import paddle; print(paddle.__version__); import onnxruntime as rt; print(rt.__version__); import mmdet; print(mmdet.__version__)"

Then install other necessary libraries via:

pip install -r requirements.txt

Enjoy!

python app.py

Visit http://127.0.0.1:5000 to see the main page!

Performance

Inference time consumption is highly related with following factors:

  • Complexity of table structure
  • Number of OCR blocks
  • Resolution of selected image

A typical inference time consumption is shown in Demo Video.

Want to contribute?

Contribute a new cell extractor

In boardered/extractor.py, we define a TraditionalExtractor based on traditional computer vision techniques and a UNetExtractor based on UNet pixel-level sematic segmentation model. Feel free to derive from the following abstract class:

class CellExtractor(ABC):
    """
    A unified interface for boardered extractor.
    OpenCV & UNet Extractor can derive from this interface.
    """

    def __init__(self):
        pass

    def get_cells(self, ori_img, table_coords) -> List[np.ndarray]:
        """
        :param ori_img: original image
        :param table_coords: List[np.ndarray], xyxy coord of each table
        :return: List[np.ndarray], [[xyxyxyxy(cell1), xyxyxyxy(cell2)](table1), ...]
        """
        pass

Contribute a new OCR Module

Located in ocr/__init__.py, you should build a custom OCR handler deriving from OCRHandler.

class OCRHandler(metaclass=abc.ABCMeta):
    """
    Handler for OCR Support
    An abstract class, any OCR implementations may derive from it
    """

    def __init__(self, *kw, **kwargs):
        pass

    def get_result(self, ori_img):
        """
        Interface for OCR inference
        :param ori_img: np.ndarray
        :return: dict, in following format:
        {'sentences': [['麦格尔特杯表格OCR测试表格2', [[85.0, 10.0], [573.0, 30.0], [572.0, 54.0], [84.0, 33.0]], 0.9],...]}
        """
        pass

Contribute to the process pipeline

WebHandler.pipeline() in web/__init__.py

Future Plans

  • Speed up inference via async-processing on dual GPUs.

Congratulations! This project earns a GRAND PRIZE(2 out of 72 participators) of the aforementioned competition!

Acknowledgement

  • PaddleOCR: Multilingual, awesome, leading, and practical OCR tools supported by Baidu.
  • ChineseOCR_lite: Super light OCR inference tool kit.
  • CascadeTabNet: An automatic table recognition method for interpretation of tabular data in document images.
  • pytorch-hed: An unofficial implementation of Holistically-Nested Edge Detection using PyTorch.
  • table-detect: Excellent work providing us with the U-Net code and pretrained weight.

hyper-table-ocr's People

Contributors

mrzilinxiao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.