Code Monkey home page Code Monkey logo

chinese_ocr's Introduction

本文基于tensorflow、keras/pytorch实现对自然场景的文字检测及端到端的OCR中文文字识别

功能

  • 文字检测 实现keras端到端的文本检测及识别(项目里面有两个模型keras和pytorch。)
  • 不定长OCR识别

Ubuntu下环境构建

Bash
##GPU环境
sh setup-python3-gpu.sh

##CPU python3环境
sh setup-python3-cpu.sh

##额外依赖的安装包
apt install graphviz
pip3 install graphviz
pip3 install pydot
pip3 install torch torchvision

模型

  • 一共分为3个网络
  • 1. 文本方向检测网络-Classify(vgg16)
  • 2. 文本区域检测网络-CTPN(CNN+RNN)
  • 3. EndToEnd文本识别网络-CRNN(CNN+GRU/LSTM+CTC)

文字方向检测-vgg分类

基于图像分类,在VGG16模型的基础上,训练0、90、180、270度检测的分类模型.
详细代码参考angle/predict.py文件,训练图片8000张,准确率88.23%

模型地址[BaiduCloud](链接:https://pan.baidu.com/s/1Sqbnoeh1lCMmtp64XBaK9w 提取码:n2v4)

文字区域检测CTPN

支持CPU、GPU环境,一键部署, 文本检测训练参考

OCR 端到端识别:CRNN

ocr识别采用GRU+CTC端到到识别技术,实现不分隔识别不定长文字

提供keras 与pytorch版本的训练代码,在理解keras的基础上,可以切换到pytorch版本,此版本更稳定

使用

体验

运行demo.py或者pytorch_demo.py(建议) 写入测试图片的路径即可,如果想要显示ctpn的结果,修改文件./ctpn/ctpn/other.py 的draw_boxes函数的最后部分,cv2.inwrite('dest_path',img),如此,可以得到ctpn检测的文字区域框以及图像的ocr识别结果

  • 在进行体验的时候,注意要更改里面的一些内容(比如模型文件等)

模型训练

1 对ctpn进行训练

  • 定位到路径--./ctpn/ctpn/train_net.py
  • 预训练的vgg网络路径[VGG_imagenet.npy](链接:https://pan.baidu.com/s/1jzrcCr0tX6xAiVoolVRyew 提取码:a5ze ) 将预训练权重下载下来,pretrained_model指向该路径即可, 此外整个模型的预训练权重[checkpoint](链接:https://pan.baidu.com/s/1oS6_kqHgmcunkooTAXE8GA 提取码:xmjv )
  • ctpn数据集还是百度云 数据集下载完成并解压后,将.ctpn/lib/datasets/pascal_voc.py 文件中的pascal_voc 类中的参数self.devkit_path指向数据集的路径即可

2 对crnn进行训练

  • keras版本 ./train/keras_train/train_batch.py model_path--指向预训练权重位置 MODEL_PATH---指向模型训练保存的位置 [keras模型预训练权重](链接:https://pan.baidu.com/s/14cTCedz1ESnj0mM9ISm__w 提取码:1kb9)
  • pythorch版本./train/pytorch-train/crnn_main.py
parser.add_argument(
    '--crnn',
    help="path to crnn (to continue training)",
    default=预训练权重的路径,看你下载的预训练权重在哪啦)
parser.add_argument(
    '--experiment',
    help='Where to store samples and models',
    default=模型训练的权重保存位置,这个自己指定)

[pytorch预训练权重](链接:https://pan.baidu.com/s/1kAXKudJLqJbEKfGcJUMVtw 提取码:9six)

文字检测及OCR识别结果

ctpn原始图像1 =========================================================== ctpn检测1 =========================================================== ctpn+crnn结果1

主要是因为训练的时候,只包含中文和英文字母,因此很多公式结构是识别不出来的

在跑的过程中遇到了问题,请联系

邮箱:[email protected]

参考

chinese_ocr's People

Contributors

pengcao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chinese_ocr's Issues

好问题

为什么可以识别例子里的的图片,test里面的其他图片识别的时候会报错
已放弃 核心已转储

大佬好,请问keras版本号多少..

Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_cls_score/Reshape_1:0", shape=(?, ?, ?, 20), dtype=float32)
Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
Tensor("Reshape_2:0", shape=(?, ?, ?, 20), dtype=float32)
Tensor("rpn_bbox_pred/Reshape_1:0", shape=(?, ?, ?, 40), dtype=float32)
Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
2019-11-08 09:57:43.611347: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Tensor_name is : rpn_conv/3x3/biases
Tensor_name is : rpn_cls_score/weights
Tensor_name is : rpn_bbox_pred/biases
Tensor_name is : lstm_o/weights
Tensor_name is : lstm_o/bidirectional_rnn/fw/lstm_cell/bias
Tensor_name is : lstm_o/bidirectional_rnn/bw/lstm_cell/kernel
Tensor_name is : lstm_o/bidirectional_rnn/bw/lstm_cell/bias
Tensor_name is : conv5_3/weights
Tensor_name is : conv5_3/biases
Tensor_name is : lstm_o/biases
Tensor_name is : conv5_2/weights
Tensor_name is : conv2_2/weights
Tensor_name is : conv1_1/weights
Tensor_name is : conv4_2/weights
Tensor_name is : conv2_2/biases
Tensor_name is : conv2_1/biases
Tensor_name is : conv1_2/weights
Tensor_name is : conv4_1/biases
Tensor_name is : conv2_1/weights
Tensor_name is : rpn_cls_score/biases
Tensor_name is : conv1_2/biases
Tensor_name is : rpn_conv/3x3/weights
Tensor_name is : conv3_1/weights
Tensor_name is : conv4_3/weights
Tensor_name is : conv3_2/biases
Tensor_name is : rpn_bbox_pred/weights
Tensor_name is : conv3_2/weights
Tensor_name is : lstm_o/bidirectional_rnn/fw/lstm_cell/kernel
Tensor_name is : conv3_3/biases
Tensor_name is : conv5_2/biases
Tensor_name is : conv5_1/weights
Tensor_name is : conv3_3/weights
Tensor_name is : conv4_1/weights
Tensor_name is : conv1_1/biases
Tensor_name is : conv4_2/biases
Tensor_name is : conv3_1/biases
Tensor_name is : conv4_3/biases
Tensor_name is : conv5_1/biases
load vggnet done
Using TensorFlow backend.
Traceback (most recent call last):
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 665, in _load_unlocked
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/zhaoyulu/web/chinese_ocr/model.py", line 16, in
from ocr.model import predict as ocr
File "/home/zhaoyulu/web/chinese_ocr/ocr/model.py", line 8, in
import keras.backend as K
File "/usr/local/lib/python3.6/dist-packages/keras/init.py", line 3, in
from . import utils
File "/usr/local/lib/python3.6/dist-packages/keras/utils/init.py", line 6, in
from . import conv_utils
File "/usr/local/lib/python3.6/dist-packages/keras/utils/conv_utils.py", line 9, in
from .. import backend as K
File "/usr/local/lib/python3.6/dist-packages/keras/backend/init.py", line 1, in
from .load_backend import epsilon
File "/usr/local/lib/python3.6/dist-packages/keras/backend/load_backend.py", line 90, in
from .tensorflow_backend import *
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 54, in
get_graph = tf_keras_backend.get_graph
AttributeError: module 'tensorflow.python.keras.backend' has no attribute 'get_graph'

方便的话请尽量多包的版本号,十分感谢

win10下面跑程序不能实现

D:\QQ\chinese_ocr-master\chinese_ocr-master> python pytorch_demo.py
Using TensorFlow backend.
Traceback (most recent call last):
File "pytorch_demo.py", line 8, in
import pytorch_model as model
File "D:\QQ\chinese_ocr-master\chinese_ocr-master\pytorch_model.py", line 12, in
from ctpn.text_detect import text_detect
File "D:\QQ\chinese_ocr-master\chinese_ocr-master\ctpn\text_detect.py", line 3, in
from .ctpn.detectors import TextDetector
File "D:\QQ\chinese_ocr-master\chinese_ocr-master\ctpn\ctpn\detectors.py", line 10, in
from ..lib.fast_rcnn.nms_wrapper import nms
File "D:\QQ\chinese_ocr-master\chinese_ocr-master\ctpn\lib_init_.py", line 1, in
from . import fast_rcnn
File "D:\QQ\chinese_ocr-master\chinese_ocr-master\ctpn\lib\fast_rcnn_init_.py", line 2, in
from . import nms_wrapper
File "D:\QQ\chinese_ocr-master\chinese_ocr-master\ctpn\lib\fast_rcnn\nms_wrapper.py", line 2, in
from ..utils.cython_nms import nms as cython_nms
File "D:\QQ\chinese_ocr-master\chinese_ocr-master\ctpn\lib\utils_init_.py", line 1, in
from . import bbox
File "D:\QQ\chinese_ocr-master\chinese_ocr-master\ctpn\lib\utils\bbox.py", line 9
cimport numpy as np
^
SyntaxError: invalid syntax

about one_hot function in trainbach.py

你好,one_hot 函数默认长度是10,当超过活着小于10时,label值多余的都被标注0了,0的位置是‘,这会影响结果?这样标注不会错?

文字方向识别

我用vgg训练了四个方向的分类,4万的数据,效果不理想。请问文字方向识别数据集是怎么样?

你好,执行sh setup-python3-cpu.sh出错了,请帮忙看一下

ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-th7tv7ue/grpcio/setup.py'"'"'; file='"'"'/tmp/pip-install-th7tv7ue/grpcio/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-th7tv7ue/grpcio/pip-egg-info
cwd: /tmp/pip-install-th7tv7ue/grpcio/
Complete output (2 lines):
Found cython-generated files...
error in grpcio setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.