ycg09 / chinese_ocr Goto Github PK

View Code? Open in Web Editor NEW

2.7K 90.0 1.1K 89.96 MB

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

License: Apache License 2.0

Python 97.09% C++ 0.07% Shell 0.30% Cuda 2.53%

chinese_ocr's Introduction

简介

基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别

文本检测：CTPN
文本识别：DenseNet + CTC

环境部署

sh setup.sh

注：CPU环境执行前需注释掉for gpu部分，并解开for cpu部分的注释

Demo

将测试图片放入test_images目录，检测结果会保存到test_result中

python demo.py

模型训练

CTPN训练

详见ctpn/README.md

DenseNet + CTC训练

1. 数据准备

数据集：https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码：lu7m)

共约364万张图片，按照99:1划分成训练集和验证集
数据利用中文语料库（新闻 + 文言文），通过字体、大小、灰度、模糊、透视、拉伸等变化随机生成
包含汉字、英文字母、数字和标点共5990个字符
每个样本固定10个字符，字符随机截取自语料库中的句子
图片分辨率统一为280x32

图片解压后放置到train/images目录下，描述文件放到train目录下

2. 训练

cd train
python train.py

3. 结果

val acc	predict	model
0.983	8ms	18.9MB

GPU: GTX TITAN X
Keras Backend: Tensorflow

4. 生成自己的样本

可参考SynthText_Chinese_version，TextRecognitionDataGenerator和text_renderer

效果展示

参考

[1] https://github.com/eragonruan/text-detection-ctpn

[2] https://github.com/senlinuc/caffe_ocr

[3] https://github.com/chineseocr/chinese-ocr

[4] https://github.com/xiaomaxiao/keras_ocr

chinese_ocr's People

Contributors

Stargazers

Watchers

Forkers

haoht angleguan lesley96-11 peterwon zhanzecheng wangkechun alexliyang feeloho lxj0276 samelltiger leftstone2015 william-stocks pustar robingong shubhampachori12110095 yxnt badbubble xikunlun001 jihanfly mukever rotorliu cronaldo1997 zhangsn828 hubeibei007 adiffm kiddding bigrlab flycocopig fyobe szdree horaccefeng jliangqiu matrixplayer zoujuny fresty hallochen anazou blackarrow3542 iceinmyeye tabzhangjx felixxyq diaodou xjtuwj yanzhezhangleon lengjiyi stayhigh haswelliris kyocen yingpcao simmoncn oujunke matrixping chavez-zhu polarbear abner2015 darknessp wyj2046 yefengyun ocean1100 qxtian gds101054108 daoyijushi webworkeryang aileader zhangyuteng hellozhuo xuewengeophysics locke637 saigege mingliumengshao sanster sanshibayuan shankeai hzyxjtu zooever fireae pengyulong huaqiaouniversity ivansong1988 icaffe frank1481906280 qiaoxie menfershare xggiou apgzs xiaolaodi wxb506 aiedward altiplanogao-theirs abao365 10183308 leichangqing dyjng xuweitj juliping qinqiang1990 anguoyang muyilangjun1 unclestrange magellen

chinese_ocr's Issues

运行demo.py不出结果？

楼主
运行demo.py，结果如下：
root@145f47f2e811:/data/chinese_ocr# python demo.py
2018-05-11 06:01:19.437201: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-05-11 06:01:19.437315: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-05-11 06:01:19.437474: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-05-11 06:01:19.437667: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-05-11 06:01:19.437881: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
/usr/local/lib/python2.7/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_cls_score/Reshape_1:0", shape=(?, ?, ?, 20), dtype=float32)
Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
Tensor("Reshape_2:0", shape=(?, ?, ?, 20), dtype=float32)
Tensor("rpn_bbox_pred/Reshape_1:0", shape=(?, ?, ?, 40), dtype=float32)
Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
Loading network VGGnet_test...
Restoring from ./ctpn/checkpoints/VGGnet_fast_rcnn_iter_50000.ckpt...
done
Using TensorFlow backend.
Killed

在相应目录没产生结果，请教怎么回事？

训练集合只能定长吗？

您好，我目前在您这份代码的基础上改动后在训练一个集合，对您这个模型感觉很棒。但我现在想知道能否改变训练集合的长度，能否让训练集的长度任意？或者是改变长度？，但是我们训练的时候希望，希望您能百忙之中抽出时间为我解答。

能否修改数据字典

我在数据字典中删除了些字符然后又加了些生僻字字符，然后在此字典上生成训练样本建模得出模型后去识别，完全识别不正确。

训练出现问题

Not enough time for target transition sequence (required: 38, available: 35)58You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs

不知道是不是因为数据集的原因，我的数据集是不定字长的，然后resize成了280*32，然后运行train.py的时候出错了。

CUDA_HOME vs CUDAHOME

In setup.py, CUDAHOME is checked, while CUDA_HOME is more used, as from this link

Please check for it too.

怎么多gpu训练

多gpu训练怎么设置，加了model=multi_gpu_model(model, gpus=2)，运行出错

运行demo,a结果是一堆乱码

我想请教下：
运用您已经提供的模型，我自己保存的模型运行出来是别的都是乱码，是怎么回事呢？
万分感谢！

大佬，模型中识别的时候，想要输出单个字的置信度，要怎么办？给点思路吧，谢谢

关于训练集的dictionary 和训练集问题

我想请教下关于训练集的问题：
下载下来的训练集中的txt文件对图案里每个字符的描述都是一个 int
那是不是有一个dictionary可以查看Int对应的character呢？
如果我想自己加一些训练图片到这个训练集中一起训练，改怎么生成对应的txt呢

感谢~

训练的问题

楼主你好，请问你是直接就用这个代码对数据进行训练的么，我用自己的数据训练后效果感觉不如你的。因为用了自己的数据，我把字典进行了更改，字符都是8个字符的数据，改了maxlength，请问还需要改什么地方么

准确率很高,但损失率比较大

Epoch 25/30
267/267 [=======] - 91s 340ms/step - loss: 0.0931 - acc: 0.9965 - val_loss: 0.3158 - val_acc: 0.9878
Epoch 26/30
267/267 [[=======] - 91s 339ms/step - loss: 0.1212 - acc: 0.9946 - val_loss: 0.2125 - val_acc: 0.9844
Epoch 27/30
267/267 [[=======] - 91s 340ms/step - loss: 0.0894 - acc: 0.9964 - val_loss: 0.3993 - val_acc: 0.9878
Epoch 28/30
267/267 [[=======] - 91s 340ms/step - loss: 0.1295 - acc: 0.9952 - val_loss: 0.5801 - val_acc: 0.9826

用自己的数据训练时，如上，准确率很高，但损失率比较大，请问楼主这是什么原因，怎么样调整

图片对应文字与字典的对应关系

你好，我想改写一下你的输入端，然后在 train/train,py 的104行见到如下代码：
labels[i, :len(str)] =[int(i) - 1 for i in str]
在生成器函数 gen() 中你并没有将输入的字符串按照字典（5990字）的位置一个个对应并改成index的形式输入，而是使用 int(i) 的方法去做，我这个理解不知道有没有错？如果是这样的话，那就是你的5990字典按照utf8编码之后应该是一串从 u'0001 到 u'5990 的字符串？如果我要换字典，我就改成通过如下就可以了？

dict = open('char_str').readlines()
label = []
for idx, char in enumerate(str):
if char in dict:
label[idx] = dict.index(char) + 1

由于之前很少用keras, API不熟，多谢指教。

英文识别-数据集

想同时识别英文和汉字，有没有合适的数据集进行训练

请问一下报已放弃(吐核) 什么意思呢?

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc 已放弃(吐核)

训练不定宽的图片

训练样本都是32*280的图片，请问我可以训练不定宽的图片吗？因为自己的样本label的长度不同，resize后又容易失真。

能不能发一份小一点的数据集，我电脑性能差，解压不了那么大的数据

应用的问题

您好，我正在做一个车辆保险单中姓名，车牌号，地址等信息，以及行驶证上对应信息的检测识别工作。尝试用demo检测了下，发现有些关键信息区域未被定位出来，而且定位出的区域识别准确度也较差。请问能否在预训练的基础上，增加相关样本进一步训练以提高效果，或者是否有其他更好的方法。期待你的建议，谢谢！

baidu网盘下载Synthetic Chinese String Dataset 解压有问题

显示
Unexpected end of archive

        796352447                    322723

一共才能认出322723个文件

自己训练的模型识别结果是乱码

请问楼主,我采用您的代码,自己做了几个数据集,但是每次都是乱码,请问这是怎么回事呢,应该怎么解决呀
Recognition Result:

蝇俛蝇俛蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇
蝇蝇蝇蝇蝇健蝇蝇蝇蝇蝇拽蝇蝇蝇蝇蝇蝇蝇健蝇蝇蝇健蝇蝇蝇蝇蝇蝇
蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇膜蝇蝇蝇蝇蝇蝇蝇

dic.txt

这个中文的dic.txt 是干什么用的训练网络的时候用的到吗

自己训练出的模型直接使用会报错？

自己训练出的模型是包括ctc和loss参数的，直接在densenet/models下使用会报错：
ValueError: Dimension 1 in both shapes must be equal, but are 5990 and 17765 for 'Assign_159' (op: 'Assign') with input shapes: [768,5990], [768,17765].

求大神指导怎么修改模型?

如何使用训练好的.h5模型文件？能写个demo吗？

请教大神，如何使用训练好的.h5模型文件？能写个demo吗？

运行demo.py出错，请问是哪里出问题了？

请教一下，在跑完demo后提示内存泄漏是什么原因呢

如下所示，在terminal里输出了识别结果后，会跟两句内存泄漏的提示

512324198106272975
公民身份号码
swig/python detected a memory leak of type 'int64_t *', no destructor found.
swig/python detected a memory leak of type 'int64_t *', no destructor found.

关于ctpn和densenet之间数据传递的疑问

楼主，我十分好奇图片经过ctpn再传入densenet里面的是个什么东西？还是一张图片吗？这个图片和原图有什么区别？

data_train.txt的生成方式

请问楼主，训练中所需要的data_train.txt是怎样生成的，应该是每个字对应在字典中的序号是么，有生成脚本么

训练数据格式

你好，请问训练样本data_train.txt，标签只能是对应字符集里面的数字吗，可以直接改成中文吗？比如说：“001.jpg 湖边南路”这种。因为我看到实在decode阶段才把label找到对应char_std的文字，所以具体要在哪里进行更改呢

请问怎么识别出空格

目前空格识别不出来，比如识别出的英文单词间没有空格，前后直接粘连到一起，有什么办法可以解决这个问题吗？

请问楼主制作样本的时候都用了哪些字体

我自己制作样本生成的模型测试的时候识别不出来，但是用工程自带的模型能识别，只是准确率不高。后来发现是字体问题，所以想咨询下楼主都使用了哪些字体？

运行sh文件出错

AttributeError: 'MSVCCompiler' object has no attribute 'compiler_so

什么时候能把生成识别样本的代码共享下

感谢作者提供了这么优秀的OCR项目，但是我在用自己生成的样本去训练模型一直都达不到想要的效果，语料库、字体、背景图都试过还是不太好还不如使用作者原本模型的效果。虽然觉得很无理但还是想问问楼主能否提供下生成样本的工程看看

训练好的模型，测试的输出结果是空

我在楼主的训练集基础上添加了10万张训练集，对应的标签也都添加了。训练结果如下：
Epoch 7/10
773/773 [==============================] - 9231s 12s/step - loss: 0.8825 - acc: 0.9929 - val_loss: 0.8609 - val_acc: 0.9933
Epoch 00007: early stopping

我将训练好的Epoch 7对应的模型移到路径：chinese_ocr/densenet/models/weights_densenet.h5下，然后进行测试，发现输出结果是空，试了好几张照片都是空。是什么原因呢？

modelPath = './models/pretrain_model/keras.h5'

在train.py中，modelPath = './models/pretrain_model/keras.h5'这句主要是什么作用？
我在./models/pretrain_model中没看到keras.h5文件

vgg vs densenet

Hi，之前的版本ctpn和crnn都是基于vgg的，你有测过densenet对比vgg的时间吗？还有你提供的准确率是怎么算出来的？是edit distance吗？

是不是没有训练空格字符

环境部署问题

windows下用cygwin运行setup.sh报出如下错误，请问该怎么解决呢？

Traceback (most recent call last):
File "setup_cpu.py", line 57, in
cmdclass={'build_ext': custom_build_ext},
File "D:\Program Files\Python\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "D:\Program Files\Python\lib\distutils\dist.py", line 955, in run_commands
self.run_command(cmd)
File "D:\Program Files\Python\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "D:\Program Files\Python\lib\site-packages\Cython\Distutils\build_ext.py", line 164, in run
_build_ext.build_ext.run(self)
File "D:\Program Files\Python\lib\distutils\command\build_ext.py", line 339, in run
self.build_extensions()
File "setup_cpu.py", line 37, in build_extensions
customize_compiler_for_nvcc(self.compiler)
File "setup_cpu.py", line 23, in customize_compiler_for_nvcc
default_compiler_so = self.compiler_so
AttributeError: 'MSVCCompiler' object has no attribute 'compiler_so'
mv: 无法获取'utils/*' 的文件状态(stat): No such file or directory

训练的疑问

（1）请问densenet.py文件中的代码：dense_cnn(input, 5000)，其中的5000代表分类类个数，为什么作者和提供的数据集个数不一致（5990类），如果要训练自己的数据集是不是这个数字要改掉成自己，比如我要分1000类就改成dense_cnn(input, 5000)。
（2）请问train.py中的 maxlabellength = 10，是不是代表标签长度不能大于10？
（3）请问如果训练作者提供的数据集，多少次的时候acc才不为0, 我一直为0.
我的配置GPU：1070Ti 8G，显存小，batch_size改为了80.
谢谢，麻烦谁了解解答下。

关于DensNet+CTC的一些疑问

您好，我有一些疑问。
CTPN检测出的文本区域宽度是不固定的，而在代码中，只把检测出的文本区域高度resize为32，而宽度只是对应放缩，但并没有resize成280。那么DenseNet+CTC最大可预测的字符数是不是width/8。
那么如果我新增训练集，是不是也可以只固定高度32，而不限制宽度，标签数是否可以设置为（0，width/8）区间内的任意值。
期待您的解答，谢谢！

遇到连续数字的时候，效果很差

比如666.66
500，000，00.00
几乎都识别错误
有人遇到这种情况吗？或者可以测试一下。

demo

请问微调您的模型后，跑新模型，相同的输入，每次重新跑输出结果都不同，这是为什么

ctpn安装错误

++ python setup.py build_ext --inplace running build_ext skipping 'bbox.c' Cython extension (up-to-date) building 'utils.bbox' extension creating build creating build/temp.linux-x86_64-3.6 {'gcc': ['-Wno-cpp', '-Wno-unused-function']} gcc -pthread -B /home/test/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/test/anaconda3/lib/python3.6/site-packages/numpy/core/include -I/home/test/anaconda3/include/python3.6m -c bbox.c -o build/temp.linux-x86_64-3.6/bbox.o -Wno-cpp -Wno-unused-function creating /home/test/orc_gc/chinese_ocr-master/ctpn/lib/utils/utils gcc -pthread -shared -B /home/test/anaconda3/compiler_compat -L/home/test/anaconda3/lib -Wl,-rpath=/home/test/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/bbox.o -o /home/test/orc_gc/chinese_ocr-master/ctpn/lib/utils/utils/bbox.cpython-36m-x86_64-linux-gnu.so /home/test/anaconda3/compiler_compat/ld: cannot find -lpthread /home/test/anaconda3/compiler_compat/ld: cannot find -lc collect2: error: ld returned 1 exit status error: command 'gcc' failed with exit status 1

请问我是缺少了什么必要的库吗

Densnet+CTC 和Densenet+BLSTM+CTC

你好，请问直接densnet连接ctc和加上lstm再连ctc哪个效果会比较好，作者有试过吗？
如果想加lstm的话，训练代码需要做什么改变吗

setup.sh 第一行install 少了个l

小问题，我不提pr了。

请问运行train.py需要的软件版本

我的本机软件版本为python2.7，keras2.14，tensorflow-gpu1.3.0，可以正常运行demo程序。但在进行模型训练时，出现以下错误：
-----------Start training-----------
Epoch 1/10
Traceback (most recent call last):
File "train.py", line 173, in
callbacks = [checkpoint, earlystop, changelr, tensorboard])
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 2212, in fit_generator
generator_output = next(output_generator)
File "/usr/local/lib/python2.7/dist-packages/keras/utils/data_utils.py", line 779, in get
six.reraise(value.class, value, value.traceback)
File "/usr/local/lib/python2.7/dist-packages/keras/utils/data_utils.py", line 644, in _data_generator_task
generator_output = next(self._generator)
File "train.py", line 108, in gen
labels[i, :len(str)] =[int(i) - 1 for i in str]
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
swig/python detected a memory leak of type 'int64_t *', no destructor found.
swig/python detected a memory leak of type 'int64_t *', no destructor found.

我将keras升级到最新版本2.16和旧版本keras2.08，均显示keras部分出现异常。采用python3.4运行train.py，也不能正常运行。

请问各位，大家都能正常运行train.py吗？我的软件配置的版本存在什么问题吗？
非常感谢！

如何用自己的数据集进行训练？

Hello，我研究了一下你的代码，发现你在测试时没有对输入图像的尺寸进行限制，但是在训练时图像的大小好像限制在280x32，我现在想用自己的数据集进行训练，但是图像的尺寸不统一，我该从哪里着手呢？@YCG09

训练时出现的问题

我运行train.py时,出现
File "train.py", line 113, in gen
labels[i,:len(str)] =[int(i) - 1 for i in str]
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
我查看了字符类型都是int型,想请问一下该问题如何解决?谢谢