oyxhust / cnn-lstm-ctc-text-recognition Goto Github PK

CNN and LSTM model for text recognition

Python 100.00%

cnn-lstm-ctc-text-recognition's Introduction

CNN-LSTM-CTC text recognition

I realize three different models for text recognition, and all of them consist of CTC loss layer to realize no segmentation for text images.

Disclaimer

I refer to the official mxnet warpctc example here.

Getting started

Build MXNet with Baidu Warp CTC, and please following this instructions here.

When I use this official instructions to add Baidu Warp CTC to Mxnet, there are some errors because the latest version of Baidu Warp CTC has complicts with mxnet. Recently, I see someone has already solved this problem and updated the official mxnet warpctc example. However, if you still have problem, please refer to this issue here.

Generating data

Run generate_data.py in generate_data. When generating training and test data, please remember to change output path and number in generate_data.py (I will update a more friendly way to generate training and test data when I have free time).

Train the model

I realize three different models for text recognition, you can check them in symbol:

LSTM + CTC;
Bidirection LSTM + CTC;
CNN (a modified model similiar to VGG) + Bidirection LSTM + CTC. Disclaimer: This CNN + LSTM + CTC model is a re-implementation of original CRNN which is based on torch. The official repository is available here. The arxiv paper is available here.

Start training:

LSTM + CTC:

python train_lstm.py

Bidirection LSTM + CTC:

python train_bi_lstm.py

CNN + Bidirection LSTM + CTC:

python train_crnn.py

Prediction

You can do the prediction using your trained model. I only write the predictors for model 1 and model 3, but it is very easy to write the predictor for model 2 when referring to the examples.

Plesae run:

python lstm_predictor.py

python crnn_predictor.py

cnn-lstm-ctc-text-recognition's People

Contributors

Stargazers

Watchers

Forkers

tianyealex winjia diggerdu allensmile baiyancheng20 zhangxinnan jdc08161063 wanjinchang lyk125 stevenlol wjssx wonyonyon chagge ericustc kuyun-zhangyang fireae dengcy028 ml-ai-nlp-ir fancyerii hxl1990 benjamesbabala leezqcst sunspring320 xshhhm twinsyssy1018 dyz-zju skdkisdi blackspadeace aovoc l1aoxingyu tobechao dreadlord1984 wenyafei4 shesung mit456 likelyzhao zhuzzjlu zt706 yingning balancewing cool-lab qaisarrajput sswjzx kyocen colinsongf pickou xiaolei89tw onebaicai chikamune wyw636 gangooteli qwzhong1988 zgsxwsdxg pengyulong alphaziggy ghhong1986 rosssong deep-learning alongwithyou blankworld niucheney crazyvertigo lijuny dushulang miudodo lllhhhqqq lukaschen1986 melvinmaonn lengjiyi batermj devadattaprasad cherish24 2016xjtuzyt linecode mayuanxun jackandrome guker ryan2x cyjxuanwu careytian0405 guoyin90 hell-to-heaven jwfan-econ bigyueyue iq-scm

cnn-lstm-ctc-text-recognition's Issues

ImportError: No module named mxnet

I have used your version of mxnet and warpctc downloaded from your BaiduYun. I built mxnet and warpctc successfully. Also, I enabled warpctc in mxnet. However, when I run the train_lstm.py, the problem went that "import mxnet as mx" ImportError: No module named mxnet. Then I realized that i should add mxnet path in the environment. I add "export PYTHONPATH=~/mxnet/python" at the end of .bashrc. When I echo $PATH in the command line, the result shows that /home/qian/mxnet/python is successfully added. However, when i run the train_lstm.py again, the problem still existed. How can I solve this problem? Thank you!

Have a problem in mxnet WarpCTC

When I run this program in lstm.py file have a error, How can i deal with. Thanks
sm = mx.sym.WarpCTC(data=pred, label=label, label_length = num_label, input_length = seq_len)

D:\DeepLearning_Demo\CNN_LSTM_CTC\CNN-LSTM-CTC-text-recognition-master>python train_lstm.py
Traceback (most recent call last):
File "train_lstm.py", line 196, in
symbol = sym_gen(SEQ_LENGTH)
File "train_lstm.py", line 187, in sym_gen
num_label = num_label)
File "D:\DeepLearning_Demo\CNN_LSTM_CTC\CNN-LSTM-CTC-text-recognition-master\symbol\lstm.py", line 79, in lstm_unroll
sm = mx.sym.WarpCTC(data=pred, label=label, label_length = num_label, input_length = seq_len)
AttributeError: module 'mxnet.symbol' has no attribute 'WarpCTC'

Why is the input size to warpctc num_classes?

 Thank you for your job.
  I found the last layers, the input layer to warpCTC is as following:

hidden_concat = mx.sym.Concat(*hidden_all, dim=0) pred = mx.sym.FullyConnected(data=hidden_concat, num_hidden=num_classes)

So the input size to warpctc is num_classes, Is it a small number ?

why the Implement of Blstm is different to the paper describle?

I tried to train crnn with chinese scene text,i noticed that it's so slow than crnn in pytorch version~
i checked the code and i found the blstm is so different to the author's version,how do you think about it?

work with my picture data(410*35),the accuracy still keep 0.0000,besides the num_label==11,and classes num reach 1000.

DeprecationWarning: mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.

Is the new version of mxnet changed the functions?
python train_bi_lstm.py
train_bi_lstm.py:204: DeprecationWarning: mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.
initializer=mx.init.Xavier(factor_type="in", magnitude=2.34))
INFO:root:begin fit
/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/model.py:530: DeprecationWarning: Calling initializer with init(str, NDArray) has been deprecated.please use init(mx.init.InitDesc(...), NDArray) instead.
self.initializer(k, v)
INFO:root:Start training with [gpu(0)]
[17:51:56] src/c_api/c_api_ndarray.cc:133: GPU support is disabled. Compile MXNet with USE_CUDA=1 to enable GPU support.
[17:51:56] /disk/data/mxnet/dmlc-core/include/dmlc/logging.h:304: [17:51:56] src/c_api/c_api_ndarray.cc:390: Operator _zeros is not implemented for GPU.

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f45d3c4b55c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(Z20ImperativeInvokeImplRKN5mxnet7ContextERKN4nnvm9NodeAttrsEPSt6vectorINS_7NDArrayESaIS8_EESB+0x9ac) [0x7f45d4915d8c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x254) [0x7f45d49162a4]
[bt] (3) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f45d713fadc]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc) [0x7f45d713f40c]
[bt] (5) /data/tensorflow/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48e) [0x7f45d73565fe]
[bt] (6) /data/tensorflow/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x15f9e) [0x7f45d7357f9e]
[bt] (7) python(PyEval_EvalFrameEx+0x98d) [0x5244dd]
[bt] (8) python(PyEval_EvalCodeEx+0x2b1) [0x555551]
[bt] (9) python(PyEval_EvalFrameEx+0x1a10) [0x525560]

Traceback (most recent call last):
File "train_bi_lstm.py", line 222, in
epoch_end_callback = mx.callback.do_checkpoint(prefix, 1))
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/model.py", line 830, in fit
sym_gen=self.sym_gen)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/model.py", line 210, in _train_multi_device
logger=logger)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/executor_manager.py", line 326, in init
self.slices, train_data)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/executor_manager.py", line 238, in init
input_types=data_types)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/executor_manager.py", line 152, in _bind_exec
arg_arr = nd.zeros(arg_shape[i], ctx, dtype=arg_types[i])
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/ndarray.py", line 1047, in zeros
return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
File "", line 15, in _zeros
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/_ctypes/ndarray.py", line 72, in _imperative_invoke
c_array(ctypes.c_char_p, [c_str(str(val)) for val in vals])))
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/base.py", line 85, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [17:51:56] src/c_api/c_api_ndarray.cc:390: Operator _zeros is not implemented for GPU.

Which file i should use of text recogintion

mxnet the squeeze axis in your crnn model

Hi,
I looked into your code, in your crnn.py #132, wordvec shows with a squeeze axis =1. However, your data after flatten should be (batch_size, num_filters x reduced_width x reduced_height). Although the reduced_height =1, num_filters is 512 and you use a sequence_length=25. Only sequence_length equals to the second component in the shape parameters, it can use squeeze_axis =1. I am a little confused.... Thanks for your work. Appreciate!

lstm_predictor.py ImportError: No module named mxnet_predict

train_lstm.py执行成功后，执行lstm_predictor.py失败，报错如下，想问一下：mxnet_predict是哪边的？如何安装？我在mxnet中并没有找到这个module
$ python lstm_predictor.py
Traceback (most recent call last):
File "lstm_predictor.py", line 9, in
from mxnet_predict import Predictor
ImportError: No module named mxnet_predict

image_set_path error

When i tried to run train_crnn.py i'am facing the below error

"NameError: global name 'image_set_path' is not defined"

Please can any one suggest me ?

why num_label is fixed?

Hi.
In your code the num_label is fixed, and you pad zeros for short ones. Is it necessary? if not, Does it affect the speed of training?

warp-ctc

how to replace mx.sym.WarpCTC with mx.contrib.sym.ctc_loss?

Hi,there
how to replace mx.sym.WarpCTC with mx.contrib.sym.ctc_loss?
thx!

AttributeError: module 'mxnet.symbol' has no attribute 'WarpCTC'

Hi every one
Please I am desperate
I am using google colab
I installed Wrapctc successfully and install mxnet too
but i can't find the config file to link the wrapctc with mxnet
the main question: could i do that in google colab Or should I use my Local Pycharm instead?

network configuration

When i read the code train_crnn.py, i find the network configuration is not similar with the paper proposed, for example ' relu4_1 = mx.symbol.Activation(data=batchnorm2, act_type="relu", name="relu4_1")' is not used. Is all right?

build wrong about mxnet with warpctc,could you tell me the version of your mxnet and warpctc?

begin fit
[07:30:38] /home/chang/mxnet/dmlc-core/include/dmlc/logging.h:304: [07:30:38] src/operator/./slice_channel-inl.h:198: Check failed: ishape[real_axis] == static_cast<size_t>(param_.num_outputs) (2400 vs. 80) If squeeze axis is True, the size of the sliced axis must be the same as num_outputs. Input shape=(32,2400), axis=1, num_outputs=80.

Stack trace returned 10 entries:
[bt] (0) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f2acf3787fc]
[bt] (1) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(ZNK5mxnet2op16SliceChannelProp10InferShapeEPSt6vectorIN4nnvm6TShapeESaIS4_EES7_S7+0x4c1) [0x7f2ad01f5c61]
[bt] (2) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x12af4d8) [0x7f2acffbf4d8]
[bt] (3) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x23c4bdd) [0x7f2ad10d4bdd]
[bt] (4) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x23c64d2) [0x7f2ad10d64d2]
[bt] (5) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm11ApplyPassesENS_5GraphERKSt6vectorISsSaISsEE+0x518) [0x7f2ad10c0c38]
[bt] (6) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm9ApplyPassENS_5GraphERKSs+0x8e) [0x7f2acfe6cbce]
[bt] (7) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm4pass10InferShapeENS_5GraphESt6vectorINS_6TShapeESaIS3_EESs+0x240) [0x7f2acfe6fa00]
[bt] (8) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(MXSymbolInferShape+0x329) [0x7f2acfe67899]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f2ad9b45adc]

infer_shape error. Arguments:
label: (32, 4)
l0_init_c: (32, 100)
l1_init_h: (32, 100)
l0_init_h: (32, 100)
data: (32, 2400)
l1_init_c: (32, 100)
Traceback (most recent call last):
File "lstm_ocr.py", line 210, in
epoch_end_callback = mx.callback.do_checkpoint(prefix, 1))
File "../../python/mxnet/model.py", line 782, in fit
self._init_params(data.provide_data+data.provide_label)
File "../../python/mxnet/model.py", line 502, in _init_params
arg_shapes, _, aux_shapes = self.symbol.infer_shape(**input_shapes)
File "../../python/mxnet/symbol.py", line 747, in infer_shape
res = self._infer_shape_impl(False, *args, **kwargs)
File "../../python/mxnet/symbol.py", line 871, in _infer_shape_impl
ctypes.byref(complete)))
File "../../python/mxnet/base.py", line 84, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator slicechannel0: [07:30:38] src/operator/./slice_channel-inl.h:198: Check failed: ishape[real_axis] == static_cast<size_t>(param.num_outputs) (2400 vs. 80) If squeeze axis is True, the size of the sliced axis must be the same as num_outputs. Input shape=(32,2400), axis=1, num_outputs=80.

multi GPU run，but out of bounds in the accuracy function

INFO:root:begin fit
INFO:root:Start training with [gpu(6), gpu(7)]
iter
Traceback (most recent call last):
File "testcrnn.py", line 249, in
epoch_end_callback = mx.callback.do_checkpoint(prefix, 1))
File "../../python/mxnet/model.py", line 811, in fit
sym_gen=self.sym_gen)
File "../../python/mxnet/model.py", line 259, in _train_multi_device
executor_manager.update_metric(eval_metric, data_batch.label)
File "../../python/mxnet/executor_manager.py", line 422, in update_metric
self.curr_execgrp.update_metric(metric, labels)
File "../../python/mxnet/executor_manager.py", line 274, in update_metric
metric.update(labels_slice, texec.outputs)
File "../../python/mxnet/metric.py", line 350, in update
reval = self._feval(label, pred)
File "../../python/mxnet/metric.py", line 379, in feval
return numpy_feval(label, pred)
File "testcrnn.py", line 166, in Accuracy
p.append(np.argmax(pred[k * BATCH_SIZE + i]))
IndexError: index 1600 is out of bounds for axis 0 with size 1600