Code Monkey home page Code Monkey logo

cnn-lstm-ctc-text-recognition's Introduction

CNN-LSTM-CTC text recognition

I realize three different models for text recognition, and all of them consist of CTC loss layer to realize no segmentation for text images.

Disclaimer

I refer to the official mxnet warpctc example here.

Getting started

  • Build MXNet with Baidu Warp CTC, and please following this instructions here.

When I use this official instructions to add Baidu Warp CTC to Mxnet, there are some errors because the latest version of Baidu Warp CTC has complicts with mxnet. Recently, I see someone has already solved this problem and updated the official mxnet warpctc example. However, if you still have problem, please refer to this issue here.

Generating data

Run generate_data.py in generate_data. When generating training and test data, please remember to change output path and number in generate_data.py (I will update a more friendly way to generate training and test data when I have free time).

Train the model

I realize three different models for text recognition, you can check them in symbol:

  1. LSTM + CTC;
  2. Bidirection LSTM + CTC;
  3. CNN (a modified model similiar to VGG) + Bidirection LSTM + CTC. Disclaimer: This CNN + LSTM + CTC model is a re-implementation of original CRNN which is based on torch. The official repository is available here. The arxiv paper is available here.
  • Start training:

LSTM + CTC:

python train_lstm.py

Bidirection LSTM + CTC:

python train_bi_lstm.py

CNN + Bidirection LSTM + CTC:

python train_crnn.py

Prediction

You can do the prediction using your trained model. I only write the predictors for model 1 and model 3, but it is very easy to write the predictor for model 2 when referring to the examples.

Plesae run:

python lstm_predictor.py

or

python crnn_predictor.py

cnn-lstm-ctc-text-recognition's People

Contributors

oyxhust avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnn-lstm-ctc-text-recognition's Issues

ImportError: No module named mxnet

I have used your version of mxnet and warpctc downloaded from your BaiduYun. I built mxnet and warpctc successfully. Also, I enabled warpctc in mxnet. However, when I run the train_lstm.py, the problem went that "import mxnet as mx" ImportError: No module named mxnet. Then I realized that i should add mxnet path in the environment. I add "export PYTHONPATH=~/mxnet/python" at the end of .bashrc. When I echo $PATH in the command line, the result shows that /home/qian/mxnet/python is successfully added. However, when i run the train_lstm.py again, the problem still existed. How can I solve this problem? Thank you!

Have a problem in mxnet WarpCTC

When I run this program in lstm.py file have a error, How can i deal with. Thanks
sm = mx.sym.WarpCTC(data=pred, label=label, label_length = num_label, input_length = seq_len)

D:\DeepLearning_Demo\CNN_LSTM_CTC\CNN-LSTM-CTC-text-recognition-master>python train_lstm.py
Traceback (most recent call last):
File "train_lstm.py", line 196, in
symbol = sym_gen(SEQ_LENGTH)
File "train_lstm.py", line 187, in sym_gen
num_label = num_label)
File "D:\DeepLearning_Demo\CNN_LSTM_CTC\CNN-LSTM-CTC-text-recognition-master\symbol\lstm.py", line 79, in lstm_unroll
sm = mx.sym.WarpCTC(data=pred, label=label, label_length = num_label, input_length = seq_len)
AttributeError: module 'mxnet.symbol' has no attribute 'WarpCTC'

Why is the input size to warpctc num_classes?

 Thank you for your job.
  I found the last layers, the input layer to warpCTC is as following:

hidden_concat = mx.sym.Concat(*hidden_all, dim=0) pred = mx.sym.FullyConnected(data=hidden_concat, num_hidden=num_classes)

So the input size to warpctc is num_classes, Is it a small number ?

DeprecationWarning: mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.

Is the new version of mxnet changed the functions?
python train_bi_lstm.py
train_bi_lstm.py:204: DeprecationWarning: mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.
initializer=mx.init.Xavier(factor_type="in", magnitude=2.34))
INFO:root:begin fit
/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/model.py:530: DeprecationWarning: Calling initializer with init(str, NDArray) has been deprecated.please use init(mx.init.InitDesc(...), NDArray) instead.
self.initializer(k, v)
INFO:root:Start training with [gpu(0)]
[17:51:56] src/c_api/c_api_ndarray.cc:133: GPU support is disabled. Compile MXNet with USE_CUDA=1 to enable GPU support.
[17:51:56] /disk/data/mxnet/dmlc-core/include/dmlc/logging.h:304: [17:51:56] src/c_api/c_api_ndarray.cc:390: Operator _zeros is not implemented for GPU.

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f45d3c4b55c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(Z20ImperativeInvokeImplRKN5mxnet7ContextERKN4nnvm9NodeAttrsEPSt6vectorINS_7NDArrayESaIS8_EESB+0x9ac) [0x7f45d4915d8c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x254) [0x7f45d49162a4]
[bt] (3) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f45d713fadc]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc) [0x7f45d713f40c]
[bt] (5) /data/tensorflow/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48e) [0x7f45d73565fe]
[bt] (6) /data/tensorflow/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x15f9e) [0x7f45d7357f9e]
[bt] (7) python(PyEval_EvalFrameEx+0x98d) [0x5244dd]
[bt] (8) python(PyEval_EvalCodeEx+0x2b1) [0x555551]
[bt] (9) python(PyEval_EvalFrameEx+0x1a10) [0x525560]

Traceback (most recent call last):
File "train_bi_lstm.py", line 222, in
epoch_end_callback = mx.callback.do_checkpoint(prefix, 1))
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/model.py", line 830, in fit
sym_gen=self.sym_gen)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/model.py", line 210, in _train_multi_device
logger=logger)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/executor_manager.py", line 326, in init
self.slices, train_data)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/executor_manager.py", line 238, in init
input_types=data_types)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/executor_manager.py", line 152, in _bind_exec
arg_arr = nd.zeros(arg_shape[i], ctx, dtype=arg_types[i])
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/ndarray.py", line 1047, in zeros
return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
File "", line 15, in _zeros
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/_ctypes/ndarray.py", line 72, in _imperative_invoke
c_array(ctypes.c_char_p, [c_str(str(val)) for val in vals])))
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/base.py", line 85, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [17:51:56] src/c_api/c_api_ndarray.cc:390: Operator _zeros is not implemented for GPU.

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f45d3c4b55c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(Z20ImperativeInvokeImplRKN5mxnet7ContextERKN4nnvm9NodeAttrsEPSt6vectorINS_7NDArrayESaIS8_EESB+0x9ac) [0x7f45d4915d8c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x254) [0x7f45d49162a4]
[bt] (3) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f45d713fadc]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc) [0x7f45d713f40c]
[bt] (5) /data/tensorflow/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48e) [0x7f45d73565fe]
[bt] (6) /data/tensorflow/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x15f9e) [0x7f45d7357f9e]
[bt] (7) python(PyEval_EvalFrameEx+0x98d) [0x5244dd]
[bt] (8) python(PyEval_EvalCodeEx+0x2b1) [0x555551]
[bt] (9) python(PyEval_EvalFrameEx+0x1a10) [0x525560]

mxnet the squeeze axis in your crnn model

Hi,
I looked into your code, in your crnn.py #132, wordvec shows with a squeeze axis =1. However, your data after flatten should be (batch_size, num_filters x reduced_width x reduced_height). Although the reduced_height =1, num_filters is 512 and you use a sequence_length=25. Only sequence_length equals to the second component in the shape parameters, it can use squeeze_axis =1. I am a little confused.... Thanks for your work. Appreciate!

lstm_predictor.py ImportError: No module named mxnet_predict

train_lstm.py执行成功后,执行lstm_predictor.py失败,报错如下,想问一下:mxnet_predict是哪边的?如何安装?我在mxnet中并没有找到这个module
$ python lstm_predictor.py
Traceback (most recent call last):
File "lstm_predictor.py", line 9, in
from mxnet_predict import Predictor
ImportError: No module named mxnet_predict

image_set_path error

When i tried to run train_crnn.py i'am facing the below error

"NameError: global name 'image_set_path' is not defined"

Please can any one suggest me ?

why num_label is fixed?

Hi.
In your code the num_label is fixed, and you pad zeros for short ones. Is it necessary? if not, Does it affect the speed of training?

AttributeError: module 'mxnet.symbol' has no attribute 'WarpCTC'

Hi every one
Please I am desperate
I am using google colab
I installed Wrapctc successfully and install mxnet too
but i can't find the config file to link the wrapctc with mxnet
the main question: could i do that in google colab Or should I use my Local Pycharm instead?

network configuration

When i read the code train_crnn.py, i find the network configuration is not similar with the paper proposed, for example ' relu4_1 = mx.symbol.Activation(data=batchnorm2, act_type="relu", name="relu4_1")' is not used. Is all right?

build wrong about mxnet with warpctc,could you tell me the version of your mxnet and warpctc?

begin fit
[07:30:38] /home/chang/mxnet/dmlc-core/include/dmlc/logging.h:304: [07:30:38] src/operator/./slice_channel-inl.h:198: Check failed: ishape[real_axis] == static_cast<size_t>(param_.num_outputs) (2400 vs. 80) If squeeze axis is True, the size of the sliced axis must be the same as num_outputs. Input shape=(32,2400), axis=1, num_outputs=80.

Stack trace returned 10 entries:
[bt] (0) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f2acf3787fc]
[bt] (1) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(ZNK5mxnet2op16SliceChannelProp10InferShapeEPSt6vectorIN4nnvm6TShapeESaIS4_EES7_S7+0x4c1) [0x7f2ad01f5c61]
[bt] (2) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x12af4d8) [0x7f2acffbf4d8]
[bt] (3) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x23c4bdd) [0x7f2ad10d4bdd]
[bt] (4) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x23c64d2) [0x7f2ad10d64d2]
[bt] (5) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm11ApplyPassesENS_5GraphERKSt6vectorISsSaISsEE+0x518) [0x7f2ad10c0c38]
[bt] (6) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm9ApplyPassENS_5GraphERKSs+0x8e) [0x7f2acfe6cbce]
[bt] (7) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm4pass10InferShapeENS_5GraphESt6vectorINS_6TShapeESaIS3_EESs+0x240) [0x7f2acfe6fa00]
[bt] (8) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(MXSymbolInferShape+0x329) [0x7f2acfe67899]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f2ad9b45adc]

infer_shape error. Arguments:
label: (32, 4)
l0_init_c: (32, 100)
l1_init_h: (32, 100)
l0_init_h: (32, 100)
data: (32, 2400)
l1_init_c: (32, 100)
Traceback (most recent call last):
File "lstm_ocr.py", line 210, in
epoch_end_callback = mx.callback.do_checkpoint(prefix, 1))
File "../../python/mxnet/model.py", line 782, in fit
self._init_params(data.provide_data+data.provide_label)
File "../../python/mxnet/model.py", line 502, in _init_params
arg_shapes, _, aux_shapes = self.symbol.infer_shape(**input_shapes)
File "../../python/mxnet/symbol.py", line 747, in infer_shape
res = self._infer_shape_impl(False, *args, **kwargs)
File "../../python/mxnet/symbol.py", line 871, in _infer_shape_impl
ctypes.byref(complete)))
File "../../python/mxnet/base.py", line 84, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator slicechannel0: [07:30:38] src/operator/./slice_channel-inl.h:198: Check failed: ishape[real_axis] == static_cast<size_t>(param
.num_outputs) (2400 vs. 80) If squeeze axis is True, the size of the sliced axis must be the same as num_outputs. Input shape=(32,2400), axis=1, num_outputs=80.

Stack trace returned 10 entries:
[bt] (0) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f2acf3787fc]
[bt] (1) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(ZNK5mxnet2op16SliceChannelProp10InferShapeEPSt6vectorIN4nnvm6TShapeESaIS4_EES7_S7+0x4c1) [0x7f2ad01f5c61]
[bt] (2) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x12af4d8) [0x7f2acffbf4d8]
[bt] (3) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x23c4bdd) [0x7f2ad10d4bdd]
[bt] (4) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(+0x23c64d2) [0x7f2ad10d64d2]
[bt] (5) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm11ApplyPassesENS_5GraphERKSt6vectorISsSaISsEE+0x518) [0x7f2ad10c0c38]
[bt] (6) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm9ApplyPassENS_5GraphERKSs+0x8e) [0x7f2acfe6cbce]
[bt] (7) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm4pass10InferShapeENS_5GraphESt6vectorINS_6TShapeESaIS3_EESs+0x240) [0x7f2acfe6fa00]
[bt] (8) /home/chang/mxnet/python/mxnet/../../lib/libmxnet.so(MXSymbolInferShape+0x329) [0x7f2acfe67899]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f2ad9b45adc]

multi GPU run,but out of bounds in the accuracy function

INFO:root:begin fit
INFO:root:Start training with [gpu(6), gpu(7)]
iter
Traceback (most recent call last):
File "testcrnn.py", line 249, in
epoch_end_callback = mx.callback.do_checkpoint(prefix, 1))
File "../../python/mxnet/model.py", line 811, in fit
sym_gen=self.sym_gen)
File "../../python/mxnet/model.py", line 259, in _train_multi_device
executor_manager.update_metric(eval_metric, data_batch.label)
File "../../python/mxnet/executor_manager.py", line 422, in update_metric
self.curr_execgrp.update_metric(metric, labels)
File "../../python/mxnet/executor_manager.py", line 274, in update_metric
metric.update(labels_slice, texec.outputs)
File "../../python/mxnet/metric.py", line 350, in update
reval = self._feval(label, pred)
File "../../python/mxnet/metric.py", line 379, in feval
return numpy_feval(label, pred)
File "testcrnn.py", line 166, in Accuracy
p.append(np.argmax(pred[k * BATCH_SIZE + i]))
IndexError: index 1600 is out of bounds for axis 0 with size 1600

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.