Code Monkey home page Code Monkey logo

stn-ocr's Introduction

STN-OCR: A single Neural Network for Text Detection and Text Recognition

This repository contains the code for the paper: STN-OCR: A single Neural Network for Text Detection and Text Recognition

Please note that we refined our approach and released new source code. You can find the code here

Please use the new code, if you want to experiment with FSNS like data and our approach. It should also be easy to redo the text recognition experiments with the new code, although we did not release any code for that.

Structure of the repository

The folder datasets contains code related to datasets used in the paper. datasets/svhn contains several scripts that can be used to create svhn based ground truth files as used in our experiments reported in section 4.2., please see the readme in this folder on how to use the scripts. datasets/fsns contains scripts that can be used to first download the fsns dataset, second extract the images from the downloaded files and third restructure the contained gt files.

The folder mxnet contains all code used for training our networks.

Installation

In order to use the code you will need the following software environment:

  1. Install python3 (the code might work with python2, too, but this is untested)
  2. it might be a good idea to use a virtualenv
  3. install all requirements with pip install -r requirements.txt
  4. clone and install warp-ctc from here
  5. go into the folder mxnet/metrics/ctc and run python setup.py build_ext --inplace
  6. clone the mxnet repository
  7. checkout the tag v0.9.3
  8. add the warpctc plugin to the project by enabling it in the file config.mk
  9. compile mxnet
  10. install the python bindings of mxnet
  11. You should be ready to go!

Training

You can use this code to train models for three different tasks.

SVHN House Number Recognition

The file train_svhn.py is the entry point for training a network using our purpose build svhn datasets. The file as such is ready to train a network capable of finding a single house number placed randomly on an image.

Example: centered_image

In order to do this, you need to follow these steps:

  1. Download the datasets

  2. Locate the folder generated/centered

  3. open train.csv and adapt the paths of all images to the path on your machine (do the same with valid.csv)

  4. make sure to prepare your environment as described in installation

  5. start the training by issuing the following command:

    python train_svhn.py <path to train.csv> <path to valid.csv> --gpus <gpu id you want to use> --log-dir <where to save the logs> -b <batch size you want ot use> --lr 1e-5 --zoom 0.5 --char-map datasets/svhn/svhn_char_map.json

  6. Wait and enjoy.

If you want to do experiments on more challenging images you might need to update some parts of the code in train_svhn.py. The parts you might want to update are located around line 40 in this file. Here you can change the max. number of house numbers in the image (num_timesteps), the maximum number of characters per house number (labels_per_timestep), the number of rnn layers to use for predicting the localization num_rnn_layers and whether to use a blstm for predicting the localization or not use_blstm.

A quite more challenging dataset is contained in the folder medium_two_digits, or medium in the datasets folder. Example: 2_digits_more_challenge

If you want to follow our experiments with svhn numbers placed in a regular grid you'll need to do the following:

  1. Download the datasets
  2. Locate the folder generated/easy
  3. open train.csv and adapt the paths of all images to the path on your machine (do the same with valid.csv)
  4. set num_timesteps and labels_per_timestep to 4 in train_svhn.py
  5. start the training using the following command: python train_svhn.py <path to train.csv> <path to valid.csv> --gpus <gpu id you want to use> --log-dir <where to save the logs> -b <batch size you want ot use> --lr 1e-5
  6. If you are lucky it will work ;)

Text Recognition

Following our text recognition experiments might be a little difficult, because we can not offer the entire dataset used by us. But it is possible to perform the experiments based on the Synth-90k dataset provided by Jaderberg et al. here. After downloading and extracting this file you'll need to adapt the groundtruth file provided with this dataset to fit to the format used by our code. Our format is quite easy. You need to create a csv file with tabular separated values. The first column is the absolute path to the image and the rest of the line are the labels corresponding to this image.

To train the network you can use the train_text_recognition.py script. You can start this script in a similar manner to the train_svhn.py script.

FSNS

In order to redo our experiments on the FSNS dataset you need to perform the following steps:

  1. Download the fsns dataset using the download_fsns.py script located in datasets/fsns

  2. Extract the individual images using the tfrecord_to_image.py script located in datasets/fsns/tfrecord_utils (you will need to install tensorflow for doing that)

  3. Use the transform_gt.py script to transform the original fsns groundtruth, which is based on a single line to a groundtruth containing labels for each word individually. A possible usage of the transform_gt.py script could look like this:

    python transform_gt.py <path to original gt> datasets/fsns/fsns_char_map.json <path to gt that shall be generated>

  4. Because MXNet expects the blank label to be 0 for the training with CTC Loss, you have to use the swap_classes.py script in datasets/fsns and swap the class for space and blank in the gt, by issuing:

    python swap_classes.py <original gt> <swapped gt> 0 133

  5. After performing these steps you should be able to run the training by issuing:

    python train_fsns.py <path to generated train gt> <path to generated validation gt> --char-map datases/fsns/fsns_char_map.json --blank-label 0

Observing the Training Progress

We've added a nice script that makes it possible to see how well the network performs at every step of the training. This progress is normally plotted to disk for each iteration and can later on be used to create animations of the train progress (you can use the create_gif.py and create_video.py scripts located in mxnet/utils for this purpose). Besides this normal plotting to disk it is also possible to directly see this progress while the training is running. In order to see this you have to do the following:

  1. start the show_progress.py script in mxnet/utils

  2. start the training with the following additional command line params:

    --send-bboxes --ip <localhost, or remote ip if you are working on a remote machine> --port <the port the show_progress.py script is running on (default is 1337)

  3. enjoy!

This tool is especially helpful in determining whether the network is learning anything or not. We recommend that you always use this tool while training.

Evaluation

If you want to evaluate already trained models you can use the evaluation scripts provided in the mxnet folder. For evaluating a model you need to do the following:

  1. train or download a model

  2. choose the correct evaluation script an adapt it, if necessary (take care in case you are fiddling around with the amount of timesteps and number of RNN layers)

  3. Get the dataset you want to evaluate the model on and adapt the groundtruth file to fit the format expected by our software. The format expected by our software is defined as a csv (tab separated) file that looks like that: <absolute path to image> \t <numerical labels each label separated from the other by \t>

  4. run the chosen evaluation script like so

    python eval_<type>_model.py <path to model dir>/<prefix of model file> <number of epoch to test> <path to evaluation gt> <path to char map>

You can use eval_svhn_model.py for evaluating a model trained with CTC on the original svhn dataset, the eval_text_recognition_model.py script for evaluating a model trained for text recognition, and the eval_fsns_model.py for evaluating a model trained on the FSNS dataset.

License

This Code is licensed under the GPLv3 license. Please see further details in LICENSE.md.

Citation

If you are using this Code please cite the following publication:

@article{bartz2017stn,
  title={STN-OCR: A single Neural Network for Text Detection and Text Recognition},
  author={Bartz, Christian and Yang, Haojin and Meinel, Christoph},
  journal={arXiv preprint arXiv:1707.08831},
  year={2017}
}

A short note on code quality

The code contains a huge amount of workarounds around MXNet, as we were not able to find any easier way to do what we wanted to do. If you know a better way, pease let us know, as we would like to have code that is better understandable, as now.

stn-ocr's People

Contributors

bartzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stn-ocr's Issues

i read the paper and have a question: what is the order of the labels?

assume there are N lines in a image, (the order is "aaa", "bbb", "ccc"...) each have a bbox, after the LocalizationNetwork there N affine transformation matrices (maybe the order is "ccc", "bbb", "aaa"), but how to decide which is which? if don't align it, how to train it?
or if it just have a prescriptive order of.. like from top to bottom? and what will happen if the number of bbox in the image is less or more than N?

test on dataset of my own

I try to evaluate the FSNS using my own datasets. It seems that the image size must be 150*150, and there is something wrong with the .csv file, which I create as your requirement. Can you tell me the requirement of test images and give me an example of the csv file for FSNS?

train accuracy on svhn dataset doesn't improve

Dear Bartzi,

thank you for the great job you did with STN-OCR. I implemented all the steps you described and I launched the learning with train_svhn script but I observe that after 90 epochs the train-accuracy doesn't improve (always between 0.26) and the train-loss is around 2.12. I don't know what happened and how to have better performance. Please find below the command line I used:

python3 train_svhn.py ../datasets/svhn/generated/centered/train.csv ../datasets/svhn/generated/centered/valid.csv --gpus 0,1 --log-dir ./logs --save-model-prefix svhn_train_model -b 100 --lr 1e-5 --zoom 0.5 -ci 500 --char-map ../datasets/svhn/svhn_char_map.json

Best Regards,

2 details questions

@Bartzi
Sorry to bother you again, I have another 2 questions. first is still about the N, because different training images may have different length of words or characters, so will N change during trainning? When I saw the source code, I found that N was set by num_time_steps param. if N keeps the same during training, so what should we do if N is larger than the length of words or charaters? the second question is about the recognition network,When we get N text regions from the original images after the sample network, how could we find the corresponding label for different text regions during training?for example, we get 2 text regions '16', '18', and we have 2 labels '16', '18',how can we choose label ‘16’ for text regions '16' instead of '18' during the network training?
Wish your reply, Thanks.

Evaluation fail!

Hello Bartzi
I try to run evaluation SVHN and i get this error:
Screenshot from 2020-08-17 13-49-29

the command:
python eval_svhn_model.py '/home/hthai/stn-ocr/datasets/svhn/original_svhn/models/model' 0040 '/home/hthai/stn-ocr/datasets/svhn/evaluation/test.csv' '/home/hthai/stn-ocr/datasets/svhn/svhn_char_map.json'

How can i fix it ?
Thank you !

StopIteration

stn-ocr
stn-ocr

Hello dear B:
it is seems like only run 10 epochs
how could i resolve this ?!

plot_log not matched the log file format

the actual log file content has 4 columns, as following:
2017-08-25 10:54:04,722 Node[0] Epoch[0] Batch [50] Speed: 106.29 samples/sec Accuracy=0.204000 Loss=2.318700
but parse_log_file only process 3 columns.
and event_info = re.search(r'.-(?P<event_name>.)=(?P.)', info) not matched the log output, there is not a '-' before the event_name.
I correct it by modify if len(line_splits) == 3 to if len(line_splits) == 4 and erase the '.
-' but I got an error when plotting. the error was:
'for metric, axe in zip(metrics_to_plot, axes):
TypeError: zip argument #2 must support iteration'

(Pdb) print(metrics_to_plot)
['Accuracy']
(Pdb) print(axes)
Axes(0.125,0.11;0.775x0.77)

tensorflow.python.framework.errors_impl.DataLossError: truncated record at 285474855

@Bartzi , I face a question when I run tfrecord_to_image.py:
python tfrecord_to_image.py /home/HardDisk/research/Computer_Vision/OCR/stn-ocr/stn-ocr/datasets/fsns/fsns_data/train /home/HardDisk/research/Computer_Vision/OCR/stn-ocr/stn-ocr/datasets/fsns/fsns_data/fsns_data_train train

error information:
Traceback (most recent call last):
File "tfrecord_to_image.py", line 39, in
for idx, string_record in enumerate(record_iterator):
File "/home/bob/stn-ocr-py3-env/lib/python3.4/site-packages/tensorflow/python/lib/io/tf_record.py", line 77, in tf_record_iterator
reader.GetNext(status)
File "/usr/lib/python3.4/contextlib.py", line 66, in exit
next(self.gen)
File "/home/bob/stn-ocr-py3-env/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.DataLossError: truncated record at 285474855

opencv error

Excuse me.
When I compiled the mxnet project, this error message appeared:
src/io/image_aug_default.cc:499:26: error: ‘CV_BGR2HLS’ was not declared in this scope cvtColor(res, res, CV_BGR2HLS); ^ src/io/image_det_aug_default.cc:561:32: error: ‘CV_HLS2BGR’ was not declared in this scope cv::cvtColor(res, res, CV_HLS2BGR); ^ src/io/image_aug_default.cc:519:26: error: ‘CV_HLS2BGR’ was not declared in this scope cvtColor(res, res, CV_HLS2BGR); ^ src/io/image_io.cc: In function ‘void mxnet::io::ImdecodeImpl(int, bool, void*, size_t, mxnet::NDArray*)’: src/io/image_io.cc:175:28: error: ‘CV_BGR2RGB’ was not declared in this scope cv::cvtColor(dst, dst, CV_BGR2RGB); ^ Makefile:443: recipe for target 'build/src/io/image_aug_default.o' failed make: *** [build/src/io/image_aug_default.o] Error 1 make: *** Waiting for unfinished jobs.... Makefile:443: recipe for target 'build/src/io/image_det_aug_default.o' failed make: *** [build/src/io/image_det_aug_default.o] Error 1 Makefile:443: recipe for target 'build/src/io/image_io.o' failed make: *** [build/src/io/image_io.o] Error 1
So,i want to know the Python and the opencv version information.and how i should fix this question.
thanks!

Error in running eval_text_recognition.py

Hi,
I'm using the text recognition pretrained model downloaded from the website. I'm getting the following error in running this script. Any idea how to solve this?

python eval_text_recognition_model.py model-0002.params 10000 original_gt.txt model-symbol.json

Traceback (most recent call last):
  File "eval_text_recognition_model.py", line 97, in <module>
    reverse_char_map = {v: k for k, v in char_map.items()}
  File "eval_text_recognition_model.py", line 97, in <dictcomp>
    reverse_char_map = {v: k for k, v in char_map.items()}
TypeError: unhashable type: 'list'

Also, how many epochs I should set for the best results? I couldn't find it in the paper.

Thanks

compiling on Windows

Hi. I want to know can the code running on a Windows platform. Since it is not clearly declared that the code can't run on windows. I had tried to run the code, but the warp-ctc can't be compiled on Windows. How can I make it?

Install error

$ python setup.py build_ext --inplace
Warning: Extension name 'ctc_loss' does not match fully qualified name 'metrics.ctc.ctc_loss' of 'ctc_loss.pyx'
running build_ext
building 'ctc_loss' extension
gcc -pthread -B /home/jxf/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/jxf/anaconda3/lib/python3.6/site-packages/numpy/core/include -I/home/jxf/anaconda3/include/python3.6m -c ctc_loss.cpp -o build/temp.linux-x86_64-3.6/ctc_loss.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
ctc_loss.cpp:509:17: fatal error: ctc.h: 没有那个文件或目录
compilation terminated.
error: command 'gcc' failed with exit status 1

question about the N grids in paper

Hello, I have a question about the N grids in the paper. in the paper, it said that

The first is the localization network that takes the input image and
predicts N transformation matrices, that are applied to N identical grids, forming N different sampling grids

How can we know the number of N ?

train original svhn datasets

Excuse me:
I noticed that your code gives the training steps of the model on the two variant data sets of svhn, but the training steps of the model on the original svhn data set are not given. If you want the model to be trained on the original svhn data set, how should the original svhn data set be preprocessed? For example, how big should the svhn data set be resized?
Looking forward to your reply.Thank you very much!

Shape error eval_svhn_model.py for SVHN demos.

Hi, I was trying to run your demos but I only make it to work for the original_svhn model, I also tried to train one by myself but at the end it raises the same size error.

When I do:

python eval_svhn_model.py ../datasets/svhn/models/original_svhn/models/model 40 ../datasets/svhn/evaluation/test.csv ../datasets/svhn/svhn_char_map.json

Works perfect.

However, when I try:

python eval_svhn_model.py ../datasets/svhn/models/regular_grid/model 19 ../datasets/svhn/evaluation/test.csv ../datasets/svhn/svhn_char_map.json

It raises the following error, I have tried to pass a different --input-width and --input-height but it seems that the problem is not there.

[16:54:45] src/nnvm/legacy_json_util.cc:153: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[16:54:45] /home/sorelyss/Documents/test/incubator-mxnet/dmlc-core/include/dmlc/./logging.h:300: [16:54:45] src/ndarray/ndarray.cc:239: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (48,48,3,3) to.shape=(64,64,3,3)

Stack trace returned 25 entries:
[bt] (0) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fbe48041d6c]
[bt] (1) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x437) [0x7fbe48832997]
[bt] (2) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x9d853a) [0x7fbe487cf53a]
[bt] (3) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvoke+0x1034) [0x7fbe48aca674]
[bt] (4) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x1312c) [0x7fbe3b8d512c]
[bt] (5) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x140ed) [0x7fbe3b8d60ed]
[bt] (6) python(PyObject_Call+0x47) [0x5c1797]
[bt] (7) python(PyEval_EvalFrameEx+0x4ec6) [0x53bba6]
[bt] (8) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]
[bt] (9) python() [0x5406df]
[bt] (10) python(PyEval_EvalFrameEx+0x54f0) [0x53c1d0]
[bt] (11) python() [0x5406df]
[bt] (12) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]
[bt] (13) python() [0x540199]
[bt] (14) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]
[bt] (15) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]
[bt] (16) python() [0x540199]
[bt] (17) python(PyEval_EvalCode+0x1f) [0x540e4f]
[bt] (18) python() [0x60c272]
[bt] (19) python(PyRun_FileExFlags+0x9a) [0x60e71a]
[bt] (20) python(PyRun_SimpleFileExFlags+0x1bc) [0x60ef0c]
[bt] (21) python(Py_Main+0x456) [0x63fb26]
[bt] (22) python(main+0xe1) [0x4cfeb1]
[bt] (23) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbe53795830]
[bt] (24) python(_start+0x29) [0x5d6049]

Traceback (most recent call last):
  File "eval_svhn_model.py", line 109, in <module>
    model = get_model(args, data_shape, output_size)
  File "eval_svhn_model.py", line 58, in get_model
    model.set_params(arg_params, aux_params)
  File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/base_module.py", line 557, in set_params
    allow_missing=allow_missing, force_init=force_init)
  File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/module.py", line 261, in init_params
    _impl(name, arr, arg_params)
  File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/module.py", line 251, in _impl
    cache_arr.copyto(arr)
  File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/ndarray.py", line 556, in copyto
    return _internal._copyto(self, out=other)
  File "mxnet/cython/ndarray.pyx", line 167, in ndarray._make_ndarray_function.generic_ndarray_function
  File "mxnet/cython/./base.pyi", line 36, in ndarray.CALL
mxnet.base.MXNetError: b'[16:54:45] src/ndarray/ndarray.cc:239: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (48,48,3,3) to.shape=(64,64,3,3)\n\nStack trace returned 25 entries:\n[bt] (0) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fbe48041d6c]\n[bt] (1) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x437) [0x7fbe48832997]\n[bt] (2) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x9d853a) [0x7fbe487cf53a]\n[bt] (3) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvoke+0x1034) [0x7fbe48aca674]\n[bt] (4) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x1312c) [0x7fbe3b8d512c]\n[bt] (5) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x140ed) [0x7fbe3b8d60ed]\n[bt] (6) python(PyObject_Call+0x47) [0x5c1797]\n[bt] (7) python(PyEval_EvalFrameEx+0x4ec6) [0x53bba6]\n[bt] (8) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]\n[bt] (9) python() [0x5406df]\n[bt] (10) python(PyEval_EvalFrameEx+0x54f0) [0x53c1d0]\n[bt] (11) python() [0x5406df]\n[bt] (12) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]\n[bt] (13) python() [0x540199]\n[bt] (14) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]\n[bt] (15) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]\n[bt] (16) python() [0x540199]\n[bt] (17) python(PyEval_EvalCode+0x1f) [0x540e4f]\n[bt] (18) python() [0x60c272]\n[bt] (19) python(PyRun_FileExFlags+0x9a) [0x60e71a]\n[bt] (20) python(PyRun_SimpleFileExFlags+0x1bc) [0x60ef0c]\n[bt] (21) python(Py_Main+0x456) [0x63fb26]\n[bt] (22) python(main+0xe1) [0x4cfeb1]\n[bt] (23) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbe53795830]\n[bt] (24) python(_start+0x29) [0x5d6049]\n'

Training does not end.

I have issued the command for training (svhn) as per the instructions. It does not progress at all.
##########################################################################
Command : python train_svhn.py /home/aditya/stn-ocr/generated/centered/train.csv /home/aditya/stn-ocr/generated/centered/valid.csv --log-dir /home/aditya/stn-ocr -b 400 --lr 1e-5

/home/aditya/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
loading data
2018-10-29 13:53:20,201 Node[0] start with arguments Namespace(batch_size=400, blank_label=0, char_map=None, checkpoint_interval=None, eval_image=None, fix_loc=False, gif=False, gpus=None, ip=None, kv_store='local', load_epoch=None, log_dir='/home/aditya/stn-ocr/2018-10-29T13:53:16.415078_training', log_file='/home/aditya/stn-ocr/2018-10-29T13:53:16.415078_training/log', log_level='INFO', log_name='training', lr=1e-05, lr_factor=1, lr_factor_epoch=1, model_prefix=None, num_epochs=10, plot_network_graph=False, port=1337, progressbar=False, save_model_prefix=None, send_bboxes=False, train_file='/home/aditya/stn-ocr/generated/centered/train.csv', val_file='/home/aditya/stn-ocr/generated/centered/valid.csv', video=False, zoom=0.9)
2018-10-29 13:53:20,202 Node[0] EPOCH SIZE: 250
2018-10-29 13:53:20,226 Node[0] Start training with [cpu(0)]

############################################################################

It stops right there. No progress.

no file /train/0.png

hello dear Bartzi
FileNotFoundError: [Errno 2] No such file or directory: '/train/0.png'
what i'm supposed to do with that

Training

Can u tell me exact steps to train the model?
with all the datasets and upto what extent it should be trained along with learning rates and all...plz help me put brother

I have 3 Question

Q1 : Do we need to use "eval_text_recognition_model.py" file to perform the text recognition ?
Q2 : Can you guys provide us with a pre-trained model?
Q3 : Is this system capable to recognizing a single text in an image or a line of text containing multiple characters?

About LSTM in loc-net

  1. Why lstm is used in loc-net ?
    i saw in the paper: "This BLSTM is used to generate the hidden states hn, which in turn are used to predict the affine transformation matrices". Why not directly use Flattened feature to predict the output affine transformation matrices.
  2. Why lstm input is same?
    In the code, the Flattened feature is copied for num_timestep times as for num_timestep inputs of lstm, these features are totally the same, why design it in such way? And if so, the diverse direction in blstm should be useless.
  3. How to choose output matrices.
    If bbox is less than num_timestep, then how do i find which affine transformation matrices is the perferred bbox parameters.
    can you explain it to me? I am a little bit confused about the paper!

Error in insallation: /usr/bin/ld: cannot find -lwarpctc

Hello @Bartzi

I fallowed the installation instruction and done up-to step 4 (i.e insall warp-ctc from here).

When I ran python setup.py build_ext --inplace initially it gave me
ctc_loss.cpp:509:17: fatal error: ctc.h: No such file or directory similar to this

When I added the include folder from warp-ctc to the $CPLUS_INCLUDE_PATH, that is gone.

But, now it is giving me this error : /usr/bin/ld: cannot find -lwarpctc

Here is the full stack trace, if that helps.

Can you please give more details about how to install warp-ctc and how to verify its installed correctly(w.r.t this project).

Some details about environment: Ubuntu 16.04, no GPU, python 3.5.2. LMK if any other details will be useful to resolve it.

Thanks.

load pretrained model error

Hello, I encountered another problem when loading the pre-trained model, as shown in the following figure:
微信图片_20210809205306
微信图片_20210809205256

When calling the python svhn_train.py --model_prefix provided by you, the responding model file cannot always be found, but I switched the directory to this directory and found that there is a responding model file in this directory, so it is strange.
The only difference between me and you is that my mxnet version is 1.0.0 instead of 0.9.3, but I think the functions of the two versions of the model load should be the same, and it will not cause the error.
In addition, I would like to ask you, is there a difference between see-ocr and stn-ocr? Are the two models exactly the same? What is the difference between the two?

Looking forward to your reply!

Cannot train on fsns data set

Hi Bartzi:
i try to use train_fsns.py to train on fsns data set.
i get the following error messages:
if i use args, --eval-image
screenshot from 2017-09-29 16-23-47
i don't use args, --eval-image. i just load train_file and val_file
screenshot from 2017-09-29 16-28-11

my development's environment:
ubuntu 16.04
cuda 8.0
cudnn 5.0
mxnet 0.9.3

Thank you very much.

purpose built svhn dataset

The purpose built svhn dataset link in the readme leads to:

You will shortly be able to see all information about our new paper
"SEE: Towards Semi-Supervised End-to-End Scene Text Recognition".

Where is the dataset itself?

Stop Iteration exception raised while training.

While running the train_fsns.py file for a few samples in the train data, i'm getting a Stop Iteration exception. It is being raised when lstm_iter.py module is being called in the line 123 through the command first_batch = next(iter(val_iter)).
exception

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.