bgshih / crnn Goto Github PK

Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.

License: MIT License

Lua 67.78% Shell 8.03% CMake 1.58% C++ 18.48% Python 4.13%

torch7 sequence-recognition computer-vision machine-learning ocr

crnn's Introduction

Convolutional Recurrent Neural Network

This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. For details, please refer to our paper http://arxiv.org/abs/1507.05717.

UPDATE Mar 14, 2017 A Docker file has been added to the project. Thanks to @varun-suresh.

UPDATE May 1, 2017 A PyTorch port has been made by @meijieru.

UPDATE Jun 19, 2017 For an end-to-end text detector+recognizer, check out the CTPN+CRNN implementation by @AKSHAYUBHAT.

Build

The software has only been tested on Ubuntu 14.04 (x64). CUDA-enabled GPUs are required. To build the project, first install the latest versions of Torch7, fblualib and LMDB. Please follow their installation instructions respectively. On Ubuntu, lmdb can be installed by apt-get install liblmdb-dev.

To build the project, go to src/ and execute sh build_cpp.sh to build the C++ code. If successful, a file named libcrnn.so should be produced in the src/ directory.

Run demo

A demo program can be found in src/demo.lua. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7 into directory model/crnn_demo/. Then launch the demo by:

th demo.lua

The demo reads an example image and recognizes its text content.

Example image:

Expected output:

Loading model...
Model loaded from ../model/crnn_demo/model.t7
Recognized text: available (raw: a-----v--a-i-l-a-bb-l-e---)

Another example:

Recognized text: shakeshack (raw: ss-h-a--k-e-ssh--aa-c--k--)

Use pretrained model

The pretrained model can be used for lexicon-free and lexicon-based recognition tasks. Refer to the functions recognizeImageLexiconFree and recognizeImageWithLexicion in file utilities.lua for details.

Train a new model

Follow the following steps to train a new model on your own dataset.

Create a new LMDB dataset. A python program is provided in tool/create_dataset.py. Refer to the function createDataset for details (need to pip install lmdb first).
Create model directory under model/. For example, model/foo_model. Then create configuraton file config.lua under the model directory. You can copy model/crnn_demo/config.lua and do modifications.
Go to src/ and execute th main_train.lua ../models/foo_model/. Model snapshots and logging file will be saved into the model directory.

Build using docker

Install docker. Follow the instructions here
Install nvidia-docker - Follow the instructions here
Clone this repo, from this directory run docker build -t crnn_docker .
Once the image is built, the docker can be run using nvidia-docker run -it crnn_docker.

Citation

Please cite the following paper if you are using the code/model in your research paper.

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Acknowledgements

The authors would like to thank the developers of Torch7, TH++, lmdb-lua-ffi and char-rnn.

Please let me know if you encounter any issues.

crnn's People

Contributors

Stargazers

Watchers

Forkers

tianyaqu realzheng elviswf michalbusta geevi moraval fireae caomw optimus1009 tybxiaobao 1461190388 liyong3forever xyy19920105 yingxiaosan silunwang terrencew sagisaga wanjinchang nieshaoshuai hitwsl mldl fangzheng354 lesliekim simmoncn qingsong99 shannonyu mengyuanhuang1212 matrixping yanyongluan lvpengyuan oneofthepeople hyzcn hn18001 xifengbishu catwang laskarcyber mvpduncan hs105 eriche2016 ronnie-tian shengyudingli ilovecv michaelbbtiger anupamaray zhangxujinsh hayesben livst kyocen lipond ericustc bityangke linjm dqj ismymajia winjia shaoxuan92 dengcy028 zimuw rremani jayhello deepinfinity wxinlong kholohan codeaudit liviust li099 zaheersm 0x454447415244 wenyafei4 viet-nguyen rongyousu tacitadeplata varun-suresh deeplearningsprint dreadlord1984 zgsxwsdxg lisaflyz yemika kathrine94 guillermogsjc zhangxinnan saincogt ly774508966 l1aoxingyu zilongzhong wyw636 kdnarwani liuwenran benakiva shlguagua cc13ny 6676401088 tobechao amos-zq jianweilin lyk125 yipeng-sun michael-chiang-mc5 clscy songyandong

crnn's Issues

The recognition result is very bad,such as the following iamges

result is like that:
Recognized text: vaossyoba (raw: v--a----o-ss-s--y-o-b--a--)

and many simple image ,the result is very bad? anyone can tell me why?

utilities.lua:253 assertion failed. assert(#state.bnVars == #stateToLoad.bnVars)

when I do "th demo.lua", I encountered this error.I added "print(#state.bnVars)" funding it equals to 3. Also I "print(#stateToLoad.bnVars)" and it is 6. I don't know how to do about this error.

Issues faced when changing Image Height from 32

Hello,
I wanted to tweak the dimensions of image. Till now I am able to change the Image width but the program fails to run when I change image Height.
Places where I alter the code:

demo.lua "local imgW, imgH = 200, 32"
DatasetLmdb.lua "function DatasetLmdb:nextBatch()"
DatasetLmdb.lua "function DatasetLmdb:allImageLabel(nSampleMax)"

Please advise how to make it work and change the image height.

P.S:A sincere thanks for writing such a wonderfully thought program.

Is the Pytorch version possible?

This project is great, but is not easy to use (lua language and some libs...), is it possible to use pytorch to translate this version?

Can I increase the model nClasses ?

what is the nClasses max values? Can I set it to 1000?

请问代码中到底有没有限制长度

function loadAndResizeImage(imagePath)
    local img = Image.load(imagePath, 3, 'byte')
    img = Image.rgb2y(img)
    img = Image.scale(img, 100, 32)[1]
    return img
end

您好：
　　　在这个函数中图片被scale了定长定宽，貌似与论文中定宽不定长不符？

Number of training steps for results from paper

Hi,

I was wondering for how many training steps did you train your model to gain the accuracy from the paper. You mentioned that it takes about 50 hours to train the model (on Tesla K40). I wanted to replicate the results, but I stumbled on much longer training time (about 260 hours), that would be required for completing the 2000000 iterations (maxIterations) that are used in config.lua. I'm using Tesla K20, but I don't think the difference should be so big.

Can I train the Chinese character?

Does anyone use the crnn to train chinese characters?

Problem with running docker

I was building docker image with
docker build -t crnn_docker .
But I am getting the following error. I checked that the file is not busy using lsof.
I am running Ubuntu 16.04

echo
echo 'Installing TH++'
echo
cd /tmp/fblualib-build.gMtGOu/thpp/thpp
'[' 0 -eq 0 ']'
mv /root/thpp_build.sh build.sh
chmod +x build.sh
./build.sh
./install_all.sh: ./build.sh: /bin/bash: bad interpreter: Text file busy
The command '/bin/sh -c ./install_all.sh' returned a non-zero code: 126

why don't answer me ?

In the readme file, you saw "Please let me know if you encounter any issues."
but indeed no reply.

fblualib version

Hello,

thanks for sharing CRNN!
Unfortunately installing fblualib is a bit of a pain. We're trying to install it on Ubuntu 14.04.
Could you tell me which version of fblualib and dependencies (folly, wangle, thrift, thpp) you are using? I think using exactly the same versions as you did would be the easiest way to get everything running.

Thanks in advance
Harald

make error

/home/bbnc/text/crnn/src/cpp/ctc.cpp:153:73: error: invalid initialization of reference of type ‘const thpp::Tensor&’ from expression of type ‘thpp::TensorBase<double, thpp::Storage, thpp::Tensor >::Ptr {aka thpp::TensorPtr<thpp::Tensor >}’
const thpp::Tensor& input = fblualib::luaGetTensorChecked(L, 1);
^
make[2]: *** [CMakeFiles/crnn.dir/ctc.cpp.o] Error 1
make[1]: *** [CMakeFiles/crnn.dir/all] Error 2
make: *** [all] Error 2
cp: cannot stat ‘*.so’: No such file or directory

when I run ./build_cpp.sh , I meet this error, is there anyone know how to solve?

is the ctc implementation gpu runnable?

Pretrained model training set contains IIIT5k images too?

I ran the demo codes using the pretrained model and I seem to be getting around 86% word-level accuracy on the IIIT5k dataset whereas the paper suggests ~80% accuracy. Were there any other pre/post processing steps involved in training the pretrained-model provided?

对于CTC代码中fvars和bvars的计算问题

假设给定logsoftmax如下：

在CTC代码中计算fvars时：
fvars.at({0, 0}) = input_i.at({0, 0}); if (nSegment > 1) { fvars.at({0, 1}) = input_i.at({0, targetData_i[0]}); }
在使用您的代码计算时，输出：

而按照代码理解，我认为输出应该为：

我所不明白的是：
1）为何计算出的fvars({0,0})和fvars（{0,1}）为0？

非常感谢。

increasing variable length

hi, i've been trying to modify this architecture/model to recognize up to 40-50 length character sequences to no avail. i've notice that "maxT= 26" sets the max length to 26, but a simple change does not do the trick. any suggestions on where to make changes?

Only 62% accuracy.How to make it better?

I train a new model and use crnn_demo_model.t7 as a pretrained model.Now iteration is 30000,train loss is 0.0005,but test loss is 4.6.And accuracy is around 62% for a long time.
Is anything wrong with it?
What can i do to make it better?
@bgshih

How to create the user DB file?

what is the type ( or format ) of the paramater "labelList" in function createDataset? Are the several labels of one image separated by space? Is the labelList "1 2 3" corresponds to the image that contain charactors "123" in line?

Query - Why is it necessary to have images of fixed width

@bgshih In the paper it is mentioned that images are resized to a height of 32 while width is adjusted by keeping the original aspect ratio. But I see that the code resizes all images to a fixed size 32*100.
img = Image.scale(img, 100, 32)[1] in function loadAndResizeImage

Is it necessary to do so when the architecture supports variable length sequences.

model:add(cudnn.SpatialMaxPooling(2, 2, 1, 2, 1, 0))?

It is mentioned that "In the 3rd and the 4th maxpooling layers, we adopt 1 × 2 sized rectangular pooling
windows instead of the conventional squared ones" in your paper.Should I change model:add(cudnn.SpatialMaxPooling(2, 2, 1, 2, 1, 0)) to model:add(cudnn.SpatialMaxPooling(1, 2, 1, 2, 1, 0)).Could it make better?

Any chance a Dockerfile could be added?

Any particular reason for working on Grayscale images? Why not RGB?

Hi,

I have made changes in crnn to make it work with RGB images but I am curious to know why did the author write it in a way to work on Grayscale images only? Is it because they achieved more accuracy on Grayscale as compared to RGB images?
I will be doing some tests to verify it myself, but just wanted to know if anyone here can share their views/experiences about the same.

The Lua version you depend on is 5.1? the c++ encapsulation for lua seems differ from v5.1 to v5.2 ?

What version of this program does it work with?(32bit or 64bit)

I am trying to use Unicode to learn more than 10,000 characters, but I still get an error message saying that the memory is insufficient. I tracked memory usage so I could see that it stopped at 3.3G. Does this program only work on 32bit? Or did I make a mistake?

Make error

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
THPP_LIBRARY
linked by target "crnn" in directory /home/ce/Documents/crnn/src/cpp

-- Configuring incomplete, errors occurred!
See also "/home/ce/Documents/crnn/src/cpp/build/CMakeFiles/CMakeOutput.log".
make: *** No targets specified and no makefile found. Stop.
cp: cannot stat ‘*.so’: No such file or directory

Training of the provided model

Hi @bgshih,

I wanted to ask you, how did you train the provided model. It's performance on ICDAR 2013 is pretty good (88,7% accuracy). When I tried to train my own model based on the Jaderbergs synthetic model I only achieved the accuracy of 87%).

Best.

Compilation error with the current TH++

Hi, I get compilation errors when building the cpp part. I suspect that I have a wrong version of TH++ but there does not seem to be a better candidate in the thpp project history (v1.0 seems too old).

Also, I'm building it on a newer Ubuntu (15.10), but I don't see how this could cause the following compilation errors. I tried both g++ version 4.9 and 5.2. Any hint appreciated!

~/crnn/src$ ./build_cpp.sh -- The C compiler identification is GNU 4.9.3
-- The CXX compiler identification is GNU 4.9.3
[...]
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Configuring done
-- Generating done
-- Build files have been written to: /home/alena/crnn/src/cpp/build
Scanning dependencies of target crnn
[ 50%] Building CXX object CMakeFiles/crnn.dir/init.cpp.o
[100%] Building CXX object CMakeFiles/crnn.dir/ctc.cpp.o
/home/alena/crnn/src/cpp/ctc.cpp: In instantiation of ‘int {anonymous}::forwardBackward(lua_State_) [with T = float; lua_State = lua_State]’:
/home/alena/crnn/src/cpp/ctc.cpp:194:16: required from ‘const luaL_Reg {anonymous}::Registerer::functions_ [3]’
/home/alena/crnn/src/cpp/ctc.cpp:203:44: required from ‘static void {anonymous}::Registerer::registerFunctions(lua_State_) [with T = float; lua_State = lua_State]’
/home/alena/crnn/src/cpp/ctc.cpp:210:24: required from here
/home/alena/crnn/src/cpp/ctc.cpp:22:76: error: conversion from ‘thpp::TensorBase<float, thpp::Storage, thpp::Tensor >::Ptr {aka thpp::TensorPtrthpp::Tensor}’ to non-scalar type ‘const thpp::Tensor’ requested
const thpp::Tensor input = fblualib::luaGetTensorChecked(L, 1);
^
/home/alena/crnn/src/cpp/ctc.cpp:23:78: error: conversion from ‘thpp::TensorBase<int, thpp::Storage, thpp::Tensor >::Ptr {aka thpp::TensorPtrthpp::Tensor}’ to non-scalar type ‘const thpp::Tensor’ requested
const thpp::Tensor targets = fblualib::luaGetTensorChecked(L, 2);
^

CMake Error THC_LIBRARY notfound

There is an erro when i execute sh build_cpp.sh .THC_LIBRARY NOTFOUND?
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
THC_LIBRARY
linked by target "crnn" in directory /root/crnn/src/cpp

-- Configuring incomplete, errors occurred!
See also "/root/crnn/src/cpp/build/CMakeFiles/CMakeOutput.log".

I would like to know what is thc_library?where could i get thc_library?

some errors in create_dataset.py

There are some typos in create_dataset.py.

line 60:
it should be "if cnt % 1000 == 0"

line 66:
When I run the code in ubuntu 14.04, I get the error as follows:

Traceback (most recent call last):
File "create_dataset.py", line 83, in
createDataset(outPath, imagePathList, labelList)
File "create_dataset.py", line 67, in createDataset
writeCache(env, cache)
File "create_dataset.py", line 21, in writeCache
txn.put(k, v)
TypeError: expected a readable buffer object

I think that line 66 should be revised to " cache['num-samples'] = str(nSamples)", then the code works.

The issue of change your code?

I want to change you code to train rgb image?
In the below image, I guess the 1 means gray image?
But the image to train, it's width and height does not indicate ?
If I change the image to rgb , I change
" local nIn = nm[i-1] or 1 => local nIn = nm[i-1] or 3 " is that?

Confidence score of OCR

Hi,

Is there any way we can get some kind of confidence score for the OCR being done by CRNN?
If it is not there then can someone suggest a possible way to do it?

Thanks

THC_LIBRARY not found

My Torch7 installation is at ${HOME}/torch. After I install the fblualib , I try to make crnn. But the following mistake occurs. I have no idea what is THC_LIBRARY. Can anyone help me?

mkdir: cannot create directory ‘build’: File exists
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
THC_LIBRARY
linked by target "crnn" in directory /home/fd/Worspace/crnn/src/cpp

-- Configuring incomplete, errors occurred!
See also "/home/fd/Worspace/crnn/src/cpp/build/CMakeFiles/CMakeOutput.log".
make: *** No targets specified and no makefile found. Stop.
cp: cannot stat '*.so': No such file or directory

Then, I go to crnn/src/cpp/build floder, and execute cmake ...

-- The C compiler identification is GNU 5.3.1
-- The CXX compiler identification is GNU 5.3.1
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
THC_LIBRARY
linked by target "crnn" in directory /home/fd/Worspace/crnn/src/cpp

-- Configuring incomplete, errors occurred!
See also "/home/fd/Worspace/crnn/src/cpp/build/CMakeFiles/CMakeOutput.log".

errors with my own mdb datasets

/disk1/deeplearning/torch/install/bin/luajit: bad argument #3 to '?' (torch.*Tensor expected, got nil)
stack traceback:
[C]: at 0x7f900fc2f8b0
[C]: in function '__newindex'
./utilities.lua:33: in function 'str2label'
./DatasetLmdb.lua:82: in function 'allImageLabel'
./training.lua:84: in function 'trainModel'
main_train.lua:51: in main chunk
[C]: in function 'dofile'
...ning/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x004064d0

anyone can help me how to fix the errors?

thanks

Compilation error when execute sh build_cpp.sh

Dear @bgshih, I get compilation errors when building the cpp part.

could you help me please in order to solve these errors

check the function: recognizeImageLexiconFree?

assert(image:dim() == 2 and image:type() == 'torch.ByteTensor',
'Input image should be single-channel byte tensor')

dim()==2 should be dim()==1 ??cause the image shoule be single-channel.

Question for CTC decoding

I don't clear that why the first position of time sequence always predicted as the first char of given string, just like:

I think that the first lable for most given sequence should be a blank space, such as

What do you think?

Add STN to crnn

@bgshih crnn is a very great project, thanks for your open. Have you ever added stn to crnn? I tried to add the stn layer(https://github.com/qassemoquab/stnbhwd) to crnn, but the train loss is always very big, I've already set the transform matrix to identity matrix, but it looks like the stn layer learns nothing in the training procedure, should I try sgd optimization method instead of adadelta or others?

batch normalized layers are not matched in config.lua and the trained model. so get the bug : ./utilities.lua:256: assertion failed!

@bgshih I find the number of batch normalized layers in your trained model is 6, but which in your config.lua is 3. It do not match. Could you help me?

training errors with my own mdb datasets

I just use the tools/CreateDataset.py to create my owned train and validation dataset. But when running training, it throws the following exceptions:
/disk1/deeplearning/torch/install/bin/luajit: bad argument #3 to '?' (torch.*Tensor expected, got nil)
stack traceback:
[C]: at 0x7f900fc2f8b0
[C]: in function '__newindex'
./utilities.lua:33: in function 'str2label'
./DatasetLmdb.lua:82: in function 'allImageLabel'
./training.lua:84: in function 'trainModel'
main_train.lua:51: in main chunk
[C]: in function 'dofile'
...ning/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x004064d0

anyone can help me how to fix the errors?

thanks

can anyone give train image and label for example?

hello:
I want to train a new model more word than english?
but I don't how the training image is like and is label ( example image may like below )
For char like "你" how many images to prepare? and the size and label?

How to change the code to use RGB image, variable width to train ,and a new language

In the code the width is 100, how to change it to variable width,and use the rgb image. such as the theme

Compilation failed, what was the environment for your program? Much thanks

@bgshih Dear author

As we are trying the reproduce your work, we found that the program is no longer compatible with latest folly, fbthrift, thpp, and fblualib.
Solution in #1 is no longer workable either due to that version is no longer compatible with latest Ubuntu 16.04.

If update your program to the latest thpp, fblualib, folly and fbthrift requires too much effort, can you let us know what was the environment for your program?
The version of: linux (ubuntu 14.04, 15.04, 16.04, etc.?)
torch (torch 7?)
folly (roughly which time period?)
fbthrift (roughly which time period?)
thpp (roughly which time period?)
fblualib (roughly which time period?)
cuda
cudnn

Thank you

how can i use this to recognition chinese word in image?

Do you test to use this to recognition chinese word in image?

Set Learning Rate

I can not find any config about the learning rate in config.lua,and the current learning rate also can not be display at each iteration during training.Do you have any idea about it?
Thanks!

Training on manual dataset : Error : terminate called after throwing an instance of 'std::invalid_argument'

I am trying to run these codes with my own dataset. I created the LMDB files using the given code-snippets in the "tools" folder. However, I get the following error while trying to run the codes,

[11/10/16 15:34:52]  Loading datasets...
../data/oxford_train_4M_lmdb/data.mdb   
y../data/svt_lmdb/data.mdb

[11/10/16 15:34:53]  Start training...  
[11/10/16 15:34:54]  Validating...
terminate called after throwing an instance of 'std::invalid_argument'
terminate called recursively
Aborted (core dumped)

and sometimes, this,

[11/10/16 15:39:08]  Loading datasets...	
../data/oxford_train_4M_lmdb/data.mdb	
../data/svt_lmdb/data.mdb	
[11/10/16 15:39:10]  Start training...	
[11/10/16 15:39:12]  Validating...	
terminate called after throwing an instance of 'std::invalid_argument'
  what():  index out of range
Aborted (core dumped)

Can someone help me find out what am I doing wrong?

are you BiRnnJoin right ?

I find a interesting place , in your implement of BLSTM ,:

    local fwdProj = nn.Linear(nIn, nOut)(fwdX)
    local bwdProj = nn.Linear(nIn, nOut)(bwdX)

    local output = nn.CAddTable()({fwdProj, bwdProj})

I see you add two Linear unit of fwax and bwdX,But in the other implement , they are concatenate and fully connect to output layer.
is there somthing wrong ? thank you

The image of long width has a bad result, the short one does not

For example the good result with short width

the bad result with long width

nothing

Is there any possible to use cudnn.BLSTM to replace bidirectionalLSTM and use Baidu's warp_ctc to replace CtcCriterion?

Is there any possible to use cudnn.BLSTM to replace bidirectionalLSTM and use Baidu's warp_ctc to replace CtcCriterion?Thus i can use cudnn to boost the system.

Problems at training OMR dataset

I tried to train crnn on dataset PitchRec_dataset\OMRB\TrainSet (2345 images), then got "train loss = inf" just like:

[05/23/16 11:18:15] Loading datasets...
[05/23/16 11:18:15] Start training...
[05/23/16 11:18:15] Validating...
[05/23/16 11:18:17] Test loss = 75.251866, accuracy = 0.000000
[05/23/16 11:18:17] dddddddddddddddddddddddddd => d (GT:aadee )
[05/23/16 11:18:17] dd1ddddddddddddddddddddddd => d1d (GT:ghihhi )
[05/23/16 11:18:17] 7ddddddddddddddddddddddddh => 7dh (GT:dcbdddfdfg )
[05/23/16 11:18:17] dd1dddd111111111111111111h => d1d1h (GT:gfggfghhhj )
[05/23/16 11:18:17] dd11dddddddddddddddddddddd => d1d (GT:ihfcba )
[05/23/16 11:18:17] 711ddddddddddddddddddddddh => 71dh (GT:ihgh )
[05/23/16 11:18:17] d11111111111ddddd111111ddh => d1d1dh (GT:fjjejjdjjc )
[05/23/16 11:18:17] 7d1111dddddddddddd11dd111h => 7d1d1d1h (GT:gccbcchgg )
[05/23/16 11:18:17] dddddddddddddddddddddddddh => dh (GT:adeef )
[05/23/16 11:18:17] dddddddddddddddddddddddddh => dh (GT:fgfe )
[05/23/16 11:18:17] 7d11ddddddddddddddddd1111h => 7d1d1h (GT:iffefhg )
[05/23/16 11:18:17] dddddddddddddddddddddddddd => d (GT:edcb )
[05/23/16 11:18:17] dddddddddddddddddddddddd1h => d1h (GT:fhhfhhcee )
[05/23/16 11:18:17] dddddddddddddddddddddddddh => dh (GT:bccddded )
[05/23/16 11:18:17] dd1dddddddddddddd1111ddddh => d1d1dh (GT:chi )
[05/23/16 11:18:47] Iteration 100 - train loss = inf
[05/23/16 11:19:17] Iteration 200 - train loss = inf
[05/23/16 11:19:47] Iteration 300 - train loss = inf
[05/23/16 11:20:17] Iteration 400 - train loss = inf
[05/23/16 11:20:47] Iteration 500 - train loss = inf
[05/23/16 11:21:17] Iteration 600 - train loss = inf
[05/23/16 11:21:45] Iteration 700 - train loss = inf
[05/23/16 11:22:11] Iteration 800 - train loss = inf
[05/23/16 11:22:37] Iteration 900 - train loss = inf
[05/23/16 11:23:04] Iteration 1000 - train loss = inf

My config.lua is as follows:
function getConfig()
local config = {
nClasses = 36,
maxT = 26,
displayInterval = 100,
testInterval = 1000,
nTestDisplay = 15,
trainBatchSize = 64,
valBatchSize = 100,
snapshotInterval = 1000,
maxIterations = 2000000,
optimMethod = optim.adadelta,
optimConfig = {},
trainSetPath = '../../PitchRec_dataset/LMDB/TrainSet/data.mdb',
valSetPath = '../../PitchRec_dataset/LMDB/Synthesized/data.mdb',
}
return config
end

function createModel(config)
local nc = config.nClasses
local nl = nc + 1
local nt = config.maxT

local ks = {3, 3, 3, 3, 3, 3, 2}
local ps = {1, 1, 1, 1, 1, 1, 0}
local ss = {1, 1, 1, 1, 1, 1, 1}
local nm = {64, 128, 256, 256, 512, 512, 512}
local nh = {256, 256}

function convRelu(i, batchNormalization)
    batchNormalization = batchNormalization or false
    local nIn = nm[i-1] or 1
    local nOut = nm[i]
    local subModel = nn.Sequential()
    local conv = cudnn.SpatialConvolution(nIn, nOut, ks[i], ks[i], ss[i], ss[i], ps[i], ps[i])
    subModel:add(conv)
    if batchNormalization then
        subModel:add(nn.SpatialBatchNormalization(nOut))
    end
    subModel:add(cudnn.ReLU(true))
    return subModel
end

function bidirectionalLSTM(nIn, nHidden, nOut, maxT)
    local fwdLstm = nn.LstmLayer(nIn, nHidden, maxT, 0, false)
    local bwdLstm = nn.LstmLayer(nIn, nHidden, maxT, 0, true)
    local ct = nn.ConcatTable():add(fwdLstm):add(bwdLstm)
    local blstm = nn.Sequential():add(ct):add(nn.BiRnnJoin(nHidden, nOut, maxT))
    return blstm
end

-- model and criterion
local model = nn.Sequential()
model:add(nn.Copy('torch.ByteTensor', 'torch.CudaTensor', false, true))
model:add(nn.AddConstant(-128.0))
model:add(nn.MulConstant(1.0 / 128))
model:add(convRelu(1))
model:add(cudnn.SpatialMaxPooling(2, 2, 2, 2))       -- 64x16x50
model:add(convRelu(2))
model:add(cudnn.SpatialMaxPooling(2, 2, 2, 2))       -- 128x8x25
model:add(convRelu(3, true))
model:add(convRelu(4))
model:add(cudnn.SpatialMaxPooling(2, 2, 1, 2, 1, 0)) -- 256x4x?
model:add(convRelu(5, true))
model:add(convRelu(6))
model:add(cudnn.SpatialMaxPooling(2, 2, 1, 2, 1, 0)) -- 512x2x26
model:add(convRelu(7, true))                         -- 512x1x26
model:add(nn.View(512, -1):setNumInputDims(3))       -- 512x26
model:add(nn.Transpose({2, 3}))                      -- 26x512
model:add(nn.SplitTable(2, 3))
model:add(bidirectionalLSTM(512, 256, 256, nt))
model:add(bidirectionalLSTM(256, 256,  nl, nt))
model:add(nn.SharedParallelTable(nn.LogSoftMax(), nt))
model:add(nn.JoinTable(1, 1))
model:add(nn.View(-1, nl):setNumInputDims(1))
model:add(nn.Copy('torch.CudaTensor', 'torch.FloatTensor', false, true))
model:cuda()
local criterion = nn.CtcCriterion()

return model, criterion

end

Then I tried to remove the 4th and 6th conv by referring to your paper (BiLSTMs are not replaced singleLSTM, because I'm new to torch. ), but it doesn't work.

Thanks.