wushilian / crnn_attention_ocr_chinese Goto Github PK

CRNN with attention to do OCR,add Chinese recognition

Python 100.00%

attention ocr crnn seq2seq-attn lstm

crnn_attention_ocr_chinese's Issues

训练结果总是 '的'

您好, 我按照train.txt生成了数据，然后修改了一下代码，但是发现预测结果总是 '的', 是因为训练的次数不够多还是我改的代码有问题？

[+] epoch:0, batch:371, loss:3.7565720081329346, acc:0.0,
 train_decode:['，的的的的的的的的的', '，，的的的的的的的的', '，，的的的的的的的的', '，，的的的的的的的的', '，的的的的的的的的的'], 
 val_decode:['，的的的的的的的的的', '，的的的的的的的的的', '，的的的的的的的的的', '，的的的的的', '，，的的的的的的的的'], 
 val_truth:['管理我们要质疑!当然', '那大仙才能保持人体健', '种色彩因其波长不同而', '命和**共和制的产生', '盟者将按照年营业额的']

config.py 主要是加了一个数据批量读入的功能还把图片尺寸改成280*32了不知道有没有影响

import numpy as np
import cv2
import os

learning_rate = 0.001
momentum = 0.9
START_TOKEN = 0
END_TOKEN = 1
UNK_TOKEN = 2
VOCAB = {'<GO>': 0, '<EOS>': 1, '<UNK>': 2, '<PAD>': 3}  # 分别表示开始，结束，未出现的字符
VOC_IND = {}


# charset='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
def get_class(path):
    """

    :param path:
    :return: 文字到int , int到文字
    """
    f = open(path, 'r', encoding='UTF-8')
    line = f.readline().strip()
    class2int = {}
    int2class = {}
    i = 0
    while line != '':
        class2int[line] = i
        int2class[i] = line
        line = f.readline().strip()
        i = i + 1
    f.close()
    return class2int, int2class


_, charset = get_class('char_std_5990.txt')

for i in range(len(charset)):
    VOCAB[charset[i]] = i + 4
for key in VOCAB:
    VOC_IND[VOCAB[key]] = key

NUM_BATCHES = 0
MAX_LEN_WORD = 20  # 标签的最大长度，以PAD
VOCAB_SIZE = len(VOCAB)
BATCH_SIZE = 320
RNN_UNITS = 256
EPOCH = 10000
IMAGE_WIDTH = 280
IMAGE_HEIGHT = 32
MAXIMUM__DECODE_ITERATIONS = 20
DISPLAY_STEPS = 20
LOGS_PATH = 'log'
CKPT_DIR = 'save_model'
train_dir = '/media/lhj/0C3313300C331330/Images'
val_dir = 'data/data'
is_restore = True


def label2int(label):  # label shape (num,len)
    # seq_len=[]
    target_input = np.ones((len(label), MAX_LEN_WORD), dtype=np.float32) + 2  # 初始化为全为PAD
    target_out = np.ones((len(label), MAX_LEN_WORD), dtype=np.float32) + 2  # 初始化为全为PAD
    for i in range(len(label)):
        # seq_len.append(len(label[i]))
        target_input[i][0] = 0  # 第一个为GO
        for j in range(len(label[i])):
            target_input[i][j + 1] = VOCAB[label[i][j]]
            target_out[i][j] = VOCAB[label[i][j]]
        target_out[i][len(label[i])] = 1
    return target_input, target_out


def int2label(decode_label):
    label = []
    for i in range(decode_label.shape[0]):
        temp = ''
        for j in range(decode_label.shape[1]):
            if VOC_IND[decode_label[i][j]] == '<EOS>':
                break
            elif decode_label[i][j] == 3:
                continue
            else:
                temp += VOC_IND[decode_label[i][j]]
        label.append(temp)
    return label


def batch_iter(data_dir, file):
    """生成批次数据"""
    global NUM_BATCHES
    f = open(file, 'r', encoding='UTF-8')
    lines = f.read().strip().split('\n')

    data_len = len(lines)
    NUM_BATCHES = int((data_len - 1) / BATCH_SIZE) + 1

    print("[+] You have {} batches!".format(NUM_BATCHES))

    for i in range(NUM_BATCHES):
        image = []
        labels = []
        start_id = i * BATCH_SIZE
        end_id = min((i + 1) * BATCH_SIZE, data_len)
        iter_lines = lines[start_id:end_id]
        for line in iter_lines:
            s = line.strip().split(' ')
            label = ''
            image_name = os.path.join(data_dir, s[0])
            im = cv2.imread(image_name, 0)  # /255.#read the gray image
            if im.shape != [32, 280]:
                im = cv2.resize(im, (IMAGE_WIDTH, IMAGE_HEIGHT))
            img = im.swapaxes(0, 1)
            image.append(np.array(img[:, :, np.newaxis]))
            for i in range(len(s) - 1):
                label += charset[int(s[i + 1])]
            labels.append(label)
        yield np.array(image), labels

def cal_acc(pred, label):
    num = 0
    for i in range(len(pred)):
        if pred[i] == label[i]:
            num += 1
    return num * 1.0 / len(pred)

train.py 调了一下代码结构，把批量读入的函数加进去了

from model import *
import config as cfg
import time
import os


init = tf.global_variables_initializer()  # 变量初始化

with tf.name_scope("optimizer") as scope:
    loss, train_decode_result, pred_decode_result = build_network(is_training=True)  # TODO
    optimizer = tf.train.MomentumOptimizer(learning_rate=cfg.learning_rate, momentum=cfg.momentum, use_nesterov=True)
    train_op = optimizer.minimize(loss)


with tf.name_scope('summaries'):
    saver = tf.train.Saver(max_to_keep=5)
    tf.summary.scalar("cost", loss)
    summary_op = tf.summary.merge_all()
    writer = tf.summary.FileWriter(cfg.LOGS_PATH)


with tf.Session() as sess:
    tf.initialize_all_variables().run()
    sess.run(init)
    if cfg.is_restore:
        ckpt = tf.train.latest_checkpoint(cfg.CKPT_DIR)
        if ckpt:
            saver.restore(sess, ckpt)
            print('[*] restore from the checkpoint{0}'.format(ckpt))

    num_batches_per_epoch = cfg.NUM_BATCHES
    for cur_epoch in range(cfg.EPOCH):
        train_dataloader = cfg.batch_iter(cfg.train_dir, 'data/train.txt')
        test_dataloader = cfg.batch_iter(cfg.train_dir, 'data/my_test')

        train_cost = 0
        start_time = time.time()
        batch_time = time.time()
        val_img, val_label = next(test_dataloader)
        # the tracing part
        for cur_batch in range(10249):
            batch_time = time.time()
            batch_inputs, batch_label = next(train_dataloader)
            print("Batch Label: ", batch_label)
            batch_target_in, batch_target_out = cfg.label2int(batch_label)
            sess.run(train_op,
                     feed_dict={image: batch_inputs, train_output: batch_target_in, target_output: batch_target_out,
                                sample_rate: np.min([1., 0.2 * cur_epoch + 0.2])})

            if cur_batch % 1 == 0:
                summary_loss, loss_result = sess.run([summary_op, loss],
                                                     feed_dict={image: batch_inputs, train_output: batch_target_in,
                                                                target_output: batch_target_out,
                                                                sample_rate: np.min([1., 1.])})
                writer.add_summary(summary_loss, cur_epoch * num_batches_per_epoch + cur_batch)
                val_predict = sess.run(pred_decode_result, feed_dict={image: val_img[0:cfg.BATCH_SIZE]})
                train_predict = sess.run(pred_decode_result, feed_dict={image: batch_inputs,
                                                                        train_output: batch_target_in,
                                                                        target_output: batch_target_out,
                                                                        sample_rate: np.min([1., 1.])})
                predit = cfg.int2label(np.argmax(val_predict, axis=2))
                train_pre = cfg.int2label(np.argmax(train_predict, axis=2))
                gt = val_label[0:cfg.BATCH_SIZE]
                acc = cfg.cal_acc(predit, gt)

                print("[+] epoch:{}, batch:{}, loss:{}, acc:{},\n train_decode:{}, \n val_decode:{}, \n val_truth:{}".
                      format(cur_epoch, cur_batch,
                             loss_result, acc,
                             train_pre[0:5],
                             predit[0:5],
                             gt[0:5]))

                if not os.path.exists(cfg.CKPT_DIR):
                    os.makedirs(cfg.CKPT_DIR)
                saver.save(sess, os.path.join(cfg.CKPT_DIR, 'attention_ocr.model'),
                           global_step=cur_epoch * num_batches_per_epoch + cur_batch)

你好，我发现在使用infer.py时，输入图片的数量必须和BATCH_SIZE一样。
1）当BATCH_SIZE为40，我用如下代码预测时：
for img_pd in val_img:
val_predict = sess.run(pred_decode_result,feed_dict={image: np.array([img_pd])})
predit = cfg.int2label(np.argmax(val_predict, axis=2))
print(predit)
会出现这样的错误：ConcatOp : Dimensions of inputs should match: shape[0] = [40,326] vs. shape[1] = [1,256]
也就是输入图片的数量得和BATCH_SIZE一样。

2）当我把BATCH_SIZE改为2，我用如下代码预测时：
for img_pd in val_img:
val_predict = sess.run(pred_decode_result,feed_dict={image: np.array([img_pd，img_pd])})
predit = cfg.int2label(np.argmax(val_predict, axis=2))
print(predit)
虽然能输出结果，但是结果却是没那么准确的。

3）我把BATCH_SIZE改为1之后，在build_network时会报错：0 is not an invariant for the loop.
定位是tf.contrib.seq2seq.dynamic_decode这个函数里面出错了。
所以没办法通过更改BATCH_SIZE为1来实现单张图片预测。

那么，请问一下，如何进行单张图片的预测呢？
期待你的回复，谢谢！

How to increase training rate.

Hi, wu
I found changing this line(in model line 81) loss = tf.reduce_mean(att_loss) to loss = tf.reduce_sum(att_loss) will greatly increase the training speed.
The reason, I thought, is reduce_sum return a bigger loss, which will directly cause that every trainable parameter will be changed greatly.

Brs
SJHB

synthetic Chinese string dataset

I downloaded the synthetic chinese string data from baiduyun, but can not open it.

I want to know is this file corrupted or not.

please give me some advice.

要改成能够识别中文怎么弄呢

请问你这样把宽度固定为120，效果好嘛？会不会让宽高比变形太大

一般都是高度变为32，宽度跟随高度变化的比例变化吧？

模型

楼主，没有开放训练好的模型吗

你好，我训练出来的模型只能识别和训练图片一样长度的文本，是某个参数设置错了吗？

an error

Hi, wu
I confront this error when I run train.py. Do you know how to solved this problem?

Traceback (most recent call last):
File "train.py", line 50, in
sess.run( train_op,feed_dict={image: batch_inputs,train_output: batch_target_in,target_output: batch_target_out,sample_rate:np.min([1.,0.2*cur_e
poch+0.2])})
File "/home/sjhbxs/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/sjhbxs/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/sjhbxs/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/sjhbxs/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: The node 'gradients/decode/decoder/while/BasicDecoderStep/ScheduledEmbeddingTrainingHe
lperSample_1/cond/Gather_1_grad/ExpandDims/StackPush' has inputs from different frames. The input 'gradients/decode/decoder/while/BasicDecoderStep/S
cheduledEmbeddingTrainingHelperSample_1/cond/Gather_1_grad/Size' is in frame ''. The input 'gradients/decode/decoder/while/BasicDecoderStep/Schedule
dEmbeddingTrainingHelperSample_1/cond/Gather_1_grad/ExpandDims/StackPush/Switch' is in frame 'decode/decoder/while/decode/decoder/while/'.

dic.txt

中文的字典的用处是什么

loss,train_decode_result,pred_decode_result=build_network(is_training=True),这个train_decode_result完全没用到啊？

哥们，这个train_decode_result是做什么的，为啥最后没用到呢？

Do I need to modify charset in config.py in order to recog Chinese?

It seems to me that in order to have K Chinese characters recoginized , I have to initialize K logits in the compute graph.With the attention mechanism,there will be K^2 attention nodes in the graph.
Considering there're many Chinese characters(20000+),the compute graph is very large and fails to allocate memory on a dual GTX1080Ti machine (22GB of VRAM)
Is my understanding correct?

你好，这个程序可以训练中文图片并测试中文吗？我看到config.py里面把英文字母转换成数字，中文的该怎么做呢？谢谢

loss下降问题

您好，我用这个代码训练synth90k的英文数据，数据量比较大，训练的时候一开始loss是正常下降的，但是后来经过每个epoch后loss竟然又上升了，这个不适合大数据量吗，无法收敛啊。我还试过小数据集进行训练，loss是正常下降了，但是到0.2就不下降了，不知道您有没有遇到类似情况。

转pb过程中的问题

请问转pb的时候，输出的名字是这个么output_node_names = "target_output"，怎么我转出来的就是84byte？

这是一次性加载所有训练数据进入内存吗？

Train.py get killed by ubuntu

Chinese branch
Killed when reading the dataset(Synthetic Chinese String Dataset)

关于生成.pb文件

谢谢之前的解答，我想问问就是模型保存到了save_model目录下后，怎么将它们转成.pb文件

转成中文识别的错误

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [584,256] rhs shape= [578,256]
[[Node: save/Assign_4 = Assign[T=DT_FLOAT, _class=["loc:@decode/decoder/attention_wrapper/gru_cell/candidate/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](decode/decoder/attention_wrapper/gru_cell/candidate/kernel, save/RestoreV2_4/_81)]]

有MultiGPU版本吗

请问训练出来的模型有转过caffemodel么

您好，请问有没有好的中文数据集

报错 Can not squeeze dim[2]

ValueError: Can not squeeze dim[2], expected a dimension of 1, got 5 for 'encode_features/Squeeze' (op: 'Squeeze') with input shapes: [?,98,5,512].
我输入的图片大小为400*80，cnn特征提取后输入到rnn时会报错，想问一下这个tf.squeeze(net, axis=2)是做了什么操作，卷积后net的维度不是[batch_size, h, w , channel]吗，对第二维做压缩是为什么？

训练结果全部变成<GO><GO><GO>

你好，使用模型训练，项目中的训练集，当loss=0.18，精确度71%，learning_rate由0.001调到0.0001的时候，模型突然所有的预测结果都变成一连串的....
请问一下，这个是什么原因导致的？谢谢啦

How is the acc?

attention模型和CTC模型效果有比较过吗

如题，想请教下您对比过CTC的效果吗，中文上哪个表现好？我的实验结果是Attention对字符尺度变化很敏感，效果比CTC差很多

MAX_LEN_WORD issue

Hello sir,

Nice job!
Here i have a question, i need to recognition long number sequence, the max length might be 20, so i just change the input W to 240(2*120=240) and change the stride of the last pooling layer from 1.2 to 2,2.
however the training will not converge whenever i add more samples(>100), and the acc is always 0.0.
Do you have any advice?

Where do i put the data to run inference

Hi @wushilian
Should i put the inference data in a seperate folder? or is it the same train folder?

could you show the addrress of model trained for us ,thanks?

怎么预测

训练完成后生成attention_ocr.ckpt-239998.data-00000-of-00001，attention_ocr.ckpt-239998.index，attention_ocr.ckpt-239998.meta，checkpoint，怎么用来对字符图片进行预测

Embedding Dimension

model.py 文件里的这一句可能有问题
embeddings = tf.get_variable(name='embed_matrix',shape=[cfg.VOCAB_SIZE, cfg.VOCAB_SIZE])
embeding 被定义成 VOCAB_SIZE * VOCAB_SIZE 的矩阵
如果训练识别中文用 char_std_5990 做字典的话, 这个矩阵就是 5994 * 5994 (还有 , , , 4个token)
还会影响后面 decoder gru 的参数量
这么巨大的参数量会导致模型无法收敛
训练中文需要调整 embedding dimension

Is this trainning process right?

I use the default parameters set in original code, and got the following training process. The "train_decode" and "val_decode" both are null till "epoch:0, batch:1700", but the loss is still decreasing. And I changed the output format as following. So the acc keeps as 0.0, and is there something wrong with the training process?

train_decode: [，，，的的的的的的的] | val_decode: [，，，的的的的的的的] | ground_truth: [职顺带休息网友奥斯摩]
train_decode: [，，的的的的的的的的] | val_decode: [，，的的的的的的的的] | ground_truth: [中距空对空导弹实施攻]
train_decode: [，，的的的的的的的的] | val_decode: [，，，，的的的的的的] | ground_truth: [参见第十四章“无状之]
train_decode: [，，，的的的的的的的] | val_decode: [，，，，的的的的的的] | ground_truth: [面冒个泡，得三公之教]
epoch:0, batch:19300, loss:3.55256605148, acc:0.0
train_decode: [，，，的的的的的的的] | val_decode: [，，，的的的的的的的] | ground_truth: [也许你通过跟人力资源]
train_decode: [，，，的的的的的的的] | val_decode: [，，，的的的的的的的] | ground_truth: [狼或熊则是一个好兆头]
train_decode: [，，的的的的的的的的] | val_decode: [，，，，的的的的的的] | ground_truth: [网来说是一个不小的挑]
train_decode: [，，的的的的的的的的] | val_decode: [，，，，的的的的的的] | ground_truth: [无神论科学家的说法是]
train_decode: [，，的的的的的的的的] | val_decode: [，，，，的的的的的的] | ground_truth: [闻玉帝求又破右胁而出]
epoch:0, batch:19400, loss:3.53783345222, acc:0.0
train_decode: [，，的的的的的的的的] | val_decode: [，，的的的的的的的的] | ground_truth: [机械学院→华北工学院]
train_decode: [，的的的的的的的的的] | val_decode: [，的的的的的的的的的] | ground_truth: [中。同时身体向下弯曲]
train_decode: [，的的的的的的的的的] | val_decode: [，，，，的的的的的的] | ground_truth: [到了德佩罗（Desp]
train_decode: [，，，的的的的的的的] | val_decode: [，，，的的的的的的的] | ground_truth: [这完全是一个主权国家]
train_decode: [，的的的的的的的的的] | val_decode: [，，的的的的的的的的] | ground_truth: [时候他会永远离不开老]

单张图片预测infer.py报错，如何解决？

用infer.py预测的时候，怎么单张图片预测报错, 但是batch_size批量预测没有问题，单张预测的问题如何解决?

请问训练和测试时候，decoer的输入不一样吧？

@wushilian 打扰下。
train_decode_result = train_outputs[0].rnn_output[:, :-1, :]
pred_decode_result = pred_outputs[0].rnn_output

请问这两句话是什么意思啊？哪里可以啊

准确率acc一直是0.0

您好，我用了你的代码，使用自己的数据集，训练的时候loss=0.0几，但是val_Decoder一直是错的，acc也一直是0.0.您知道是什么问题么？

Error

当我执行此代码时会显示错误消息，请有人可以帮助我
when I execute this code an error message is displayed, please someone can help me

acc一直为0.0

您好，我使用的是您推荐的数据集，训练了3个epoch，acc一直为0，您知道是什么问题吗？

trian.py中的acc输出的值是什么的准确率呢？

我看你的代码中，比较的是val_img的输出predict和train_img的target。这两个值不能求准确率吧。

您好，可以提供下您训练好的模型存档吗？就是saved model

不能训练问题

程序在运行 sess.run(tf.global_variables_initializer()) 这句时卡主了，过一会就强制退出来了，请问这是什么原因呢

代码出问题，怎么解决

img = cv2.resize(im, (IMAGE_WIDTH,IMAGE_HEIGHT))

cv2.error: C:\bld\opencv_1498171314629\work\opencv-3.2.0\modules\imgproc\src\imgwarp.cpp:3492: error: (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize

你好，请问怎么预测不是10个字的图片啊

你好，我用你的代码训练出来的模型，只能预测对10个字的图片，但是对多于十个字或者少于十个字的图片该怎么办呢，可以做到预测不定数目字数的图片吗

batchnormolization 没有更新

你好你在训练过程中BN的参数没有更新，导致在测试的时候必须使用和训练过程中一样的batch测试数据才能显示正确结果，如果在测试过程中把is_training改成false会出现错误

run error

Traceback (most recent call last):
File "/home/jxf/CRNN_Attention_OCR-master/train.py", line 24, in
img,label=cfg.read_data(cfg.train_dir)
File "/home/jxf/CRNN_Attention_OCR-master/config.py", line 64, in read_data
img = cv2.resize(im, (IMAGE_WIDTH,IMAGE_HEIGHT))
cv2.error: /io/opencv/modules/imgproc/src/imgwarp.cpp:3483: error: (-215) ssize.width > 0 && ssize.height > 0 in function resize

wushilian / crnn_attention_ocr_chinese Goto Github PK

crnn_attention_ocr_chinese's Issues

Recommend Projects

Recommend Topics

Recommend Org