jiangxiluning / master-tf Goto Github PK

View Code? Open in Web Editor NEW

138.0 7.0 43.0 73 KB

MASTER

License: MIT License

Python 100.00%

ocr ocr-recognition transformer deep-learning cv scene-text-recognition

master-tf's Issues

raise FileNotFoundError('LMDB file is not found. {}'.format(lmdb_path)) FileNotFoundError: LMDB file is not found. \data_lmdb_release\training\MJ\MJ_train

Hello, I appreciate your research, when I run the training, I get the following error. Please help, thanks.

模型输入尺寸是限定的吗

您好，请问模型的输入尺寸是限定的吗？我测试的时候发现我用长一点的文本框，识别就会出错。

raise FileNotFoundError('LMDB file is not found. {}'.format(lmdb_path)) FileNotFoundError: LMDB file is not found. \data_lmdb_release\training\MJ\MJ_train

Hello, I appreciate your research, when I run the training, I get the following error. Please help, thanks.

### ### #

关于transformer_tf的Decoder中可能存在的一个bug

在transformer.py的Decoder中，call函数如下:

def call(self, x, memory, src_mask, tgt_mask, training=False):
        T = tf.shape(x)[1]
        x = x + self.decoder_pe[:, :T]

        for layer in self.layers:
            x = layer(x, memory, src_mask, tgt_mask, training=training)
        return self.norm(x)

可以看到这里的x应该已经是embedding了，
但是在transformer_tf.py的Decoder中, call函数如下:

def call(self, x, enc_output, training,
             look_ahead_mask, padding_mask):

        seq_len = tf.shape(x)[1]
        attention_weights = {}

        x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            x, block1, block2 = self.dec_layers[i](x, enc_output, training,
                                                   look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

这里的x显然是int32类型的label。
这两个地方的实现有diff，阅读了其他代码之后我确信transformer_tf.py中Decoder的call函数对x的处理应该省略开头一段取embedding的操作.
不知道论文中的实验结果是哪个版本的？

测试精度问题

您好，我使用您提供的weights，在IIIT5K数据集上测试，但Word Accuracy只有0.41，不知道是否有什么问题？

english character or chinese words recognition?

which one ?

pytorch implementation

where can I find your pytorch implementation ?

Hello, when I initialized the model, I only saw the decoder, not the GCAttention part

Can you help me explain

请问这个模型对长文本的识别和CRNN比较，哪个效果好？

之前看文章说基于attention的对长文本文字识别没有crnn效果好，针对文档类，非自然场景文字。不知道这个模型如何？

Hi, thank you very much for your reply in the past. I have a problem how can I output the recognition result to a .log file with the following format: "image_path predicted_labels confidence score". Thank you and look forward to your help.

The training phase converges quickly (acc>0.95), but the validate result is very bad (acc<0.3)

您好，我用我自己的数据集（汉英，真实场景160k数据量）进行实验，发现训练很快收敛，但是验证结果很差，您出现过这种情况吗？我想的话，这是不是因为这种结构和输入方式，相当于设置了teaching_forcing = 1，很容易就导致过拟合了。

what's the difference between transformer_tf and transformer.py ?

About inference time

作者您好，我使用您的网络进行测试，硬件环境也是V100，但FPS大概在4左右，averaged_infer_time: 203.730ms，和论文所给的9.22ms相差较远，是否存在什么问题？
另外，我为了和deep-text-recognition进行对比，将网络输入大小改为32*100，基于MJ+ST/lower_case训练了4个epoch，精度如下：
accuracy: IIIT5k_3000: 0.845 SVT: 0.804 IC03_860: 0.910 IC03_867: 0.908 IC13_857: 0.896 IC13_1015: 0.891 IC15_1811: 0.675 IC15_2077: 0.595 SVTP: 0.693 CUTE80: 0.667
total_accuracy: 0.779
好像这个精度相较于TPS-BLSTM-Attn, SAR, 另一篇基于transformer的网络([https://arxiv.org/pdf/1906.05708.pdf]))都会有一点偏低，无法看出网络的优越性。如何能实现论文所给的精度呢？

jiangxiluning / master-tf Goto Github PK

master-tf's Issues

### ### #

Recommend Projects

Recommend Topics

Recommend Org