Code Monkey home page Code Monkey logo

master-tf's Issues

模型输入尺寸是限定的吗

您好,请问模型的输入尺寸是限定的吗?我测试的时候发现我用长一点的文本框,识别就会出错。

关于transformer_tf的Decoder中可能存在的一个bug

在transformer.py的Decoder中,call函数如下:

def call(self, x, memory, src_mask, tgt_mask, training=False):
        T = tf.shape(x)[1]
        x = x + self.decoder_pe[:, :T]

        for layer in self.layers:
            x = layer(x, memory, src_mask, tgt_mask, training=training)
        return self.norm(x)

可以看到这里的x应该已经是embedding了,
但是在transformer_tf.py的Decoder中, call函数如下:

def call(self, x, enc_output, training,
             look_ahead_mask, padding_mask):

        seq_len = tf.shape(x)[1]
        attention_weights = {}

        x = self.embedding(x)  # (batch_size, target_seq_len, d_model)
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            x, block1, block2 = self.dec_layers[i](x, enc_output, training,
                                                   look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2

        # x.shape == (batch_size, target_seq_len, d_model)
        return x, attention_weights

这里的x显然是int32类型的label。
这两个地方的实现有diff,阅读了其他代码之后我确信transformer_tf.py中Decoder的call函数对x的处理应该省略开头一段取embedding的操作.
不知道论文中的实验结果是哪个版本的?

测试精度问题

您好,我使用您提供的weights,在IIIT5K数据集上测试,但Word Accuracy只有0.41,不知道是否有什么问题?

write OCR in file log

Hi, thank you very much for your reply in the past. I have a problem how can I output the recognition result to a .log file with the following format: "image_path predicted_labels confidence score". Thank you and look forward to your help.

About inference time

作者您好,我使用您的网络进行测试,硬件环境也是V100,但FPS大概在4左右,averaged_infer_time: 203.730ms,和论文所给的9.22ms相差较远,是否存在什么问题?
另外,我为了和deep-text-recognition进行对比,将网络输入大小改为32*100,基于MJ+ST/lower_case训练了4个epoch,精度如下:
accuracy: IIIT5k_3000: 0.845 SVT: 0.804 IC03_860: 0.910 IC03_867: 0.908 IC13_857: 0.896 IC13_1015: 0.891 IC15_1811: 0.675 IC15_2077: 0.595 SVTP: 0.693 CUTE80: 0.667
total_accuracy: 0.779
好像这个精度相较于TPS-BLSTM-Attn, SAR, 另一篇基于transformer的网络([https://arxiv.org/pdf/1906.05708.pdf]))都会有一点偏低,无法看出网络的优越性。如何能实现论文所给的精度呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.