jiangxiluning / master-tf Goto Github PK
View Code? Open in Web Editor NEWMASTER
License: MIT License
MASTER
License: MIT License
您好,请问模型的输入尺寸是限定的吗?我测试的时候发现我用长一点的文本框,识别就会出错。
在transformer.py的Decoder中,call函数如下:
def call(self, x, memory, src_mask, tgt_mask, training=False):
T = tf.shape(x)[1]
x = x + self.decoder_pe[:, :T]
for layer in self.layers:
x = layer(x, memory, src_mask, tgt_mask, training=training)
return self.norm(x)
可以看到这里的x应该已经是embedding了,
但是在transformer_tf.py的Decoder中, call函数如下:
def call(self, x, enc_output, training,
look_ahead_mask, padding_mask):
seq_len = tf.shape(x)[1]
attention_weights = {}
x = self.embedding(x) # (batch_size, target_seq_len, d_model)
x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
x += self.pos_encoding[:, :seq_len, :]
x = self.dropout(x, training=training)
for i in range(self.num_layers):
x, block1, block2 = self.dec_layers[i](x, enc_output, training,
look_ahead_mask, padding_mask)
attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
attention_weights['decoder_layer{}_block2'.format(i+1)] = block2
# x.shape == (batch_size, target_seq_len, d_model)
return x, attention_weights
这里的x显然是int32类型的label。
这两个地方的实现有diff,阅读了其他代码之后我确信transformer_tf.py中Decoder的call函数对x的处理应该省略开头一段取embedding的操作.
不知道论文中的实验结果是哪个版本的?
您好,我使用您提供的weights,在IIIT5K数据集上测试,但Word Accuracy只有0.41,不知道是否有什么问题?
which one ?
where can I find your pytorch implementation ?
之前看文章说基于attention的对长文本文字识别没有crnn效果好,针对文档类,非自然场景文字。不知道这个模型如何?
Hi, thank you very much for your reply in the past. I have a problem how can I output the recognition result to a .log file with the following format: "image_path predicted_labels confidence score". Thank you and look forward to your help.
您好,我用我自己的数据集(汉英,真实场景160k数据量)进行实验,发现训练很快收敛,但是验证结果很差,您出现过这种情况吗?我想的话,这是不是因为这种结构和输入方式,相当于设置了teaching_forcing = 1,很容易就导致过拟合了。
RT
作者您好,我使用您的网络进行测试,硬件环境也是V100,但FPS大概在4左右,averaged_infer_time: 203.730ms,和论文所给的9.22ms相差较远,是否存在什么问题?
另外,我为了和deep-text-recognition进行对比,将网络输入大小改为32*100,基于MJ+ST/lower_case训练了4个epoch,精度如下:
accuracy: IIIT5k_3000: 0.845 SVT: 0.804 IC03_860: 0.910 IC03_867: 0.908 IC13_857: 0.896 IC13_1015: 0.891 IC15_1811: 0.675 IC15_2077: 0.595 SVTP: 0.693 CUTE80: 0.667
total_accuracy: 0.779
好像这个精度相较于TPS-BLSTM-Attn, SAR, 另一篇基于transformer的网络([https://arxiv.org/pdf/1906.05708.pdf]))都会有一点偏低,无法看出网络的优越性。如何能实现论文所给的精度呢?
Thanks for your work.
Can you upload your checkpoint to google drive or another cloud ? I can't download it from baidu
@jiangxiluning 看代码,h被设置为1了。请问是在实际实践有什么trick?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.