wang-tianwei / decoupled-attention-network Goto Github PK

View Code? Open in Web Editor NEW

309.0 309.0 65.0 2.05 MB

Pytorch implementation for "Decoupled attention network for text recognition".

License: MIT License

Python 100.00%

decoupled-attention-network's People

Contributors

Stargazers

Watchers

Forkers

shengzhang90 fendaq elavin11 wwwanghao dun933 happog wuxiaolianggit gztangde aniruddh1 super-ljg timsears hell-to-heaven chengmuni66 baifanysu bachelorwangwei 17666107783 penetrator111 jxncyym 5l1v3r1 starstylesky jeasung-pf zzmcdc minglii1998 annihilation7 fengxingxiang yuzhouzhili xiaodangao137 thangdt xuhuaren azuredsky davidko3 ustczhouyu mzpmzk kapitsa2811 alongstar518 chenmin-333 wpu93 dibyakantimahapatra xiaocainiaopy euphoriayan ucodai lohzhunyewcs yangyin2016 zouxiangxiang dimplesl fightli123456 274869388 fireae smile-ls-all yshbjut fangpanliang shubham303 chasemonsteraway wuyumax lucasvalentim garonehuang taoshss lyttonkeepfoing bang123-box aniketgurav easy-forks praneelrokz rumadra

decoupled-attention-network's Issues

About irregular text benchmark

I have been confused about the accuracy on irregular datasets for many days. The model can reach 93.3 on IIIT5K, however when I tested on svt_p. ic15 and outher datasets, i got a terrible results. I don't know the reason, can anybody give me some suggetions?

How to test the pre-trained hand written model on my test image?

I just want to check the accuracy of the models provided in the repo. I used python3. I ran the script main.py after downloading the models and put them in models/hw folder but I am getting the error:

mkdir: cannot create directory ‘models/hw’: File exists
Traceback (most recent call last):
File "main.py", line 116, in
model = load_network()
File "main.py", line 65, in load_network
model_cam = cfgs.net_cfgs'CAM'
File "/home/tft/Videos/Decoupled-attention-network/DAN.py", line 121, in init
ksize_h = 1 if scales[i-1][1] == 1 else ksize[r_h-1]
TypeError: list indices must be integers or slices, not float

Can somebody help me with this?

关于中文识别的有关问题？

@Wang-Tianwei 您好，我查看了您的issue列表，想问 MORAN_v2和Decoupled-attention-network两个repo哪一个比较适合ICDAR2019 ReCTS这样的数据集呢？在您之前的回答中#3 ，哪一个方法效果会比较适合这一数据集呢

请问可以提供中文预训练模型吗？

感谢分享！非常好的学习资源！

bug_report

Decoupled-attention-network/dataset_scene.py

Line 54 in 64806d7

self.target_ratio = img_width / float(img_width)

should :
self.target_ratio = img_width / float(img_height)

In Settings, whether the image size needs to be an even multiple of 32. Otherwise, an error will be reported？

Decoupled-attention-network/cfgs_scene.py

Lines 18 to 24 in 64806d7

    
           'dataset_train_args': { 
        
               'roots': ['path/to/lmdb_ST', 'path/to/lmdb_SK'], 
        
               'img_height': 32, 
        
               'img_width': 128, 
        
               'transform': transforms.Compose([transforms.ToTensor()]), 
        
               'global_state': 'Train', 
        
           },

Whether the image width 128 and the Feature_Extractor input sizes are one-to-one？

Decoupled-attention-network/cfgs_scene.py

Lines 49 to 55 in 64806d7

    
           net_cfgs = { 
        
               'FE': Feature_Extractor, 
        
               'FE_args': { 
        
                   'strides': [(1,1), (2,2), (1,1), (2,2), (1,1), (1,1)], 
        
                   'compress_layer' : False,  
        
                   'input_shape': [1, 32, 128], # C x H x W 
        
               },

HWDB 2.0-2.2loss不收敛

您好, 我复现IAM的精度后想试一下在中文手写上的精度，针对CASIA-HWDB 2.0-2.2的数据调整了配置中的classes数量后发现loss无法收敛，请问还有哪些地方需要调整吗，多谢！

exp1_E4_I20000-239703.M0.pth

hello, i want to know where is the file of 'exp1_E4_I20000-239703.M0.pth' , i cann`t create it. who can answer me? Thanks

这个网络可以实现变长输入吗，改写需要注意些什么呢

why you do permute in CAM_transposed?

i found these lines in your code :

 # Transpose B-C-H-W to B-W-C-H
  x = x.permute(0, 3, 1, 2).contiguous()

and i print the shape of features follow your model like this :

and my question is why you do this permutation , it seems not very meaningful ，

batch inference

你好，我修改了推理代码实现了batch推理，但是rnn那个部分的推理时间和batch成线性关系，请问能帮忙解释一下这个现象么，谢谢

lenText = nT
nsteps = nT
output = torch.zeros(nB, lenText).int()
output_probs = torch.zeros(nB, lenText).float()

hidden = torch.zeros(nB, self.nchannel).type_as(C.data)
stats = torch.zeros(nB, self.nchannel).type_as(C.data)
prev_emb = self.char_embeddings.index_select(0, torch.zeros(nB).type_as(C.data).long())

for step in range(nsteps):
hidden, stats = self.rnn(torch.cat((C[step, :, :], prev_emb), dim=1), (hidden, stats))
step_result = self.generator(hidden)
step_result = F.softmax(step_result, dim=-1)
max_prob_index = torch.argmax(step_result, dim=1)
max_prob = step_result.index_select(dim=1, index=max_prob_index)
output[:, step] = max_prob_index
output_probs[:, step] = max_prob
prev_emb = self.char_embeddings.index_select(0, max_prob_index.long())

return output, output_probs

When training from scratch on a new dataset, how appropriate is the initial learning rate?

I train this model on baidu ocr dataset and new keys.txt but I think the 1.0 is too big for initial learning rate

94.3% 2D recognition

Hello, could you please tell me how to recurrence the recognition accuracy 94.3% of 2D attention, and what modifications need to be made in the network

How can i test my own image and c the result?

i successfully run the model and have a accuracy of 0.999.. , but after that i dont know how to test my own imgs and c wat the model gives out

关于attention maps使用sigmoid的问题

作者您好，您在生成attention maps时使用了sigmoid +normalization，而没有使用传统的softmax。请问是为什么呢？是做过相关的实验吗？谢谢！

What's meaning of bi-directional model ?

In #1 (comment) issue,
you mentioned that the performance of paper could be get from bi-directional model.

I have checked the accuracy of various test dataset and accuracies were lower than paper.
So I want to try the best condition of models.

Is "bi-directional model" related with DTD module?
But "pre_lstm" in DTD is already using "bidirectional=True" flag.
I'm confusing if we can apply "bidirectional=True" with nn.GRU instead of nn.GRUCell(your code).

It looks nn.GRUCell is proper method because DTD of DAN have to check character of each step.

感谢作者开源，请问，训练中文的话，是不是只要在dict/dict_36.txt文件中增加中文字符就可以了？

how test a image?

the code for test a image(cv2.imread)

可视化问题

请问论文中图7和8关于attention maps的可视化是如何画出来的, 可以提供代码吗

Could you release the trained model for the scene text recognition?

Hi,
Thanks for your released code. Could you release the trained model for the scene text recognition? We would like to compare our method with your method comprehensively.

Thank you~

请教，测试和训练准确率一直为0.

用您提供的场景文字识别模型，测试准确率一直为零；自己训练准确率也是0. 配置文件中的路径都改过了，环境也与您提供的一致。不知道还有什么问题。烦请解答一下可以吗？

how to train the models for custom dataset without loading existing models?

关于DTD的小白问题

您好，请问DTD中的
self.char_embeddings = Parameter(torch.randn(nclass, nchannel))
后面进行decode用到了多次
具体起什么作用呀，可以大概帮忙介绍下么，非常感谢

测试，训练acc一直都是0，请问如何解决？

‘’sample too long‘’ in training.

When I set 'root' in cfgs_scene.py to:
'roots': ['/home/yxwang/pytorch/DAN_1/dan/data/reg_dataset/NIPS2014',
'/home/yxwang/pytorch/DAN_1/dan/data/reg_dataset/NIPS2014',
'/home/yxwang/pytorch/DAN_1/dan/data/reg_dataset/CVPR2016',
'/home/yxwang/pytorch/DAN_1/dan/data/reg_dataset/CVPR2016']
it will output 'sample too long', however, the network still works. Is there anything wrong?

为什么会输出sample too long

在训练的时候为什么会输出sample too long，是由于batch_size太大了吗？

How to get the confidence score of the o/p ?

Is that support handwritten chinese text ?

@scutyuanzhi
Is that work support handwritten chinese text recognition?

How to train on more than one gpus?

Even if I set the CUDA_VISIBLE_DEVICES as 0,1,2,3 , still this code can only run on the first gpu.
I added the nn.DataParallel function by myself, but here comes AssertionError when testing.
Thus I wonder if this model can be trained on two or mare gpus naturally, or it just because my wrong operation.
Thanks a lot.

How to get RIMES datasets?

I don't know how to get RIMES datasets. Can you give some instructions?

why the accuracy is always zero when I use CVPR and NIPS datasets and IIIT5K_3000 test dataset?

when I use CVPR and NIPS datasets and IIIT5K_3000 test dataset, the accuracy is always zero, though the loss is decreasing? I think the reason may be that when we read the label, the code will insert three characters [b'] at the beginning and ['] at the end of the text, e.g. ["b'contributions'", "b'do'"]. But the text itself does not contain the three characters [b'] and ['], right? I am not sure. Could you know the reasons? Thanks a lot.

Bug in `dataset_hw.py`

Hi, thanks for your impressive work, I am really interested in it.
But when I test your code with the released model, I found that there is a small bug in dataset_hw.py.
Specifically, no locking mechanism for self.idx in class LineGenerate, which will cause an error when running in parallel.
The true testing results may be a little lower than your paper reported.

num_workers is set to 2 as your code

when num_workers = 1

程序能一边训练一边验证，不能仅测试一张图片吗？

为啥注意力权重和特征图计算后又过了一个pre_lstm

如题，论文里没有提到这个lstm，加了这一层速度会慢不少吧

About cfgs_scene.py

Excuse me, I'm confused about some setting in cfgs_scene.py.

especially this one:
'roots': ['path/to/lmdb_ST', 'path/to/lmdb_SK'],

What's the meaning of lmdb_ST and lmdb_SK ?
Looking forward to your reply...

测试正常，训练准确率一直为0

你好，我使用了SVT的测试集作为数据集，使用作者提供的场景识别模型进行测试，精度正常；基于作者提供的场景模型进行迁移训练，精度也正常提高。
但是不使用预训练模型的情况下，精度就一直为0，无法收敛。
这是为何？感谢答复！

包括中文字符的数据集表现如何？

你好，请问该方法在其他包括中字符的数据集表现如何？如icpr2018、ReCTs、LSVT、art、ctw1500数据集。谢谢

关于index out of range的问题

您好，我把“Test”改成“Train”之后
在DAN.py,line 206 这一行
prev_emb = self.char_embeddings.index_select(0,text[:,i])
RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorMath.cpp
但是我发现是nsteps和text的纵行匹配的
（但是另外每次nsteps这个值都会变）
请问您知道什么原因吗？

谢谢~

About the accuracy of the pretrained model

I downloaded the pretrained model and tested on the IIIT5k dataset lmdbs provided in this repo, however I only get 0.8903 accuracy. Is there something I may have missed?

The results I got:
Accuracy: 0.890333, AR: 0.946031, CER: 0.053969, WER: 0.109667

Where is attention score or value??

I read your paper and it was very interesting,

I have some question about theoretically.
In general Attention mechanism, there is a process for calculating attention score(by various method) to deal with correlation of contexts.

In DAN, only alignment process is proposed and there is no process about getting attention score.
CAM is consisted with U-Net structured network.

I wonder if we have to put proper attention mechanism in CAM or DTD module in DAN??

关于参数的一些调节

channels=64的效果远不如channels=512的效果，channels=512时attention得到了非常好的效果！不过这样cam模块会变的非常大，参数量从5M到了96M。整体的参数量为178M左右。
cam channels=512时，我把主干网络换成了Darknet-light，依然得到了非常好的效果。我把两层的gru 换成了单层的lstm 以后效果也提升了不少，就是prelstm去掉，GRUCell换成了LSTMCell。

数据集是，LSVT, ReCTs, ICDAR2017RCTW, ART, 还有部分自己生成数据集。
这个Issues希望能给后来的人少走弯路。

谢谢作者开源！

icdar 2015 dataset support

@Wang-Tianwei
Thank you for your hard work

Will there be support to input icdar2015 dataset for training

RuntimeError: CUDA error: out of memory

内存是够的，也试了网上针对RuntimeError: CUDA error: out of memory问题的方法，都无效，不知道有没有人知道是咋回事？感激不尽！
DTD(
(pre_lstm): LSTM(256, 128, bidirectional=True)
(rnn): GRUCell(512, 256)
(generator): Sequential(
(0): Dropout(p=0.7)
(1): Linear(in_features=256, out_features=80, bias=True)
)
)
error
error
preparing all done
Traceback (most recent call last):
File "main.py", line 134, in
test_acc_counter])
File "main.py", line 105, in test
features= model0
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/sunny/projects/Decoupled-attention-network/DAN.py", line 20, in forward
features = self.model(input)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/sunny/projects/Decoupled-attention-network/resnet.py", line 108, in forward
x = self.layer3(x)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/sunny/projects/Decoupled-attention-network/resnet.py", line 30, in forward
out = self.conv1(x)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

target_ratio in scene dataset