Code Monkey home page Code Monkey logo

decoupled-attention-network's People

Contributors

wang-tianwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

decoupled-attention-network's Issues

About irregular text benchmark

I have been confused about the accuracy on irregular datasets for many days. The model can reach 93.3 on IIIT5K, however when I tested on svt_p. ic15 and outher datasets, i got a terrible results. I don't know the reason, can anybody give me some suggetions?

How to test the pre-trained hand written model on my test image?

I just want to check the accuracy of the models provided in the repo. I used python3. I ran the script main.py after downloading the models and put them in models/hw folder but I am getting the error:

mkdir: cannot create directory ‘models/hw’: File exists
Traceback (most recent call last):
File "main.py", line 116, in
model = load_network()
File "main.py", line 65, in load_network
model_cam = cfgs.net_cfgs'CAM'
File "/home/tft/Videos/Decoupled-attention-network/DAN.py", line 121, in init
ksize_h = 1 if scales[i-1][1] == 1 else ksize[r_h-1]
TypeError: list indices must be integers or slices, not float

Can somebody help me with this?

关于中文识别的有关问题?

@Wang-Tianwei 您好,我查看了您的issue列表,想问 MORAN_v2和Decoupled-attention-network两个repo哪一个比较适合ICDAR2019 ReCTS这样的数据集呢?在您之前的回答中#3 ,哪一个方法效果会比较适合这一数据集呢

bug_report

  • In

self.target_ratio = img_width / float(img_width)

should :
self.target_ratio = img_width / float(img_height)

  • In Settings, whether the image size needs to be an even multiple of 32. Otherwise, an error will be reported?

'dataset_train_args': {
'roots': ['path/to/lmdb_ST', 'path/to/lmdb_SK'],
'img_height': 32,
'img_width': 128,
'transform': transforms.Compose([transforms.ToTensor()]),
'global_state': 'Train',
},

  • Whether the image width 128 and the Feature_Extractor input sizes are one-to-one?

net_cfgs = {
'FE': Feature_Extractor,
'FE_args': {
'strides': [(1,1), (2,2), (1,1), (2,2), (1,1), (1,1)],
'compress_layer' : False,
'input_shape': [1, 32, 128], # C x H x W
},

HWDB 2.0-2.2loss不收敛

您好, 我复现IAM的精度后想试一下在中文手写上的精度,针对CASIA-HWDB 2.0-2.2的数据调整了配置中的classes数量后发现loss无法收敛,请问还有哪些地方需要调整吗,多谢!

exp1_E4_I20000-239703.M0.pth

hello, i want to know where is the file of 'exp1_E4_I20000-239703.M0.pth' , i cann`t create it. who can answer me? Thanks

why you do permute in CAM_transposed?

i found these lines in your code :

 # Transpose B-C-H-W to B-W-C-H
  x = x.permute(0, 3, 1, 2).contiguous()

and i print the shape of features follow your model like this :

image

and my question is why you do this permutation , it seems not very meaningful ,

batch inference

你好,我修改了推理代码实现了batch推理,但是rnn那个部分的推理时间和batch成线性关系,请问能帮忙解释一下这个现象么,谢谢

lenText = nT
nsteps = nT
output = torch.zeros(nB, lenText).int()
output_probs = torch.zeros(nB, lenText).float()

hidden = torch.zeros(nB, self.nchannel).type_as(C.data)
stats = torch.zeros(nB, self.nchannel).type_as(C.data)
prev_emb = self.char_embeddings.index_select(0, torch.zeros(nB).type_as(C.data).long())

for step in range(nsteps):
hidden, stats = self.rnn(torch.cat((C[step, :, :], prev_emb), dim=1), (hidden, stats))
step_result = self.generator(hidden)
step_result = F.softmax(step_result, dim=-1)
max_prob_index = torch.argmax(step_result, dim=1)
max_prob = step_result.index_select(dim=1, index=max_prob_index)
output[:, step] = max_prob_index
output_probs[:, step] = max_prob
prev_emb = self.char_embeddings.index_select(0, max_prob_index.long())

return output, output_probs

94.3% 2D recognition

Hello, could you please tell me how to recurrence the recognition accuracy 94.3% of 2D attention, and what modifications need to be made in the network

关于attention maps使用sigmoid的问题

作者您好,您在生成attention maps时使用了sigmoid +normalization,而没有使用传统的softmax。请问是为什么呢?是做过相关的实验吗?谢谢!

What's meaning of bi-directional model ?

In #1 (comment) issue,
you mentioned that the performance of paper could be get from bi-directional model.

I have checked the accuracy of various test dataset and accuracies were lower than paper.
So I want to try the best condition of models.

Is "bi-directional model" related with DTD module?
But "pre_lstm" in DTD is already using "bidirectional=True" flag.
I'm confusing if we can apply "bidirectional=True" with nn.GRU instead of nn.GRUCell(your code).

It looks nn.GRUCell is proper method because DTD of DAN have to check character of each step.

可视化问题

请问论文中图7和8关于attention maps的可视化是如何画出来的, 可以提供代码吗

请教,测试和训练准确率一直为0.

用您提供的场景文字识别模型,测试准确率一直为零;自己训练准确率也是0. 配置文件中的路径都改过了,环境也与您提供的一致。不知道还有什么问题。烦请解答一下可以吗?

关于DTD的小白问题

您好,请问DTD中的
self.char_embeddings = Parameter(torch.randn(nclass, nchannel))
后面进行decode用到了多次
具体起什么作用呀,可以大概帮忙介绍下么,非常感谢

‘’sample too long‘’ in training.

When I set 'root' in cfgs_scene.py to:
'roots': ['/home/yxwang/pytorch/DAN_1/dan/data/reg_dataset/NIPS2014',
'/home/yxwang/pytorch/DAN_1/dan/data/reg_dataset/NIPS2014',
'/home/yxwang/pytorch/DAN_1/dan/data/reg_dataset/CVPR2016',
'/home/yxwang/pytorch/DAN_1/dan/data/reg_dataset/CVPR2016']
it will output 'sample too long', however, the network still works. Is there anything wrong?

image

How to train on more than one gpus?

Even if I set the CUDA_VISIBLE_DEVICES as 0,1,2,3 , still this code can only run on the first gpu.
I added the nn.DataParallel function by myself, but here comes AssertionError when testing.
Thus I wonder if this model can be trained on two or mare gpus naturally, or it just because my wrong operation.
Thanks a lot.

why the accuracy is always zero when I use CVPR and NIPS datasets and IIIT5K_3000 test dataset?

when I use CVPR and NIPS datasets and IIIT5K_3000 test dataset, the accuracy is always zero, though the loss is decreasing? I think the reason may be that when we read the label, the code will insert three characters [b'] at the beginning and ['] at the end of the text, e.g. ["b'contributions'", "b'do'"]. But the text itself does not contain the three characters [b'] and ['], right? I am not sure. Could you know the reasons? Thanks a lot.

Bug in `dataset_hw.py`

Hi, thanks for your impressive work, I am really interested in it.
But when I test your code with the released model, I found that there is a small bug in dataset_hw.py.
Specifically, no locking mechanism for self.idx in class LineGenerate, which will cause an error when running in parallel.
The true testing results may be a little lower than your paper reported.

num_workers is set to 2 as your code
image

when num_workers = 1
image

About cfgs_scene.py

Excuse me, I'm confused about some setting in cfgs_scene.py.

especially this one:
'roots': ['path/to/lmdb_ST', 'path/to/lmdb_SK'],

What's the meaning of lmdb_ST and lmdb_SK ?
Looking forward to your reply...

测试正常,训练准确率一直为0

你好,我使用了SVT的测试集作为数据集,使用作者提供的场景识别模型进行测试,精度正常;基于作者提供的场景模型进行迁移训练,精度也正常提高。
但是不使用预训练模型的情况下,精度就一直为0,无法收敛。
这是为何?感谢答复!

关于index out of range的问题

您好,我把“Test”改成“Train”之后
在DAN.py,line 206 这一行
prev_emb = self.char_embeddings.index_select(0,text[:,i])
RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorMath.cpp
但是我发现是nsteps和text的纵行匹配的
(但是另外每次nsteps这个值都会变)
请问您知道什么原因吗?

谢谢~

About the accuracy of the pretrained model

I downloaded the pretrained model and tested on the IIIT5k dataset lmdbs provided in this repo, however I only get 0.8903 accuracy. Is there something I may have missed?

The results I got:
Accuracy: 0.890333, AR: 0.946031, CER: 0.053969, WER: 0.109667

Where is attention score or value??

I read your paper and it was very interesting,

I have some question about theoretically.
In general Attention mechanism, there is a process for calculating attention score(by various method) to deal with correlation of contexts.

In DAN, only alignment process is proposed and there is no process about getting attention score.
CAM is consisted with U-Net structured network.

I wonder if we have to put proper attention mechanism in CAM or DTD module in DAN??

关于参数的一些调节

channels=64的效果远不如channels=512的效果,channels=512时attention得到了非常好的效果!不过这样cam模块会变的非常大,参数量从5M到了96M。整体的参数量为178M左右。
cam channels=512时,我把主干网络换成了Darknet-light,依然得到了非常好的效果。我把两层的gru 换成了单层的lstm 以后效果也提升了不少,就是prelstm去掉,GRUCell换成了LSTMCell。

数据集是,LSVT, ReCTs, ICDAR2017RCTW, ART, 还有部分自己生成数据集。
这个Issues希望能给后来的人少走弯路。

谢谢作者开源!

RuntimeError: CUDA error: out of memory

内存是够的,也试了网上针对RuntimeError: CUDA error: out of memory问题的方法,都无效,不知道有没有人知道是咋回事?感激不尽!
DTD(
(pre_lstm): LSTM(256, 128, bidirectional=True)
(rnn): GRUCell(512, 256)
(generator): Sequential(
(0): Dropout(p=0.7)
(1): Linear(in_features=256, out_features=80, bias=True)
)
)
error
error
preparing all done
Traceback (most recent call last):
File "main.py", line 134, in
test_acc_counter])
File "main.py", line 105, in test
features= model0
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/sunny/projects/Decoupled-attention-network/DAN.py", line 20, in forward
features = self.model(input)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/sunny/projects/Decoupled-attention-network/resnet.py", line 108, in forward
x = self.layer3(x)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/sunny/projects/Decoupled-attention-network/resnet.py", line 30, in forward
out = self.conv1(x)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/sun/anaconda3/envs/DAN/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.