fangshancheng / abinet Goto Github PK

View Code? Open in Web Editor NEW

417.0 417.0 72.0 1.93 MB

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

License: Other

Python 5.44% Dockerfile 0.04% Jupyter Notebook 94.52%

abinet's People

Stargazers

Watchers

Forkers

w867066886 gds101054108 xushibo96 muskanmahajan37 zhitao654321 liuzhuang1024 duxiangcheng trendingtechnology wuxiaolianggit aiedward cvlinks yacobby ronlitman guome lczhang06 lpais hheracles ouya-bytes khuongnd yuni1314 aishangmaxiaoming ohayonguy unxiaohao esword618 cqray1990 sunxingxingtf baifanysu barongeng faltingsa huyhoang17 tianfukang gaoxinjian-ustc simplify23 wenhuach magiccodess quasarlight seonwhee-genome shengzhang90 ultrapower8 mercurial24 peanuth17 ak391 liangxiaoyun namdo281 ntuanhung sneedgers newuserforstudy joene-zhou bbsvip qinb yangyang8599 dl-model-zoo wangdian215 yxyxnrh goalers simease zoeun mhhhaster praneelrokz a-biao96 dableuteef jihlim xlg-go southmelon22 lmlajie galijiangzhi gaurav-g-12 demonalbert

abinet's Issues

论文相关的一些问题

1、在预训练语言模型的时候，作者采用的是图片输入来进行完形填空，还是直接就是Text的one-hot呢
2、作者有没有试过将VM和LM合并，效果怎么样呢
3、作者在最后与sota对比的时候，是否加入了伪标签用作额外训练呢
谢谢～

请问有测试单张图片的代码吗？

非常感谢作者们开源如此好的项目！！！我想请问一下是否有测试单张图片的代码，以及在Evaluation的main.py中如何使用cpu运行？再次表示非常感谢如此nb的开源项目！

how to make my own dataset and fine tune the model?

Thank you very much for such a great open source project. I would like to ask how to make my own dataset and fine tune the model?Looking forward to your reply!

Can not reproduce the pretrained vision model

Hello, I ran the code directly using the setting pretrain_vision_model.yaml, here are the results of the trained model:

Benchmark | Accuracy |
IC13 | 92.6 |
SVT | 87.2 |
IIIT | 88.1 |
IC15 | 78.7 |
SVTP | 81.4 |
CUTE80 | 79.5 |
Average | 85.0 |.

It seems that the released pretrained vision has an average accuracy of 90%, so could you please tell me do you use the
pretrain_vision_model.yaml to pretrain the vision model, and if you have some additional tricks or data to train the vision model?

训练到第10个epoch自动终止

您好！感谢您们的工作和分享！我在按照readme.md的说明训练模型的时候发现每次训练到第10个epoch就会自动终止，即使修改了yaml里面的training-epochs还是没法训练更多的epoch，请问是哪里没改对吗？第一次接触FastAI的框架不是很熟悉，希望能多指点。期待您的回复。

Questions for reimplementing ABINet

Thanks for your great work. I'm facing some trouble on the re-implementation of the vision part of your work, and I'd like to ask for more experimental details for re-implementation if it is possible.

I noticed that in SRN, they used resnet50 as their backbone. While in your ABINet, you chose a much more light-weight backbone with only 5 residual blocks(it seems like a resnet18 or even lighter) for feature extraction(according to your arxiv paper's footprint) and you still achieved comparable results. Can you provide the detailed structure of your resnet backbone as well as the mini-Unet structure? Besides, can you provide the different configurations of your SV(small vision model), MV(medium vision model) and LV?
Is the positional encoding and the order embedding(used as Q in attention) hard-coded or learned? Does different encoding methods affect the performance a lot?
Can you provide the detailed parameters for augmentation methods? And How much does it affect the performance with and without data augmentation?
How long approximately does it take for your model to reach convergence using 4x1080ti gpus?

Thanks again for your work, looking forward to your reply.

请问README中data文件夹中的charset_36.txt文件在哪下载呢

或者能否告知文件中包含哪些字符呢

想问下识别时能不能走batch，推理呢？应该怎么改呢？

推理时经常出现 Segmentation fault 的段错误。不知道作者是否有遇到？

关于复现结果的问题

感谢作者分享的源码，想问一下为什么我用双卡3090跑源码，数据集一样，并不能复现出原文的效果，我是使用的提供好的视觉和语言模型直接训abinet，感觉无法收敛，cwr波动很大徘徊在77-85之间。

demo.py中batch应该如何修改呢

关于 self training

in the process of self-training, Do you mean to filter out samples whose C < Q, or keep these samples?

请问如何理解论文中的图6

请问怎么理解您论文中的图6？左右两幅图的横纵坐标都是相同的，但是我不明白两幅图的区别，期待您的回复，谢谢！

nan loss

Hi, thanks for your great work @FangShancheng
I try to training ABINet with vietnamese characters. Firstly, I train the language model using my own vietnamese token but got nan loss. Can you help me to fix this ?

请问训练中文怎么改字典路径，还需要改别的代码吗

NameError: name 'ifnone' is not defined

想请教一下正确的训练步骤： 1.单独训练语言模型pretrain_language_model.yaml，得到最好的语言模型pretrain-language-model.pth 2.然后单独训练视觉模型，得到pretrain-vision-model.pth 3.最后加载前两个与训练模型联合训练train_abinet.yaml 还是直接训练train_abinet.yaml呢？

关于合成数据集里大小写字母的问题

在代码里似乎是用了dict_36（仅含小写字母）的字典，但MJ 和ST数据集里应该是有大写字母和小写字母的标签，这里大小写字母的转化具体在哪个地方进行的处理呢，能否发一下位置，感谢~

AssertionError

你好，看了您的论文，我很受启发。在复现您代码用其它数据集时，我遇到了这个错误：
if not self.is_training: assert idx == idx_new, f'idx {idx} != idx_new {idx_new} during testing.'
AssertionError: idx 3 != idx_new 762 during testing.

请问我是还需要修改哪里吗？期待您的回复

关于论文里最终的准确率

您好，请问论文里最后跟其他SOTA相比的ABINet是先pretrain了VM和LM，然后finetune出来的吗？

复现模型的准确率问题

首先感谢作者的分享，看了您的文章，给了我很大的启发。
最近复现了一下您的程序，使用的是开源的code中yaml文件的默认参数，除了数据集路径与batchsize数值其他并未改动。但是训练的结果准确率却和ABInet的开源的model差了很多。

我们的复现结果：IIIT5k 准确率为89.8%，小于公布的96.4%；SVT：92.1%小于开源model的93.2%；IC15: 82.2%小于开源model的85.9%；SVTP: 87.1% 小于开源的89%；CUTE:84.7%小于开源model的89.2%。

有些数据集的准确率相差非常大，例如IIIT5k。请问是什么原因呢？可能是训练的epoch造成的吗？我复现vision model 训练了默认的8个epoch，language model 默认的epoch数是80，由于太大，我训练了5个epoch就停止了训练，这个时候从损失值来看网络差不多稳定了。在训练ABInet 时训练了3 个epoch，此时我发现网络也已经差不多稳定了。如果完全复现80个和10个epoch，确实是一个很大的工程，我的两个2080ti的gpu大概需要30天，而文中写的使用4个1080ti也需要半个月吧

请问您训练时的参数是什么。模型的准确率相差太多是哪些地方我没注意到导致的呢？
再次感谢作者的工作与贡献。期待您的回复。

RecursionError: maximum recursion depth exceeded

This massage is logged to the terminal:

[2021-10-14 04:53:57,225 main.py:215 INFO train-abinet] ModelConfig(
(0): dataset_case_sensitive = False
(1): dataset_charset_path = data/charset_36.txt
(2): dataset_data_aug = True
(3): dataset_eval_case_sensitive = False
(4): dataset_image_height = 32
(5): dataset_image_width = 128
(6): dataset_max_length = 25
(7): dataset_multiscales = False
(8): dataset_num_workers = 14
(9): dataset_one_hot_y = True
(10): dataset_pin_memory = True
(11): dataset_smooth_factor = 0.1
(12): dataset_smooth_label = False
(13): dataset_test_batch_size = 384
(14): dataset_test_roots = ['data/training/MJ/MJ_train/', 'data/training/MJ/MJ_test/', 'data/training/MJ/MJ_valid/', 'data/training/ST']
(15): dataset_train_batch_size = 384
(16): dataset_train_roots = ['data/training/MJ/MJ_train/', 'data/training/MJ/MJ_test/', 'data/training/MJ/MJ_valid/', 'data/training/ST']
(17): dataset_use_sm = False
(18): global_name = train-abinet
(19): global_phase = train
(20): global_seed = None
(21): global_stage = train-super
(22): global_workdir = workdir/train-abinet
(23): model_alignment_loss_weight = 1.0
(24): model_checkpoint = None
(25): model_ensemble =
(26): model_iter_size = 3
(27): model_language_checkpoint = workdir/pretrain-language-model/pretrain-language-model.pth
(28): model_language_detach = True
(29): model_language_loss_weight = 1.0
(30): model_language_num_layers = 4
(31): model_language_use_self_attn = False
(32): model_name = modules.model_abinet_iter.ABINetIterModel
(33): model_strict = True
(34): model_use_vision = False
(35): model_vision_attention = position
(36): model_vision_backbone = transformer
(37): model_vision_backbone_ln = 3
(38): model_vision_checkpoint = workdir/pretrain-vision-model/best-pretrain-vision-model.pth
(39): model_vision_loss_weight = 1.0
(40): optimizer_args_betas = (0.9, 0.999)
(41): optimizer_bn_wd = False
(42): optimizer_clip_grad = 20
(43): optimizer_lr = 0.0001
(44): optimizer_scheduler_gamma = 0.1
(45): optimizer_scheduler_periods = [6, 4]
(46): optimizer_true_wd = False
(47): optimizer_type = Adam
(48): optimizer_wd = 0.0
(49): training_epochs = 10
(50): training_eval_iters = 3000
(51): training_save_iters = 3000
(52): training_show_iters = 50
(53): training_start_iters = 0
(54): training_stats_iters = 100000
)
[2021-10-14 04:53:57,226 main.py:222 INFO train-abinet] Construct dataset.
[2021-10-14 04:53:57,228 main.py:92 INFO train-abinet] 67199 training items found.
[2021-10-14 04:53:57,228 main.py:94 INFO train-abinet] 67199 valid items found.
[2021-10-14 04:53:57,228 main.py:226 INFO train-abinet] Construct model.
[2021-10-14 04:53:57,488 model_vision.py:37 INFO train-abinet] Read vision model from workdir/pretrain-vision-model/best-pretrain-vision-model.pth.
[2021-10-14 04:53:59,805 model_language.py:38 INFO train-abinet] Read language model from workdir/pretrain-language-model/pretrain-language-model.pth.
[2021-10-14 04:53:59,843 main.py:104 INFO train-abinet] ABINetIterModel(
(vision): BaseVision(
(backbone): ResTranformer(
(resnet): ResNet(
(conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(32, 32, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer5): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0.1)
)
(transformer): TransformerEncoder(
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
)
)
)
(attention): PositionAttention(
(k_encoder): Sequential(
(0): Sequential(
(0): Conv2d(512, 64, kernel_size=(3, 3), stride=(1, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(1): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(2): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(3): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
)
(k_decoder): Sequential(
(0): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(1): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(2): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(3): Sequential(
(0): Upsample(size=(8, 32), mode=nearest)
(1): Conv2d(64, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0)
)
(project): Linear(in_features=512, out_features=512, bias=True)
)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
(language): BCNLanguage(
(proj): Linear(in_features=37, out_features=512, bias=False)
(token_encoder): PositionalEncoding(
(dropout): Dropout(p=0.1)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0)
)
(model): TransformerDecoder(
(layers): ModuleList(
(0): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(1): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(2): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(3): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
)
)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
(alignment): BaseAlignment(
(w_att): Linear(in_features=1024, out_features=512, bias=True)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
)
[2021-10-14 04:53:59,848 main.py:229 INFO train-abinet] Construct learner.
[2021-10-14 04:53:59,962 main.py:233 INFO train-abinet] Start training.
Traceback (most recent call last):
File "/ABINet/dataset.py", line 103, in get
return self._next_image(idx)
File "/ABINet/dataset.py", line 61, in _next_image
next_index = random.randint(0, len(self) - 1)
RecursionError: maximum recursion depth exceeded
[2021-10-14 04:53:59,994 dataset.py:119 INFO train-abinet] Corrupted image is found: MJ_train, 34607, , 0
Fatal Python error: Cannot recover from stack overflow.

Thread 0x00007f7923687700 (most recent call first):
File "/home/user/miniconda/envs/py36/lib/python3.6/selectors.py", line 376 in select
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 911 in wait
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 414 in _poll
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 257 in poll
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/queues.py", line 104 in get
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/tensorboardX/event_file_writer.py", line 202 in run
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007f797e1ef700 (most recent call first):
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 235 in _queue_processor
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 864 in run
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007f7a07cd3700 (most recent call first):
File "/ABINet/dataset.py", line 61 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 120 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
...
Aborted (core dumped)

Do you know what should I do with my dataset?
Thanks

Where is the demo.py ?

First，thanks for the author's ditribution for this project ! Can anyone tell me the "demo.py" for this porject, I can only get test mode for test datasets.

论文中说：基于ABINet，进一步提出了一种用于半监督学习的整体自训练方法。这部分代码没有开源吗？

How to prepare a custom dataset for training?

I tried to train your model with custom dataset and I got this:

"
[2021-10-11 13:09:35,820 main.py:222 INFO train-abinet] Construct dataset.
Traceback (most recent call last):
File "main.py", line 246, in
main()
File "main.py", line 224, in main
else: data = _get_databaunch(config)
File "main.py", line 80, in _get_databaunch
train_ds = _get_dataset(ImageDataset, config.dataset_train_roots, True, config)
File "main.py", line 47, in _get_dataset
datasets = [ds_type(p, **kwargs) for p in paths]
File "main.py", line 47, in
datasets = [ds_type(p, **kwargs) for p in paths]
File "/content/drive/MyDrive/OCR/ABINet/dataset.py", line 49, in init
self.length = int(txn.get('num-samples'.encode()))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
"

Thanks

关于visual model 的训练

模型在中文数据集上的效果讨论

您好！非常感谢您们的工作和分享，请问如果想用这个模型在中文数据集上，会有哪些优劣吗？这里的语言模型要是换成中文的bert的话，效果影响大吗？您们是否有做过这方面的实验呢

Bad requirements.txt

Hi, I noticed that the requirements.txt file is no longer valid. Some of the versions do not exist in pip. I'm noticing that torch==1.1.0, torchvision==0.3.0, and fastai==1.0.60. Is there a reason these are so old? Current torch version is 1.9.0, torchvision is 0.10.0 and fastai is 2.4. I have some of the code working with these newer versions. The other parts I'm working through (I'm at a syntax error right now).

How can I fine-tune the pretrained model with my custom dataset?

Hello @FangShancheng,
I have a Vietnamese dataset and want to fine-tune this on your pre-trained model, how is this possible?

About dataset

Thanks for your great work！I have a questation .Where can I get the files of WikiText-103.csv and WikiText-103_eval_d1.csv? Can you provide a link? Thank you!

可以提供训练中文数据集的配置文件吗

By the way,will SRN code released here?

i see SRN method is compared with better performance. will i see the SRN reimplement here?thanks

关于position attention的end token

您好！请问对于position attention设置end token的意义在哪里呢？既然position attention不是auto regressive的形式，好像并不需要一个end token来终止解码过程？

语言模型问题

请问下，使用中文语料训练语言模型，得到的指标是 epoch 15 iter 28000: eval loss = 2.5787, ccr = 0.7434, cwr = 0.0914, ted = 0.0000, ned = 0.0000, ted/w = 0.0000,但是推理时却效果不佳，这个可能是什么原因？

input: 我是申华人民共和国公民
output:['这校中请人民共和谐公司的的']

请问文章只是在合成数据集上训练没有在真实数据集上微调吗

如题

请问如何理解论文中的图6

请问怎么理解您论文中的图6？左右两幅图的横纵坐标都是相同的，但是我不明白两幅图的区别，期待您的回复，谢谢！

No module named 'fastai.callback.general_sched'

File "main.py", line 10, in
from fastai.callback.general_sched import GeneralScheduler, TrainingPhase
ModuleNotFoundError: No module named 'fastai.callback.general_sched'

这套代码没有办法训练中文吗？

可以提供半监督训练部分的代码吗

运行eval时，一直卡在加载vision model

[2021-09-06 17:11:08,426 main.py:229 INFO train-abinet] Construct learner.
一直卡在这。我调试了一下，是在：
learner = Learner(data, model, silent=True, model_dir='.', true_wd=config.optimizer_true_wd, wd=config.optimizer_wd, bn_wd=config.optimizer_bn_wd, path=config.global_workdir, metrics=metrics, opt_func=partial(opt_type, **config.optimizer_args or dict()), loss_func=MultiLosses(one_hot=config.dataset_one_hot_y))
请问有什么解决办法吗

问一下Parallel Attention和Position Attention的区别

读了论文，我的理解文中的Position Attention如下：

# feature_map: (HW, channels)特征图
# w: (len_seq, channels)位置编码
q = w  # (len_seq, channels) 实际上位置编码就是一个可训练参数，基本上就是个fc层
k = unet(feature_map)  # (HW ,  channels)
v = feature_map  # (HW ,  channels)

attention_map = softmax(matmul(q, k.transpose()))   # (len_seq,  HW)
output = matmul(attention_map, v)   # (len_seq,  channels)

而Parallel Attention (2D Attentional Irregular Scene Text Recognizer)中是这样的：

# 参考https://arxiv.org/pdf/1906.05708.pdf公式7
# w1: (channels, channels)
# w2: (len_seq, channels)
q = w2  # (len_seq, channels)
k = tanh(w1,feature_map.transpose())  #  (channels, HW)
v = feature_map  # (HW, channels)

attention_map = softmax(matmul(q, k))  # (len_seq, HW)
output = matmul(attention_map, v)  # (len_seq, channels)

所以能否理解为Parallel Attention和Position Attention的区别只是在于对k的变换不一样？

关于BGF的作用

论文中提到BGF可以使VM和LM独立的训练，不太理解如果没有BGF，对LM的影响是什么？并且LM本身可以采用pre-trained LM。
盼回复，谢谢

为什么训练中文数据，只识别出英文数字，字典已经修改啦

Waiting for update code.

~~(￣▽￣)~~*

NameError: name 'ifnone' is not defined

运行python demo.py --config=configs/train_abinet.yaml --input=figs/test
出现NameError: name 'ifnone' is not defined的错误。
请问这个怎么解决？

Failed to reproduce results in the paper

Thanks for release your source code.
Unfortunately, i got poor test result (about -4% from official results in the paper)

Is there any important training strategies to reproduce the results?
Or, is there anyone who meets same issue..?

请问怎么理解论文图6

请问怎么理解您论文中的图6？左右两幅图的横纵坐标都是相同的，但是我不明白两幅图的区别，期待您的回复，谢谢！

可以提供训练中文语言模型的语料吗

我想把代码改成中文模型训练，目前语言模型这块不知道中文的数据集需要怎么制作

Question for adapting to textline recognition and variable-size images?

Thanks for your amazing repo!
I have a textline dataset and I want to get a solution for using your work to adapt my dataset.
Can the system be used for variable-size images (Not fixed size - 32 x 128) and variable-size text length (according to a max text-length of mini-batch ,not using fixed dataset_max_length)?
Could you refer to me a solution?
Thanks!

Question about pre-training language model

Hello, I find the setting of epoch in pretrain_language_model.yaml is 80, and I use 4 titan xp to pre-train language model. However, 1 epoch takes 8 hours, and 80 epochs need 640 hours(nearly 26 days). Should I have to train 80 epochs? How to judge the training of language model become convergence?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.