fangshancheng / abinet Goto Github PK
View Code? Open in Web Editor NEWRead Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
License: Other
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
License: Other
1、在预训练语言模型的时候,作者采用的是图片输入来进行完形填空,还是直接就是Text的one-hot呢
2、作者有没有试过将VM和LM合并,效果怎么样呢
3、作者在最后与sota对比的时候,是否加入了伪标签用作额外训练呢
谢谢~
非常感谢作者们开源如此好的项目!!!我想请问一下是否有测试单张图片的代码,以及在Evaluation的main.py中如何使用cpu运行?再次表示非常感谢如此nb的开源项目!
Thank you very much for such a great open source project. I would like to ask how to make my own dataset and fine tune the model?Looking forward to your reply!
Hello, I ran the code directly using the setting pretrain_vision_model.yaml, here are the results of the trained model:
Benchmark | Accuracy |
IC13 | 92.6 |
SVT | 87.2 |
IIIT | 88.1 |
IC15 | 78.7 |
SVTP | 81.4 |
CUTE80 | 79.5 |
Average | 85.0 |.
It seems that the released pretrained vision has an average accuracy of 90%, so could you please tell me do you use the
pretrain_vision_model.yaml to pretrain the vision model, and if you have some additional tricks or data to train the vision model?
您好!感谢您们的工作和分享!我在按照readme.md的说明训练模型的时候发现每次训练到第10个epoch就会自动终止,即使修改了yaml里面的training-epochs还是没法训练更多的epoch,请问是哪里没改对吗?第一次接触FastAI的框架不是很熟悉,希望能多指点。期待您的回复。
Thanks for your great work. I'm facing some trouble on the re-implementation of the vision part of your work, and I'd like to ask for more experimental details for re-implementation if it is possible.
I noticed that in SRN, they used resnet50 as their backbone. While in your ABINet, you chose a much more light-weight backbone with only 5 residual blocks(it seems like a resnet18 or even lighter) for feature extraction(according to your arxiv paper's footprint) and you still achieved comparable results. Can you provide the detailed structure of your resnet backbone as well as the mini-Unet structure? Besides, can you provide the different configurations of your SV(small vision model), MV(medium vision model) and LV?
Is the positional encoding and the order embedding(used as Q in attention) hard-coded or learned? Does different encoding methods affect the performance a lot?
Can you provide the detailed parameters for augmentation methods? And How much does it affect the performance with and without data augmentation?
How long approximately does it take for your model to reach convergence using 4x1080ti gpus?
Thanks again for your work, looking forward to your reply.
或者能否告知文件中包含哪些字符呢
感谢作者分享的源码,想问一下为什么我用双卡3090跑源码,数据集一样,并不能复现出原文的效果,我是使用的提供好的视觉和语言模型直接训abinet,感觉无法收敛,cwr波动很大徘徊在77-85之间。
in the process of self-training, Do you mean to filter out samples whose C < Q, or keep these samples?
请问怎么理解您论文中的图6?左右两幅图的横纵坐标都是相同的,但是我不明白两幅图的区别,期待您的回复,谢谢!
Hi, thanks for your great work @FangShancheng
I try to training ABINet with vietnamese characters. Firstly, I train the language model using my own vietnamese token but got nan loss. Can you help me to fix this ?
在代码里似乎是用了dict_36(仅含小写字母)的字典,但MJ 和ST数据集里应该是有大写字母和小写字母的标签,这里大小写字母的转化具体在哪个地方进行的处理呢,能否发一下位置,感谢~
你好,看了您的论文,我很受启发。在复现您代码用其它数据集时,我遇到了这个错误:
if not self.is_training: assert idx == idx_new, f'idx {idx} != idx_new {idx_new} during testing.'
AssertionError: idx 3 != idx_new 762 during testing.
请问我是还需要修改哪里吗?期待您的回复
您好,请问论文里最后跟其他SOTA相比的ABINet是先pretrain了VM和LM,然后finetune出来的吗?
首先感谢作者的分享,看了您的文章,给了我很大的启发。
最近复现了一下您的程序,使用的是开源的code中yaml文件的默认参数,除了数据集路径与batchsize数值其他并未改动。但是训练的结果准确率却和ABInet的开源的model差了很多。
我们的复现结果:IIIT5k 准确率为89.8%,小于公布的96.4%;SVT:92.1%小于开源model的93.2%;IC15: 82.2%小于开源model的85.9%;SVTP: 87.1% 小于开源的89%;CUTE:84.7%小于开源model的89.2%。
有些数据集的准确率相差非常大,例如IIIT5k。请问是什么原因呢?可能是训练的epoch造成的吗?我复现vision model 训练了默认的8个epoch,language model 默认的epoch数是80,由于太大,我训练了5个epoch就停止了训练,这个时候从损失值来看网络差不多稳定了。在训练ABInet 时 训练了3 个epoch,此时我发现网络也已经差不多稳定了。如果完全复现80个和10个epoch,确实是一个很大的工程,我的两个2080ti的gpu大概需要30天,而文中写的使用4个1080ti也需要半个月吧
请问您训练时的参数是什么。模型的准确率相差太多是哪些地方我没注意到导致的呢?
再次感谢作者的工作与贡献。期待您的回复。
This massage is logged to the terminal:
[2021-10-14 04:53:57,225 main.py:215 INFO train-abinet] ModelConfig(
(0): dataset_case_sensitive = False
(1): dataset_charset_path = data/charset_36.txt
(2): dataset_data_aug = True
(3): dataset_eval_case_sensitive = False
(4): dataset_image_height = 32
(5): dataset_image_width = 128
(6): dataset_max_length = 25
(7): dataset_multiscales = False
(8): dataset_num_workers = 14
(9): dataset_one_hot_y = True
(10): dataset_pin_memory = True
(11): dataset_smooth_factor = 0.1
(12): dataset_smooth_label = False
(13): dataset_test_batch_size = 384
(14): dataset_test_roots = ['data/training/MJ/MJ_train/', 'data/training/MJ/MJ_test/', 'data/training/MJ/MJ_valid/', 'data/training/ST']
(15): dataset_train_batch_size = 384
(16): dataset_train_roots = ['data/training/MJ/MJ_train/', 'data/training/MJ/MJ_test/', 'data/training/MJ/MJ_valid/', 'data/training/ST']
(17): dataset_use_sm = False
(18): global_name = train-abinet
(19): global_phase = train
(20): global_seed = None
(21): global_stage = train-super
(22): global_workdir = workdir/train-abinet
(23): model_alignment_loss_weight = 1.0
(24): model_checkpoint = None
(25): model_ensemble =
(26): model_iter_size = 3
(27): model_language_checkpoint = workdir/pretrain-language-model/pretrain-language-model.pth
(28): model_language_detach = True
(29): model_language_loss_weight = 1.0
(30): model_language_num_layers = 4
(31): model_language_use_self_attn = False
(32): model_name = modules.model_abinet_iter.ABINetIterModel
(33): model_strict = True
(34): model_use_vision = False
(35): model_vision_attention = position
(36): model_vision_backbone = transformer
(37): model_vision_backbone_ln = 3
(38): model_vision_checkpoint = workdir/pretrain-vision-model/best-pretrain-vision-model.pth
(39): model_vision_loss_weight = 1.0
(40): optimizer_args_betas = (0.9, 0.999)
(41): optimizer_bn_wd = False
(42): optimizer_clip_grad = 20
(43): optimizer_lr = 0.0001
(44): optimizer_scheduler_gamma = 0.1
(45): optimizer_scheduler_periods = [6, 4]
(46): optimizer_true_wd = False
(47): optimizer_type = Adam
(48): optimizer_wd = 0.0
(49): training_epochs = 10
(50): training_eval_iters = 3000
(51): training_save_iters = 3000
(52): training_show_iters = 50
(53): training_start_iters = 0
(54): training_stats_iters = 100000
)
[2021-10-14 04:53:57,226 main.py:222 INFO train-abinet] Construct dataset.
[2021-10-14 04:53:57,228 main.py:92 INFO train-abinet] 67199 training items found.
[2021-10-14 04:53:57,228 main.py:94 INFO train-abinet] 67199 valid items found.
[2021-10-14 04:53:57,228 main.py:226 INFO train-abinet] Construct model.
[2021-10-14 04:53:57,488 model_vision.py:37 INFO train-abinet] Read vision model from workdir/pretrain-vision-model/best-pretrain-vision-model.pth.
[2021-10-14 04:53:59,805 model_language.py:38 INFO train-abinet] Read language model from workdir/pretrain-language-model/pretrain-language-model.pth.
[2021-10-14 04:53:59,843 main.py:104 INFO train-abinet] ABINetIterModel(
(vision): BaseVision(
(backbone): ResTranformer(
(resnet): ResNet(
(conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(32, 32, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer5): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0.1)
)
(transformer): TransformerEncoder(
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
)
)
)
(attention): PositionAttention(
(k_encoder): Sequential(
(0): Sequential(
(0): Conv2d(512, 64, kernel_size=(3, 3), stride=(1, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(1): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(2): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(3): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
)
(k_decoder): Sequential(
(0): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(1): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(2): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(3): Sequential(
(0): Upsample(size=(8, 32), mode=nearest)
(1): Conv2d(64, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0)
)
(project): Linear(in_features=512, out_features=512, bias=True)
)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
(language): BCNLanguage(
(proj): Linear(in_features=37, out_features=512, bias=False)
(token_encoder): PositionalEncoding(
(dropout): Dropout(p=0.1)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0)
)
(model): TransformerDecoder(
(layers): ModuleList(
(0): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(1): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(2): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(3): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
)
)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
(alignment): BaseAlignment(
(w_att): Linear(in_features=1024, out_features=512, bias=True)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
)
[2021-10-14 04:53:59,848 main.py:229 INFO train-abinet] Construct learner.
[2021-10-14 04:53:59,962 main.py:233 INFO train-abinet] Start training.
Traceback (most recent call last):
File "/ABINet/dataset.py", line 103, in get
return self._next_image(idx)
File "/ABINet/dataset.py", line 61, in _next_image
next_index = random.randint(0, len(self) - 1)
RecursionError: maximum recursion depth exceeded
[2021-10-14 04:53:59,994 dataset.py:119 INFO train-abinet] Corrupted image is found: MJ_train, 34607, , 0
Fatal Python error: Cannot recover from stack overflow.
Thread 0x00007f7923687700 (most recent call first):
File "/home/user/miniconda/envs/py36/lib/python3.6/selectors.py", line 376 in select
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 911 in wait
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 414 in _poll
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 257 in poll
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/queues.py", line 104 in get
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/tensorboardX/event_file_writer.py", line 202 in run
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 884 in _bootstrap
Thread 0x00007f797e1ef700 (most recent call first):
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 235 in _queue_processor
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 864 in run
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 884 in _bootstrap
Current thread 0x00007f7a07cd3700 (most recent call first):
File "/ABINet/dataset.py", line 61 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 120 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
...
Aborted (core dumped)
Do you know what should I do with my dataset?
Thanks
First,thanks for the author's ditribution for this project ! Can anyone tell me the "demo.py" for this porject, I can only get test mode for test datasets.
I tried to train your model with custom dataset and I got this:
"
[2021-10-11 13:09:35,820 main.py:222 INFO train-abinet] Construct dataset.
Traceback (most recent call last):
File "main.py", line 246, in
main()
File "main.py", line 224, in main
else: data = _get_databaunch(config)
File "main.py", line 80, in _get_databaunch
train_ds = _get_dataset(ImageDataset, config.dataset_train_roots, True, config)
File "main.py", line 47, in _get_dataset
datasets = [ds_type(p, **kwargs) for p in paths]
File "main.py", line 47, in
datasets = [ds_type(p, **kwargs) for p in paths]
File "/content/drive/MyDrive/OCR/ABINet/dataset.py", line 49, in init
self.length = int(txn.get('num-samples'.encode()))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
"
And here's how my data is stored:
ABINet
|--data
| |--img
| | |--(here is images)
| |--lmdb
| | |--data.mdb
| | |--lock.mdb
Thanks
您好!非常感谢您们的工作和分享,请问如果想用这个模型在中文数据集上,会有哪些优劣吗?这里的语言模型要是换成中文的bert的话,效果影响大吗?您们是否有做过这方面的实验呢
Hi, I noticed that the requirements.txt
file is no longer valid. Some of the versions do not exist in pip
. I'm noticing that torch==1.1.0
, torchvision==0.3.0
, and fastai==1.0.60
. Is there a reason these are so old? Current torch version is 1.9.0
, torchvision is 0.10.0
and fastai is 2.4
. I have some of the code working with these newer versions. The other parts I'm working through (I'm at a syntax error right now).
Hello @FangShancheng,
I have a Vietnamese dataset and want to fine-tune this on your pre-trained model, how is this possible?
Thanks for your great work!I have a questation .Where can I get the files of WikiText-103.csv and WikiText-103_eval_d1.csv? Can you provide a link? Thank you!
i see SRN method is compared with better performance. will i see the SRN reimplement here?thanks
您好!请问对于position attention设置end token的意义在哪里呢?既然position attention不是auto regressive的形式,好像并不需要一个end token来终止解码过程?
请问下,使用中文语料训练语言模型,得到的指标是 epoch 15 iter 28000: eval loss = 2.5787, ccr = 0.7434, cwr = 0.0914, ted = 0.0000, ned = 0.0000, ted/w = 0.0000,但是推理时却效果不佳,这个可能是什么原因?
input: 我是申华人民共和国公民
output:['这校中请人民共和谐公司的的']
如题
请问怎么理解您论文中的图6?左右两幅图的横纵坐标都是相同的,但是我不明白两幅图的区别,期待您的回复,谢谢!
File "main.py", line 10, in
from fastai.callback.general_sched import GeneralScheduler, TrainingPhase
ModuleNotFoundError: No module named 'fastai.callback.general_sched'
[2021-09-06 17:11:08,426 main.py:229 INFO train-abinet] Construct learner.
一直卡在这。我调试了一下,是在:
learner = Learner(data, model, silent=True, model_dir='.', true_wd=config.optimizer_true_wd, wd=config.optimizer_wd, bn_wd=config.optimizer_bn_wd, path=config.global_workdir, metrics=metrics, opt_func=partial(opt_type, **config.optimizer_args or dict()), loss_func=MultiLosses(one_hot=config.dataset_one_hot_y))
请问有什么解决办法吗
读了论文,我的理解文中的Position Attention如下:
# feature_map: (HW, channels)特征图
# w: (len_seq, channels)位置编码
q = w # (len_seq, channels) 实际上位置编码就是一个可训练参数,基本上就是个fc层
k = unet(feature_map) # (HW , channels)
v = feature_map # (HW , channels)
attention_map = softmax(matmul(q, k.transpose())) # (len_seq, HW)
output = matmul(attention_map, v) # (len_seq, channels)
而Parallel Attention (2D Attentional Irregular Scene Text Recognizer)中是这样的:
# 参考https://arxiv.org/pdf/1906.05708.pdf公式7
# w1: (channels, channels)
# w2: (len_seq, channels)
q = w2 # (len_seq, channels)
k = tanh(w1,feature_map.transpose()) # (channels, HW)
v = feature_map # (HW, channels)
attention_map = softmax(matmul(q, k)) # (len_seq, HW)
output = matmul(attention_map, v) # (len_seq, channels)
所以能否理解为Parallel Attention和Position Attention的区别只是在于对k的变换不一样?
论文中提到BGF可以使VM和LM独立的训练,不太理解如果没有BGF,对LM的影响是什么?并且LM本身可以采用pre-trained LM。
盼回复,谢谢
( ̄▽ ̄)*
运行python demo.py --config=configs/train_abinet.yaml --input=figs/test
出现NameError: name 'ifnone' is not defined的错误。
请问这个怎么解决?
Thanks for release your source code.
Unfortunately, i got poor test result (about -4% from official results in the paper)
Is there any important training strategies to reproduce the results?
Or, is there anyone who meets same issue..?
请问怎么理解您论文中的图6?左右两幅图的横纵坐标都是相同的,但是我不明白两幅图的区别,期待您的回复,谢谢!
我想把代码改成中文模型训练,目前语言模型这块不知道中文的数据集需要怎么制作
Thanks for your amazing repo!
I have a textline dataset and I want to get a solution for using your work to adapt my dataset.
Can the system be used for variable-size images (Not fixed size - 32 x 128) and variable-size text length (according to a max text-length of mini-batch ,not using fixed dataset_max_length)?
Could you refer to me a solution?
Thanks!
Hello, I find the setting of epoch in pretrain_language_model.yaml is 80, and I use 4 titan xp to pre-train language model. However, 1 epoch takes 8 hours, and 80 epochs need 640 hours(nearly 26 days). Should I have to train 80 epochs? How to judge the training of language model become convergence?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.