Code Monkey home page Code Monkey logo

abinet's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abinet's Issues

论文相关的一些问题

1、在预训练语言模型的时候,作者采用的是图片输入来进行完形填空,还是直接就是Text的one-hot呢
2、作者有没有试过将VM和LM合并,效果怎么样呢
3、作者在最后与sota对比的时候,是否加入了伪标签用作额外训练呢
谢谢~

请问有测试单张图片的代码吗?

非常感谢作者们开源如此好的项目!!!我想请问一下是否有测试单张图片的代码,以及在Evaluation的main.py中如何使用cpu运行?再次表示非常感谢如此nb的开源项目!

Can not reproduce the pretrained vision model

Hello, I ran the code directly using the setting pretrain_vision_model.yaml, here are the results of the trained model:

Benchmark | Accuracy |
IC13 | 92.6 |
SVT | 87.2 |
IIIT | 88.1 |
IC15 | 78.7 |
SVTP | 81.4 |
CUTE80 | 79.5 |
Average | 85.0 |.

It seems that the released pretrained vision has an average accuracy of 90%, so could you please tell me do you use the
pretrain_vision_model.yaml to pretrain the vision model, and if you have some additional tricks or data to train the vision model?

训练到第10个epoch自动终止

您好!感谢您们的工作和分享!我在按照readme.md的说明训练模型的时候发现每次训练到第10个epoch就会自动终止,即使修改了yaml里面的training-epochs还是没法训练更多的epoch,请问是哪里没改对吗?第一次接触FastAI的框架不是很熟悉,希望能多指点。期待您的回复。

Questions for reimplementing ABINet

Thanks for your great work. I'm facing some trouble on the re-implementation of the vision part of your work, and I'd like to ask for more experimental details for re-implementation if it is possible.

  1. I noticed that in SRN, they used resnet50 as their backbone. While in your ABINet, you chose a much more light-weight backbone with only 5 residual blocks(it seems like a resnet18 or even lighter) for feature extraction(according to your arxiv paper's footprint) and you still achieved comparable results. Can you provide the detailed structure of your resnet backbone as well as the mini-Unet structure? Besides, can you provide the different configurations of your SV(small vision model), MV(medium vision model) and LV?

  2. Is the positional encoding and the order embedding(used as Q in attention) hard-coded or learned? Does different encoding methods affect the performance a lot?

  3. Can you provide the detailed parameters for augmentation methods? And How much does it affect the performance with and without data augmentation?

  4. How long approximately does it take for your model to reach convergence using 4x1080ti gpus?

Thanks again for your work, looking forward to your reply.

关于复现结果的问题

感谢作者分享的源码,想问一下为什么我用双卡3090跑源码,数据集一样,并不能复现出原文的效果,我是使用的提供好的视觉和语言模型直接训abinet,感觉无法收敛,cwr波动很大徘徊在77-85之间。

关于 self training

in the process of self-training, Do you mean to filter out samples whose C < Q, or keep these samples?

请问如何理解论文中的图6

请问怎么理解您论文中的图6?左右两幅图的横纵坐标都是相同的,但是我不明白两幅图的区别,期待您的回复,谢谢!

nan loss

image
Hi, thanks for your great work @FangShancheng
I try to training ABINet with vietnamese characters. Firstly, I train the language model using my own vietnamese token but got nan loss. Can you help me to fix this ?

关于合成数据集里大小写字母的问题

在代码里似乎是用了dict_36(仅含小写字母)的字典,但MJ 和ST数据集里应该是有大写字母和小写字母的标签,这里大小写字母的转化具体在哪个地方进行的处理呢,能否发一下位置,感谢~

AssertionError

你好,看了您的论文,我很受启发。在复现您代码用其它数据集时,我遇到了这个错误:
if not self.is_training: assert idx == idx_new, f'idx {idx} != idx_new {idx_new} during testing.'
AssertionError: idx 3 != idx_new 762 during testing.

请问我是还需要修改哪里吗?期待您的回复

复现模型的准确率问题

首先感谢作者的分享,看了您的文章,给了我很大的启发。
最近复现了一下您的程序,使用的是开源的code中yaml文件的默认参数,除了数据集路径与batchsize数值其他并未改动。但是训练的结果准确率却和ABInet的开源的model差了很多。

我们的复现结果:IIIT5k 准确率为89.8%,小于公布的96.4%;SVT:92.1%小于开源model的93.2%;IC15: 82.2%小于开源model的85.9%;SVTP: 87.1% 小于开源的89%;CUTE:84.7%小于开源model的89.2%。

有些数据集的准确率相差非常大,例如IIIT5k。请问是什么原因呢?可能是训练的epoch造成的吗?我复现vision model 训练了默认的8个epoch,language model 默认的epoch数是80,由于太大,我训练了5个epoch就停止了训练,这个时候从损失值来看网络差不多稳定了。在训练ABInet 时 训练了3 个epoch,此时我发现网络也已经差不多稳定了。如果完全复现80个和10个epoch,确实是一个很大的工程,我的两个2080ti的gpu大概需要30天,而文中写的使用4个1080ti也需要半个月吧

请问您训练时的参数是什么。模型的准确率相差太多是哪些地方我没注意到导致的呢?
再次感谢作者的工作与贡献。期待您的回复。

RecursionError: maximum recursion depth exceeded

This massage is logged to the terminal:

[2021-10-14 04:53:57,225 main.py:215 INFO train-abinet] ModelConfig(
(0): dataset_case_sensitive = False
(1): dataset_charset_path = data/charset_36.txt
(2): dataset_data_aug = True
(3): dataset_eval_case_sensitive = False
(4): dataset_image_height = 32
(5): dataset_image_width = 128
(6): dataset_max_length = 25
(7): dataset_multiscales = False
(8): dataset_num_workers = 14
(9): dataset_one_hot_y = True
(10): dataset_pin_memory = True
(11): dataset_smooth_factor = 0.1
(12): dataset_smooth_label = False
(13): dataset_test_batch_size = 384
(14): dataset_test_roots = ['data/training/MJ/MJ_train/', 'data/training/MJ/MJ_test/', 'data/training/MJ/MJ_valid/', 'data/training/ST']
(15): dataset_train_batch_size = 384
(16): dataset_train_roots = ['data/training/MJ/MJ_train/', 'data/training/MJ/MJ_test/', 'data/training/MJ/MJ_valid/', 'data/training/ST']
(17): dataset_use_sm = False
(18): global_name = train-abinet
(19): global_phase = train
(20): global_seed = None
(21): global_stage = train-super
(22): global_workdir = workdir/train-abinet
(23): model_alignment_loss_weight = 1.0
(24): model_checkpoint = None
(25): model_ensemble =
(26): model_iter_size = 3
(27): model_language_checkpoint = workdir/pretrain-language-model/pretrain-language-model.pth
(28): model_language_detach = True
(29): model_language_loss_weight = 1.0
(30): model_language_num_layers = 4
(31): model_language_use_self_attn = False
(32): model_name = modules.model_abinet_iter.ABINetIterModel
(33): model_strict = True
(34): model_use_vision = False
(35): model_vision_attention = position
(36): model_vision_backbone = transformer
(37): model_vision_backbone_ln = 3
(38): model_vision_checkpoint = workdir/pretrain-vision-model/best-pretrain-vision-model.pth
(39): model_vision_loss_weight = 1.0
(40): optimizer_args_betas = (0.9, 0.999)
(41): optimizer_bn_wd = False
(42): optimizer_clip_grad = 20
(43): optimizer_lr = 0.0001
(44): optimizer_scheduler_gamma = 0.1
(45): optimizer_scheduler_periods = [6, 4]
(46): optimizer_true_wd = False
(47): optimizer_type = Adam
(48): optimizer_wd = 0.0
(49): training_epochs = 10
(50): training_eval_iters = 3000
(51): training_save_iters = 3000
(52): training_show_iters = 50
(53): training_start_iters = 0
(54): training_stats_iters = 100000
)
[2021-10-14 04:53:57,226 main.py:222 INFO train-abinet] Construct dataset.
[2021-10-14 04:53:57,228 main.py:92 INFO train-abinet] 67199 training items found.
[2021-10-14 04:53:57,228 main.py:94 INFO train-abinet] 67199 valid items found.
[2021-10-14 04:53:57,228 main.py:226 INFO train-abinet] Construct model.
[2021-10-14 04:53:57,488 model_vision.py:37 INFO train-abinet] Read vision model from workdir/pretrain-vision-model/best-pretrain-vision-model.pth.
[2021-10-14 04:53:59,805 model_language.py:38 INFO train-abinet] Read language model from workdir/pretrain-language-model/pretrain-language-model.pth.
[2021-10-14 04:53:59,843 main.py:104 INFO train-abinet] ABINetIterModel(
(vision): BaseVision(
(backbone): ResTranformer(
(resnet): ResNet(
(conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(32, 32, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer5): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0.1)
)
(transformer): TransformerEncoder(
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm1): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1)
(dropout2): Dropout(p=0.1)
)
)
)
)
(attention): PositionAttention(
(k_encoder): Sequential(
(0): Sequential(
(0): Conv2d(512, 64, kernel_size=(3, 3), stride=(1, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(1): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(2): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
(3): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
)
)
(k_decoder): Sequential(
(0): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(1): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(2): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
(3): Sequential(
(0): Upsample(size=(8, 32), mode=nearest)
(1): Conv2d(64, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0)
)
(project): Linear(in_features=512, out_features=512, bias=True)
)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
(language): BCNLanguage(
(proj): Linear(in_features=37, out_features=512, bias=False)
(token_encoder): PositionalEncoding(
(dropout): Dropout(p=0.1)
)
(pos_encoder): PositionalEncoding(
(dropout): Dropout(p=0)
)
(model): TransformerDecoder(
(layers): ModuleList(
(0): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(1): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(2): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
(3): TransformerDecoderLayer(
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(linear1): Linear(in_features=512, out_features=2048, bias=True)
(dropout): Dropout(p=0.1)
(linear2): Linear(in_features=2048, out_features=512, bias=True)
(norm2): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm(torch.Size([512]), eps=1e-05, elementwise_affine=True)
(dropout2): Dropout(p=0.1)
(dropout3): Dropout(p=0.1)
)
)
)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
(alignment): BaseAlignment(
(w_att): Linear(in_features=1024, out_features=512, bias=True)
(cls): Linear(in_features=512, out_features=37, bias=True)
)
)
[2021-10-14 04:53:59,848 main.py:229 INFO train-abinet] Construct learner.
[2021-10-14 04:53:59,962 main.py:233 INFO train-abinet] Start training.
Traceback (most recent call last):
File "/ABINet/dataset.py", line 103, in get
return self._next_image(idx)
File "/ABINet/dataset.py", line 61, in _next_image
next_index = random.randint(0, len(self) - 1)
RecursionError: maximum recursion depth exceeded
[2021-10-14 04:53:59,994 dataset.py:119 INFO train-abinet] Corrupted image is found: MJ_train, 34607, , 0
Fatal Python error: Cannot recover from stack overflow.

Thread 0x00007f7923687700 (most recent call first):
File "/home/user/miniconda/envs/py36/lib/python3.6/selectors.py", line 376 in select
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 911 in wait
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 414 in _poll
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/connection.py", line 257 in poll
File "/home/user/miniconda/envs/py36/lib/python3.6/multiprocessing/queues.py", line 104 in get
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/tensorboardX/event_file_writer.py", line 202 in run
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007f797e1ef700 (most recent call first):
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 235 in _queue_processor
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 864 in run
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/user/miniconda/envs/py36/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007f7a07cd3700 (most recent call first):
File "/ABINet/dataset.py", line 61 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 120 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
File "/ABINet/dataset.py", line 62 in _next_image
File "/ABINet/dataset.py", line 103 in get
...
Aborted (core dumped)

Do you know what should I do with my dataset?
Thanks

Where is the demo.py ?

First,thanks for the author's ditribution for this project ! Can anyone tell me the "demo.py" for this porject, I can only get test mode for test datasets.

How to prepare a custom dataset for training?

I tried to train your model with custom dataset and I got this:

"
[2021-10-11 13:09:35,820 main.py:222 INFO train-abinet] Construct dataset.
Traceback (most recent call last):
File "main.py", line 246, in
main()
File "main.py", line 224, in main
else: data = _get_databaunch(config)
File "main.py", line 80, in _get_databaunch
train_ds = _get_dataset(ImageDataset, config.dataset_train_roots, True, config)
File "main.py", line 47, in _get_dataset
datasets = [ds_type(p, **kwargs) for p in paths]
File "main.py", line 47, in
datasets = [ds_type(p, **kwargs) for p in paths]
File "/content/drive/MyDrive/OCR/ABINet/dataset.py", line 49, in init
self.length = int(txn.get('num-samples'.encode()))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
"

And here's how my data is stored:
ABINet
|--data
| |--img
| | |--(here is images)
| |--lmdb
| | |--data.mdb
| | |--lock.mdb

Thanks

模型在中文数据集上的效果讨论

您好!非常感谢您们的工作和分享,请问如果想用这个模型在中文数据集上,会有哪些优劣吗?这里的语言模型要是换成中文的bert的话,效果影响大吗?您们是否有做过这方面的实验呢

Bad requirements.txt

Hi, I noticed that the requirements.txt file is no longer valid. Some of the versions do not exist in pip. I'm noticing that torch==1.1.0, torchvision==0.3.0, and fastai==1.0.60. Is there a reason these are so old? Current torch version is 1.9.0, torchvision is 0.10.0 and fastai is 2.4. I have some of the code working with these newer versions. The other parts I'm working through (I'm at a syntax error right now).

About dataset

Thanks for your great work!I have a questation .Where can I get the files of WikiText-103.csv and WikiText-103_eval_d1.csv? Can you provide a link? Thank you!

关于position attention的end token

您好!请问对于position attention设置end token的意义在哪里呢?既然position attention不是auto regressive的形式,好像并不需要一个end token来终止解码过程?

语言模型问题

请问下,使用中文语料训练语言模型,得到的指标是 epoch 15 iter 28000: eval loss = 2.5787, ccr = 0.7434, cwr = 0.0914, ted = 0.0000, ned = 0.0000, ted/w = 0.0000,但是推理时却效果不佳,这个可能是什么原因?

input: 我是申华人民共和国公民
output:['这校中请人民共和谐公司的的']

请问如何理解论文中的图6

请问怎么理解您论文中的图6?左右两幅图的横纵坐标都是相同的,但是我不明白两幅图的区别,期待您的回复,谢谢!

运行eval时,一直卡在加载vision model

[2021-09-06 17:11:08,426 main.py:229 INFO train-abinet] Construct learner.
一直卡在这。我调试了一下,是在:
learner = Learner(data, model, silent=True, model_dir='.', true_wd=config.optimizer_true_wd, wd=config.optimizer_wd, bn_wd=config.optimizer_bn_wd, path=config.global_workdir, metrics=metrics, opt_func=partial(opt_type, **config.optimizer_args or dict()), loss_func=MultiLosses(one_hot=config.dataset_one_hot_y))
请问有什么解决办法吗

问一下Parallel Attention和Position Attention的区别

读了论文,我的理解文中的Position Attention如下:

# feature_map: (HW, channels)特征图
# w: (len_seq, channels)位置编码
q = w  # (len_seq, channels) 实际上位置编码就是一个可训练参数,基本上就是个fc层
k = unet(feature_map)  # (HW ,  channels)
v = feature_map  # (HW ,  channels)

attention_map = softmax(matmul(q, k.transpose()))   # (len_seq,  HW)
output = matmul(attention_map, v)   # (len_seq,  channels)

而Parallel Attention (2D Attentional Irregular Scene Text Recognizer)中是这样的:

# 参考https://arxiv.org/pdf/1906.05708.pdf公式7
# w1: (channels, channels)
# w2: (len_seq, channels)
q = w2  # (len_seq, channels)
k = tanh(w1,feature_map.transpose())  #  (channels, HW)
v = feature_map  # (HW, channels)

attention_map = softmax(matmul(q, k))  # (len_seq, HW)
output = matmul(attention_map, v)  # (len_seq, channels)

所以能否理解为Parallel Attention和Position Attention的区别只是在于对k的变换不一样?

关于BGF的作用

论文中提到BGF可以使VM和LM独立的训练,不太理解如果没有BGF,对LM的影响是什么?并且LM本身可以采用pre-trained LM。
盼回复,谢谢

NameError: name 'ifnone' is not defined

运行python demo.py --config=configs/train_abinet.yaml --input=figs/test
出现NameError: name 'ifnone' is not defined的错误。
请问这个怎么解决?

Failed to reproduce results in the paper

Thanks for release your source code.
Unfortunately, i got poor test result (about -4% from official results in the paper)

Is there any important training strategies to reproduce the results?
Or, is there anyone who meets same issue..?

请问怎么理解论文图6

请问怎么理解您论文中的图6?左右两幅图的横纵坐标都是相同的,但是我不明白两幅图的区别,期待您的回复,谢谢!

Question for adapting to textline recognition and variable-size images?

Thanks for your amazing repo!
I have a textline dataset and I want to get a solution for using your work to adapt my dataset.
Can the system be used for variable-size images (Not fixed size - 32 x 128) and variable-size text length (according to a max text-length of mini-batch ,not using fixed dataset_max_length)?
Could you refer to me a solution?
Thanks!

Question about pre-training language model

Hello, I find the setting of epoch in pretrain_language_model.yaml is 80, and I use 4 titan xp to pre-train language model. However, 1 epoch takes 8 hours, and 80 epochs need 640 hours(nearly 26 days). Should I have to train 80 epochs? How to judge the training of language model become convergence?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.