ayumiymk / aster.pytorch Goto Github PK

View Code? Open in Web Editor NEW

662.0 11.0 168.0 120 KB

ASTER in Pytorch

License: MIT License

Python 97.97% Shell 2.03%

ocr text-recognition scene-text computer-vision text-rectification aster pytorch

aster.pytorch's People

Contributors

Stargazers

Watchers

Forkers

fendaq asa008 git-manager csyhhb ustczhouyu packyan loovelj phybrain rkshuai pursu billyzju kuyun-zhangyang xiaoyubing jeffrey98-ai sunshinehome zzmcdc alwc liuwenhaha zhengjiawen fxwfzsxyq yongshuaihuang qingyang-jiandao jamence baojun-lee hell-to-heaven irentang mervo chengmuni66 zobeirraisi xiaolaodi clscy lzneu shengzhang90 yangjirui zwy4896 penetrator111 somebody-deep jeozhao gehongpeng weitaoatvison sunxingxingtf fanofjava jingjing-you cabbagehust2507 moyans kgmsft huizhang0110 epinnock lethe-hhh holygen wenhuach mathpopo joyliu37 vaedan yangtong1989 gsws verazjy xiyangde 2017tjm kennycao123 pinakinathc wuzhihao7788 delldu janetwise shengfly zhangxinnan yanggui19891007 yanqi1811 xmlyqing00 crispyl siaoliu dy1998 fcakyon annihilation7 alperkesen s110h0716f swpu-computer metamind chenjun2hao ml-lab loveltyoic xtg88 huangzixuan0508 lhwcv wangjianyuweg dalei22 lwzbuaa terence1023 laocan cv-ip wxbxj haiboku233 matchaqaq ducbluee harsha-sharechat-account mengxiaolu changss dyf-ai mingchaoxu ljwdust

aster.pytorch's Issues

Training process problems

Hi @ayumiymk ,
I've tested and got the same accuracy like you did. And I've been trying to train model with a small dataset clean that you gave. But I'm not sure about data that I use to train or test is different or not?
It means when I train I will convert dataset (include images and gt.txt files) to .mdb files as you told by tools/create_svtp_lmdb.py.
But when I want to test, I still need to construct my data to .mdb files, right? or I only need put my images and .txt files?
Thank you so much! I hope you will let me know clearly about this problem.

How to solve this problem

File "/home/water/Downloads/aster.pytorch-master/lib/models/attention_recognition_head.py", line 75, in beam_search
batch_size, l, d = x.size()
ValueError: too many values to unpack (expected 3)

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().

when i execute bash scripts/stn_att_rec.sh, i have the problem
UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().
could you tell me how to solve it? @ayumiymk

你好，请问有下载训练数据集的链接吗？

请教关于训练的问题

制作了Synthtext和synth90k的lmdb格式，使用您的代码进行训练，但速度非常缓慢，gpu利用率很多时间都是0，请问您有遇到过这个问题吗？

Error: out of vocabulary

Hi @ayumiymk ,
I've tried with dataset of Japanese images to training but when I was training, I got this error and I don't know the reason why?
"""
い is out of vocabulary.
れ is out of vocabulary.
。 is out of vocabulary.
海 is out of vocabulary.
（ is out of vocabulary.
"""
Can you give me some advice?
Thank you so much!

CUDA out of memory problem

CUDA out of memory.
I encountered the following problems when training the model. I wonder if the author is loading all the data into memory? Is there any other way to solve this problem? Thank you very much.

Traceback (most recent call last): File "main.py", line 229, in <module> main(args) File "main.py", line 213, in main test_dataset=test_dataset) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/trainers.py", line 59, in train output_dict = self._forward(input_dict) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/trainers.py", line 189, in _forward output_dict = self.model(input_dict) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/model_builder.py", line 87, in forward rec_pred = self.decoder([encoder_feats, rec_targets, rec_lengths]) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 39, in forward output, state = self.decoder(x, state, y_prev) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 255, in forward alpha = self.attention_unit(x, sPrev) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 220, in forward sumTanh = torch.tanh(sProj + xProj) RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 10.91 GiB total capacity; 10.30 GiB already allocated; 5.38 MiB free; 35.66 MiB cached)

Can I adapt the aster for textline recognition?

Hi ayumiymk,
I have just came into your aster framework. I'm wondering that it is possible to adapt the framework for textline. If it is possible, what I need to do for adapting.
Could you help me explain the problem?
Thank you very much!

Link textline : https://drive.google.com/file/d/16pSR_zknToN1AwLaKig04NTQxctKfarm/view?usp=sharing

自然场景下中文文本识别

请问自然场景下中文文本识别的数据集可以用于训练这个网络吗？制作数据集、字典的方法与英文一致吗？如果您已有相关的实验数据，可以告知一下识别效果吗？

ImportError: cannot import name 'SubsetRandomSampler'

when we execute bash scripts/main_test_all.sh, I find this issue that ImportError: cannot import name 'SubsetRandomSampler'. Could you help me?

不定长中文数据的识别的问题

针对不定长中文识别问题，

可以请教下使用Attention-Based的方法识别的话，
必须使用不定长数据来训练吗？
这个不定长训练该怎么实现呢？
不能像CTC那样，定长训练，不定长识别吗？

非常期待回复，多谢了

output rectification

Hello, how to output the corrected image?

demo problem

您好，请问下这个是不是没有直接测试图片的代码，我看这个代码在测试的时候都需要label，请问这个解码过程是需要label信息来辅助的吗？

Why do we need to scale img_feat to 0.1 in the stn_head?

aster.pytorch/lib/models/stn_head.py

Line 90 in 36e73da

x = self.stn_fc2(0.1 * img_feat)

Why need to * 0.1 to img_feat? Thanks!

inference code?

Slow in inference

I got the slow inference: normally 0.2 s for one image. When I break down 2 steps: transformation: 0.16s and recognition 0.04s.
Look like the transformation stage is quite slow. Is it a problem?

Dear yang:

You have not produced the demo script. Could aster be tested on single images? I have found that labels and label_lengths need to be fed into your model even in the val process. How can I test on images without labels?
Thank you very much.

about margin

@ayumiymk Thanks for your great work. I have a question, margin in stn_head.py is initialized to "margin=0.01", but in tps_spatial_transformer.py, the margin is 0.05. Is that intentional or a mismatch? Thanks.

Could you provide the dataset StreetViewText-Perspective(svtp) download links?

I want to make the svtp lmdb, but I found the original download url in original paper ' http://www.comp.nus.edu.sg/~phanquyt/' is not reachable, may you share it with me ? Thanks!

demo.pth.tar error

tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
Dear author:
how can i sovle this problem, thanks for answer.

@ayumiymk Can I use STN as an individual module to do correct distorted text image?

I noticed that STN needs training. My objective is to use STN as an individual module, so what shall I do ?

Is it possible to train just the LSTM, not the localization or the rectification network

Hi,
Thanks for your valuable work. As far as I understand Aster comprises of a text localization, rectification and a recognition part. I wanted to ask if it is possible to train just the recognition part.

What dataset is used to train the pretrained model?

Hello author:
I try to train ASTER on Synth800K and Synth90K, same setting as the TPAMI Papaer. However, I can not reimplement the accuracy in the paper and your pretrained model.
Can you tell me what dataset are you used to train the pretrained model you provided? Thanks very much!

How to improve performance on Chinese dataset

您好，我尝试将ASTER应用到ICDAR ReCTS数据集上，但是发现对于中文数据集效果比较差。请问对于中文数据集，您有什么建议吗？谢谢。

提供的预训练模型无法解压

作者你好，我下载了你提供的预训练模型“demo.path.tar” 始终无法解压，提示数据格式损坏，不知道其他人是否也出现同样问题，能否重新上传一个呢？

arising error

error arising in dataset.py while executing main.sh
""" in getitem
buf.write(imgbuf)
TypeError: a bytes-like object is required, not 'NoneType' """

I see the code file for attention decoder in the folder /lib/models, so why do you say that attention is not included?

So do you not use attention decoder or do you use the attention mechanism different from ASTER(L2R)?

怎么关掉 STN （图像矫正部分）？

自制汉字数据集，进行识别，需要需要修改的地方。【肯定句】

首先非常感谢作者的开源工作，而且issues也是我见过最为友好的。
首先，非常非常重要的就是，在做汉字种类的list的时候一定不要直接汉字上，最好是用encode后的汉字，不然你就需要做下面的功课了。。。。。唉～～

lmdb的制作过程博主有说
labelmaps中新加入你的数据集的汉字种类。如['一','甲'，......]
datasets.py的98行
由 word = self.txn.get(label_key)改成word = self.txn.get(label_key).decode()。
这是因为你的lmdb在制作的时候经过了encode的过程，这个对于英文和数字无所谓，但是对于汉字就必须加入了
在metrics.py中

if True:
修改成：if False：
这里做了约束，通过_normalize_text去除掉了汉字，如果不做修改会造成评价指标不正常，没有算进去中文。

configuration environment？

Hi, thanks for your great work.
could you tell me the detail configuration environment? like pytorch version,cuda.
It would be great if a docker version could be provided。

My accuracy of SVTP of lexicon0 is 80.9 rather than the ReadMe's 81.2, is there somewrong?

Is there something wrong?

关于测试数据和配置

您好，请问可以提供一下你测试用的数据吗？
--synthetic_train_data_dir /data/mkyang/scene_text/recognition/CVPR2016/ /data/mkyang/scene_text/recognition/NIPS2014/
--test_data_dir /data/mkyang/scene_text/recognition/benchmark_lmdbs_new/IIIT5K_3000/ \

对这三个文件，有什么要求吗？

ValueError: too many values to unpack (expected 3)

~/aster.pytorch/lib/models/model_builder.py in forward(self, input_dict)
89 return_dict['losses']['loss_rec'] = loss_rec
90 else:
---> 91 rec_pred, rec_pred_scores = self.decoder.beam_search(encoder_feats, global_args.beam_width, self.eos)
92 rec_pred_ = self.decoder([encoder_feats, rec_targets, rec_lengths])
93 loss_rec = self.rec_crit(rec_pred_, rec_targets, rec_lengths)

~/aster.pytorch/lib/models/attention_recognition_head.py in beam_search(self, x, beam_width, eos)
74
75 # https://github.com/IBM/pytorch-seq2seq/blob/fede87655ddce6c94b38886089e05321dc9802af/seq2seq/models/TopKDecoder.py
---> 76 batch_size, l, d = x.size()
77 # inflated_encoder_feats = _inflate(encoder_feats, beam_width, 0) # ABC --> AABBCC -/-> ABCABC
78 inflated_encoder_feats = x.unsqueeze(1).permute((1,0,2,3)).repeat((beam_width,1,1,1)).permute((1,0,2,3)).contiguous().view(-1, l, d)

ValueError: too many values to unpack (expected 3)

Reproduced results中L2R的数据是从哪里获得的？代码中的实现应该就是L2R吧

那下面的aster.pytorch指的是什么条件下的结果？求解答~~

预训练模型压缩文件出错了

如题。作者能不能重新上传一下文件呢？

How to print out predicted sequence in inference time?

Dear @ayumiymk,
Thank you for this repo,
In inference time, the script prints out the accuracy, so I want it to print out also the actual predicted sequence. Is it stored in lib/evaluators.py pred_list variable?

CUDA out of memory problem

CUDA out of memory
训练模型时遇到以下问题。我想知道作者是否正在将所有数据加载到内存中吗？还有其他解决方法吗？非常感谢你。

could you provide a pretrained model? thank you.I encountered the following problem when I loaded the downloaded model.

I encountered the following problem when I loaded the downloaded model.

RuntimeError： CUDA out of memory

Hi, when trained, it always try allocate new memory. Do you have this problem?

About the testing time comsumption

According to the original paper, the recognition time for each image is 20ms. I modified demo.py, let it recognize all the images on the ICDAR2003 test set, and computed the average recognition time.

The following is my modification:

and the output is

average time = 168.360ms

which is larger than 20ms. I did the recognition on a single NVIDIA Tesla V100 GPU. I don't think using PyTorch will cause so much extra time consumption (I even ignore the time of reading image and preprocessing). So I wonder what is the problem.

Thanks very much.

@ayumiymk 有关lexicon search的问题

@ayumiymk 请教一下lexicon search的问题; 我现在的应用是识别发票上的打印文字, 我目前训练了一个文字识别模型,文字识别率在84%; 发票上的文字主要是药品名称和治疗方法名称, 现在我想构建一个词库, 然后采用lexicon search的方法提高文字识别率; 由于发票上的文字经常挨着,而且没有标点符号分隔(比如药品名称左右分别和编号或规格信息挨着), 这样需要自己分词, 分词完后我采用lexicon search的方法来查找词库中匹配的词; 现在的问题是有些发票上识别错的药品名称能够通过lexicon search纠正识别错误, 但也有些识别正确的药品名称会弄错; 我目前所用的lexicon search 方案是调用fuzzywoozy库的extract()函数, 这个函数会计算待查药品名称和词库中的词语之间的edit distance, 并给出一个score;不知道你在采用lexicon search时有没有遇到类似问题?

Evaluation is slow

Hi, @ayumiymk ,thanks for your greate work. I find evaluation takes more time than training for the same number of iterations. An iteration in the training process requires 0.6s, but it requires at least 30s in evalutation process. How do I speed it up?

pretrained model

could you provide a pretrained model? thank you.

您好，请教您一些问题

我在ocr方面还算是新手，运行了您的代码之后有一些疑问。我发现模型似乎没有办法识别空格(被认为是背景？)，如果想要自动识别并分开两个英文单词应该怎么做呢？使用包含空格的数据集训练吗？

The dictionary does not support Chinese

Hi, vocab supports only english, not chinese，what should I do, thank you.

预训练模型

你好请问一下由于我下载aster-pytorch的预训练模型太慢，请问你能不能发我一份。谢谢。

synthetic_train_data_dir

in stn_att_rec.sh
CUDA_VISIBLE_DEVICES=0,1 python main.py
--synthetic_train_data_dir /data/mkyang/scene_text/recognition/CVPR2016/ /data/mkyang/scene_text/recognition/NIPS2014/
what's in CVPR2016? and what's in NIPS2014?

lmdb.Error: /share/zhui/reg_dataset/NIPS2014: No such file or directory

When I clone this project and run it. I had this error, and I haven't known how to fix it now. I hope you can help me with this issue. Thank you so much!
---***----
Traceback (most recent call last):
File "main.py", line 230, in
main(args)
File "main.py", line 127, in main
args.height, args.width, args.batch_size, args.workers, True, args.keep_ratio)
File "main.py", line 39, in get_data
dataset_list.append(LmdbDataset(data_dir_, voc_type, max_len, num_samples))
File "/home/wanna/Documents/aster.pytorch/lib/datasets/dataset.py", line 53, in init
self.env = lmdb.open(root, max_readers=32, readonly=True)
lmdb.Error: /share/zhui/reg_dataset/NIPS2014: No such file or directory

您的代码里面是否包含文字检测定位功能呢？

  尊敬的作者您好，首先非常感谢您开源这么好的代码和模型，我正在使用您的代码和模型做一个关于VQA的研究。但是我发现您的代码好像只能处理截取好的图片，而不能处理完整的图片（比如一张自然场景下的图片，该图片中可能包含多个文本信息）。您有关于相关代码或者模型解决这个问题吗？

original data

Dear @ayumiymk,
Thank you for this repo,

How do I create a .pth file with the original data?

ValueError: => No checkpoint found at '/home/ubuntu/ASTER/logs/'

CUDA_VISIBLE_DEVICES=0,1 python main.py \
  --synthetic_train_data_dir /home/ubuntu/ASTER/lmdb_data/32x32_960/train/ \
  --test_data_dir /home/ubuntu/ASTER/lmdb_data/32x32_960/test/ \
  --batch_size 1024 \
  --workers 8 \
  --height 64 \
  --width 256 \
  --voc_type JAPANESE \
  --arch ResNet_ASTER \
  --logs_dir /home/ubuntu/ASTER/logs/baseline_aster \
  --real_logs_dir /home/ubuntu/ASTER/logs/baseline_aster \
  --max_len 100 \
  --STN_ON \
  --tps_inputsize 32 64 \
  --tps_outputsize 32 100 \
  --tps_margins 0.05 0.05 \
  --stn_activation none \
  --num_control_points 20 \