ayumiymk / aster.pytorch Goto Github PK
View Code? Open in Web Editor NEWASTER in Pytorch
License: MIT License
ASTER in Pytorch
License: MIT License
Hi @ayumiymk ,
I've tested and got the same accuracy like you did. And I've been trying to train model with a small dataset clean that you gave. But I'm not sure about data that I use to train or test is different or not?
It means when I train I will convert dataset (include images and gt.txt files) to .mdb files as you told by tools/create_svtp_lmdb.py.
But when I want to test, I still need to construct my data to .mdb files, right? or I only need put my images and .txt files?
Thank you so much! I hope you will let me know clearly about this problem.
File "/home/water/Downloads/aster.pytorch-master/lib/models/attention_recognition_head.py", line 75, in beam_search
batch_size, l, d = x.size()
ValueError: too many values to unpack (expected 3)
when i execute bash scripts/stn_att_rec.sh, i have the problem
UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().
could you tell me how to solve it? @ayumiymk
制作了Synthtext和synth90k的lmdb格式,使用您的代码进行训练,但速度非常缓慢,gpu利用率很多时间都是0,请问您有遇到过这个问题吗?
Hi @ayumiymk ,
I've tried with dataset of Japanese images to training but when I was training, I got this error and I don't know the reason why?
"""
い is out of vocabulary.
れ is out of vocabulary.
。 is out of vocabulary.
海 is out of vocabulary.
( is out of vocabulary.
"""
Can you give me some advice?
Thank you so much!
CUDA out of memory.
I encountered the following problems when training the model. I wonder if the author is loading all the data into memory? Is there any other way to solve this problem? Thank you very much.
Traceback (most recent call last): File "main.py", line 229, in <module> main(args) File "main.py", line 213, in main test_dataset=test_dataset) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/trainers.py", line 59, in train output_dict = self._forward(input_dict) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/trainers.py", line 189, in _forward output_dict = self.model(input_dict) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/model_builder.py", line 87, in forward rec_pred = self.decoder([encoder_feats, rec_targets, rec_lengths]) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 39, in forward output, state = self.decoder(x, state, y_prev) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 255, in forward alpha = self.attention_unit(x, sPrev) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 220, in forward sumTanh = torch.tanh(sProj + xProj) RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 10.91 GiB total capacity; 10.30 GiB already allocated; 5.38 MiB free; 35.66 MiB cached)
Hi ayumiymk,
I have just came into your aster framework. I'm wondering that it is possible to adapt the framework for textline. If it is possible, what I need to do for adapting.
Could you help me explain the problem?
Thank you very much!
Link textline : https://drive.google.com/file/d/16pSR_zknToN1AwLaKig04NTQxctKfarm/view?usp=sharing
请问自然场景下中文文本识别的数据集可以用于训练这个网络吗?制作数据集、字典的方法与英文一致吗?如果您已有相关的实验数据,可以告知一下识别效果吗?
when we execute bash scripts/main_test_all.sh, I find this issue that ImportError: cannot import name 'SubsetRandomSampler'. Could you help me?
针对不定长中文识别问题,
可以请教下使用Attention-Based的方法识别的话,
必须使用不定长数据来训练吗?
这个不定长训练该怎么实现呢?
不能像CTC那样,定长训练,不定长识别吗?
非常期待回复,多谢了
Hello, how to output the corrected image?
您好,请问下这个是不是没有直接测试图片的代码,我看这个代码在测试的时候都需要label,请问这个解码过程是需要label信息来辅助的吗?
aster.pytorch/lib/models/stn_head.py
Line 90 in 36e73da
Why need to * 0.1 to img_feat? Thanks!
inference code?
I got the slow inference: normally 0.2 s for one image. When I break down 2 steps: transformation: 0.16s and recognition 0.04s.
Look like the transformation stage is quite slow. Is it a problem?
You have not produced the demo script. Could aster be tested on single images? I have found that labels and label_lengths need to be fed into your model even in the val process. How can I test on images without labels?
Thank you very much.
@ayumiymk Thanks for your great work. I have a question, margin in stn_head.py is initialized to "margin=0.01", but in tps_spatial_transformer.py, the margin is 0.05. Is that intentional or a mismatch? Thanks.
I want to make the svtp lmdb, but I found the original download url in original paper ' http://www.comp.nus.edu.sg/~phanquyt/' is not reachable, may you share it with me ? Thanks!
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
Dear author:
how can i sovle this problem, thanks for answer.
I noticed that STN needs training. My objective is to use STN as an individual module, so what shall I do ?
Hi,
Thanks for your valuable work. As far as I understand Aster comprises of a text localization, rectification and a recognition part. I wanted to ask if it is possible to train just the recognition part.
Hello author:
I try to train ASTER on Synth800K and Synth90K, same setting as the TPAMI Papaer. However, I can not reimplement the accuracy in the paper and your pretrained model.
Can you tell me what dataset are you used to train the pretrained model you provided? Thanks very much!
您好,我尝试将ASTER应用到ICDAR ReCTS数据集上,但是发现对于中文数据集效果比较差。请问对于中文数据集,您有什么建议吗?谢谢。
作者你好,我下载了你提供的预训练模型“demo.path.tar” 始终无法解压,提示数据格式损坏,不知道其他人是否也出现同样问题,能否重新上传一个呢?
error arising in dataset.py while executing main.sh
""" in getitem
buf.write(imgbuf)
TypeError: a bytes-like object is required, not 'NoneType' """
So do you not use attention decoder or do you use the attention mechanism different from ASTER(L2R)?
首先非常感谢作者的开源工作,而且issues也是我见过最为友好的。
首先,非常非常重要的就是,在做汉字种类的list的时候一定不要直接汉字上,最好是用encode后的汉字,不然你就需要做下面的功课了。。。。。唉~~
if True:
修改成:if False:
这里做了约束,通过_normalize_text去除掉了汉字,如果不做修改会造成评价指标不正常,没有算进去中文。
Hi, thanks for your great work.
could you tell me the detail configuration environment? like pytorch version,cuda.
It would be great if a docker version could be provided。
您好,请问可以提供一下你测试用的数据吗?
--synthetic_train_data_dir /data/mkyang/scene_text/recognition/CVPR2016/ /data/mkyang/scene_text/recognition/NIPS2014/
--test_data_dir /data/mkyang/scene_text/recognition/benchmark_lmdbs_new/IIIT5K_3000/ \
对这三个文件,有什么要求吗?
~/aster.pytorch/lib/models/model_builder.py in forward(self, input_dict)
89 return_dict['losses']['loss_rec'] = loss_rec
90 else:
---> 91 rec_pred, rec_pred_scores = self.decoder.beam_search(encoder_feats, global_args.beam_width, self.eos)
92 rec_pred_ = self.decoder([encoder_feats, rec_targets, rec_lengths])
93 loss_rec = self.rec_crit(rec_pred_, rec_targets, rec_lengths)
~/aster.pytorch/lib/models/attention_recognition_head.py in beam_search(self, x, beam_width, eos)
74
75 # https://github.com/IBM/pytorch-seq2seq/blob/fede87655ddce6c94b38886089e05321dc9802af/seq2seq/models/TopKDecoder.py
---> 76 batch_size, l, d = x.size()
77 # inflated_encoder_feats = _inflate(encoder_feats, beam_width, 0) # ABC --> AABBCC -/-> ABCABC
78 inflated_encoder_feats = x.unsqueeze(1).permute((1,0,2,3)).repeat((beam_width,1,1,1)).permute((1,0,2,3)).contiguous().view(-1, l, d)
ValueError: too many values to unpack (expected 3)
那下面的aster.pytorch指的是什么条件下的结果?求解答~~
如题。作者能不能重新上传一下文件呢?
Dear @ayumiymk,
Thank you for this repo,
In inference time, the script prints out the accuracy, so I want it to print out also the actual predicted sequence. Is it stored in lib/evaluators.py
pred_list
variable?
Hi, when trained, it always try allocate new memory. Do you have this problem?
According to the original paper, the recognition time for each image is 20ms. I modified demo.py, let it recognize all the images on the ICDAR2003 test set, and computed the average recognition time.
The following is my modification:
and the output is
average time = 168.360ms
which is larger than 20ms. I did the recognition on a single NVIDIA Tesla V100 GPU. I don't think using PyTorch will cause so much extra time consumption (I even ignore the time of reading image and preprocessing). So I wonder what is the problem.
Thanks very much.
@ayumiymk 请教一下lexicon search的问题; 我现在的应用是识别发票上的打印文字, 我目前训练了一个文字识别模型,文字识别率在84%; 发票上的文字主要是药品名称和治疗方法名称, 现在我想构建一个词库, 然后采用lexicon search的方法提高文字识别率; 由于发票上的文字经常挨着,而且没有标点符号分隔(比如药品名称左右分别和编号或规格信息挨着), 这样需要自己分词, 分词完后我采用lexicon search的方法来查找词库中匹配的词; 现在的问题是有些发票上识别错的药品名称能够通过lexicon search纠正识别错误, 但也有些识别正确的药品名称会弄错; 我目前所用的lexicon search 方案是调用fuzzywoozy库的extract()函数, 这个函数会计算待查药品名称和词库中的词语之间的edit distance, 并给出一个score;不知道你在采用lexicon search时有没有遇到类似问题?
Hi, @ayumiymk ,thanks for your greate work. I find evaluation takes more time than training for the same number of iterations. An iteration in the training process requires 0.6s, but it requires at least 30s in evalutation process. How do I speed it up?
could you provide a pretrained model? thank you.
我在ocr方面还算是新手,运行了您的代码之后有一些疑问。我发现模型似乎没有办法识别空格(被认为是背景?),如果想要自动识别并分开两个英文单词应该怎么做呢?使用包含空格的数据集训练吗?
Hi, vocab supports only english, not chinese,what should I do, thank you.
你好请问一下由于我下载aster-pytorch的预训练模型太慢,请问你能不能发我一份。谢谢。
in stn_att_rec.sh
CUDA_VISIBLE_DEVICES=0,1 python main.py
--synthetic_train_data_dir /data/mkyang/scene_text/recognition/CVPR2016/ /data/mkyang/scene_text/recognition/NIPS2014/
what's in CVPR2016? and what's in NIPS2014?
When I clone this project and run it. I had this error, and I haven't known how to fix it now. I hope you can help me with this issue. Thank you so much!
---***----
Traceback (most recent call last):
File "main.py", line 230, in
main(args)
File "main.py", line 127, in main
args.height, args.width, args.batch_size, args.workers, True, args.keep_ratio)
File "main.py", line 39, in get_data
dataset_list.append(LmdbDataset(data_dir_, voc_type, max_len, num_samples))
File "/home/wanna/Documents/aster.pytorch/lib/datasets/dataset.py", line 53, in init
self.env = lmdb.open(root, max_readers=32, readonly=True)
lmdb.Error: /share/zhui/reg_dataset/NIPS2014: No such file or directory
尊敬的作者您好,首先非常感谢您开源这么好的代码和模型,我正在使用您的代码和模型做一个关于VQA的研究。但是我发现您的代码好像只能处理截取好的图片,而不能处理完整的图片(比如一张自然场景下的图片,该图片中可能包含多个文本信息)。您有关于相关代码或者模型解决这个问题吗?
Dear @ayumiymk,
Thank you for this repo,
How do I create a .pth file with the original data?
ValueError: => No checkpoint found at '/home/ubuntu/ASTER/logs/'
CUDA_VISIBLE_DEVICES=0,1 python main.py \
--synthetic_train_data_dir /home/ubuntu/ASTER/lmdb_data/32x32_960/train/ \
--test_data_dir /home/ubuntu/ASTER/lmdb_data/32x32_960/test/ \
--batch_size 1024 \
--workers 8 \
--height 64 \
--width 256 \
--voc_type JAPANESE \
--arch ResNet_ASTER \
--logs_dir /home/ubuntu/ASTER/logs/baseline_aster \
--real_logs_dir /home/ubuntu/ASTER/logs/baseline_aster \
--max_len 100 \
--STN_ON \
--tps_inputsize 32 64 \
--tps_outputsize 32 100 \
--tps_margins 0.05 0.05 \
--stn_activation none \
--num_control_points 20 \
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.