Code Monkey home page Code Monkey logo

aster.pytorch's People

Contributors

ayumiymk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aster.pytorch's Issues

Training process problems

Hi @ayumiymk ,
I've tested and got the same accuracy like you did. And I've been trying to train model with a small dataset clean that you gave. But I'm not sure about data that I use to train or test is different or not?
It means when I train I will convert dataset (include images and gt.txt files) to .mdb files as you told by tools/create_svtp_lmdb.py.
But when I want to test, I still need to construct my data to .mdb files, right? or I only need put my images and .txt files?
Thank you so much! I hope you will let me know clearly about this problem.

How to solve this problem

File "/home/water/Downloads/aster.pytorch-master/lib/models/attention_recognition_head.py", line 75, in beam_search
batch_size, l, d = x.size()
ValueError: too many values to unpack (expected 3)

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().

图片
when i execute bash scripts/stn_att_rec.sh, i have the problem
UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().
could you tell me how to solve it? @ayumiymk

请教关于训练的问题

制作了Synthtext和synth90k的lmdb格式,使用您的代码进行训练,但速度非常缓慢,gpu利用率很多时间都是0,请问您有遇到过这个问题吗?

Error: out of vocabulary

Hi @ayumiymk ,
I've tried with dataset of Japanese images to training but when I was training, I got this error and I don't know the reason why?
"""
い is out of vocabulary.
れ is out of vocabulary.
。 is out of vocabulary.
海 is out of vocabulary.
( is out of vocabulary.
"""
Can you give me some advice?
Thank you so much!

CUDA out of memory problem

CUDA out of memory.
I encountered the following problems when training the model. I wonder if the author is loading all the data into memory? Is there any other way to solve this problem? Thank you very much.

Traceback (most recent call last): File "main.py", line 229, in <module> main(args) File "main.py", line 213, in main test_dataset=test_dataset) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/trainers.py", line 59, in train output_dict = self._forward(input_dict) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/trainers.py", line 189, in _forward output_dict = self.model(input_dict) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/model_builder.py", line 87, in forward rec_pred = self.decoder([encoder_feats, rec_targets, rec_lengths]) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 39, in forward output, state = self.decoder(x, state, y_prev) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 255, in forward alpha = self.attention_unit(x, sPrev) File "/home/zjy/zjy_env/anaconda3/envs/recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home2/zhaojinyuan/zjy-projects/aster-pytorch/lib/models/attention_recognition_head.py", line 220, in forward sumTanh = torch.tanh(sProj + xProj) RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 10.91 GiB total capacity; 10.30 GiB already allocated; 5.38 MiB free; 35.66 MiB cached)

自然场景下中文文本识别

请问自然场景下中文文本识别的数据集可以用于训练这个网络吗?制作数据集、字典的方法与英文一致吗?如果您已有相关的实验数据,可以告知一下识别效果吗?

不定长中文数据的识别的问题

针对不定长中文识别问题,

可以请教下使用Attention-Based的方法识别的话,
必须使用不定长数据来训练吗?
这个不定长训练该怎么实现呢?
不能像CTC那样,定长训练,不定长识别吗?

非常期待回复,多谢了

demo problem

您好,请问下这个是不是没有直接测试图片的代码,我看这个代码在测试的时候都需要label,请问这个解码过程是需要label信息来辅助的吗?

Slow in inference

I got the slow inference: normally 0.2 s for one image. When I break down 2 steps: transformation: 0.16s and recognition 0.04s.
Look like the transformation stage is quite slow. Is it a problem?

Dear yang:

You have not produced the demo script. Could aster be tested on single images? I have found that labels and label_lengths need to be fed into your model even in the val process. How can I test on images without labels?
Thank you very much.

about margin

@ayumiymk Thanks for your great work. I have a question, margin in stn_head.py is initialized to "margin=0.01", but in tps_spatial_transformer.py, the margin is 0.05. Is that intentional or a mismatch? Thanks.

demo.pth.tar error

tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
Dear author:
how can i sovle this problem, thanks for answer.

What dataset is used to train the pretrained model?

Hello author:
I try to train ASTER on Synth800K and Synth90K, same setting as the TPAMI Papaer. However, I can not reimplement the accuracy in the paper and your pretrained model.
Can you tell me what dataset are you used to train the pretrained model you provided? Thanks very much!

提供的预训练模型无法解压

作者你好,我下载了你提供的预训练模型“demo.path.tar” 始终无法解压,提示数据格式损坏,不知道其他人是否也出现同样问题,能否重新上传一个呢?

arising error

error arising in dataset.py while executing main.sh
""" in getitem
buf.write(imgbuf)
TypeError: a bytes-like object is required, not 'NoneType' """

自制汉字数据集,进行识别,需要需要修改的地方。【肯定句】

首先非常感谢作者的开源工作,而且issues也是我见过最为友好的。
首先,非常非常重要的就是,在做汉字种类的list的时候一定不要直接汉字上,最好是用encode后的汉字,不然你就需要做下面的功课了。。。。。唉~~

  1. lmdb的制作过程博主有说
  2. labelmaps中新加入你的数据集的汉字种类。如['一','甲',......]
  3. datasets.py的98行
    由 word = self.txn.get(label_key)改成word = self.txn.get(label_key).decode()。
    这是因为你的lmdb在制作的时候经过了encode的过程,这个对于英文和数字无所谓,但是对于汉字就必须加入了
  4. 在metrics.py中

if True:
修改成:if False:
这里做了约束,通过_normalize_text去除掉了汉字,如果不做修改会造成评价指标不正常,没有算进去中文。

configuration environment?

Hi, thanks for your great work.
could you tell me the detail configuration environment? like pytorch version,cuda.
It would be great if a docker version could be provided。

关于测试数据和配置

您好,请问可以提供一下你测试用的数据吗?
--synthetic_train_data_dir /data/mkyang/scene_text/recognition/CVPR2016/ /data/mkyang/scene_text/recognition/NIPS2014/
--test_data_dir /data/mkyang/scene_text/recognition/benchmark_lmdbs_new/IIIT5K_3000/ \

对这三个文件,有什么要求吗?

ValueError: too many values to unpack (expected 3)

~/aster.pytorch/lib/models/model_builder.py in forward(self, input_dict)
89 return_dict['losses']['loss_rec'] = loss_rec
90 else:
---> 91 rec_pred, rec_pred_scores = self.decoder.beam_search(encoder_feats, global_args.beam_width, self.eos)
92 rec_pred_ = self.decoder([encoder_feats, rec_targets, rec_lengths])
93 loss_rec = self.rec_crit(rec_pred_, rec_targets, rec_lengths)

~/aster.pytorch/lib/models/attention_recognition_head.py in beam_search(self, x, beam_width, eos)
74
75 # https://github.com/IBM/pytorch-seq2seq/blob/fede87655ddce6c94b38886089e05321dc9802af/seq2seq/models/TopKDecoder.py
---> 76 batch_size, l, d = x.size()
77 # inflated_encoder_feats = _inflate(encoder_feats, beam_width, 0) # ABC --> AABBCC -/-> ABCABC
78 inflated_encoder_feats = x.unsqueeze(1).permute((1,0,2,3)).repeat((beam_width,1,1,1)).permute((1,0,2,3)).contiguous().view(-1, l, d)

ValueError: too many values to unpack (expected 3)

CUDA out of memory problem

CUDA out of memory
训练模型时遇到以下问题。我想知道作者是否正在将所有数据加载到内存中吗?还有其他解决方法吗?非常感谢你。

12
12

About the testing time comsumption

According to the original paper, the recognition time for each image is 20ms. I modified demo.py, let it recognize all the images on the ICDAR2003 test set, and computed the average recognition time.

The following is my modification:

image

and the output is

average time = 168.360ms

which is larger than 20ms. I did the recognition on a single NVIDIA Tesla V100 GPU. I don't think using PyTorch will cause so much extra time consumption (I even ignore the time of reading image and preprocessing). So I wonder what is the problem.

Thanks very much.

@ayumiymk 有关lexicon search的问题

@ayumiymk 请教一下lexicon search的问题; 我现在的应用是识别发票上的打印文字, 我目前训练了一个文字识别模型,文字识别率在84%; 发票上的文字主要是药品名称和治疗方法名称, 现在我想构建一个词库, 然后采用lexicon search的方法提高文字识别率; 由于发票上的文字经常挨着,而且没有标点符号分隔(比如药品名称左右分别和编号或规格信息挨着), 这样需要自己分词, 分词完后我采用lexicon search的方法来查找词库中匹配的词; 现在的问题是有些发票上识别错的药品名称能够通过lexicon search纠正识别错误, 但也有些识别正确的药品名称会弄错; 我目前所用的lexicon search 方案是调用fuzzywoozy库的extract()函数, 这个函数会计算待查药品名称和词库中的词语之间的edit distance, 并给出一个score;不知道你在采用lexicon search时有没有遇到类似问题?

Evaluation is slow

Hi, @ayumiymk ,thanks for your greate work. I find evaluation takes more time than training for the same number of iterations. An iteration in the training process requires 0.6s, but it requires at least 30s in evalutation process. How do I speed it up?

您好,请教您一些问题

我在ocr方面还算是新手,运行了您的代码之后有一些疑问。我发现模型似乎没有办法识别空格(被认为是背景?),如果想要自动识别并分开两个英文单词应该怎么做呢?使用包含空格的数据集训练吗?

预训练模型

你好请问一下由于我下载aster-pytorch的预训练模型太慢,请问你能不能发我一份。谢谢。

synthetic_train_data_dir

in stn_att_rec.sh
CUDA_VISIBLE_DEVICES=0,1 python main.py
--synthetic_train_data_dir /data/mkyang/scene_text/recognition/CVPR2016/ /data/mkyang/scene_text/recognition/NIPS2014/
what's in CVPR2016? and what's in NIPS2014?

lmdb.Error: /share/zhui/reg_dataset/NIPS2014: No such file or directory

When I clone this project and run it. I had this error, and I haven't known how to fix it now. I hope you can help me with this issue. Thank you so much!
---***----
Traceback (most recent call last):
File "main.py", line 230, in
main(args)
File "main.py", line 127, in main
args.height, args.width, args.batch_size, args.workers, True, args.keep_ratio)
File "main.py", line 39, in get_data
dataset_list.append(LmdbDataset(data_dir_, voc_type, max_len, num_samples))
File "/home/wanna/Documents/aster.pytorch/lib/datasets/dataset.py", line 53, in init
self.env = lmdb.open(root, max_readers=32, readonly=True)
lmdb.Error: /share/zhui/reg_dataset/NIPS2014: No such file or directory

您的代码里面是否包含文字检测定位功能呢?

  尊敬的作者您好,首先非常感谢您开源这么好的代码和模型,我正在使用您的代码和模型做一个关于VQA的研究。但是我发现您的代码好像只能处理截取好的图片,而不能处理完整的图片(比如一张自然场景下的图片,该图片中可能包含多个文本信息)。您有关于相关代码或者模型解决这个问题吗?

original data

Dear @ayumiymk,
Thank you for this repo,

How do I create a .pth file with the original data?

ValueError: => No checkpoint found at '/home/ubuntu/ASTER/logs/'

CUDA_VISIBLE_DEVICES=0,1 python main.py \
  --synthetic_train_data_dir /home/ubuntu/ASTER/lmdb_data/32x32_960/train/ \
  --test_data_dir /home/ubuntu/ASTER/lmdb_data/32x32_960/test/ \
  --batch_size 1024 \
  --workers 8 \
  --height 64 \
  --width 256 \
  --voc_type JAPANESE \
  --arch ResNet_ASTER \
  --logs_dir /home/ubuntu/ASTER/logs/baseline_aster \
  --real_logs_dir /home/ubuntu/ASTER/logs/baseline_aster \
  --max_len 100 \
  --STN_ON \
  --tps_inputsize 32 64 \
  --tps_outputsize 32 100 \
  --tps_margins 0.05 0.05 \
  --stn_activation none \
  --num_control_points 20 \

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.