tongpi / basicocr Goto Github PK

View Code? Open in Web Editor NEW

344.0 40.0 127.0 97.71 MB

BasicOCR是一个致力于解决自然场景文字识别算法研究的项目。该项目由长城数字大数据应用技术研究院佟派AI团队发起和维护。

Home Page: https://tongpi.github.io/basicOCR/

License: GNU General Public License v3.0

Python 87.18% Java 8.62% Lua 4.19%

ocr synthtext lstm cnn gan rnn

basicocr's People

Stargazers

Watchers

Forkers

gitter-badger jsmilemsj jennyidream0318 youkpan cxf2015 weiliangxiao zgsxwsdxg yingning tobechao yaoq jdc08161063 knightofdawn benjamesbabala wanjinchang 10183308 shiyongde stevenlol dreadlord1984 allensmile marketjob kyocen denghongdong gewenpulan xuanhan863 scottai equalll hajungong007 lageek bitwangdan zhuruihe leezqcst hailiang-wang danchi010 watkyns sangermax dwj1979 senlinuc songfang baleen-y paulyoufu pengfei2017 cvtower caizikun youngstu taichu012 juventi elviswf peterlee08 maogeslx public666 cool-lab giserh vivounicorn lxj0276 gpsbird hangtongluo cangyuwangwc wellenwoo sikaizhou bygreencn smilewsw clement9527 newdimitri alexliyang lovaster luciaark jixiangyibao matrixplayer delonzhou lzufalcon duniang818 tim-lee-cn onedollor duboya cystory senitco yuanhang8605 zju-plp breakend2010 tustslf software8899 lyqsr kevinyzy1 gavin666github yunhai0920 yangjian615 xiaodongdreams labimage ly774508966 zhangwudisong ieee820 plus-vision 1329844541 wangyanna1991 jack19861225 landwolf orientswift missyangx phrmgb zuoqiwen

basicocr's Issues

修改demo代码，使之适用于cpu和gpu

#coding: utf-8
import torch
from torch.autograd import Variable
import utils
import dataset
import os
from PIL import Image

import models.crnn as crnn

#os.environ["CUDA_VISIBLE_DEVICES"] ="1"
model_path = './data/netCRNN_ch_nc_21_nh_128.pth'
img_path = './data/image33.jpg'
alphabet = u''ACIMRey万下依口哺摄次状璐癌草血运重'
#print(alphabet)
nclass = len(alphabet) + 1

判断是否含有GPU

if torch.cuda.is_available():
model = crnn.CRNN(32, 1, nclass, 128).cuda()
pre_model = torch.load(model_path)
else:
model = crnn.CRNN(32, 1, nclass, 128)
pre_model = torch.load(model_path,map_location=lambda storage, loc: storage)

print('loading pretrained model from %s' % model_path)
for k,v in pre_model.items():
print(k,len(v))
model.load_state_dict(pre_model)

converter = utils.strLabelConverter(alphabet)

transformer = dataset.resizeNormalize((100, 32))
image = Image.open(img_path).convert('L')

#是否含有GPU
if torch.cuda.is_available():
image = transformer(image).cuda()
else:
image = transformer(image)

image = image.view(1, *image.size())
image = Variable(image)

model.eval()
preds = model(image)

_, preds = preds.max(2)
preds = preds.squeeze(2)
preds = preds.transpose(1, 0).contiguous().view(-1)

preds_size = Variable(torch.IntTensor([preds.size(0)]))
raw_pred = converter.decode(preds.data, preds_size.data, raw=True)
sim_pred = converter.decode(preds.data, preds_size.data, raw=False)
print('%-20s => %-20s' % (raw_pred.encode('utf8'), sim_pred.encode('utf8')))

请问中文语料库的解压密码是什么, crnn 模型

@YoungMiao ，我从另外一个 issue 看到你提供了百度云的中文语料库的下载地址：http://pan.baidu.com/s/1jHYJeh4 密码：fdtk。

我下载下来后，发现需要解压密码，请问解压密码是什么？谢谢！

能提供一下Textbox.md中提到的voc_label_text_xml_for_ssd.py吗？

感谢～

多GPU训练loss不对，单GPU训练没有问题

麻烦问一下，crnn, 我用多个gpu训练的loss感觉不对，与单gpu训练相同Loss值的模型，预测结果非常差，单gpu训练的模型预测是正确的。@ wulivicte，请问您遇到过这种问题吗，怎么解决的，非常感谢。

How can l get the probability of the sequence outputted by CRNN ?

Hello,

l'm wondering whether the CRNN is able to output also the probability of each sequence

from example :

--h-e--ll-oo- => 'hello' with a probability= 0.89
for instance
how can l get that ?

in the code CTCLoss can't find these probabilites .
However l don't find where to print the output probabilities in CTCloss(). In __init__.py the CTC class is defined as follow :

class _CTC(Function):
    def forward(self, acts, labels, act_lens, label_lens):
        is_cuda = True if acts.is_cuda else False
        acts = acts.contiguous()
        loss_func = warp_ctc.gpu_ctc if is_cuda else warp_ctc.cpu_ctc
        grads = torch.zeros(acts.size()).type_as(acts)
        minibatch_size = acts.size(1)
        costs = torch.zeros(minibatch_size)
        loss_func(acts,
                  grads,
                  labels,
                  label_lens,
                  act_lens,
                  minibatch_size,
                  costs)
        self.grads = grads
        self.costs = torch.FloatTensor([costs.sum()])
        return self.costs

    def backward(self, grad_output):
        return self.grads, None, None, None


class CTCLoss(Module):
    def __init__(self):
        super(CTCLoss, self).__init__()

    def forward(self, acts, labels, act_lens, label_lens):
        """
        acts: Tensor of (seqLength x batch x outputDim) containing output from network
        labels: 1 dimensional Tensor containing all the targets of the batch in one sequence
        act_lens: Tensor of size (batch) containing size of each output sequence from the network
        act_lens: Tensor of (batch) containing label length of each example
        """
        _assert_no_grad(labels)
        _assert_no_grad(act_lens)
        _assert_no_grad(label_lens)
        return _CTC()(acts, labels, act_lens, label_lens)

Is RARE?

Hello，This model is RARE?Why it is CRNN in here.

关于环境配置的疑问

您好，我看textbox实验的内容，环境配置的问题，请教如何配置呢？Ubuntu14.04系统下配安装docker？“服务器上用nvidia-docker从镜像gds/keras-th-tf-opencv中新建了caffe_ys容器。按照caffe的依赖文件，并编译GPU版本。”这句话不理解╮(╯▽╰)╭

请问下添加与训练模型怎么微调修改

我在运行 crnn_main.py的时候，因为我的类和预训练模型不一样，所以需要修改，我修改了key.py里面的汉字，再crnn_main修改requires_grad但是没找到修改类的地方，想请假一下在哪里修改啊

DA03解读代码

数字的识别率太低

我上传图片中的数字都被识别成了英文字母，请问该怎么办呢？

How to train a new model

Hi,
can anyone please tell me how to train this model for transfer learning.
My model is working fine. But when I train it I am getting following error.

Traceback (most recent call last): File "crnn_main.py", line 191, in <module> train_iter = iter(train_loader) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 303, in __iter__ return DataLoaderIter(self) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 143, in __init__ self.sample_iter = iter(self.sampler) File "/home/pranay/crnn.pytorch/dataset.py", line 99, in __iter__ random_start = random.randint(0, len(self) - self.batch_size) File "/usr/lib/python2.7/random.py", line 242, in randint return self.randrange(a, b+1) File "/usr/lib/python2.7/random.py", line 218, in randrange raise ValueError, "empty range for randrange() (%d,%d, %d)" % (istart, istop, width) ValueError: empty range for randrange() (0,-33, -33) Exception AttributeError: "'DataLoaderIter' object has no attribute 'shutdown'" in <bound method DataLoaderIter.__del__ of <torch.utils.data.dataloader.DataLoaderIter object at 0x7ff08e6da050>> ignored

I am using the following line on terminal to run the code
python crnn_main.py --trainroot="Pranay/train_set/" --valroot="Pranay/validation_set" --alphabet='0123456789abcdefghijklmnopqrstuvwxyz !-%.'"'"',#&$\/[]:()?;'

I just want to test that training is possible before I train the complete model. Hence I have selected validation set with 10 images and train set with 30 images. I know it is not possible to train by using just 30 images. But still, I just want to see my training on CPU working. I think this error has something to do with the small size of training data.

When I try to run for more images (150-200), my computer gets hanged. Hence, I am trying for small subset.

I am training on ICDAR 2015 data.
I generated .mdb data by using input images path list as (This is just a sample. I hae more data in both lists)

['../test_images/word_11.png',
'../test_images/word_12.png',
'../test_images/word_13.png',
'../test_images/word_14.png',]

and label path list as
['Genaxis Theatre',
'[06]',
'62-03',
'Carpark',
]

I have generated train.mdb and lock.mdb files both in folder train_set and validation_set. I have changed the alphabet to take into account new special characters as
--alphabet='0123456789abcdefghijklmnopqrstuvwxyz !-%.'"'"',#&$/[]:()?;'

Please, can you help me train my model? Any help is really appreciated.

Thanks

请问您训练中文包含多少个字，crnn模型

打扰您，请问您的crnn模型涵盖的中文有多少个字，是常用字吗