tongpi / basicocr Goto Github PK
View Code? Open in Web Editor NEWBasicOCR是一个致力于解决自然场景文字识别算法研究的项目。该项目由长城数字大数据应用技术研究院佟派AI团队发起和维护。
Home Page: https://tongpi.github.io/basicOCR/
License: GNU General Public License v3.0
BasicOCR是一个致力于解决自然场景文字识别算法研究的项目。该项目由长城数字大数据应用技术研究院佟派AI团队发起和维护。
Home Page: https://tongpi.github.io/basicOCR/
License: GNU General Public License v3.0
#coding: utf-8
import torch
from torch.autograd import Variable
import utils
import dataset
import os
from PIL import Image
import models.crnn as crnn
#os.environ["CUDA_VISIBLE_DEVICES"] ="1"
model_path = './data/netCRNN_ch_nc_21_nh_128.pth'
img_path = './data/image33.jpg'
alphabet = u''ACIMRey万下依口哺摄次状璐癌草血运重'
#print(alphabet)
nclass = len(alphabet) + 1
if torch.cuda.is_available():
model = crnn.CRNN(32, 1, nclass, 128).cuda()
pre_model = torch.load(model_path)
else:
model = crnn.CRNN(32, 1, nclass, 128)
pre_model = torch.load(model_path,map_location=lambda storage, loc: storage)
print('loading pretrained model from %s' % model_path)
for k,v in pre_model.items():
print(k,len(v))
model.load_state_dict(pre_model)
converter = utils.strLabelConverter(alphabet)
transformer = dataset.resizeNormalize((100, 32))
image = Image.open(img_path).convert('L')
#是否含有GPU
if torch.cuda.is_available():
image = transformer(image).cuda()
else:
image = transformer(image)
image = image.view(1, *image.size())
image = Variable(image)
model.eval()
preds = model(image)
_, preds = preds.max(2)
preds = preds.squeeze(2)
preds = preds.transpose(1, 0).contiguous().view(-1)
preds_size = Variable(torch.IntTensor([preds.size(0)]))
raw_pred = converter.decode(preds.data, preds_size.data, raw=True)
sim_pred = converter.decode(preds.data, preds_size.data, raw=False)
print('%-20s => %-20s' % (raw_pred.encode('utf8'), sim_pred.encode('utf8')))
@YoungMiao ,我从另外一个 issue 看到你提供了百度云的中文语料库的下载地址:http://pan.baidu.com/s/1jHYJeh4 密码:fdtk。
我下载下来后,发现需要解压密码,请问解压密码是什么?谢谢!
感谢~
麻烦问一下,crnn, 我用多个gpu训练的loss感觉不对,与单gpu训练相同Loss值的模型,预测结果非常差,单gpu训练的模型预测是正确的。@ wulivicte,请问您遇到过这种问题吗,怎么解决的,非常感谢。
Hello,
l'm wondering whether the CRNN is able to output also the probability of each sequence
from example :
--h-e--ll-oo- => 'hello' with a probability= 0.89
for instance
how can l get that ?
in the code CTCLoss can't find these probabilites .
However l don't find where to print the output probabilities in CTCloss()
. In __init__.py
the CTC class is defined as follow :
class _CTC(Function):
def forward(self, acts, labels, act_lens, label_lens):
is_cuda = True if acts.is_cuda else False
acts = acts.contiguous()
loss_func = warp_ctc.gpu_ctc if is_cuda else warp_ctc.cpu_ctc
grads = torch.zeros(acts.size()).type_as(acts)
minibatch_size = acts.size(1)
costs = torch.zeros(minibatch_size)
loss_func(acts,
grads,
labels,
label_lens,
act_lens,
minibatch_size,
costs)
self.grads = grads
self.costs = torch.FloatTensor([costs.sum()])
return self.costs
def backward(self, grad_output):
return self.grads, None, None, None
class CTCLoss(Module):
def __init__(self):
super(CTCLoss, self).__init__()
def forward(self, acts, labels, act_lens, label_lens):
"""
acts: Tensor of (seqLength x batch x outputDim) containing output from network
labels: 1 dimensional Tensor containing all the targets of the batch in one sequence
act_lens: Tensor of size (batch) containing size of each output sequence from the network
act_lens: Tensor of (batch) containing label length of each example
"""
_assert_no_grad(labels)
_assert_no_grad(act_lens)
_assert_no_grad(label_lens)
return _CTC()(acts, labels, act_lens, label_lens)
Hello,This model is RARE?Why it is CRNN in here.
您好,我看textbox实验的内容,环境配置的问题,请教如何配置呢?Ubuntu14.04系统下配安装docker?“服务器上用nvidia-docker从镜像gds/keras-th-tf-opencv中新建了caffe_ys容器。按照caffe的依赖文件,并编译GPU版本。”这句话不理解╮(╯▽╰)╭
我在运行 crnn_main.py的时候,因为我的类和预训练模型不一样,所以需要修改,我修改了key.py里面的汉字,再crnn_main修改requires_grad但是没找到修改类的地方,想请假一下在哪里修改啊
我上传图片中的数字都被识别成了英文字母,请问该怎么办呢?
Hi,
can anyone please tell me how to train this model for transfer learning.
My model is working fine. But when I train it I am getting following error.
Traceback (most recent call last): File "crnn_main.py", line 191, in <module> train_iter = iter(train_loader) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 303, in __iter__ return DataLoaderIter(self) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 143, in __init__ self.sample_iter = iter(self.sampler) File "/home/pranay/crnn.pytorch/dataset.py", line 99, in __iter__ random_start = random.randint(0, len(self) - self.batch_size) File "/usr/lib/python2.7/random.py", line 242, in randint return self.randrange(a, b+1) File "/usr/lib/python2.7/random.py", line 218, in randrange raise ValueError, "empty range for randrange() (%d,%d, %d)" % (istart, istop, width) ValueError: empty range for randrange() (0,-33, -33) Exception AttributeError: "'DataLoaderIter' object has no attribute 'shutdown'" in <bound method DataLoaderIter.__del__ of <torch.utils.data.dataloader.DataLoaderIter object at 0x7ff08e6da050>> ignored
I am using the following line on terminal to run the code
python crnn_main.py --trainroot="Pranay/train_set/" --valroot="Pranay/validation_set" --alphabet='0123456789abcdefghijklmnopqrstuvwxyz !-%.'"'"',#&$\/[]:()?;'
I just want to test that training is possible before I train the complete model. Hence I have selected validation set with 10 images and train set with 30 images. I know it is not possible to train by using just 30 images. But still, I just want to see my training on CPU working. I think this error has something to do with the small size of training data.
When I try to run for more images (150-200), my computer gets hanged. Hence, I am trying for small subset.
I am training on ICDAR 2015 data.
I generated .mdb data by using input images path list as (This is just a sample. I hae more data in both lists)
and label path list as
['Genaxis Theatre',
'[06]',
'62-03',
'Carpark',
]
I have generated train.mdb and lock.mdb files both in folder train_set and validation_set. I have changed the alphabet to take into account new special characters as
--alphabet='0123456789abcdefghijklmnopqrstuvwxyz !-%.'"'"',#&$/[]:()?;'
Please, can you help me train my model? Any help is really appreciated.
Thanks
打扰您,请问您的crnn模型涵盖的中文有多少个字,是常用字吗
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.