zhang0jhon / attentionocr Goto Github PK
View Code? Open in Web Editor NEWScene text recognition
Scene text recognition
首先很感谢可以分享代码,我想问一下,可以不通过docker的方式运行吗?
python test.py
2019-11-13 16:33:41.908568: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-13 16:33:41.950705: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 16:33:41.951266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-11-13 16:33:41.951360: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951422: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951534: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951589: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951644: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:42.539829: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 16:33:42.539906: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-11-13 16:33:42.540637: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-13 16:33:42.575270: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-11-13 16:33:42.576026: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c16973a10 executing computations on platform Host. Devices:
2019-11-13 16:33:42.576040: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2019-11-13 16:33:42.576103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-13 16:33:42.576110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]
2019-11-13 16:33:42.656045: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 16:33:42.656412: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c18c65640 executing computations on platform CUDA. Devices:
2019-11-13 16:33:42.656424: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
Traceback (most recent call last):
File "test.py", line 121, in
test(args)
File "test.py", line 91, in test
model = TextRecognition(args.pb_path, cfg.seq_len+1)
File "test.py", line 23, in init
self.init_model()
File "test.py", line 37, in init_model
self.label_ph = self.sess.graph.get_tensor_by_name('label:0')
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3972, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3838, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'label:0' refers to a Tensor which does not exist. The operation, 'label', does not exist in the graph."
For the recognition part, I noticed that it's a simple 'for loop', I want to improve performance with batch predict, so I made subtle changes just to test:
pads = [image_padded, image_padded]
image_padded = np.array(pads)
print("Batch images: ", image_padded.shape)
# Batch images: (2, 299, 299, 3)
texts, probs = self.model.predict(image_padded, self.label_dict)
Then I got following error:
ValueError: Cannot feed value of shape (2, 299, 299, 3) for Tensor 'image:0', which has shape '(1, 299, 299, 3)'
Why 'image:0' has shape '(1, 299, 299, 3)' rather than '(?, 299, 299, 3)'? Is it fixed when training? Really appreciate any suggestions on how to fix this
Hi, do you have any plan to open source for detection part? Appropriate that if you can open it.
W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 358.89MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
普通显卡
tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GT 1030 major: 6 minor: 1 memoryClockRate(GHz): 1.5185 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 1.63GiB
您好,经过您的耐心帮助 我已经跑通docker里的demo能够通过网页上传图片查看识别结果,如果我想直接通过直接运行python文件根据本地图片文件夹图片进行检测识别该怎么办?期待您的回复 @zhang0jhon
因为电脑里没有docker,谢谢大佬呀
seq_len是单个文本行最多可识别的字符数,是这个意思吧。
我现在想训一个短文本行的模型,最长seq_len设置为16,请问还需要修改哪些地方?
直接改为16报错,报ValueRrror,维度不匹配, 具体错误为cannot feed value of shape(16,33) for Tensor 'label:0', which has shape(?,17)
麻烦您指点一下,多谢
假设已经定位到文字部分(暂不考虑定位方法),若采用AttentionOCR去识别,识别结果是针对图片中文字整体识别还是针对图片中的文字一个一个进行识别,因为之前采用crnn-ctc的模型是对图片中的文字一起识别,但是我看到您的images文件夹中图片有标识每一个汉字的识别概率,不知道我表达清楚没有^~^
在flask页面上传图片后报错了,
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2309, in call
return self.wsgi_app(environ, start_response)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2295, in wsgi_app
response = self.handle_exception(e)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1741, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functionsrule.endpoint
File "/ocr/ocr/flaskapp.py", line 134, in predict_ocr_image
image = detection(img_path, ocr_detection_model, ocr_recognition_model, ocr_label_dict)
File "/ocr/ocr/flaskapp.py", line 242, in detection
r_boxes, polygons, scores = detection_model.predict(bgr_image)
File "/ocr/ocr/text_detection.py", line 60, in predict
r_box, polygon = generate_polygon(mask, box)
File "/ocr/ocr/util.py", line 559, in generate_polygon
contours, hierarchy = cv2.findContours(mask_int,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack (expected 2)
您好,请问可以提供直接用来测试的权重吗
你好:
请问我再运行完python flaskapp.py,上传图片之后,进行预测的时候,会显示下面的错误,请问这是什么原因导致的?
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
[[node pool0/MaxPool (defined at /tensor_flow/OCRSpace/ocr/ocr/text_detection.py:29) ]]
Original stack trace for 'pool0/MaxPool':
请问应该怎么修改呢?多谢。。。
请问下您,在做文字识别的时候,有使用其他的数据集或者自己制作的数据集吗,如果有的话,方便分享一下吗?如果不方便分享,可以说下下思路吗?
类似这个,text_recognition_5435.pb
你好,请问一下docker版本里检测文本的坐标位置在哪儿?输出的坐标是什么?还有resize是什么意思?坐标是resize之后的吗?我想输出检测文本在原图中的坐标该怎么办? @zhang0jhon
我看了训练图像输入的size是256*256的,不知道我改一下,对长行的效果怎么样,请问您那边有测试吗?谢谢
我看代码是可以改的,如果可行,我打算转换一下自已的数据试下,3Q。
would you mind sharing the speed
你好,请问有可供训练时所用的docker吗?
这个docker可以在CPU环境下跑吗?
Hey Thank you for all this work
couldn't find the checkpoint folder and the text_recognition_5435.pb model
我从比赛官网下载了数据,但是没在程序里找到读的地方。
在docker 里文字检测模型初始化:
TextDetection(detection_pb, tf_config, max_size=1600)
请问这个1600是否必须那么大?输入太大运行性能太差了。
请问该模型的paper会在什么时候发
麻烦问下这个是什么原因啊。tensorflow-gpu==1.14.0 这个版版
Hi:
when i perform nvidia-docker run --runtime=nvidia -p 5000:5000 -it zhang0jhon/demo:ocr bash
errors comes:
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/1dd9d7b67a05f6c1b95ad52e6ada9b2ff3e9f249c85d214f405feb610c19b569/log.json: no such file or directory): fork/exec /usr/bin/nvidia-container-runtime: no such file or directory: : unknown.
thank you!
test.py line 99, 定义points的时候height 和 width 是不是反了?
您好,我在ALLOWED_EXTENSIONS看到有pdf,但是没找到相应实现
I would be really interested in having a public web site with the Docker image running. So that we could easily try AttentionOCR on test images.
但是docker里面检测和识别是两个模型
from utils.np_box_ops import iou as np_iou
ImportError: No module named 'utils'
请问下您的AttentionOCR模型是单独训练的,还是和mask-rcnn结合在一起一起训练的?如果单独训练AttentionOCR模型,数据格式是FSNS-tfrecord吗?因为我准备用自己的数据训练,不去下载ICDAR官方数据。
这个模型可以在单机单卡下跑吗
你好,请问文件icdar_datasets.npy里面是什么内容,它的格式是什么样的?
might i ask if it's possible to finetune with pre-trained model with extra data
if i can add new characters in the file below and then train with existing model?https://github.com/zhang0jhon/AttentionOCR/blob/master/label_dict/icdar_labels.txt
if using https://github.com/Belval/TextRecognitionDataGenerator for synthesize text, what about be the masks, bboxes and points for the data?
thanks.
这个错误是什么意思?完全没用过docker,难过。。。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.