zhang0jhon / attentionocr Goto Github PK

View Code? Open in Web Editor NEW

819.0 819.0 261.0 34.6 MB

Scene text recognition

Python 100.00%

attentionocr's People

Contributors

Stargazers

Watchers

Forkers

huangqiuyu friendly-shaowei rid7 roughsoft wangxiong101 attendfov ccszwg basispoint kentchun33333 aiplus2019 tigermachinelearning dlml zh672903 xgmiao siyecao99999 qdzhangyuhui zhangxiao339 33152811 dreamplayerzhang happog teresasun xiangliu886 robingong ishine bachelorwangwei challenging6 teliduxing004 zipengfeng banyueqin highclow cqray1990 llf10811020205 trami1995 zhenming33 zhenyu66 lmpan shengzhang90 xiliu strategist922 12345fengce richlaji sunxingxingtf liuwenhaha hhsummerwind wuxiaolianggit billyzju hemingwaycn wanke15 kapitsa2811 tony1236 sunweiconfidence jtpils xrosliang dreadlord1984 jingmouren jangocheng wwwanghao dataorz tchigher fleapo enigmahong realzheng dx111 vilon888 zonasw wangbingok1118 rkshuai adewin yangzeyu95 intgogo gpsbird yisampi ouya-bytes sunzhuojun ericxsun sunchengrong wengbenjue szkszk95 felixzhang7 trantorrepository gm19900510 duducode sailinglqh holygen dont32 seeker1943 clscy duanshuai123 caoyangcr7 qq2737499951 liben2018 gehongpeng hiker2046 lanwong1 wangning7149 shitoubiao wjinhai hhliao pkq1688 gq124

attentionocr's Issues

Your privacy is in the docker image

python test.py
2019-11-13 16:33:41.908568: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-13 16:33:41.950705: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 16:33:41.951266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-11-13 16:33:41.951360: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951422: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951534: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951589: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951644: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:42.539829: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 16:33:42.539906: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-11-13 16:33:42.540637: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-13 16:33:42.575270: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-11-13 16:33:42.576026: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c16973a10 executing computations on platform Host. Devices:
2019-11-13 16:33:42.576040: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2019-11-13 16:33:42.576103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-13 16:33:42.576110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]
2019-11-13 16:33:42.656045: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 16:33:42.656412: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c18c65640 executing computations on platform CUDA. Devices:
2019-11-13 16:33:42.656424: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
Traceback (most recent call last):
File "test.py", line 121, in
test(args)
File "test.py", line 91, in test
model = TextRecognition(args.pb_path, cfg.seq_len+1)
File "test.py", line 23, in init
self.init_model()
File "test.py", line 37, in init_model
self.label_ph = self.sess.graph.get_tensor_by_name('label:0')
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3972, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3838, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'label:0' refers to a Tensor which does not exist. The operation, 'label', does not exist in the graph."

Recognition model batch predict

For the recognition part, I noticed that it's a simple 'for loop', I want to improve performance with batch predict, so I made subtle changes just to test:

pads = [image_padded, image_padded]
image_padded = np.array(pads)
print("Batch images: ", image_padded.shape)
# Batch images:  (2, 299, 299, 3)

texts, probs = self.model.predict(image_padded, self.label_dict)

Then I got following error:

ValueError: Cannot feed value of shape (2, 299, 299, 3) for Tensor 'image:0', which has shape '(1, 299, 299, 3)'

Why 'image:0' has shape '(1, 299, 299, 3)' rather than '(?, 299, 299, 3)'? Is it fixed when training? Really appreciate any suggestions on how to fix this

Could you open source the detection model training part?

Hi, do you have any plan to open source for detection part? Appropriate that if you can open it.

docker运行demo时内存不够怎么配置

W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 358.89MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

普通显卡
tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GT 1030 major: 6 minor: 1 memoryClockRate(GHz): 1.5185 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 1.63GiB

can the recognize model do with horizontal text？

does the model work with the text image like this directly？

请问一下，该项目中的识别部分的样本怎么获取？

您好，如何不通过网页直接运行python文件查看识别结果？

您好，经过您的耐心帮助我已经跑通docker里的demo能够通过网页上传图片查看识别结果，如果我想直接通过直接运行python文件根据本地图片文件夹图片进行检测识别该怎么办？期待您的回复 @zhang0jhon

请问可以提供测试的pb模型权重？

因为电脑里没有docker，谢谢大佬呀

可识别字符长度改为16

seq_len是单个文本行最多可识别的字符数，是这个意思吧。
我现在想训一个短文本行的模型，最长seq_len设置为16，请问还需要修改哪些地方？
直接改为16报错，报ValueRrror，维度不匹配, 具体错误为cannot feed value of shape(16,33) for Tensor 'label:0', which has shape(?,17)
麻烦您指点一下，多谢

请问nvidia-docker 跨平台吗，在win下能使用吗

How about memory need to train the model

关于识别的问题

假设已经定位到文字部分（暂不考虑定位方法），若采用AttentionOCR去识别，识别结果是针对图片中文字整体识别还是针对图片中的文字一个一个进行识别，因为之前采用crnn-ctc的模型是对图片中的文字一起识别，但是我看到您的images文件夹中图片有标识每一个汉字的识别概率，不知道我表达清楚没有^~^

关于如何模型感受野问题

如上图,,当字体和图片比较起来偏小时,会识别出来一堆# 字符,请教一下,针对这种场景有什么办法可以解决?

上传图片报错

在flask页面上传图片后报错了，

File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2309, in call
return self.wsgi_app(environ, start_response)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2295, in wsgi_app
response = self.handle_exception(e)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1741, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functionsrule.endpoint
File "/ocr/ocr/flaskapp.py", line 134, in predict_ocr_image
image = detection(img_path, ocr_detection_model, ocr_recognition_model, ocr_label_dict)
File "/ocr/ocr/flaskapp.py", line 242, in detection
r_boxes, polygons, scores = detection_model.predict(bgr_image)
File "/ocr/ocr/text_detection.py", line 60, in predict
r_box, polygon = generate_polygon(mask, box)
File "/ocr/ocr/util.py", line 559, in generate_polygon
contours, hierarchy = cv2.findContours(mask_int,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack (expected 2)

您好，请问可以提供直接用来测试的权重吗

好的看到啦，谢谢！

Originally posted by @Byronnar in #3 (comment)

请问可以分享一下，pb模型权重吗？我没有docker，谢谢鸭

Default MaxPoolingOp only supports NHWC on device type CPU

你好：
请问我再运行完python flaskapp.py，上传图片之后，进行预测的时候，会显示下面的错误，请问这是什么原因导致的？
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
[[node pool0/MaxPool (defined at /tensor_flow/OCRSpace/ocr/ocr/text_detection.py:29) ]]

Original stack trace for 'pool0/MaxPool':

请问应该怎么修改呢？多谢。。。

关于数据扩充

请问下您，在做文字识别的时候，有使用其他的数据集或者自己制作的数据集吗，如果有的话，方便分享一下吗？如果不方便分享，可以说下下思路吗？

作者您好，代码很优秀，请问训练好的文字检测模型会开源嘛？

类似这个，text_recognition_5435.pb

How to run on cpu ??

关于检测文本的坐标

你好，请问一下docker版本里检测文本的坐标位置在哪儿？输出的坐标是什么？还有resize是什么意思？坐标是resize之后的吗？我想输出检测文本在原图中的坐标该怎么办？ @zhang0jhon

您好，请问对长行的效果如何

我看了训练图像输入的size是256*256的，不知道我改一下，对长行的效果怎么样，请问您那边有测试吗？谢谢
我看代码是可以改的，如果可行，我打算转换一下自已的数据试下，3Q。

配置文件中model_name的值‘ocr’与‘ocr_with_normalized_bbox’有什么区别？如果训练样本是长文本，image_size需要改变吗，增大它的值，还需要修改其它配置吗，谢谢！

您好，请问详细的版本

用的CUDA10吗？
另外要单独 conda install keras-gpu.吗，如果不跟版本号直接会把tensorflow直接更新到2.0
刚开始报这个错： No OpKernel was registered to support Op 'NcclAllReduce' with these attrs.
参考的：https://ask.csdn.net/questions/931786
后来重新装了个环境：tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
期待回复，谢谢。

speed

would you mind sharing the speed

训练时所用的docker环境

你好，请问有可供训练时所用的docker吗？

可以在cpu下跑吗

这个docker可以在CPU环境下跑吗？

text_recognition_5435.pb model

Hey Thank you for all this work
couldn't find the checkpoint folder and the text_recognition_5435.pb model

请问数据需要处理成numpy形式的么？好像不能直接用文件夹的图片

我从比赛官网下载了数据，但是没在程序里找到读的地方。

你好，请问能否改变TextDetection的max_size？

在docker 里文字检测模型初始化：
TextDetection(detection_pb, tf_config, max_size=1600)

请问这个1600是否必须那么大？输入太大运行性能太差了。

关于paper的问题

请问该模型的paper会在什么时候发

我能修改‘'label_dict/icdar_labels.txt'’文件吗，添加汉字吗

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'MaxBytesInUse' used by node GPUMemoryTracker/MaxBytesInUse

麻烦问下这个是什么原因啊。tensorflow-gpu==1.14.0 这个版版

docker run failed

Hi：
when i perform nvidia-docker run --runtime=nvidia -p 5000:5000 -it zhang0jhon/demo:ocr bash

errors comes:
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/1dd9d7b67a05f6c1b95ad52e6ada9b2ff3e9f249c85d214f405feb610c19b569/log.json: no such file or directory): fork/exec /usr/bin/nvidia-container-runtime: no such file or directory: : unknown.

thank you！

if i can add new characters in the file below and then train with existing model?https://github.com/zhang0jhon/AttentionOCR/blob/master/label_dict/icdar_labels.txt
if using https://github.com/Belval/TextRecognitionDataGenerator for synthesize text, what about be the masks, bboxes and points for the data?

thanks.

flag provided but not defined: --runtime

这个错误是什么意思？完全没用过docker，难过。。。