mingtzge / 2019-ccf-bdci-ocr-mczj-fake_data_generator Goto Github PK
View Code? Open in Web Editor NEW2019CCF-BDCI大赛 OCR赛题第一名 天晨破晓团队 仿真数据生成方案源码
2019CCF-BDCI大赛 OCR赛题第一名 天晨破晓团队 仿真数据生成方案源码
你好,非常感谢分享你们的方案和代码,有个问题请教下,生成复印无效数据集第一阶段的的数据集1的时候,choosed_imgs_10_2:"复印无效"水印打在空白处的图片(只有部分样本,用这些样本运行python move_watermask_location.py,只能得到几百张图片), 比赛中choosed_imgs_10_2中有多少样本,如果想要用更多的样本,需要自己去训练集中挑选出这些水印打在空白的地方的样本吗?谢谢
你好!我看了一下初赛中”复印无效”两种生成数据的方式,first_train利用训练数据中水印位于空白区域的图片,使用模板匹配找到水印位置,并在其他地方生成水印;second_finetune直接生成用身份证模板生成水印。我想问的问题是:我觉得后一种生成方式(second_finetune)明显效果好一些(水印),那使用first_train生成数据还有必要吗?这样做是为了多添加一些训练数据吗?还是first_train生成的数据相比于second_finetine有什么优势吗?谢谢!
cb@ceo-pc:~/code/ccf_ocr/2/fake_data_generator/chusai_fuyinwuxiao/first_train$ python move_watermask_location.py
0a5f692293f34acfb7fa006b910c2598_1.jpg
Traceback (most recent call last):
File "move_watermask_location.py", line 149, in
m_run(img)
File "move_watermask_location.py", line 101, in m_run
father_cor, img_father = match_img(img_path, roi_img_path, tp_threshold)
File "move_watermask_location.py", line 21, in match_img
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
cv2.error: OpenCV(4.1.1) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'
(tensorflow) F:\2019-CCF-BDCI-OCR-MCZJ-fake_data_generator-master\chusai_fuyinwuxiao\first_train>python add_rematch_watermask_self.py
00a0d1ba365f44f280a2adc22edf8c5e_0.jpg
00a0d1ba365f44f280a2adc22edf8c5e_1.jpg
../../Train_DataSet_final\00a0d1ba365f44f280a2adc22edf8c5e_1.jpg template failed!!
00a1eec24f304c20ab477e2acf6a73bf_0.jpg
../../Train_DataSet_final\00a1eec24f304c20ab477e2acf6a73bf_0.jpg template failed!!
00a1eec24f304c20ab477e2acf6a73bf_1.jpg
../../Train_DataSet_final\00a1eec24f304c20ab477e2acf6a73bf_1.jpg template failed!!
0a0a3bd703994168b7764b8cbd98d6ef_0.jpg
../../Train_DataSet_final\0a0a3bd703994168b7764b8cbd98d6ef_0.jpg template failed!!
0a0a3bd703994168b7764b8cbd98d6ef_1.jpg
Traceback (most recent call last):
File "add_rematch_watermask_self.py", line 133, in
gen_run(img)
File "add_rematch_watermask_self.py", line 99, in gen_run
im_after = add_text_to_image(origin_img, u'复印无效', bright_thr, new_pt)
File "add_rematch_watermask_self.py", line 40, in add_text_to_image
image_draw.rectangle((p_new, (p_new[0] + 177, p_new[1] + 50)), outline=(0, 0, 0, bright_random), width=4)
TypeError: rectangle() got an unexpected keyword argument 'width'
想咨询下这个问题如何解决,谢谢
python gen_card.py只生成20000万图片,没有生成36000张
请问二代身份证正反底纹图片是从哪来的?还有比赛用的水印模板图片是怎么获得的?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.