Code Monkey home page Code Monkey logo

trocr-chinese's Introduction

基于trocr(beit+roberta)实现对中文场景文字识别

trocr原地址(https://github.com/microsoft/unilm/tree/master/trocr)

实现功能

  • 单行/多行文字/横竖排文字识别
  • 不规则文字(印章,公式等)
  • 转 onnx模型
  • 表格识别
  • 模型蒸馏/DML(协作学习)
  • Prompt Learning

环境编译

docker build --network=host -t trocr-chinese:latest .
docker run --gpus all -it -v /tmp/trocr-chinese:/trocr-chinese trocr-chinese:latest bash

训练

初始化模型到自定义训练数据集

字符集准备参考cust-data/vocab.txt

vocab.txt
1
2
...
a
b
c
python gen_vocab.py \
       --dataset_path "dataset/cust-data/0/*.txt" \
       --cust_vocab ./cust-data/vocab.txt

初始化自定义数据集模型

下载预训练模型trocr模型权重

链接: https://pan.baidu.com/s/1rARdfadQlQGKGHa3de82BA 密码: 0o65.
google driver: https://drive.google.com/drive/folders/1ibOVCHu33asiMUaFT9FzvhFNM4z25cJY?usp=share_link

python init_custdata_model.py \   
    --cust_vocab ./cust-data/vocab.txt \  
    --pretrain_model ./weights \
    --cust_data_init_weights_path ./cust-data/weights
    
## cust_vocab 词库文件   
## pretrain_model 预训练模型权重   
## cust_data_init_weights_path 自定义模型初始化模型权重保存位置   

训练模型

数据准备,数据结构如下图所示

dataset/cust-data/0/0.jpg
dataset/cust-data/0/0.txt
...
dataset/cust-data/100/10000.jpg
dataset/cust-data/100/10000.txt

训练模型

python train.py \
       --cut_data_init_weights_path ./cust-data/weights \
       --checkpoint_path ./checkpoint/trocr-custdata \
       --dataset_path "./dataset/cust-data/*/*.jpg" \
       --per_device_train_batch_size 8 \
       --CUDA_VISIBLE_DEVICES 1

评估模型

拷贝checkpoint/trocr-custdata训练完成的pytorch_model.bin 到 ./cust-data/weights 目录下
python eval.py \
    --dataset_path "./data/cust-data/test/*/*.jpg" \
    --cust_data_init_weights_path ./cust-data/weights    

测试模型

## 拷贝训练完成的pytorch_model.bin 到 ./cust-data/weights 目录下
index = 2300 ##选择最好的或者最后一个step模型
cp ./checkpoint/trocr-custdata/checkpoint-$index/pytorch_model.bin ./cust-data/weights
python app.py --cust_data_init_weights_path ./cust-data/weights --test_img test/test.jpg

转onnx

python -m \
    transformers.onnx \
    hand-write \
    --feature=vision2seq-lm \
    hand-write-onnx --atol 1e-4

cp hand-write/vocab.json hand-write-onnx/

python onnx_test.py --model hand-write-onnx --test_img ./img/hand.png

预训练模型

模型 cer(字符错误率) acc(文本行) 下载地址 训练数据来源 训练耗时(GPU:3090)
hand-write(中文手写) 0.011 0.940 hand-write 密码: punl 数据集地址 8.5h(10epoch)
seal-ocr(印章识别) 0.006 0.956 整理后开放下载 -
im2latex(数学公式识别) - - - im2latex
TAL_OCR_TABLE(表格识别) - - - TAL_OCR_TABLE
TAL_OCR_MATH(小学低年级算式数据集) - - - TAL_OCR_MATH
TAL_OCR_CHN(手写中文数据集) 0.0455 0.674(标注质量不太高,例如:test_64/552.jpg 标注值:蝶恋花, 实际值:欧阳修 ) TAL_OCR_CHN 密码: 9kd8 TAL_OCR_CHN 0.6h(20epoch)
HME100K(手写公式) - - - HME100K

备注:后续所有模型会开源在这个目录下链接,可以自由下载. https://pan.baidu.com/s/1uSdWQhJPEy2CYoEULoOhRA 密码: vwi2

模型调用

手写识别

image

unzip hand-write.zip 
python app.py --cust_data_init_weights_path hand-write --test_img test/hand.png

## output: '醒我的昏迷,偿还我的天真。'

训练技巧

数据集较少时,可以采用数据增强的方法构造更多的数据,理论上几十万的数据(可不做数据增强,模型预训练已经见到过足够多的数据(票据类、证件类,打印、手写、拍照等场景)),可以收敛到90%以上的准确率(CER<0.05)
训练样本不要自己resize到384x384(后续会优化这个结构,目前预训练是384x384),保留原图即可,模型前处理processor会自动处理
如果要训练识别多行文字,文字行之间可以加一个特殊字符标记,例如:"1234\n4567\n89990"
fine-tune中英文以外的语言效果可能不太好(足够多的数据及足够steps也能收敛),因为没有在其他语言上预训练
遇到问题先分析一下自己的数据,然后增加一些训练的技巧去优化,不要指望模型解决100%的问题
本项目采用的encoder-decoder结构, 模型还是比较大,如果上生产对硬件开销大,也可以优化encoder(比如cnn结构的mobilenet,resnet)或者decoder(roberta-tiny),然后对其进行蒸馏
如果此项目不能解决您的问题,请选择其他项目,不要因为此项目影响自己的心情!!!

trocr-chinese's People

Contributors

wenlihaoyu avatar wenyinlong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trocr-chinese's Issues

关于印章识别

非常感谢,您的工作非常给了我很大的帮助!另外关于印章识别有几个问题想请教一下:

  1. trocr模型的输入是整个印章的部分输入吗,还是经过文本检测矫正后的印章文本区域呢?
  2. 什么时候可以公发布章识别的模型呢?
    期待您的解答

模型转onnx

你好,请问一下就是怎么将weights.zip里面对应的模型文件转换成onnx格式的模型文件。

我这边转换时,使用的是transformers 官方提供的脚本convert_graph_to_onnx.py

然后转换失败了,我看了下源码文件,卡在了token = nlp.tokenizer(xxx,xxx)代码上。

这是转换的代码:python -m transformers.convert_graph_to_onnx --framework pt --model "./weights" --check-loading --quantize "./onnx/model.onnx"

RuntimeError: CUDA error: device-side assert triggered! Solve!!!

出现这个问题是因为计算损失时,label超出模型输出的类别数造成的。
运行完python gen_vocab.py会生成自定义的vocab.txt(假如有100类),之后运行python init_custdata_model.py初始化模型等权重,确实将模型最后的分类层改成了100(自己print(model)看一下就知道了),初始化之后的vocab.json也确实是100类,但是tokenizer.json还是原来的11318类,所以在train.pyvocab = processor.tokenizer.get_vocab() 所获得的词表还是11318,那么自定义的词表中某个字符的标签就可能不是0-99,而是0-11317,做交叉熵损失时,模型输出100类,你的标签是1000,所以会发生错误!
解决方法:将生成好的cust-data/weights/vocab.json,替换weights/vocab.json,再重新跑一遍python init_custdata_model.py,得到正确的tokenizer.json文件!

手写识别训练数据增强问题

请问,手写模型训练所使用的数据是否有添加背景再训练,注意到数据集中的图片都是完全的纯白色背景,这样训练出来的模型,对有背景的图片会完全识别不了,但是测试模型是能够识别有背景色的图片的,这是为什么呢?

修改processor后重新训练

你好,感谢您的工作

我注意到processor是resize到384x384,在长文本行识别效果不好,所以修改了embedding方式后,可能需要进行再一次预训练,所以想问下当初预训练时使用了哪些数据,有没有开源的数据可以使用

谢谢

Tensorrt部署

请问有哪位大佬将TrOCR转为tensorrt部署了吗?

model.generate的疑问

尊敬的作者:
您好,在运行app.py的时候,想看下推理部分是怎么执行的。想咨询下model.generate调用的是哪部分的内容呢~

onnxruntime 转换报错

执行命令:python -m transformers.onnx --model weights/hand-write --feature=vision2seq-lm onnx/
transformers: 4.30.2
onnxrun:1.14.1
image

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_53' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:40 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{3,578,384}, requested shape:{2,578,6,64}

请问与训练模型是哪个?官方的

;训练出现以下错误
./aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [40,0,0], thread: [71,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [40,0,0], thread: [72,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [40,0,0], thread: [73,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [40,0,0], thread: [74,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [40,0,0], thread: [75,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [40,0,0], thread: [76,0,0] Assertion srcIndex < srcSelectDimSize failed.

预训练

您好,请问预训练trocr有什么资料可以参考嘛

您好,请教一下您给的数据集的实例的问题

dataset/cust-data/0/0.jpg
dataset/cust-data/0/0.txt
...
dataset/cust-data/100/10000.jpg
dataset/cust-data/100/10000.txt

这是您给的数据集样例,我现在已经把手写公式数据集进行了.jpg和.txt的拆分,分别放在train_images和labels的文件夹下,现在有个问题,我想请教您,这里,您是要求数据集的文件夹有一层序号0~100的文件夹,对吧,那么里面的比如.jpg和.txt是否只能填写一对,还是可以放置若干对呢?非常期待您的答复。

印章识别

您好~ 您的工作真的很棒,想问一下关于印章识别的相关模型下载会提供吗,数据之类的是自己标注的真实数据还是生成数据呢,我最近也在做这个任务的研究,比较好奇,期待您的回复。

RuntimeError: CUDA error: device-side assert triggered

***** Running training *****
Num examples = 2582
Num Epochs = 10
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 1620
0%| | 0/1620 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [42,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [43,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [44,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [45,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [46,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [47,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [48,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [49,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [50,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [51,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [52,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [53,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [54,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.
Traceback (most recent call last):
File "train.py", line 107, in
trainer.train()
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 1422, in train
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 2011, in training_step
loss = self.compute_loss(model, inputs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 2043, in compute_loss
outputs = model(**inputs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py", line 499, in forward
**kwargs_decoder,
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 958, in forward
return_dict=return_dict,
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 651, in forward
attention_mask, input_shape, inputs_embeds, past_key_values_length
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 523, in _prepare_decoder_attention_mask
).to(self.device)
RuntimeError: CUDA error: device-side assert triggered
0%| | 0/1620 [00:01<?, ?it/s]

请问用自定义vocab.txt总报错怎么办

File "/home/aipf/work/trocr-chinese-main/train.py", line 25, in compute_metrics
cer = cer_metric.compute(predictions=pred_str, references=label_str)
File "/usr/local/python3/lib/python3.8/site-packages/datasets/metric.py", line 438, in compute
output = self._compute(**inputs, **compute_kwargs)
File "/home/aipf/.cache/huggingface/modules/datasets_modules/metrics/cer/d860b205328e261291d8908e4b471f4727e587d43ca2ee29c7fb436bf4f007b5/cer.py", line 150, in _compute
measures = jiwer.compute_measures(
File "/usr/local/python3/lib/python3.8/site-packages/jiwer/measures.py", line 206, in compute_measures
raise ValueError("one or more groundtruths are empty strings")
ValueError: one or more groundtruths are empty strings

数据生成

感谢文老师做的工作,请教下您训练该项目的识别数据是怎么生成的呢?字典数是多少?

使用自定义vocab.txt

作者好,感谢你的分享!
在按你的步骤进行操作时出现了一个问题:当使用我自定义的vocab.txt时,在执行了init_custdata_model.py文件后发现生成的配置文件中tokenizer.json文件还是原来的字库,并没有更新至我自定义的字库,导致调用processor.tokenizer.get_vocab()时得到的是原字库,而这影响到了训练和测试时的encode和decode。
期待你的回答,再次感谢!

交个朋友?

大佬,你好,我是chineseocr_lite的作者,之前开源这个项目也是在你的项目基础上开发的,对你的工作非常感兴趣,能否交个朋友😄

如何使用指定的多块GPU训练问题

我在训练时,单GPU没问题就是速度慢,使用指定的多块GPU时会默认使用机器上所有的GPU,这个您知道如何修改吗?pytorch的torch.distributed.launch使用会报错

How to generate the our own pretrained weights?

I have seen that there are only pretrained models for chinese characters , but the formulas and others are not uploaded.
Therefore, I wanna to kown how to generate our pretrained weights?

链接失效

印章的预训练模型链接失效了,可以再发一下么

多行文字识别问题

640

多行文字识别的结果为:

time take: 1.59 s ocr: ['HUXEPPP 621651017111410013000310400 ### 都会议次在发展中的标准的证词 都会吸收']

使用的是 weights 模型文件。

是哪里没有操作对么

优化encoder或decoder结构

请问如果优化encoder或decoder结构,例如decoder使用Roberta-tiny,应该去哪找中文上预训练好的模型?

您好

请问您提供的模型是用什么训练集训练的呢? 就是那个
image

长文本中相同的文字识别错误,但是切割后都可以识别对

作者您好,非常感谢您的工作,我这边在做一些书籍内容的识别,文本行的宽度基本上都超过了1000像素,用我自己的数据训练之后发现一个非常诡异的问题。
一个文本行中相同的字会出现不同的识别结果,但切割之后都可以识别正确,比如这个样本中有3个“视网膜”关键字,前面和后面识别错误,但是中间的“视网膜”识别正确,如果把3个“视网膜”手动切割开则都可以识别正确,请问可以如何缓解这种情况呢,是因为当前预处理resize成384*384对长文本不太适用还是说跟上下文关系很大?可以提供一些指导性建议么~非常感谢
25

vocab.txt

你好,我想请问下cust-data/vocab.txt,这个字典在训练中有对应吗?我加载了你放好的预训练模型能直接训练,而这个字典是否起到作用了呢?

Prompt Learning相关问题

你好,能否说明一下 模型蒸馏/DML(协作学习)和Prompt Learning,代码怎么使用,感谢~

打印文本识别

有没有伙伴使用这个模型在打印、印刷文本上做文字识别,我在手写上识别率不到90%,在打印文本上训练了下70多,怎么回事啊,有做这方面的伙伴可以交流下吗

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.