Hello, I just begin studying about OCR-correction. Could you tell me how to use natas with a pretrained-model? about natas HOT 5 CLOSED

mikahama commented on September 18, 2024

Hello, I just begin studying about OCR-correction. Could you tell me how to use natas with a pretrained-model?

from natas.

Comments (5)

trongvanhpkt99 commented on September 18, 2024

I want to train a model for OCR-correcting output in Vietnamese, so at fist I want to know how to use a pre-trained model

from natas.

mikahama commented on September 18, 2024

We only have a pretrained model for English at the moment, so it will not work with Vietnamese. Natas calls OpenNMT-py on the background, so basically you can use onmt_translate with your own model, pass it -n_best 10 and filter the results with a dictionary.

from natas.

trongvanhpkt99 commented on September 18, 2024

We only have a pretrained model for English at the moment, so it will not work with Vietnamese. Natas calls OpenNMT-py on the background, so basically you can use onmt_translate with your own model, pass it -n_best 10 and filter the results with a dictionary.

Thank you! Can you give me the English pretrained model and tell me how to use it?

from natas.

mikahama commented on September 18, 2024

This is how to use it from Natas:

import natas
natas.ocr_correct_words(["paft", "friendlhip"])

To use it with OpenNMT, you must first download the model.

Then you will need to prepare a text file with the words you want to OCR post-correct so that there is one word per line and each word should be split into characters.

So if you have a sentence cat ran avvay you should produce the following text file ocr_errors.txt

c a t
r a n
a v v a y

Then you can run onmt_translate -model ocr.pt -src ocr_errors.txt -output ocr_fixed.txt -replace_unk -verbose. This will produce a text file ocr_fixed.txt with the OCR corrections. OpenNMT lets you do all sorts of things in translate, so please refer to their documentation as well.

from natas.

trongvanhpkt99 commented on September 18, 2024

This is how to use it from Natas:
import natas
natas.ocr_correct_words(["paft", "friendlhip"])
To use it with OpenNMT, you must first download the model.

Then you will need to prepare a text file with the words you want to OCR post-correct so that there is one word per line and each word should be split into characters.

So if you have a sentence cat ran avvay you should produce the following text file ocr_errors.txt
c a t
r a n
a v v a y
Then you can run onmt_translate -model ocr.pt -src ocr_errors.txt -output ocr_fixed.txt -replace_unk -verbose. This will produce a text file ocr_fixed.txt with the OCR corrections. OpenNMT lets you do all sorts of things in translate, so please refer to their documentation as well.

Thank you! I'll try it

from natas.

Hello, I just begin studying about OCR-correction. Could you tell me how to use natas with a pretrained-model? about natas HOT 5 CLOSED

Comments (5)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent