Light

如何训练呢？ about figocr HOT 8 CLOSED

huangfj commented on September 24, 2024

如何训练呢？

from figocr.

Comments (8)

HuangFJ commented on September 24, 2024

第一步准备数据

先准备数据，然后制成一份索引文件，假设是labels.txt：

dataset/123.jpg   123
dataset/2345.jpg  2345
...

第一列是图片地址，第二列是标注，列和列之间用'\t'符分隔。然后在src/model/dataset.py源码中将：

        # 本地数据集
        # root=Path('drive/My Drive/cv/images/dataset/labels.txt')
        # line = 0
        # with open(root) as f:
        #     for l in f:
        #         line += 1
        #         if train:
        #             if line % 20 == 0:
        #                 continue
        #         else:
        #             if line % 20 != 0:
        #                 continue

        #         fn, label = l.strip().split('\t')
        #         fn = root.parent.joinpath('images', fn)
        #         self.items.append((str(fn), label))

这段注释去掉，将drive/My Drive/cv/images/dataset/labels.txt改成你自己的labels.txt文件地址。这里面的训练数据和验证数据的数量比例是20:1，要按自己情况调整。

第二步开始学习

model = OCRModel()
model.evolution()
model.save_checkpoint('path/to/your/model.pth')

上面的学习步骤使用了默认参数。实际上除了OCRModel的evolution方法的参数可以自己定外，还有其他一些参数可以自己改，比如：

你可以修改src/model/vocabulary.py中的vocabulary，定义自己的字符集（分类）；
你可以修改OCRModel的input_shape的320，假如你要识别的图片序列最长超过了20个字符（width=16*序列长度）；
你可以修改OCRModel的batch_size。

from figocr.

zhangjingling commented on September 24, 2024

第一步准备数据

先准备数据，然后制成一份索引文件，假设是labels.txt：
dataset/123.jpg   123
dataset/2345.jpg  2345
...
第一列是图片地址，第二列是标注，列和列之间用'\t'符分隔。然后在src/model/dataset.py源码中将：
        # 本地数据集
        # root=Path('drive/My Drive/cv/images/dataset/labels.txt')
        # line = 0
        # with open(root) as f:
        #     for l in f:
        #         line += 1
        #         if train:
        #             if line % 20 == 0:
        #                 continue
        #         else:
        #             if line % 20 != 0:
        #                 continue

        #         fn, label = l.strip().split('\t')
        #         fn = root.parent.joinpath('images', fn)
        #         self.items.append((str(fn), label))
这段注释去掉，将drive/My Drive/cv/images/dataset/labels.txt改成你自己的labels.txt文件地址。这里面的训练数据和验证数据的数量比例是20:1，要按自己情况调整。

第二步开始学习
model = OCRModel()
model.evolution()
model.save_checkpoint('path/to/your/model.pth')
上面的学习步骤使用了默认参数。实际上除了OCRModel的evolution方法的参数可以自己定外，还有其他一些参数可以自己改，比如：

你可以修改src/model/vocabulary.py中的vocabulary，定义自己的字符集（分类）；

你可以修改OCRModel的input_shape的320，假如你要识别的图片序列最长超过了20个字符（width=16*序列长度）；

你可以修改OCRModel的batch_size。

谢谢您的回复！
我还是想问，我是需要再单独建立train.py文件吗？
修改完上面的参数后，在train.py中增加如下代码：
`import logging
from model.model import OCRModel
import cv2
import argparse
from pathlib import Path
from model import train
from model import model
if name == 'main':

model = OCRModel()
model.evolution()
model.save_checkpoint('path/to/your/model.pth')

`
我见您的mode.py中有train()函数，直接在model.py文件中跑这个函数来可以吗？
能不能把您的train文件上传一下，我这边好多错误。

from figocr.

zhangjingling commented on September 24, 2024

第一步准备数据

先准备数据，然后制成一份索引文件，假设是labels.txt：
dataset/123.jpg   123
dataset/2345.jpg  2345
...
第一列是图片地址，第二列是标注，列和列之间用'\t'符分隔。然后在src/model/dataset.py源码中将：
        # 本地数据集
        # root=Path('drive/My Drive/cv/images/dataset/labels.txt')
        # line = 0
        # with open(root) as f:
        #     for l in f:
        #         line += 1
        #         if train:
        #             if line % 20 == 0:
        #                 continue
        #         else:
        #             if line % 20 != 0:
        #                 continue

        #         fn, label = l.strip().split('\t')
        #         fn = root.parent.joinpath('images', fn)
        #         self.items.append((str(fn), label))
这段注释去掉，将drive/My Drive/cv/images/dataset/labels.txt改成你自己的labels.txt文件地址。这里面的训练数据和验证数据的数量比例是20:1，要按自己情况调整。

第二步开始学习
model = OCRModel()
model.evolution()
model.save_checkpoint('path/to/your/model.pth')
上面的学习步骤使用了默认参数。实际上除了OCRModel的evolution方法的参数可以自己定外，还有其他一些参数可以自己改，比如：

你可以修改src/model/vocabulary.py中的vocabulary，定义自己的字符集（分类）；

你可以修改OCRModel的input_shape的320，假如你要识别的图片序列最长超过了20个字符（width=16*序列长度）；

你可以修改OCRModel的batch_size。
谢谢您的回复！
我还是想问，我是需要再单独建立train.py文件吗？
修改完上面的参数后，在train.py中增加如下代码：
`import logging
from model.model import OCRModel
import cv2
import argparse
from pathlib import Path
from model import train
from model import model
if name == 'main':
model = OCRModel()
model.evolution()
model.save_checkpoint('path/to/your/model.pth')
`
我见您的mode.py中有train()函数，直接在model.py文件中跑这个函数来可以吗？
能不能把您的train文件上传一下，我这边好多错误。

我想利用代码识别别的图片中的所有数字，印刷体的数字

from figocr.

zhangjingling commented on September 24, 2024

第一步准备数据

先准备数据，然后制成一份索引文件，假设是labels.txt：
dataset/123.jpg   123
dataset/2345.jpg  2345
...
第一列是图片地址，第二列是标注，列和列之间用'\t'符分隔。然后在src/model/dataset.py源码中将：
        # 本地数据集
        # root=Path('drive/My Drive/cv/images/dataset/labels.txt')
        # line = 0
        # with open(root) as f:
        #     for l in f:
        #         line += 1
        #         if train:
        #             if line % 20 == 0:
        #                 continue
        #         else:
        #             if line % 20 != 0:
        #                 continue

        #         fn, label = l.strip().split('\t')
        #         fn = root.parent.joinpath('images', fn)
        #         self.items.append((str(fn), label))
这段注释去掉，将drive/My Drive/cv/images/dataset/labels.txt改成你自己的labels.txt文件地址。这里面的训练数据和验证数据的数量比例是20:1，要按自己情况调整。

第二步开始学习
model = OCRModel()
model.evolution()
model.save_checkpoint('path/to/your/model.pth')
上面的学习步骤使用了默认参数。实际上除了OCRModel的evolution方法的参数可以自己定外，还有其他一些参数可以自己改，比如：

你可以修改src/model/vocabulary.py中的vocabulary，定义自己的字符集（分类）；

你可以修改OCRModel的input_shape的320，假如你要识别的图片序列最长超过了20个字符（width=16*序列长度）；

你可以修改OCRModel的batch_size。
谢谢您的回复！
我还是想问，我是需要再单独建立train.py文件吗？
修改完上面的参数后，在train.py中增加如下代码：
`import logging
from model.model import OCRModel
import cv2
import argparse
from pathlib import Path
from model import train
from model import model
if name == 'main':
model = OCRModel()
model.evolution()
model.save_checkpoint('path/to/your/model.pth')
`
我见您的mode.py中有train()函数，直接在model.py文件中跑这个函数来可以吗？
能不能把您的train文件上传一下，我这边好多错误。
我想利用代码识别别的图片中的所有数字，印刷体的数字
你的数据集是单个数字的图片吗。类似minist？如果我想识别数字串，我是直接训练包含多个数字的图片，然后多个字符串做标签？

from figocr.

HuangFJ commented on September 24, 2024

我的训练数据是可变长度，但是这个库也支持用minist的数据，不矛盾。如果是印刷体我建议你直接用tesseract ocr，它支持用字体文件来做训练。用这个库来训练，提供不了那么多的训练数据。

from figocr.

zhangjingling commented on September 24, 2024

我的训练数据是可变长度，但是这个库也支持用minist的数据，不矛盾。如果是印刷体我建议你直接用tesseract ocr，它支持用字体文件来做训练。用这个库来训练，提供不了那么多的训练

from figocr.

zhangjingling commented on September 24, 2024

多谢哈，我已经训练出模型了。其实我就是想实现不定长数字串的识别，我先用我的数据跑一下

from figocr.

zhangjingling commented on September 24, 2024

我想识别这种图片中的所有数据，输出数字串即可。那我是直接用这种数据做数据集，数字串做标签，效果好，还是印刷体只含1,2,3,4的数据做数据集？

from figocr.

Related Issues (1)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.