Code Monkey home page Code Monkey logo

Comments (6)

wenlihaoyu avatar wenlihaoyu commented on June 18, 2024

为啥不合理?只要你的文本不是太长(比如宽度超过2048)。padding不就浪费了,为啥不选用其他图像侧的backbone呢?

from trocr-chinese.

archwolf118 avatar archwolf118 commented on June 18, 2024

from trocr-chinese.

746891300 avatar 746891300 commented on June 18, 2024

我理解应该是大量数据预训练的前提,可以把变形的文字也看成一种字体,学习过了就可以准确预测

from trocr-chinese.

wenlihaoyu avatar wenlihaoyu commented on June 18, 2024

resize到固定尺寸必然会导致文字信息产生变形。 发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: lywen @.> 发送时间: 2022年4月9日 21:19 收件人: chineseocr/trocr-chinese @.> 抄送: archwolf118 @.>, Author @.> 主题: Re: [chineseocr/trocr-chinese] 现在trocr最大的问题就是这个384384的预处理 (Issue #3) 为啥不合理?只要你的文本不是太长(比如宽度超过2048)。padding不就浪费了,为啥不选用其他图像侧的backbone呢? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.**>

形变有啥关系呢,预训练就让模型适应了这样的变化,相当于模型进行了空间映射。

from trocr-chinese.

wenlihaoyu avatar wenlihaoyu commented on June 18, 2024

任何算法都不是全能的。如果觉得此方法不好,可以选择其他算法,不要因为此项目让自己不愉快。

from trocr-chinese.

wenlihaoyu avatar wenlihaoyu commented on June 18, 2024

超长的文本,其实也是可以识别出来的,roberta是支持最大510个字符(除去s,/s),只是seq2seq方式会超慢而已(如果自己场景全是超过2048像素,ctc方式也需要很大的显卡才能训练得很好)。这里探索的是用一些transformer的方式去解决比如弧形文字、不规则文字、多行文字的端识别方法。

from trocr-chinese.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.