Code Monkey home page Code Monkey logo

Comments (5)

gwkrsrch avatar gwkrsrch commented on September 7, 2024 1

As a general tip, to train a model for a new language, you need to care about the token vocabulary/tokenizer. #11 would be useful to you :)

from donut.

Mahmuod1 avatar Mahmuod1 commented on September 7, 2024

@gwkrsrch please any help

from donut.

gwkrsrch avatar gwkrsrch commented on September 7, 2024

Hi @Mahmuod1 , there are several options you can take. You may modify the layout/textbox generation module to make the desired RTL layout. There would be several code lines to modify, e.g., textbox, layouts.
Another option is to generate the data with your own code based on SynthDoG. The followings are the main flow of the preliminary version of SynthDoG. The first step is to draw texts on a paper texture image. The following links would also be helpful to you.

And then, using a perspective transformation (or other transformations), you can embed the synthetic paper into a background. Although the idea is simple, you will see some agreeable results. You may further enhance the quality of the generated samples via various techniques, but it is optional. Hope this helps :) Feel free to reopen this or open another issue if you have anything new for sharing.

from donut.

Mahmuod1 avatar Mahmuod1 commented on September 7, 2024

thanks, @gwkrsrch for your detailed instructions
can you please give me instructions for the donut model configuration that will be changed as it language specific
I will use the document parsing training so please can you tell me what should care about the training configurations

from donut.

akashlp27 avatar akashlp27 commented on September 7, 2024

Hi, @Mahmuod1, @gwkrsrch were you able to generate images using synthdog for RTL languages such as arabic.... any suggestions will help a lot..

from donut.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.