Code Monkey home page Code Monkey logo

Comments (23)

galkahana avatar galkahana commented on May 20, 2024

Hi,
Yes, you should be able to write RTL using a similar methodology to what i am using in the hummusrenderer project, which is based on hummus.

The following defines a method to compute how bidirectional text should be written if looking from a regular computerized left-to-write writing (which is how writeText expects things to be)

bidi = require('icu-bidi');

function computeTextForItem(inText,inDirection)
{
    var p = bidi.Paragraph(inText,{paraLevel: inDirection == 'rtl' ? bidi.RTL:bidi.LTR});
    return p.writeReordered(bidi.Reordered.KEEP_BASE_COMBINING);
}

The text to pass to writeText should now be computeTextForItem(theText).
The extra parameter should be 'rtl' if the text that you are writing is intended to imitate a paragraph that's right to left. normally this shouldn't matter other than if the starting characters are numbers.

If you want multiline support, then i suggest you look into the hummusrenderer relevant code, and we can discuss it if you want.

from hummusjs.

hussasad avatar hussasad commented on May 20, 2024

Yes, I used bidi as you suggested above and that worked in reversing the direction, but i still have an issue with the characters not joining as they should for arabic text (like they would for cursive fonts). so if i want to write the text حمّص , in the PDF i get it as ح مّ ص

this happens in hummusrenderer as well. you can notice it if you try the following object

{
    "externals": {
        "fbLogo": "http://pdfrendering.herokuapp.com/profileImage.jpg"
    },
    "pages": [
        {
            "width": 595,
            "height": 842,
            "boxes": [
                {
                    "bottom": 500,
                    "left": 10,
                    "text": {
                        "text": "خمص",
                        "options": {
                            "fontPath": "./resources/fonts/arial.ttf",
                            "size": 40,
                            "color": "pink"
                        }
                    }
                },
                {
                    "bottom": 600,
                    "left": 10,
                    "image": {
                        "external": "fbLogo"
                    }
                }
            ]
        }
    ]
}

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

yes. i kinda figured this would come next. but i'm not sure i know what to do about this.

At this point writeText only knows to translate unicode characters to their matching glyphs in the font. With Arabic characters there are several choices, but the simplistic selection code just picks the same one, which would create the not-connecting effect.

To overcome this something needs to be changed in hummus to choose the right glyph according to position, or that you can provide the glyph ids directly.

If you already know the correct glyphs, than i can assist you with knowing which hummus commands to use to place those glyphs, otherwise this needs some changes in hummus, which i'd be glad if you can add...otherwise it'll have to wait till i get to it...and its a lower priority now. So i'm sorry here.

Gal.

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

@galkahana i'm interested to patch this, can you show me where to change? Also is it ok to add a dependency? I think the best way to go here is to create a sort of "glyph resolver" library and make Hummus depend on it to pick the correct glyph.

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

@amrnablus That would be lovely!
Actually hummus provides a method for placement of text through glyph IDs, so if you can provide them hummus will take care of the rest.

Here is how it works:

writeText is simply an abstraction over lower level text placement commands. This example shows how to use them [ill explain below:]:
https://github.com/galkahana/HummusJS/blob/master/tests/SimpleTextUsageTest.js

This is how to place text using the lower level commands:

        var page = pdfWriter.createPage(0,0,595,842);
        var font = pdfWriter.getFontForFile(__dirname + '/TestMaterials/fonts/BrushScriptStd.otf');
        var fontK = pdfWriter.getFontForFile(__dirname + '/TestMaterials/fonts/KozGoPro-Regular.otf');
        pdfWriter.startPageContentContext(page)
            .BT()
            .k(0,0,0,1)
            .Tf(font,1)
            .Tm(30,0,0,30,78.4252,662.8997)
            .Tj('abcd')
            .ET()
        pdfWriter.writePage(page).end();

Instead of calling writeText on the context you call a series of commands (which is basically calling them):

-- BT - to start text object
-- k - set the color using CMYK values. 0,0,0,1 means black
-- Tf - sets the font, use a regular PDFUsedFont object as you would with writeText
-- Tm - sets the size an position of font. 30 in the example (put twice) is the size of the text (30), and position is (78.4252,662.8997)
-- Tj - is the command placing the actual text

In regular usage Tj accepts a unicode string as in the example, but you can also pass glyph ids.
This is shown in here:
https://github.com/galkahana/HummusJS/blob/master/tests/SimpleTextUsageTest.js#L79

        var page = pdfWriter.createPage(0,0,595,842);
        var font = pdfWriter.getFontForFile(__dirname + '/TestMaterials/fonts/arial.ttf');
        pdfWriter.startPageContentContext(page)
            .BT()
            .k(0,0,0,1)
            .Tf(font,1)
            .Tm(30,0,0,30,78.4252,662.8997)
            .Tj([[68,97],[69,98],[70,99],[71,100]])
            .ET();
        pdfWriter.writePage(page).end();

Note that Tj now has an array of Arrays:
[[68,97],[69,98],[70,99],[71,100]]

Each item in the array marks a glyph. The first number is the glyph ID. The 2nd number is the unicode value that matches it. there might be a third value for surrogate unicode values, but that's CJK characters so Arabic is in the clear.

So, if you can prepare something that can take a font and text and get the right glyph IDs that would be awesome.
Is the information provided clear?

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

I'm a but rusty on unicode, so; just to confirm the problem:

  1. the client developer passes an arabic text to hummus, something like "ابجد"
  2. what hummus receives is the disconnected code point for each letter, "ا ب ج د"
  3. hummus "forwards" this text to the pdfwriter which ends up printing disconnected letters as it doesn't know how to get the proper glyphs

If that's the case, i prefer to fix this on the pdfwriter level by writing a cpp converter which will take the "ا ب ج د" and convert it to "ابجد" with the proper code points, the font should take care of the rest once the proper glyphs are selected.

Can you please confirm

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

if you wanna do that in PDFWriter all the better. I can point you to the areas in the code that do the translation in there. will that be good?

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

Yeah i'd rather do that. Please point me to the code (and if there are cpp samples).

from hummusjs.

hussasad avatar hussasad commented on May 20, 2024

@galkahana - it would be more easier to convert the arabic unicode characters from the default block to Arabic Presentation Forms-B block. This way I don't need to figure out the glyphs. Granted this block is only meant for compatibility with older systems, but it works fine for my case. I just used this library to convert the string before calling writeText. I just had to reverse the output string for it to render correctly

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

I'm considering using either fribidi (http://fribidi.org/) or pango (http://www.pango.org/) to do the trick. I'll run some POCs and post my findings here.

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

@hussasad - brilliant! Didn't realize that this was an option. i'll keep that in mind. definitely the easier path.

@amrnablus - note that fribidi should give you RTL ordering, not the glyphs [it's what i'm using in hummus as 'bidi', you can look at the source to see how to use it, 'cause it does solve part of the deal (the rtl) - ].
As for sources in PDFWriter. The method that translated string to glyphs is PDFUsedFont::TranslateStringToGlyphs. you can see it here -
https://github.com/galkahana/PDF-Writer/blob/master/PDFWriter/PDFUsedFont.cpp#L75
currently it simply uses freetype to translate each character to a glyph code, and this is where one can place a helper to change that.
When this is done all text commands will be affected, in particular WriteText.
Examples for using the C++ code with its matching WriteText you can find in:
https://github.com/galkahana/PDF-Writer/blob/master/PDFWriterTestPlayground/HighLevelContentContext.cpp#L107

Thanks and good luck!
Gal.

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

I got this from the fribidi mailing list:

On 15-12-20 03:55 PM, Amr Shahin wrote:

Hello,
Does fribidi provide a functionality to convert a set of arabic codepoints to
their corresponding form-b representation? If so i would appreciate a quick
guide on how to do it.

It does. See fribidi_shape_arabic(). I don't remember the details.

b

Will try to write some POCs converting a regular string into it's form-b format, if that works out fine i'll try to apply the same to hummus.

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

that's super cool :)

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

@galkahana So i tried to write a simple cpp application that demonstrates the problem, but it's printing gibberish instead of the arabic letters (Not even the disconnected version).
The sample code is in the attachments, would appreciate if you can take a look
P.S: I'm using the latest version of PDF-Writer compiled from source, the command i'm using for my sample is
g++ -std=c++17 -o /tmp/testArabic testArabic.cpp -I PDFWriter Build/PDFWriter/libPDFWriter.a Build/LibJpeg/libLibJpeg.a Build/FreeType/libFreeType.a Build/ZLib/libZlib.a Build/LibTiff/libLibTiff.a
testArabic.cpp.zip

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

Hi,
The strings that you should provide to PDFWriter should be unicode encoded in utf-8. It seems like what you are trying to do is to provide plain ascii encoding, which will not work.

Regards,
Gal.

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

You can use UnicodeString to help you with this.

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

Thanks Gal, it was actually a font issue, i'm using "FreeSerif.otf" now and it's working

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

@galkahana does the same issue exist for Hebrew? I tested the POC code for both Arabic and Farsi and it seems to be working fine.

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

Cool.
Hebrew should be fine with the basic RTL approach of icu-bidi (or any similar algirhtm that you are using). for some letters there are different ending characters (for instance, מ appearing in the end of word would be ם) but they are simply created by hitting different keys. so no need for automatic replacement.

In terms of your implementation (PR), if i may. If you look into adding this directly to PDFWriter (which i'd love if you could, and thank you a lot) my preference would be that it will be used as part of AbstractContentContext::WriteText (already at its begining). for the sake of providing some manual override it would be super nice if you could:

  1. Use a separate class/method for your implementation to do the translation (so i can use externally in scenarios that dont go through writeText), utf8->utf8.
  2. Add another options struct to WriteText, which by default will use this translation, but will have a single boolean to check, and if "true" (the not default) will bypass the translation, and got to the current implementation of writeText.

I'm also good with leaving it as something that someone can calls to before calling writetext, if you deem this better. i can add the WriteText implementation on top of it (just keep it utf8 std::string to utf8 std::string please). I'll also take care of integrating this into HummusJS.

If you'll do that i'll thank you very much, and i'm sure others will as well.

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

Thanks, that's exactly what i'm doing, except i didn't really separate the bidi conversion outside AbstractContentContext (now that mentioned it, it makes much more sense).
Is there a certain place in the codebase where you would place such a utility class, say something like 'UnicodeTextUtils'

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

Something like UnicodeTextUtils sounds great

from hummusjs.

amrnablus avatar amrnablus commented on May 20, 2024

Ok so the code is done and working fine, regarding the commit, should I add fribidi as a "sit-in"? dependency the same way you're using LibJpeg, LibTiff, etc ...? If so should it be a git submodule or just clone the repo and copy it to the sources directory?

from hummusjs.

galkahana avatar galkahana commented on May 20, 2024

Cool :).
Good question. i vote for a sit-in, like LibJpeg and LibTiff. copy the relevant sources in a folder, like libtiff etc. Note that they have conditions in the cmake files. If possible i'd rather that you will use something similar with the new addition in term of a flag that allows not having the icu bidi functionality. in that case, please also make sure that compilation is done safely.

from hummusjs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.