Code Monkey home page Code Monkey logo

Comments (18)

iherman avatar iherman commented on July 17, 2024 1

Propose closure: Can be closed; mostly taken care of in #129, although the JSON serialization is still to be done.

from wpub.

llemeurfr avatar llemeurfr commented on July 17, 2024

Two interesting resources on bidi-text and Unicode:
1- http://www.iamcal.com/understanding-bidirectional-text/
2- https://www.w3.org/International/articles/inline-bidi-markup/

from wpub.

llemeurfr avatar llemeurfr commented on July 17, 2024

For bidi text, it appears that we may have to create a JSON "dir" attribute representing the global text direction applicable to the metadata by default.

Embedded in the text as special characters, "implicit marker characters" (Left-to-Right Mark and Right-to-Left Mark) will help tailoring the direction of "neutral" characters (e.g. "!"), and "explicit markers" will describe a local text direction.

from wpub.

lrosenthol avatar lrosenthol commented on July 17, 2024

it's not just about bidi - you also have the more general problem of language identification.
Consider the case of a book with multiple (localized) titles - how do you encode that information?
Or worse, consider a multi-lingual title?

There is some work in this area for JSON-LD.

from wpub.

HadrienGardeur avatar HadrienGardeur commented on July 17, 2024

In Readium-2 we already handle that case (multiple localization for some strings).

Here's how we handle it for title for example:

"title": {
  "fr": "Vingt mille lieues sous les mers",
  "en": "Twenty Thousand Leagues Under the Sea",
  "ja": "海底二万里"
}

Since we're using JSON-LD and include the proper info in our context document, this is correctly understood by JSON-LD clients:

schema:name "Twenty Thousand Leagues Under the Sea"@en, "Vingt mille lieues sous les mers"@fr, "海底二万里"@ja ;

from wpub.

lrosenthol avatar lrosenthol commented on July 17, 2024

from wpub.

HadrienGardeur avatar HadrienGardeur commented on July 17, 2024

Representing multiple languages in a single string is a much bigger issue that we can't tackle on our own.

Unlike what the name of this issue implies, this IMO has nothing to do with JSON:

  • the exact same issue exists today in XML with EPUB
  • ... or most metadata formats

If you can't represent that info in a string, the problem is much bigger than the manifest:

  • how would anyone store that info in a database (usually these fields are UTF-8 strings)?
  • how would you transmit this info in an API?
  • how would dedicated reading systems represent this info in-memory?

I think this falls under the "not our problem to solve" category that Ivan mentioned several times during the F2F.
We can participate in efforts to solve this problem with UTF-8, but we can't and shouldn't try to fix it on our own.

from wpub.

lrosenthol avatar lrosenthol commented on July 17, 2024

from wpub.

HadrienGardeur avatar HadrienGardeur commented on July 17, 2024

Sure, but who's using HTML to represent strings in a database or an API? Absolutely no one.

from wpub.

lrosenthol avatar lrosenthol commented on July 17, 2024

from wpub.

llemeurfr avatar llemeurfr commented on July 17, 2024

Re. databases: not really Leonard. @HadrienGardeur is talking about databases (e.g MySQL), you move to search engines (e.g. Solr). Some databases (SQLServer, Oracle, DB2) can handle XML fields in their recent versions; but others (MySQL, sqlite) don't. Most professional search engines don't index HTML or XML, the tags are tripped out before indexing. Note also that ElasticSearch imports JSON structures, not XML.

Re. the Web: Web Publication must be adapted to ... the Web, i.e. browsers and nowadays, browsers don't handle XML perfectly.

We are dealing with property/value tuples in this discussion. If you want to promote mixed content as core value type, you'll have the whole database/web community "vent debout" against the idea.

The problem I raised (i18n for metadata values) is currently not handled in EPUB 3. I suppose that the publishing industry was not so impatient to have it resolved before. So I agree with Hadrien that we should just express why it could be interesting to have a solution for this issue and which solution is offered by other W3C WGs.

IMHO, there are two main reasons why we would like proper internationalized metadata values:

  • mix of ltr and rtl words in a string (if is certainly a MUST)
  • proper pronounciation of words by a tts engine, from a string (IMO it is a "good to have" but not mandatory).

from wpub.

murata2makoto avatar murata2makoto commented on July 17, 2024

In the case of the Japanese language, each human-readable text requires two representations: one in Kana only and one in Kanji.

For example, the Japanese National Diet Llibrary uses

<dc:title>
   <rdf:Description>
     <rdf:value>国立国会図書館資料デジタル化の手引</rdf:value>
     <dcndl:transcription>コクリツ コッカイ トショカン シリョウ デジタルカ ノ テビキ</dcndl:transcription>
   </rdf:Description>
</dc:title>

where dcndl:transcription is Kana-only.

from wpub.

HadrienGardeur avatar HadrienGardeur commented on July 17, 2024

@murata0204 in the Readium Web Publication Manifest this is supported for most strings. The only place where we can't use it yet is for the description.

from wpub.

murata2makoto avatar murata2makoto commented on July 17, 2024

See Requirements for Language and Direction Metadata in Data Formats.

from wpub.

danielweck avatar danielweck commented on July 17, 2024

Thank you Makoto.
Hadrien, what about:

"title": "<span lang='en-US' dir='ltr'>Mobi Dick</span>"

?

from wpub.

HadrienGardeur avatar HadrienGardeur commented on July 17, 2024

@danielweck not sure what you mean, as you know we support both syntax in Readium-2.

from wpub.

danielweck avatar danielweck commented on July 17, 2024

we have langcode to string mapping, but no dir, right?

from wpub.

iherman avatar iherman commented on July 17, 2024

Closing per https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-03-12-minutes.html#resolution1

from wpub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.