Comments (18)
Propose closure: Can be closed; mostly taken care of in #129, although the JSON serialization is still to be done.
from wpub.
Two interesting resources on bidi-text and Unicode:
1- http://www.iamcal.com/understanding-bidirectional-text/
2- https://www.w3.org/International/articles/inline-bidi-markup/
from wpub.
For bidi text, it appears that we may have to create a JSON "dir" attribute representing the global text direction applicable to the metadata by default.
Embedded in the text as special characters, "implicit marker characters" (Left-to-Right Mark and Right-to-Left Mark) will help tailoring the direction of "neutral" characters (e.g. "!"), and "explicit markers" will describe a local text direction.
from wpub.
it's not just about bidi - you also have the more general problem of language identification.
Consider the case of a book with multiple (localized) titles - how do you encode that information?
Or worse, consider a multi-lingual title?
There is some work in this area for JSON-LD.
from wpub.
In Readium-2 we already handle that case (multiple localization for some strings).
Here's how we handle it for title
for example:
"title": {
"fr": "Vingt mille lieues sous les mers",
"en": "Twenty Thousand Leagues Under the Sea",
"ja": "海底二万里"
}
Since we're using JSON-LD and include the proper info in our context document, this is correctly understood by JSON-LD clients:
schema:name "Twenty Thousand Leagues Under the Sea"@en, "Vingt mille lieues sous les mers"@fr, "海底二万里"@ja ;
from wpub.
from wpub.
Representing multiple languages in a single string is a much bigger issue that we can't tackle on our own.
Unlike what the name of this issue implies, this IMO has nothing to do with JSON:
- the exact same issue exists today in XML with EPUB
- ... or most metadata formats
If you can't represent that info in a string, the problem is much bigger than the manifest:
- how would anyone store that info in a database (usually these fields are UTF-8 strings)?
- how would you transmit this info in an API?
- how would dedicated reading systems represent this info in-memory?
I think this falls under the "not our problem to solve" category that Ivan mentioned several times during the F2F.
We can participate in efforts to solve this problem with UTF-8, but we can't and shouldn't try to fix it on our own.
from wpub.
from wpub.
Sure, but who's using HTML to represent strings in a database or an API? Absolutely no one.
from wpub.
from wpub.
Re. databases: not really Leonard. @HadrienGardeur is talking about databases (e.g MySQL), you move to search engines (e.g. Solr). Some databases (SQLServer, Oracle, DB2) can handle XML fields in their recent versions; but others (MySQL, sqlite) don't. Most professional search engines don't index HTML or XML, the tags are tripped out before indexing. Note also that ElasticSearch imports JSON structures, not XML.
Re. the Web: Web Publication must be adapted to ... the Web, i.e. browsers and nowadays, browsers don't handle XML perfectly.
We are dealing with property/value tuples in this discussion. If you want to promote mixed content as core value type, you'll have the whole database/web community "vent debout" against the idea.
The problem I raised (i18n for metadata values) is currently not handled in EPUB 3. I suppose that the publishing industry was not so impatient to have it resolved before. So I agree with Hadrien that we should just express why it could be interesting to have a solution for this issue and which solution is offered by other W3C WGs.
IMHO, there are two main reasons why we would like proper internationalized metadata values:
- mix of ltr and rtl words in a string (if is certainly a MUST)
- proper pronounciation of words by a tts engine, from a string (IMO it is a "good to have" but not mandatory).
from wpub.
In the case of the Japanese language, each human-readable text requires two representations: one in Kana only and one in Kanji.
For example, the Japanese National Diet Llibrary uses
<dc:title>
<rdf:Description>
<rdf:value>国立国会図書館資料デジタル化の手引</rdf:value>
<dcndl:transcription>コクリツ コッカイ トショカン シリョウ デジタルカ ノ テビキ</dcndl:transcription>
</rdf:Description>
</dc:title>
where dcndl:transcription is Kana-only.
from wpub.
@murata0204 in the Readium Web Publication Manifest this is supported for most strings. The only place where we can't use it yet is for the description.
from wpub.
See Requirements for Language and Direction Metadata in Data Formats.
from wpub.
Thank you Makoto.
Hadrien, what about:
"title": "<span lang='en-US' dir='ltr'>Mobi Dick</span>"
?
from wpub.
@danielweck not sure what you mean, as you know we support both syntax in Readium-2.
from wpub.
we have langcode to string mapping, but no dir, right?
from wpub.
Closing per https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-03-12-minutes.html#resolution1
from wpub.
Related Issues (20)
- Optional HTML TOC HOT 4
- Why list resources for links? HOT 2
- Should there be a TOC if supplemental materials are provided in an audio book? HOT 10
- Manifest files need their own MIME Media Type (because canonicalization) HOT 8
- allow toc link markup to be preserved? HOT 6
- HTML <input type="time"> missing attributes HOT 1
- Is duration required for audiobooks? HOT 21
- Normative references to expected values of "accessibility" properties are unclear. HOT 5
- Informative and structural "properties" are relationships HOT 9
- Bookish nature of recommended resources? HOT 8
- Change rel URLs to tokens HOT 3
- There are 2 lines of "datePublished" at "F. Properties Index" HOT 1
- Wording for application of base direction HOT 7
- Wording for D. Examples for bidirectional texts HOT 2
- Different relations for linking to manifest and primary entry page? HOT 11
- Remove accessibilityControl and accessibilityAPI HOT 1
- PEP is canonical identifier? HOT 1
- ua conformance criteria should link to manifest processing HOT 3
- Differentiating the primary entry page URL from the publication address HOT 1
- Web publications / Publication manifest / other forks difference and status confusion HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wpub.