Code Monkey home page Code Monkey logo

pub-manifest's Introduction

W3C Logo

Publication Manifest

This is the repository of the W3C’s specification on Publication Manifests, developed by the Publishing Working Group. The editors’ draft of the specification can also be read directly.

Contributing to the Repository

Use the standard fork, branch, and pull request workflow to propose changes to the specification. Please make branch names informative—by including the issue or bug number for example.

Editorial changes that improve the readability of the spec or correct spelling or grammatical mistakes are welcome.

Please read CONTRIBUTING.md, about licensing contributions.

Code of Conduct

W3C functions under a code of conduct.

pub-manifest's People

Contributors

gregoriopellegrino avatar iherman avatar jccr avatar llemeurfr avatar marisademeglio avatar mattgarrish avatar naglis avatar wareid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pub-manifest's Issues

Specification's version in the manifest

Since we created a first propotype of Web Publication based on the current specifications, I wondered how to indicate in the manifest that it refers to the current specifications alpha or beta version.

In general, if in a few years we will define Web Publication 2.0, how will a User Agent distinguish current Web Publications from future ones?

json-ld reference

We should reference JSON-LD 1.1 as it will be going to REC around the same time, but it's not clear if our current unnumbered specref reference will auto-updated when that specification gets published.

Is there a need for both an authored and a canonical manifests?

At the moment, there is an authored and a canonical manifest, with a separate canonicalization step to transform the authored manifest into the canonical one. The goal is to allow the author to express data more succinctly (eg, use only simple file names instead of complete LinkedResource instances or person names instead of Person structures).

It was raised, in #11, that the price being paid for having this is too high:

For the few people typing these by hand, [...] but for the vast majority of implementers (i.e. CMS's generating these manifests), I think they'd find the consistency of using the canonical representation (and the lack of "overhead" of needing it to be canonicalized every time...) to be a win. (#11 (comment))

Question: do we want to simplify the manifest by removing this extra step and defining the manifest purely in terms of what is currently called the canonical manifest?

Add linked resource property called 'alternative'

LinkedResource needs a property called alternative, which has as its value a URL of a resource that represents the LinkedResource's content in an alternate modality. It's used as follows:

  1. audio book with mp3s in the reading order; a reading order entry's alternative property points to a text file for that audio (see w3c/pwpub#44)

  2. audio book, as above, but with alternative pointing to a synchronized narration document

  3. text book with HTML files in the reading order; alternative points to a synchronized narration document.

The resource pointed to by alternative appears in the resources list and is processed according to mime type.

Reference example: draft of how to incorporate synchronized narration: https://w3c.github.io/sync-media-pub/packaging.html

"Processing the manifest": consider adopting the failure/error terminology

The result of the current algo is either a canonical representation of the manifest or an early termination (which means "the manifest is not valid"). There are also cases where the algo issues a warning but does not terminate.

Most of the other specs we reference (e.g. HTML, URL) instead use a failure/error terminology. A failure is returned when the algorithm terminates early. An error is recoverable.

For instance, in the processing model for images HTML defines parse errors as:

A parse error for algorithms in this section indicates a non-fatal mismatch between input and requirements. User agents are encouraged to expose parse errors somehow.

The URL Standard defines validation error as:

A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.

Note: A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.

I suggest we adopt the same terminology in our spec. The more alignment, the easier it is for spec addicts readers.

Also, incidentally, when the spec says:

If the algorithm terminates early, the manifest is not valid.

I think it is worth noting that the opposite is not necessarily true. For instance when the required type property is missing, a default value of CreativeWork will be assumed. I think this would be worth being explicitly clarified.

Why do we have C.3 Audiobook

Since we have a specific Audiobook profile I was surprised to see a manifest example for AudioBook here. Are we going to point to the audiobook profile, or move this over to that spec? Just seems like if we are showing this manifest example here for Audiobooks why do we have a completely separate specification for Audiobooks.

Use existing Vocabulary terms from ActivityStreams for Link semantics

Given that ActivityStreams is widely used (thanks to Mastadon, PeerTube, etc), it would make sense for our Link definitions to build up from theirs--at least in the places where Schema.org's vocabulary is lacking:
https://www.w3.org/TR/activitystreams-vocabulary/#dfn-link

The Web of Things spec also has a nearly identical definition and hopefully they could be convinced to match the ActivityStreams terminology also.

WPUB ActivityStreams Web of Things
url href* href
encodingFormat mediaType mediaType
name name
description
rel rel rel
hreflang
height
width
anchor
preview
  • ActivityStreams also defines url which has a range of xsd:anyURI (i.e. a "raw" URL in the form of http://example.com/) OR a Link object whereas their href is defined with a range of only xsd:anyURI

At the very least, it would be helpful if the three communities could coordinate on their use of rel and mediaType. I've also found the distinction between url and href to be valuable.

Context files for each community:

must inLanguage be an array?

I was trying out the JSON schema today, and got an error for

"inLanguage": "en";

I suspect that others will be tripped up by this.

How Metadata Works in the Publishing World

In much of the EPUB world, the metadata that matters is not inside the EPUB, but outside (in the form of ONIX). The metadata inside EPUBs is often wrong, is difficult to change, and there is very little incentive to make it accurate since it's mostly unused.

In the web world, page metadata directly affects search ranking, Google rich snippets, etc. There is no out-of-band transmission of metadata. There is strong incentive to make it accurate.

How do we avoid the situation with EPUB, where we've spent decades worrying about metadata, continually changing how it's expressed, without really benefiting users?

trimming whitespace

We don't say anything about trimming whitespace in the processing algorithm, or generally in the specification, and JSON/JSON-LD don't define rules (that I've found). The only statement I can find in our spec about trimming is in the definition for non-empty, which just defers to other specifications for the rules.

Adding a step to trim all property names and values is simple enough, but what is the expectation if two properties end up with the same name after trimming:

   "name": "John Doe",
   "name ": "Jane Doe"

Is the second discarded or is it expected that the user agent make a "name" array for its internal representation?

Or do we avoid trimming property names (only do it for values) and the second instance just ends up an unrecognized property?

Editorial error in § 2.7.3.6.2 Item-specific Language

§ 2.7.3.6.2 Item-specific Language
https://www.w3.org/TR/pub-manifest/#manifest-specific-language-and-dir

The last two paragraphs are almost the same:

Once the base direction has been identified, user agents MUST determine the appropriate rendering and display of natural language values according to the Unicode Bidirectional Algorithm [bidi]. This could require wrapping additional markup or Unicode formatting characters around the string prior to display, in order to apply the base direction.

Once the base direction has been identified, user agents MUST determine the appropriate rendering and display of natural language values according to the Unicode Bidirectional Algorithm[bidi]. This could require wrapping additional markup or control characters around the string prior to display, in order to apply the base direction. (See § D. Examples for bidirectional texts.

URLs shouldn't be required to dereference to a resource

Section 2.1.7.5 URLs says:

URLs MUST dereference to a resource, although user agents are not required to dereference all URLs in the manifest.

This is probably wrong, I don't think anyone can enforce that (it's certainly not testable, at least not in a consistent manner); and 404 are an inherent part of the Web.

Maybe remove that statement?

I18N Self-review

Short i18n review checklist is here. The relevant documents are:

The second document is largely based on the first, and adds comparatively very little; these additions are all irrelevant in terms of internationalization.

Note, also, that the first document was originally developed under the name "Web Publication" in a separate repository, i.e., the earlier issues referred to are in the https://github.com/w3c/wpub repository.

Self test

(Only the relevant "sub" forms below have been copied from the i18n form. Other entries in that form are non-applicable.)

  1. If the spec (or its implementation) contains any natural language text that will be read by a human (this includes error messages or other UI text, JSON strings, etc, etc),

    1. It should be possible to associate a language with any piece of natural language text that will be read by a user. more

    2. Where possible, there should be a way to label natural language changes in inline text. more

      N/A (Not Applicable)

    3. Consider whether it is useful to express the intended linguistic audience of a resource, in addition to specifying the language used for text processing. more

      See the global publication language tag

    4. A language declaration that indicates the text processing language for a range of text must associate a single language value with a specific range of text. more

      The language tag can be set both globally and on individual items

    5. Use the HTML lang and XML xml:lang language attributes where appropriate to identify the text processing language, rather than creating a new attribute or mechanism. more

      N/A (Not Applicable) (the mechanism is inherited from JSON-LD)

    6. It should be possible to associate a metadata-type language declaration (which indicates the intended use of the resource rather than the language of a specific range of text) with multiple language values. more

      There is a separate (global and local) marker for the language used in the manifest, and another one for the language of the publication. The latter is a list with decreasing priority for possibly several languages.
      (At this moment, ie, 27.08.'19, this is not yet in the draft, but it will be; see the relevant WG resolution.)

    7. Attributes that express the language of external resources should not use the HTML lang and XML xml:lang language attributes, but should use a different attribute when they represent metadata (which indicates the intended use of the resource rather than the language of a specific range of text). more

      N/A (Not Applicable) (there is no mechanism to set the language of "external" resources; it is up to the individual resources, e.g., HTML files, to do that).

    8. Values for language declarations must use BCP 47. more

    9. Refer to BCP 47, not to RFC 5646. more

    10. Be specific about what level of conformance you expect for language tags. The word "valid" has special meaning in BCP 47. Generally "well-formed" is a better choice.

    11. Reference BCP47 for language tag matching.

    12. The specification should indicate how to define the default text-processing language for the resource as a whole. more

      See the global publication language tag.

    13. Content within the resource should inherit the language of the text-processing declared at the resource level, unless it is specifically overridden.

    14. Consider whether it is necessary to have separate declarations to indicate the text-processing language versus metadata about the expected use of the resource. more

      N/A (Not Applicable) (only one language is provided)

    15. If there is only one language declaration for a resource, and it has more than one language tag as a value, it must be possible to identify the default text-processing language for the resource. more

      N/A (Not Applicable) (there is only one declaration per resource).

    16. By default, blocks of content should inherit any text-processing language set for the resource as a whole. more

      N/A (Not Applicable) (there is no concept of a "block of content").

    17. It should be possible to indicate a change in language for blocks of content where the language changes. more

      N/A (Not Applicable) (there is no concept of a "block of content").

    18. It should be possible to indicate language for spans of inline text where the language changes. more

      N/A (Not Applicable) (JSON(-LD) operates with single texts only.)

    19. It must be possible to indicate base direction for each individual paragraph-level item of natural language text that will be read by someone. more

      N/A (Not Applicable) (there is no concept of a "paragraph level text").

    20. It must be possible to indicate base direction changes for embedded runs of inline bidirectional text for all natural language text that will be read by someone. more

      N/A (Not Applicable): (JSON(-LD) operates with single texts only.)

    21. Annotating right-to-left text must require the minimum amount of effort for people who work natively with right-to-left scripts. more

      See the definition of inDirection.

    22. Do not assume that direction can be determined from language information. more

    23. Values for the default base direction should include left-to-right, right-to-left, and auto. more

    24. Provide metadata constructs that can be used to indicate the base direction of any natural language string. more

      This can be done globally, but not locally. The issue has been discussed elsewhere (see, e.g., RDF Literals and Base Directions); essentially, this specification is based on JSON-LD and cannot unilaterally add new JSON-LD level structures to the manifest. This means that, in this version, this feature cannot be provided. See the discussion in #354, as well as the text in the specification explaining the situation.

    25. Specify that consumers of strings should use heuristics, preferably based on the Unicode Standard first-strong algorithm, to detect the base direction of a string except where metadata is provided. more

      See the discussion in the specification.

    26. Where possible, define a field to indicate the default direction for all strings in a given resource or document. more

    27. Do NOT assume that a creating a document-level default without the ability to change direction for any string is sufficient. more

    28. If metadata is not available due to legacy implementations and cannot otherwise be provided, specifications MAY allow a base direction to be interpolated from available language metadata. more

      N/A (Not Applicable)

    29. Specifications MUST NOT require the production or use of paired bidi controls. more

  2. If the spec (or its implementation) allows content authors to produce typographically appealing text, either in its own right, or in association with graphics.

    N/A (Not Applicable)

  3. If the spec (or its implementation) allows the user to point into text, creates text fragments, concatenates text, allows the user to select or step through text (using a cursor or other methods), etc.

    N/A (Not Applicable)

  4. If the spec (or its implementation) allows searching or matching of text, including syntax and identifiers

    N/A (Not Applicable)
     

  5. If the spec (or its implementation) sorts text

    N/A (Not Applicable)

  6. If the spec (or its implementation) captures user input

    N/A (Not Applicable)

  7. If the spec (or its implementation) deals with time in any way that will be read by humans and/or crosses time zone boundaries

    The two items that use dates (datePublished and dateModified) are defined as ISO8601 data (which should cover all the points below).

    1. When defining calendar and date systems, be sure to allow for dates prior to the common era, or at least define handling of dates outside the most common range.
    2. When defining time or date data types, ensure that the time zone or relationship to UTC is always defined.
    3. Provide a health warning for conversion of time or date data types that are "floating" to/from incremental types, referring as necessary to the Time Zones WG Note. more
    4. Allow for leap seconds in date and time data types. more
    5. Use consistent terminology when discussing date and time values. Use 'floating' time for time zone independent values.
    6. Keep separate the definition of time zone from time zone offset.
    7. Use IANA time zone IDs to identify time zones. Do not use offsets or LTO as a proxy for time zone.
    8. Use a separate field to identify time zone.
    9. When defining rules for a "week", allow for culturally specific rules to be applied. more
    10. When defining rules for week number of year, allow for culturally specific rules to be applied.
    11. When non-Gregorian calendars are permitted, note that the "month" field can go to 13 (undecimber).
    12. If the spec (or its implementation) allows any character encoding other than UTF-8.
  8. If the spec (or its implementation) defines markup.

    N/A (Not Applicable)

  9. If the spec (or its implementation) deals with names, addresses, time & date formats, etc

    (Only names are relevant in the specification.)

    1. Check whether you really need to store or access given name and family name separately. more

      Only a single name field is used.

    2. Avoid placing limits on the length of names, or if you do, make allowance for long strings. more

    3. Try to avoid using the labels 'first name' and 'last name' in non-localized contexts. more

    4. Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where users can provide part(s) of their name that you need to use for a specific purpose. more

      N/A (Not Applicable) (The class used for a Person has been defined by schema.org, this specification just takes it over. schema.org's Person has a number of relevant term, and this WG does not want to touch that class' definition.

    5. Allow for users to be asked separately how they would like you be addressed when someone contacts them. more

      N/A (Not Applicable) (there is no interaction with the user in the creation of this field).

    6. If parts of a person's name are captured separately, ensure that the separate items can capture all relevant information. more

      N/A (Not Applicable)

    7. Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more

      N/A (Not Applicable)

    8. Don't assume that a single letter name is an initial. more

      N/A (Not Applicable)

    9. Don't require that people supply a family name. more

    10. Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more

    11. Don't require names to be entered all in upper case. more

    12. Allow the user to enter a name with spaces. more

    13. Don't assume that members of the same family will share the same family name. more

      N/A (Not Applicable)

    14. It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more

      N/A (Not Applicable)

    15. You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, as separate items. more

      All "name" fields are defined as (possibly) arrays of names, and each item can have its language tag set.

  10. If the spec (or its implementation) describes a format or data that is likely to need localization.

    N/A (Not Applicable)

  11. If the spec (or its implementation) makes any reference to or relies on any cultural norms

    N/A (Not Applicable)

Spreads and having control over them

This is based on my gap analysis between EPUB 3.2 and WP: w3c/wpub#176 (comment)

In EPUB, there's a concept of spreads (mostly through the package rendering vocabulary) where two resources can be displayed next to one another, and where we also give the author and UA control over how resources are displayed.

This is mostly used in Fixed Layout publications and is useful for comics, kid books and textbooks among others.

Canonicalization algorithm should incorporate `@base`

The current algorithm is based on the "incoming" base value only. However, the author may use the JSON-LD @base term as part of its context, which would then overwrite the value of base. This is not accounted for in the algorithm.

Need for both inDirection and readingProgression

Per the discussion in PR #47, there are questions about these two properties and potential overlaps between them.

  • inDirection claims to specify the text direction for placing menus, etc., while
  • readingProgression claims to specify resource direction for placing menus, etc.

Both define ltr and rtl as there expected values, with readingProgression having a default of ltr and inDirection having no default. Do we need both?

Using rel="publication"

I'd like to re-visit our decision to roll our own rel value (publication) for detecting the WP manifest.

As I've said in the past, I don't think there's any good reason why we can't use manifest instead:

We could also immediately get rid of the first section of the lifecycle as well, and simply reference the WAM section instead.

I really think that this is a straightforward decision, and an easy win (less spec language and monkey patching in our draft).

Handling of invalid values

We state to issue warnings when certain values are determined to be invalid, but that's dodging the issue of what a user agent has to do in these cases.

We can leave it entirely up to the user agent to determine what to do with invalid values, but we might want to look at the cases more closely to avoid implementation ambiguity.

Where the invalidity isn't significant, like dates, we might state, for example, that the reading system should not include the property in its internal representation. Likewise, if there's a default, as with reading progression, that should probably be substituted for the invalid value.

Shouldn't reading order be an array

2.3.1 The PublicationManifest Dictionary

required sequence readingOrder;
like the resources
sequence resources = [];

I would have expected
required sequence readingOrder = [];

Letting the author control device orientation

This is based on my gap analysis between EPUB 3.2 and WP: w3c/wpub#176 (comment)

In EPUB, an author can indicate if an entire publication or a given resource should be displayed using a specific device orientation.

This is often used on Fixed Layout publications, where the orientation is tied to the nature of the resource.

[WR] Use of schema.org

  • The spec describes the use of https://schema.org as a required element of @context. This is common practice, but @danbri has expressed frustration for schema.org being part of the execution path of JSON-LD. We recommend that processors cache popular contexts such as schema.org, and you might as well. As the JSON-LD WG is including embedded HTML support, including for contexts, it's possible that in the future, schema.org will not perform content negotiation at https://schema.org for application/ld+json and will instead include an embedded JSON-LD context in a script element which does something like load http://schema.org/docs/jsonldcontext.json through something like the following:

    <html><head><script type="application/ld+json">
      {
        "@context": "http://schema.org/docs/jsonldcontext.json"
      }
    </script></head></html>

    I'm not sure what action you'd take upon something so prospective, particularly given that many specs reference `https://schema.org" (or equivalent), but you should be aware.

  • Some of the examples have invalid JSON syntax, for example the second part of Example 5 is missing a comma (",") after "type" : "Person". The JSON-LD specs have some infrastructure to extract all examples and perform validation of them (both syntactic and semantic), which can find such issues. Of course, this is not simple, but people copy and paste such examples, so it's good that they be validated.

  • In Example 6, the "resources" property describes that "datatypes.svg" is treated as a relative URL, however the term definition for "resources" seems inconsistent with this, as when this example is expanded (playground link) it is interpreted as a value.

  • Example 8 uses ItemList and itemListElement. Note that the values of itemListElement are not actually ordered, and order is specified using itemListOrder (which I suppose defaults to Unordered or schema:position in a schema:ListItem value. By using ItemList it may give the impression that values of itemListElement are ordered, which they are not. You might want to be clear about this.

  • Note that the specification of text direction in JSON-LD is under active discussion (as @iherman well knows) and there is hope that some solution may be forthcoming.

  • The "links" property is described as having string values interpreted as URLs, but when trying this in the playground, they expand to values, not ids. The term definition for "links" should include "@type": "@id".

  • Not schema.org related, but many properties (e.g., "encodingFormat") do not have a language. If it's possible that a publisher might put, say, "@language": "fr" in the context, this could cause properties that specifically should not have a language to gain one. You might consider adding "@language": null to the term definitions of such properties.

Validate expected value types

While we validate some specific values for syntax (e.g., dates), we don’t say anything about what to do if a property doesn’t have its expected value, for example:

   “author”: true

would silently slip through the algorithm without warning, even though it is supposed to be a compact localizable string.

We should add a general step to the validation section that says for all properties with a known value type, issue a warning if the value does not match that type.

Text base direction again — but now with a solution:-)

The issue of base direction has been plagued us for a while (see also issue 354 on wpub), but we may have a proper solution for it now. There has been discussions elsewhere to look for a solution, a (failed) attempt to revive the discussion in RDF land and, finally, a breakout session at TPAC. This led the JSON-LD WG to re-open the issue of adding base direction to JSON-LD 1.1. On its F2F meeting at TPAC, the JSON-LD WG has accepted a series of resolutions, see the minutes of Thursday and Friday for the details.

The essence is: JSON-LD 1.1 will introduce a new keyword @direction that can be used, essentially, the same way as @language: can be part of the context to denote a global (ie, default) value, and can also be used as part of an individual literal. In our terminology, a LocalizableString in our manifest can have a directional value, if needed, just the same way as we handle language tags.

What I propose is to make changes on the manifest specification to adopt this feature. What it would mean for the manifest is:

The result, I believe, would be a much cleaner format for our manifest, and we can but this issue at rest at last.

There is one caveat, though. This is a JSON-LD 1.1 feature, i.e., we become dependent on JSON-LD 1.1. This, by itself, is not a problem, JSON-LD 1.1 is slightly ahead of use in its advance towards a Rec. However, we have to be careful to use this feature in a way that it does not upset JSON-LD 1.0 processors that are supposed to simply ignore an unknown keyword. Without going into details what this means that we should not create the direction alias for @direction, as we do for, e.g., @language and @value. As a consequence, I would propose that we remove the usage of these aliases, and we use @langauge and @value; otherwise it looks very inconsistent.

As soon as the JSON-LD 1.1 editor's draft includes the new feature, I will put in a PR to adopt these changes in the manifest document. We can decide, through the PR, whether we agree with these changes.

(This should supersede, ie, close, issue #39, and also make the discussions in w3c/wpub#354 moot. Finally, it should close the gap in the i18n review in #38.)

Cc: @r12a @aphillips @mattgarrish @wareid @BigBlueHat @laudrain @llemeurfr @GarthConboy @rdeltour @dauwhe

Review of i18n self-review

I have done the official i18n self-review in issue #38. Before adding the extra label to ping the i18n people, the WG should have a look at the individual items and, if possible, check whether I have made a mistake.

I have left two issues open:

  • It should be possible to associate a metadata-type language declaration (which indicates the intended use of the resource rather than the language of a specific range of text) with multiple language values. more
  • Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where users can provide part(s) of their name that you need to use for a specific purpose. more

We do not fulfill those, and the question is whether we have proper arguments to stay that way or whether we would add additional features accordingly. The translation, into our world, is:

  • At the moment we have the inLanguage, defined as a single language tag, and that signals the publication's language and the default language used for the metadata terms (title, etc). Do we need the possibility to have an array of languages? What happens with multilingual ebooks, for example?
  • At the moment the name of a person is one or several text terms (i.e., the person's name in English and Japanese). Do we need extra fields (an example is to specify the name used for sorting)? If so, how do we do that, knowing that having universal vocabularies for names can quickly become a very complicated nightmare...

I think, for both cases, the responses should be based on industry practice and business usage...

General issue with schemas

It would be helpful to have a readme file for the schemas. Many people involved in this work have absolutely no experience with JSON schemas. Some basic questions:

  1. What version of the JSON schema spec are we targeting? From my limited searching, this seems to make a significant difference.

  2. Have our schemas been tested with particular validators? What might these be? Are they easy to set up?

  3. Which schema do we actually use? I'm guessing that publication.schema.json is the master, and imports the others.

PING self review

PING Questionnaire for Publication Manifest

The answers below often reference potential to expose information about a user based on the metadata contained in the publication manifest. It should be noted that the same or similar information could be gathered from a user simply reading a publication online using existing web technologies, so it is not clear that this format introduces any new surfaces for gathering PI, PII, or tracking. In addition to the information contained in this spec, there other other technologies it builds upon which are not covered here, including JSON-LD, HTML, CSS, HTTP, and HTTPS.

2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
As a data format, this specification does not call for any additional data to be exposed to a web site. While a web site could infer information about a user based on the content of the manifest (for example, author they may be interested in), that would be true of the content of any web page (for example a fan page in html about that author). WebIDL is used to describe the processing model for the content, but it is not intended to be used to expose information via an API.
2.2. Is this specification exposing the minimum amount of information necessary to power the feature?
There are multiple use cases for the content of this manifest. For instance, it could be delivered directly to a consumer, it could be sent to a digital storefront, or it could be used to archive the content. As such, not all data that could be encapsulated by the format will always be required. However, significant effort was put into determining the least amount of information required to make a publication useful, and only that limited set is required. Only information entered by the authored is contained in the format, and authors have full control over what information will be added.
2.3. How does this specification deal with personal information or personally-identifiable information or information derived thereof?
Neither PI nor PII is included in the format. Information about the author(s), content, etc may be included, however no mechanism is provided by the specification to include identifiable information automatically.

2.4. How does this specification deal with sensitive information?
This specification does not address how sensitive information should be handled. As a data format, no API is proposed to expose data to the web and therefore no mechanism is proposed to protect such distribution. Information about a personal library, reading habits, or other information gleaned from a publication or group of publications should be considered sensitive information. Since this specification does not address transmission of that data, it is up to existing web standards to provide adequate protections (for example, using https instead of http).
2.5. Does this specification introduce new state for an origin that persists across browsing sessions?
This specification does not directly allow browsers to persist state across sessions. While downloaded content could contain state about a user, no mechanism is provided by the specification for a website to access that downloaded content.
2.6. What information from the underlying platform, e.g. configuration data, is exposed by this specification to an origin?
This specification does not expose any data to an origin. But, see 2.8, below.
2.7. Does this specification allow an origin access to sensors on a user’s device
No.
2.8. What data does this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.
This specification does not expose any additional information to an origin. Note that it may reference other documents (for example, HTML) that could expose data. Since this specification does not alter the processing model for those other formats, it does not introduce any new data exposure.

2.9. Does this specification enable new script execution/loading mechanisms?
No. It does reference documents (via the manifest) which in turn might enable script loading mechanisms, but this is no different than clicking on a link.
2.10. Does this specification allow an origin to access other devices?
No.
2.11. Does this specification allow an origin some measure of control over a user agent’s native UI?
The specification itself does not provide a mechanism for overriding native UI. It is expected that implementations of this specification could allow such control, but such implementations would simply be web apps, which are not defined by this spec.
2.12. What temporary identifiers might this this specification create or expose to the web?
No temporary identifiers are created. A web publication itself has a permanent identifier (see https://www.w3.org/TR/pub-manifest/#canonical-identifier), but no mechanism is provided to expose that to external sites.
2.13. How does this specification distinguish between behavior in first-party and third-party contexts?
This specification does not change the processing model of the resources it references, therefore it does not distinguish between first and third parties. It is possible to create a manifest that references third party resources, but the standard processing models for the relevant formats and protocols handle such context switches. For example, a third-party font could be loaded via first party CSS, or the last item in the reading order could be hosted on another site, which will be handled as any other third party resource or page load by a UA.
2.14. How does this specification work in the context of a user agent’s Private Browsing or "incognito" mode?
Since this specification does not alter the UA processing model for documents, it has no impact on private mode.
2.15. Does this specification have a "Security Considerations" and "Privacy Considerations" section?
Yes.
2.16. Does this specification allow downgrading default security characteristics?

Yes.

PING Questionnaire for the Audiobook Profile of Publication Manifest
Please refer to the Publication Manifest questionnaire for a review of that specification. The answers for this specification are largely the same as this profile is intended to refine the manifest requirements of that specification. It does add a non-normative reference to the Lightweight Packaging Format, but does not define that format. It also adds placeholder sections for privacy and security. Otherwise the answers are the same as for the publication manifest.

Name of page that links/embeds manifest

The "primary entry page" was very specific to the idea of a web publication, but the concept lives on to a small degree in the manifest spec where we need to talk about the page that links to the manifest.

It's named the "publication" entry page for now, but we should review if there's a way to write the term out entirely or if there's an even more general name for it.

Minor issue on Identifiers (2.7.1.6.)

The text says:

Identifiers are used to refer to Web Content in a persistent and unambiguous manner

That is probably too restrictive; identifiers can also be used to identify persons (as actually referred to in the definition of entities).

Probably something like:

Identifiers are used to refer to Web Content, Persons, or Organizations in a persistent and unumbiguous manner

would be enough for our purposes (without getting into a discussion on what 'identifier' means in general...)

Reliance on type for profiles

I hate to revisit old issues, but given the mixture of properties now in the core I'm wondering how reliable type declarations are as a means of differentiating profiles?

As I understand it, the synchronized media specification uses duration, but duration is defined in Audiobook. So if I want to synchronize some other format, and also want to be honest to schema.org, I'd have to declare it as a type.

But how do we make sense of this? Does the order of type declarations matter?

More problematic, is that we don't say anything about this being the means of identifying profiles, and should there be a registry of reserved types somewhere?

Inheriting (or not) the language tag of a <script> element

(This issue has been noted in the WPUB spec for a while, and was never recorded.)

The current editors' draft says:

If the manifest is embedded in the primary entry page via a script element, and the manifest does not set the global language and/or the base direction (see § 2.6.3.4.1 Global Language and Direction), the lang and the dir attributes of the script element are used as the global language and base direction, respectively.

It must be noted that the JSON-LD 1.1 draft does not have this behavior, and the lang and dir attributes of the <script> element are ignored. We may want to remove this behavior from WPUB as well, to stay in sync.

Minor issue on identifiers: add it to persons and organizations, too...

In areas like scholarly publishing, the precise identification of a person is essential. ORCID is routinely used for that, but there are also others (VIAF, ISNI...). The current text refers to id as the canonical identifier of the publication or for a Person/Organization. It also refers to the generic schema.org identifier property, but the latter is only called out for the publication.

Proposal:

  1. The identifier property should also be called out explicitly for a Person or an Organization (in section 2.7.3.4), i.e., should also be part of the WebIDL specification, with a value of an Array of string values.

Usage of JSON-LD language maps in WPM?

(This is a spin-off from w3c/wpub#287; raised it separately to follow the discussions better.)

The current setup for localizable strings is to use either a simple string (inheriting the language set via inLanguage, if available, otherwise no language is provided) or an object of the form:

{
	"@value" : "The string",
    "@language": "en"
}

When we have multilingual values, at the moment the only option is to use a mix of these in an array:

{
   "name" : [
  	    "The Three Musketeers",
  	    {
            "@value": "Les Trois Mousquetaires",
            "@language": "fr"
        }
    ]
}

JSON-LD has an alternate notion, ie, language maps, that would allow a more consise formulation:

{
   "name" : {
  	    "en": "The Three Musketeers",
        "fr": "Les Trois Mousquetaires"
    }
}

Question: should we rely on language maps instead of (or maybe additionally to?) the current "@value"+"@language" approach?

Should we use JSON schemas as part of the spec?

(This is a spin-off of #11.)

The manifest defines some sort of a subset of JSON-LD. It may be a good idea to use JSON Schemas to define that subset more formally.

(JSON Schemas is a moving target, so the reference can only be informative, though.)

Inclusion of certifiedBy, conformsTo and certifiersCredential

We specifically added the optional
accessibility-report in 2.8.1.1 Accessibility Report

I would propose including
accessibility-conformsTo
accessibility-certifiedBy
and
accessibility-certifiersCredential

Since these are metadata properties being added to conformant EPUBs currently in the US. Macmillan Learning for one is publishing EPUBs with this information and being able to include it in a web manifest I would think would be equally important.
Self Certification is also possible and Pearson I believe is already doing this as well.

Trim examples

The bulk of our examples include @context and type declarations, even though this information isn't specifically relevant to the examples.

Let's trim these to just the information readers need to pay attention to.

Should the canonical identifier resolve to a preferred version?

Without the web publications underpinnings, there isn't as strong a case for recommending URLs for the canonical identifier at the manifest level.

Should we consider removing this recommendation and leaving it to implementations to decide whether URLs are preferred?

The possibility to add a `type` to `LinkedResource` or `LocalizableString` should be in the spec

At the moment, we talk about LinkedResource and LocalizableString as separate object type. Per JSON-LD it should be possible to add these explicitly to the objects; this should be reflected in the respective definition (https://w3c.github.io/pub-manifest/#app-linkedResource, respectively https://w3c.github.io/pub-manifest/#dom-localizablestring), and also reflected in the respective WebIDL.

Obviously, both of these are optional, and it is really for JSON-LD geeks only. But it should be possible...

(I do not believe this issue requires WG Discussion; it is editorial only...)

Should the manifest set minimum property requirements?

The web publications implementation of the manifest defined the required and recommended sets of properties. That didn't come across with part 1.

Should we add a section that sets a similar common base of properties for all manifests regardless of implementation?

If so, should we stop recommending properties and only define what is critical and leave it to implementations to recommend from the rest plus add whatever they need?

Metadata rendering "hints" from epub we should consider for manifests

As discussed on the March 18th, 2019 WP call, here is a list of rendering related metadata that epub supports and has found some traction in the publishing/reading system community. This is not an exhaustive list, it is intended to contain only those settings that seem to have actual use.

Metadata can be specified at the publication level (applies to the entire publication), the item level (applies to a section of the publication), or both.

page-progression-direction controls the direction (left or right) that pages should turn when implementing next page functionality. Used extensively in Japan, but has traction for other languages. Frequent use and has been implemented multiple times. Critical to support, otherwise some content will be broken. May not be needed for scrolled content, or for all UIs. Publication level.

flow-[auto|paginated|scrolled-continuous|scrolled-doc] indicates whether the content is intended to be paginated or scrolled, and if scrolled whether it is continuous over multiple items. Differs from CSS @page as that describes styling when content is paginated, this specifies whether pagination should occur. Unclear how common this is in practice, though I believe there are some implementations. Both item and publication levels, but unclear if mixed content exists.

layout-[pre-paginated|reflowable] indicates whether an item should be considered a single, high design "page", or whether it is a stream of 1 or more pages. Some overlap with the flow-* properties, above. Widely used and implemented. Both levels, but unclear how common mixed content is.

orientation-[auto|landscape|portrait] hints about the overall aspect ratio of the content. Could be (and is) used to control how a book opens on phones and tablets (auto switches device orientation). Widely used, sometimes correctly. Often coupled with spread-* properties. Both levels.

spread-[auto|both|none|landscape|portrait] indicates when and how synthetic spreads should be generated (that is, when to put pages side-by-side). Widely used and implemented. Both levels.

page-spread-[left|right|center] whether the first (or only) page of an item should appear on the left or right side of (or centered in) the display when showing more than 1 page (that is, in spreads). Widely used and implemented, particularly in pre-paginated content. When missing this can completely break content. Item level.

viewport defines the aspect ratio of pre-paginated content. May also appear in the document content, so may not be needed at the higher level. Both levels.

linear controls whether the item is part of the linear navigation. When true, this is part of the main publication content, when false indicates where it might appear in a printed publication but indicates that it is not part of the main, linear navigation of the publication. Implementations use this as a hint for how and where to display the content. Item level.

TAG review of Web Publications

In response to the TAG review request in w3ctag/design-reviews#344 (which originally came from w3c/wpub#384), I wanted to file an issue here (since you requested the filing of a single issue in your repo) with a pointer back to the feedback so far, which is in that issue.

There's a good bit in that issue (most of which I wrote) -- and I don't want to copy it here because I also think it's not quite done -- there were a few other TAG members who wanted to take a look and will hopefully do so soon. However, I wanted to file this in advance of being "fully done" since you suggested that it would be useful to have the feedback prior to your face-to-face meeting next week.

One high level note would be that reading the use cases document made it sound like you were going to do a bunch of things that seem like they might be scary, but reading the actual specification seemed much less scary. I'm not sure whether it's worth going back to the use cases document and saying how the use cases are addressed -- it might depend on how frequently you intend to point people to the use cases document in the future.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.