The pub-manifest from w3c

Specification's version in the manifest

Since we created a first propotype of Web Publication based on the current specifications, I wondered how to indicate in the manifest that it refers to the current specifications alpha or beta version.

In general, if in a few years we will define Web Publication 2.0, how will a User Agent distinguish current Web Publications from future ones?

json-ld reference

We should reference JSON-LD 1.1 as it will be going to REC around the same time, but it's not clear if our current unnumbered specref reference will auto-updated when that specification gets published.

Is there a need for both an authored and a canonical manifests?

At the moment, there is an authored and a canonical manifest, with a separate canonicalization step to transform the authored manifest into the canonical one. The goal is to allow the author to express data more succinctly (eg, use only simple file names instead of complete LinkedResource instances or person names instead of Person structures).

It was raised, in #11, that the price being paid for having this is too high:

For the few people typing these by hand, [...] but for the vast majority of implementers (i.e. CMS's generating these manifests), I think they'd find the consistency of using the canonical representation (and the lack of "overhead" of needing it to be canonicalized every time...) to be a win. (#11 (comment))

Question: do we want to simplify the manifest by removing this extra step and defining the manifest purely in terms of what is currently called the canonical manifest?

Add linked resource property called 'alternative'

LinkedResource needs a property called alternative, which has as its value a URL of a resource that represents the LinkedResource's content in an alternate modality. It's used as follows:

audio book with mp3s in the reading order; a reading order entry's alternative property points to a text file for that audio (see w3c/pwpub#44)
audio book, as above, but with alternative pointing to a synchronized narration document
text book with HTML files in the reading order; alternative points to a synchronized narration document.

The resource pointed to by alternative appears in the resources list and is processed according to mime type.

Reference example: draft of how to incorporate synchronized narration: https://w3c.github.io/sync-media-pub/packaging.html

"Processing the manifest": consider adopting the failure/error terminology

The result of the current algo is either a canonical representation of the manifest or an early termination (which means "the manifest is not valid"). There are also cases where the algo issues a warning but does not terminate.

Most of the other specs we reference (e.g. HTML, URL) instead use a failure/error terminology. A failure is returned when the algorithm terminates early. An error is recoverable.

For instance, in the processing model for images HTML defines parse errors as:

A parse error for algorithms in this section indicates a non-fatal mismatch between input and requirements. User agents are encouraged to expose parse errors somehow.

The URL Standard defines validation error as:

A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.

Note: A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.

I suggest we adopt the same terminology in our spec. The more alignment, the easier it is for spec ~~addicts~~ readers.

Also, incidentally, when the spec says:

If the algorithm terminates early, the manifest is not valid.

I think it is worth noting that the opposite is not necessarily true. For instance when the required type property is missing, a default value of CreativeWork will be assumed. I think this would be worth being explicitly clarified.

Why do we have C.3 Audiobook

Since we have a specific Audiobook profile I was surprised to see a manifest example for AudioBook here. Are we going to point to the audiobook profile, or move this over to that spec? Just seems like if we are showing this manifest example here for Audiobooks why do we have a completely separate specification for Audiobooks.

Use existing Vocabulary terms from ActivityStreams for Link semantics

Given that ActivityStreams is widely used (thanks to Mastadon, PeerTube, etc), it would make sense for our Link definitions to build up from theirs--at least in the places where Schema.org's vocabulary is lacking:
https://www.w3.org/TR/activitystreams-vocabulary/#dfn-link

The Web of Things spec also has a nearly identical definition and hopefully they could be convinced to match the ActivityStreams terminology also.

WPUB	ActivityStreams	Web of Things
url	href*	href
encodingFormat	mediaType	mediaType
name	name
description
rel	rel	rel
	hreflang
	height
	width
		anchor
	preview

ActivityStreams also defines url which has a range of xsd:anyURI (i.e. a "raw" URL in the form of http://example.com/) OR a Link object whereas their href is defined with a range of only xsd:anyURI

At the very least, it would be helpful if the three communities could coordinate on their use of rel and mediaType. I've also found the distinction between url and href to be valuable.

Context files for each community:

must inLanguage be an array?

I was trying out the JSON schema today, and got an error for

"inLanguage": "en";

I suspect that others will be tripped up by this.

Additional values for readingProgression

readingProgression defines the values ltr and rtl. Should it also include top-to-bottom and/or bottom-to-top, as raised in #47?

How Metadata Works in the Publishing World

In much of the EPUB world, the metadata that matters is not inside the EPUB, but outside (in the form of ONIX). The metadata inside EPUBs is often wrong, is difficult to change, and there is very little incentive to make it accurate since it's mostly unused.

In the web world, page metadata directly affects search ranking, Google rich snippets, etc. There is no out-of-band transmission of metadata. There is strong incentive to make it accurate.

How do we avoid the situation with EPUB, where we've spent decades worrying about metadata, continually changing how it's expressed, without really benefiting users?

trimming whitespace

We don't say anything about trimming whitespace in the processing algorithm, or generally in the specification, and JSON/JSON-LD don't define rules (that I've found). The only statement I can find in our spec about trimming is in the definition for non-empty, which just defers to other specifications for the rules.

Adding a step to trim all property names and values is simple enough, but what is the expectation if two properties end up with the same name after trimming:

   "name": "John Doe",
   "name ": "Jane Doe"

Is the second discarded or is it expected that the user agent make a "name" array for its internal representation?

Or do we avoid trimming property names (only do it for values) and the second instance just ends up an unrecognized property?

Editorial error in § 2.7.3.6.2 Item-specific Language

§ 2.7.3.6.2 Item-specific Language
https://www.w3.org/TR/pub-manifest/#manifest-specific-language-and-dir

The last two paragraphs are almost the same:

Once the base direction has been identified, user agents MUST determine the appropriate rendering and display of natural language values according to the Unicode Bidirectional Algorithm [bidi]. This could require wrapping additional markup or Unicode formatting characters around the string prior to display, in order to apply the base direction.

Once the base direction has been identified, user agents MUST determine the appropriate rendering and display of natural language values according to the Unicode Bidirectional Algorithm[bidi]. This could require wrapping additional markup or control characters around the string prior to display, in order to apply the base direction. (See § D. Examples for bidirectional texts.

URLs shouldn't be required to dereference to a resource

Section 2.1.7.5 URLs says:

URLs MUST dereference to a resource, although user agents are not required to dereference all URLs in the manifest.

This is probably wrong, I don't think anyone can enforce that (it's certainly not testable, at least not in a consistent manner); and 404 are an inherent part of the Web.

Maybe remove that statement?

I18N Self-review

Short i18n review checklist is here. The relevant documents are:

The second document is largely based on the first, and adds comparatively very little; these additions are all irrelevant in terms of internationalization.

Note, also, that the first document was originally developed under the name "Web Publication" in a separate repository, i.e., the earlier issues referred to are in the https://github.com/w3c/wpub repository.

Self test

(Only the relevant "sub" forms below have been copied from the i18n form. Other entries in that form are non-applicable.)

Spreads and having control over them

This is based on my gap analysis between EPUB 3.2 and WP: w3c/wpub#176 (comment)

In EPUB, there's a concept of spreads (mostly through the package rendering vocabulary) where two resources can be displayed next to one another, and where we also give the author and UA control over how resources are displayed.

This is mostly used in Fixed Layout publications and is useful for comics, kid books and textbooks among others.

Canonicalization algorithm should incorporate `@base`

The current algorithm is based on the "incoming" base value only. However, the author may use the JSON-LD @base term as part of its context, which would then overwrite the value of base. This is not accounted for in the algorithm.

Need for both inDirection and readingProgression

Per the discussion in PR #47, there are questions about these two properties and potential overlaps between them.

inDirection claims to specify the text direction for placing menus, etc., while
readingProgression claims to specify resource direction for placing menus, etc.

Both define ltr and rtl as there expected values, with readingProgression having a default of ltr and inDirection having no default. Do we need both?

Using rel="publication"

I'd like to re-visit our decision to roll our own rel value (publication) for detecting the WP manifest.

As I've said in the past, I don't think there's any good reason why we can't use manifest instead:

if we adopt the WAM for our serialization, we'll need to use it anyway
even if we don't adopt the WAM, the definition for manifest is a perfectly fit for our use case (https://w3c.github.io/manifest/#link-relation-type-registration) and we can easily identify the WP manifest using a media type

We could also immediately get rid of the first section of the lifecycle as well, and simply reference the WAM section instead.

I really think that this is a straightforward decision, and an easy win (less spec language and monkey patching in our draft).

Handling of invalid values

We state to issue warnings when certain values are determined to be invalid, but that's dodging the issue of what a user agent has to do in these cases.

We can leave it entirely up to the user agent to determine what to do with invalid values, but we might want to look at the cases more closely to avoid implementation ambiguity.

Where the invalidity isn't significant, like dates, we might state, for example, that the reading system should not include the property in its internal representation. Likewise, if there's a default, as with reading progression, that should probably be substituted for the invalid value.

Shouldn't reading order be an array

2.3.1 The PublicationManifest Dictionary

required sequence readingOrder;
like the resources
sequence resources = [];

I would have expected
required sequence readingOrder = [];

Letting the author control device orientation

This is based on my gap analysis between EPUB 3.2 and WP: w3c/wpub#176 (comment)

In EPUB, an author can indicate if an entire publication or a given resource should be displayed using a specific device orientation.

This is often used on Fixed Layout publications, where the orientation is tied to the nature of the resource.

Media type for Publication Manifest

We still have a Processing the Manifest section--as noted in #60 (comment) As long as that exists--i.e. as long as this specification requires more than JSON and/or JSON-LD processing algorithms--it will need it's own media type to signal such processing.

Custom Type Definition Appendix is confusing

LinkedResource is a type meant to be used within the JSON document, but CreatorInfo is not--as I understand it's use...and LocalizableString would actually create invalid JSON-LD if used in the document...
https://www.w3.org/TR/pub-manifest/#app-custom-types

We should either clarify their use ("internal representation only") or put them in separate sections or make them more clearly distinct "animals."

manifest processing model, what if null base URL? (related to origin issue)

Issue originally raised in the "opaque origin" conversation:
w3c/wpub#321 (comment)

[WR] Use of schema.org

The spec describes the use of https://schema.org as a required element of @context. This is common practice, but @danbri has expressed frustration for schema.org being part of the execution path of JSON-LD. We recommend that processors cache popular contexts such as schema.org, and you might as well. As the JSON-LD WG is including embedded HTML support, including for contexts, it's possible that in the future, schema.org will not perform content negotiation at https://schema.org for application/ld+json and will instead include an embedded JSON-LD context in a script element which does something like load http://schema.org/docs/jsonldcontext.json through something like the following:
```
<html><head><script type="application/ld+json">
  {
    "@context": "http://schema.org/docs/jsonldcontext.json"
  }
</script></head></html>
```
I'm not sure what action you'd take upon something so prospective, particularly given that many specs reference `https://schema.org" (or equivalent), but you should be aware.
Some of the examples have invalid JSON syntax, for example the second part of Example 5 is missing a comma (",") after "type" : "Person". The JSON-LD specs have some infrastructure to extract all examples and perform validation of them (both syntactic and semantic), which can find such issues. Of course, this is not simple, but people copy and paste such examples, so it's good that they be validated.
In Example 6, the "resources" property describes that "datatypes.svg" is treated as a relative URL, however the term definition for "resources" seems inconsistent with this, as when this example is expanded (playground link) it is interpreted as a value.
Example 8 uses ItemList and itemListElement. Note that the values of itemListElement are not actually ordered, and order is specified using itemListOrder (which I suppose defaults to Unordered or schema:position in a schema:ListItem value. By using ItemList it may give the impression that values of itemListElement are ordered, which they are not. You might want to be clear about this.
Note that the specification of text direction in JSON-LD is under active discussion (as @iherman well knows) and there is hope that some solution may be forthcoming.
The "links" property is described as having string values interpreted as URLs, but when trying this in the playground, they expand to values, not ids. The term definition for "links" should include "@type": "@id".
Not schema.org related, but many properties (e.g., "encodingFormat") do not have a language. If it's possible that a publisher might put, say, "@language": "fr" in the context, this could cause properties that specifically should not have a language to gain one. You might consider adding "@language": null to the term definitions of such properties.

Validate expected value types

While we validate some specific values for syntax (e.g., dates), we don’t say anything about what to do if a property doesn’t have its expected value, for example:

   “author”: true

would silently slip through the algorithm without warning, even though it is supposed to be a compact localizable string.

We should add a general step to the validation section that says for all properties with a known value type, issue a warning if the value does not match that type.

Text base direction again — but now with a solution:-)

The issue of base direction has been plagued us for a while (see also issue 354 on wpub), but we may have a proper solution for it now. There has been discussions elsewhere to look for a solution, a (failed) attempt to revive the discussion in RDF land and, finally, a breakout session at TPAC. This led the JSON-LD WG to re-open the issue of adding base direction to JSON-LD 1.1. On its F2F meeting at TPAC, the JSON-LD WG has accepted a series of resolutions, see the minutes of Thursday and Friday for the details.

The essence is: JSON-LD 1.1 will introduce a new keyword @direction that can be used, essentially, the same way as @language: can be part of the context to denote a global (ie, default) value, and can also be used as part of an individual literal. In our terminology, a LocalizableString in our manifest can have a directional value, if needed, just the same way as we handle language tags.

What I propose is to make changes on the manifest specification to adopt this feature. What it would mean for the manifest is:

the current direction term disappears and is replaced by a @direction keyword usage alongside @language (see 2.5.1)
item specific settings (in 2.5.2) will include @direction
The definition of LocalizableString will be updated
Some additions to the context file to empower all this.

The result, I believe, would be a much cleaner format for our manifest, and we can but this issue at rest at last.

There is one caveat, though. This is a JSON-LD 1.1 feature, i.e., we become dependent on JSON-LD 1.1. This, by itself, is not a problem, JSON-LD 1.1 is slightly ahead of use in its advance towards a Rec. However, we have to be careful to use this feature in a way that it does not upset JSON-LD 1.0 processors that are supposed to simply ignore an unknown keyword. Without going into details what this means that we should not create the direction alias for @direction, as we do for, e.g., @language and @value. As a consequence, I would propose that we remove the usage of these aliases, and we use @langauge and @value; otherwise it looks very inconsistent.

As soon as the JSON-LD 1.1 editor's draft includes the new feature, I will put in a PR to adopt these changes in the manifest document. We can decide, through the PR, whether we agree with these changes.

(This should supersede, ie, close, issue #39, and also make the discussions in w3c/wpub#354 moot. Finally, it should close the gap in the i18n review in #38.)

Cc: @r12a @aphillips @mattgarrish @wareid @BigBlueHat @laudrain @llemeurfr @GarthConboy @rdeltour @dauwhe

Review of i18n self-review

I have done the official i18n self-review in issue #38. Before adding the extra label to ping the i18n people, the WG should have a look at the individual items and, if possible, check whether I have made a mistake.

I have left two issues open:

It should be possible to associate a metadata-type language declaration (which indicates the intended use of the resource rather than the language of a specific range of text) with multiple language values. more
Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where users can provide part(s) of their name that you need to use for a specific purpose. more

We do not fulfill those, and the question is whether we have proper arguments to stay that way or whether we would add additional features accordingly. The translation, into our world, is:

At the moment we have the inLanguage, defined as a single language tag, and that signals the publication's language and the default language used for the metadata terms (title, etc). Do we need the possibility to have an array of languages? What happens with multilingual ebooks, for example?
At the moment the name of a person is one or several text terms (i.e., the person's name in English and Japanese). Do we need extra fields (an example is to specify the name used for sorting)? If so, how do we do that, knowing that having universal vocabularies for names can quickly become a very complicated nightmare...

I think, for both cases, the responses should be based on industry practice and business usage...

General issue with schemas

It would be helpful to have a readme file for the schemas. Many people involved in this work have absolutely no experience with JSON schemas. Some basic questions:

What version of the JSON schema spec are we targeting? From my limited searching, this seems to make a significant difference.
Have our schemas been tested with particular validators? What might these be? Are they easy to set up?
Which schema do we actually use? I'm guessing that publication.schema.json is the master, and imports the others.

PING self review

PING Questionnaire for Publication Manifest

The answers below often reference potential to expose information about a user based on the metadata contained in the publication manifest. It should be noted that the same or similar information could be gathered from a user simply reading a publication online using existing web technologies, so it is not clear that this format introduces any new surfaces for gathering PI, PII, or tracking. In addition to the information contained in this spec, there other other technologies it builds upon which are not covered here, including JSON-LD, HTML, CSS, HTTP, and HTTPS.

2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
As a data format, this specification does not call for any additional data to be exposed to a web site. While a web site could infer information about a user based on the content of the manifest (for example, author they may be interested in), that would be true of the content of any web page (for example a fan page in html about that author). WebIDL is used to describe the processing model for the content, but it is not intended to be used to expose information via an API.
2.2. Is this specification exposing the minimum amount of information necessary to power the feature?
There are multiple use cases for the content of this manifest. For instance, it could be delivered directly to a consumer, it could be sent to a digital storefront, or it could be used to archive the content. As such, not all data that could be encapsulated by the format will always be required. However, significant effort was put into determining the least amount of information required to make a publication useful, and only that limited set is required. Only information entered by the authored is contained in the format, and authors have full control over what information will be added.
2.3. How does this specification deal with personal information or personally-identifiable information or information derived thereof?
Neither PI nor PII is included in the format. Information about the author(s), content, etc may be included, however no mechanism is provided by the specification to include identifiable information automatically.

2.4. How does this specification deal with sensitive information?
This specification does not address how sensitive information should be handled. As a data format, no API is proposed to expose data to the web and therefore no mechanism is proposed to protect such distribution. Information about a personal library, reading habits, or other information gleaned from a publication or group of publications should be considered sensitive information. Since this specification does not address transmission of that data, it is up to existing web standards to provide adequate protections (for example, using https instead of http).
2.5. Does this specification introduce new state for an origin that persists across browsing sessions?
This specification does not directly allow browsers to persist state across sessions. While downloaded content could contain state about a user, no mechanism is provided by the specification for a website to access that downloaded content.
2.6. What information from the underlying platform, e.g. configuration data, is exposed by this specification to an origin?
This specification does not expose any data to an origin. But, see 2.8, below.
2.7. Does this specification allow an origin access to sensors on a user’s device
No.
2.8. What data does this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.
This specification does not expose any additional information to an origin. Note that it may reference other documents (for example, HTML) that could expose data. Since this specification does not alter the processing model for those other formats, it does not introduce any new data exposure.

2.9. Does this specification enable new script execution/loading mechanisms?
No. It does reference documents (via the manifest) which in turn might enable script loading mechanisms, but this is no different than clicking on a link.
2.10. Does this specification allow an origin to access other devices?
No.
2.11. Does this specification allow an origin some measure of control over a user agent’s native UI?
The specification itself does not provide a mechanism for overriding native UI. It is expected that implementations of this specification could allow such control, but such implementations would simply be web apps, which are not defined by this spec.
2.12. What temporary identifiers might this this specification create or expose to the web?
No temporary identifiers are created. A web publication itself has a permanent identifier (see https://www.w3.org/TR/pub-manifest/#canonical-identifier), but no mechanism is provided to expose that to external sites.
2.13. How does this specification distinguish between behavior in first-party and third-party contexts?
This specification does not change the processing model of the resources it references, therefore it does not distinguish between first and third parties. It is possible to create a manifest that references third party resources, but the standard processing models for the relevant formats and protocols handle such context switches. For example, a third-party font could be loaded via first party CSS, or the last item in the reading order could be hosted on another site, which will be handled as any other third party resource or page load by a UA.
2.14. How does this specification work in the context of a user agent’s Private Browsing or "incognito" mode?
Since this specification does not alter the UA processing model for documents, it has no impact on private mode.
2.15. Does this specification have a "Security Considerations" and "Privacy Considerations" section?
Yes.
2.16. Does this specification allow downgrading default security characteristics?

Yes.

PING Questionnaire for the Audiobook Profile of Publication Manifest
Please refer to the Publication Manifest questionnaire for a review of that specification. The answers for this specification are largely the same as this profile is intended to refine the manifest requirements of that specification. It does add a non-normative reference to the Lightweight Packaging Format, but does not define that format. It also adds placeholder sections for privacy and security. Otherwise the answers are the same as for the publication manifest.

Name of page that links/embeds manifest

The "primary entry page" was very specific to the idea of a web publication, but the concept lives on to a small degree in the manifest spec where we need to talk about the page that links to the manifest.

It's named the "publication" entry page for now, but we should review if there's a way to write the term out entirely or if there's an even more general name for it.

Minor issue on Identifiers (2.7.1.6.)

The text says:

Identifiers are used to refer to Web Content in a persistent and unambiguous manner

That is probably too restrictive; identifiers can also be used to identify persons (as actually referred to in the definition of entities).

Probably something like:

Identifiers are used to refer to Web Content, Persons, or Organizations in a persistent and unumbiguous manner

would be enough for our purposes (without getting into a discussion on what 'identifier' means in general...)

Reliance on type for profiles

I hate to revisit old issues, but given the mixture of properties now in the core I'm wondering how reliable type declarations are as a means of differentiating profiles?

As I understand it, the synchronized media specification uses duration, but duration is defined in Audiobook. So if I want to synchronize some other format, and also want to be honest to schema.org, I'd have to declare it as a type.

But how do we make sense of this? Does the order of type declarations matter?

More problematic, is that we don't say anything about this being the means of identifying profiles, and should there be a registry of reserved types somewhere?

Inheriting (or not) the language tag of a <script> element

(This issue has been noted in the WPUB spec for a while, and was never recorded.)

The current editors' draft says:

If the manifest is embedded in the primary entry page via a script element, and the manifest does not set the global language and/or the base direction (see § 2.6.3.4.1 Global Language and Direction), the lang and the dir attributes of the script element are used as the global language and base direction, respectively.

It must be noted that the JSON-LD 1.1 draft does not have this behavior, and the lang and dir attributes of the <script> element are ignored. We may want to remove this behavior from WPUB as well, to stay in sync.

Explicit typing for Linked Resource not necessary?

Question: is the inclusion of

... : {
  "type" : "LinkedResource",
  ...
}

required or optional?

Minor issue on identifiers: add it to persons and organizations, too...

In areas like scholarly publishing, the precise identification of a person is essential. ORCID is routinely used for that, but there are also others (VIAF, ISNI...). The current text refers to id as the canonical identifier of the publication or for a Person/Organization. It also refers to the generic schema.org identifier property, but the latter is only called out for the publication.

Proposal:

The identifier property should also be called out explicitly for a Person or an Organization (in section 2.7.3.4), i.e., should also be part of the WebIDL specification, with a value of an Array of string values.

Usage of JSON-LD language maps in WPM?

(This is a spin-off from w3c/wpub#287; raised it separately to follow the discussions better.)

The current setup for localizable strings is to use either a simple string (inheriting the language set via inLanguage, if available, otherwise no language is provided) or an object of the form:

{
	"@value" : "The string",
    "@language": "en"
}

When we have multilingual values, at the moment the only option is to use a mix of these in an array:

{
   "name" : [
  	    "The Three Musketeers",
  	    {
            "@value": "Les Trois Mousquetaires",
            "@language": "fr"
        }
    ]
}

JSON-LD has an alternate notion, ie, language maps, that would allow a more consise formulation:

{
   "name" : {
  	    "en": "The Three Musketeers",
        "fr": "Les Trois Mousquetaires"
    }
}

Question: should we rely on language maps instead of (or maybe additionally to?) the current "@value"+"@language" approach?

Missing mention of multilingual natural metadata value

In Manifest language and direction, there is no mention of the possibility to represent name as an array of localizable strings.

Even if this is well specified in https://www.w3.org/TR/pub-manifest/#CreatorInfo and https://www.w3.org/TR/pub-manifest/#value-array, I believe such an important feature should be illustrated by a sample in the Manifest language and direction section.

Update manifest context?

"WP" is incorporated into the default context URL:

https://www.w3.org/ns/wp-context

Should we consider swapping it for "pub"?

Should we use JSON schemas as part of the spec?

(This is a spin-off of #11.)

The manifest defines some sort of a subset of JSON-LD. It may be a good idea to use JSON Schemas to define that subset more formally.

(JSON Schemas is a moving target, so the reference can only be informative, though.)

Should canonicalization include 'absolutization'?

At the moment, step 11. of the manifest canonicalization means all relative URI-s are resolved at this step using base. This may be a problem in relation with the value of base in the case of packaged publications, see w3c/pwpub#45.

Inclusion of certifiedBy, conformsTo and certifiersCredential

We specifically added the optional
accessibility-report in 2.8.1.1 Accessibility Report

I would propose including
accessibility-conformsTo
accessibility-certifiedBy
and
accessibility-certifiersCredential

Since these are metadata properties being added to conformant EPUBs currently in the US. Macmillan Learning for one is publishing EPUBs with this information and being able to include it in a web manifest I would think would be equally important.
Self Certification is also possible and Pearson I believe is already doing this as well.

Trim examples

The bulk of our examples include @context and type declarations, even though this information isn't specifically relevant to the examples.

Let's trim these to just the information readers need to pay attention to.

Should the canonical identifier resolve to a preferred version?

Without the web publications underpinnings, there isn't as strong a case for recommending URLs for the canonical identifier at the manifest level.

Should we consider removing this recommendation and leaving it to implementations to decide whether URLs are preferred?

The possibility to add a `type` to `LinkedResource` or `LocalizableString` should be in the spec

At the moment, we talk about LinkedResource and LocalizableString as separate object type. Per JSON-LD it should be possible to add these explicitly to the objects; this should be reflected in the respective definition (https://w3c.github.io/pub-manifest/#app-linkedResource, respectively https://w3c.github.io/pub-manifest/#dom-localizablestring), and also reflected in the respective WebIDL.

Obviously, both of these are optional, and it is really for JSON-LD geeks only. But it should be possible...

(I do not believe this issue requires WG Discussion; it is editorial only...)

Should the manifest set minimum property requirements?

The web publications implementation of the manifest defined the required and recommended sets of properties. That didn't come across with part 1.

Should we add a section that sets a similar common base of properties for all manifests regardless of implementation?

If so, should we stop recommending properties and only define what is critical and leave it to implementations to recommend from the rest plus add whatever they need?

Update manifest properties vocabulary and namespace?

The custom properties incorporate "WP" into their namespace:

https://www.w3.org/ns/wp#inDirection

Should we also change this to "pub"?

Also, following the URL takes you to the "Web Publication Manifest Vocabulary". We should definitely rename that.

Metadata rendering "hints" from epub we should consider for manifests

As discussed on the March 18th, 2019 WP call, here is a list of rendering related metadata that epub supports and has found some traction in the publishing/reading system community. This is not an exhaustive list, it is intended to contain only those settings that seem to have actual use.

Metadata can be specified at the publication level (applies to the entire publication), the item level (applies to a section of the publication), or both.

page-progression-direction controls the direction (left or right) that pages should turn when implementing next page functionality. Used extensively in Japan, but has traction for other languages. Frequent use and has been implemented multiple times. Critical to support, otherwise some content will be broken. May not be needed for scrolled content, or for all UIs. Publication level.

flow-[auto|paginated|scrolled-continuous|scrolled-doc] indicates whether the content is intended to be paginated or scrolled, and if scrolled whether it is continuous over multiple items. Differs from CSS @page as that describes styling when content is paginated, this specifies whether pagination should occur. Unclear how common this is in practice, though I believe there are some implementations. Both item and publication levels, but unclear if mixed content exists.

layout-[pre-paginated|reflowable] indicates whether an item should be considered a single, high design "page", or whether it is a stream of 1 or more pages. Some overlap with the flow-* properties, above. Widely used and implemented. Both levels, but unclear how common mixed content is.

orientation-[auto|landscape|portrait] hints about the overall aspect ratio of the content. Could be (and is) used to control how a book opens on phones and tablets (auto switches device orientation). Widely used, sometimes correctly. Often coupled with spread-* properties. Both levels.

spread-[auto|both|none|landscape|portrait] indicates when and how synthetic spreads should be generated (that is, when to put pages side-by-side). Widely used and implemented. Both levels.

page-spread-[left|right|center] whether the first (or only) page of an item should appear on the left or right side of (or centered in) the display when showing more than 1 page (that is, in spreads). Widely used and implemented, particularly in pre-paginated content. When missing this can completely break content. Item level.

viewport defines the aspect ratio of pre-paginated content. May also appear in the document content, so may not be needed at the higher level. Both levels.

linear controls whether the item is part of the linear navigation. When true, this is part of the main publication content, when false indicates where it might appear in a printed publication but indicates that it is not part of the main, linear navigation of the publication. Implementations use this as a hint for how and where to display the content. Item level.

Lack of publication type a fatal error?

I noticed that the canonicalization algorithm is expected to terminate if either the context or type are not set: https://w3c.github.io/pub-manifest/#canon-min-req

This makes sense for the context, since it wouldn't be clear that you have a manifest and not something else, but should the lack of a type be fatal or just a warning? It would be easy enough to default to CreativeWork, for example.

TAG review of Web Publications

In response to the TAG review request in w3ctag/design-reviews#344 (which originally came from w3c/wpub#384), I wanted to file an issue here (since you requested the filing of a single issue in your repo) with a pointer back to the feedback so far, which is in that issue.

There's a good bit in that issue (most of which I wrote) -- and I don't want to copy it here because I also think it's not quite done -- there were a few other TAG members who wanted to take a look and will hopefully do so soon. However, I wanted to file this in advance of being "fully done" since you suggested that it would be useful to have the feedback prior to your face-to-face meeting next week.

One high level note would be that reading the use cases document made it sound like you were going to do a bunch of things that seem like they might be scary, but reading the actual specification seemed much less scary. I'm not sure whether it's worth going back to the use cases document and saying how the use cases are addressed -- it might depend on how frequently you intend to point people to the use cases document in the future.

w3c / pub-manifest Goto Github PK

pub-manifest's Introduction

Publication Manifest

Contributing to the Repository

Code of Conduct

pub-manifest's People

Contributors

Stargazers

Watchers

Forkers

pub-manifest's Issues

Self test

Recommend Projects

Recommend Topics

Recommend Org