Code Monkey home page Code Monkey logo

audiobooks's Introduction

W3C Logo

Audiobooks profile for Web Publications

This is the repository of the W3C’s specification on Audiobooks profile of Publication Manifest, developed by the Publishing Working Group. The editors’ draft of the specification can also be read directly.

Contributing to the Repository

Use the standard fork, branch, and pull request workflow to propose changes to the specification. Please make branch names informative—by including the issue or bug number for example.

Editorial changes that improve the readability of the spec or correct spelling or grammatical mistakes are welcome.

Please read CONTRIBUTING.md, about licensing contributions.

Issues against this specification must be raised in this repository.

Code of Conduct

W3C functions under a code of conduct.

audiobooks's People

Contributors

bigbluehat avatar garthconboy avatar iherman avatar llemeurfr avatar marisademeglio avatar mattgarrish avatar naglis avatar wareid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audiobooks's Issues

Add primary entry page to resources when not a publication resource

The primary entry page is a required publication resource, but right now all the publication manifest algorithm does is warn if it is not present in either the reading order or resource list.

Audiobooks could add a step to automatically add the document to the resource list if it's not there to avoid the warning, similar to how pub manifest adds the document to the reading order as a last resort (fyi, it can't for audiobooks, as audiobooks fail earlier if there isn't a reading order).

Or we could just punt on this as an idea for the future if it actually proves to be a real pain to have to list.

conformance section missing

Need an empty section with id="conformance" in the introduction to get the boilerplate about rfc keywords, etc.

Proposed edits to section 5.11 Accessibility

Right now, section 5.11 makes references to the work of the Sync Media for Publications CG, and acknowledges that work has not yet been stabilized. The references include specific details such as mime type, file extension, and spec name.

I would like to propose that we avoid referencing specifics and instead use this replacement text for Section 5.11:

"
5.11 Accessibility

This section is non-normative.

The history of the audiobook is rooted in the world of accessibility. To make publications fully accessible, content creators should refer to the Synchronized Multimedia for Publications Community Group regarding creating synchronized accessible content and incorporating it into an Audiobook.

Alternatively, a content creator can provide the text equivalent as HTML [html] resources in the resources.
"

Furthermore, I would delete the Editor's Note as well as Example 13 and leave Example 14.

Audiobooks - Add note regarding ZIP compression method 'store' for media files

The ZIP format allows individual files to specify compression algorithm. The default options are Store and Deflate. Store leaves the file uncompressed while Deflate compresses the file.
Many ZIP libraries will compress media files such as audio and video using the Deflate algorithm. When compressed, the Deflate algorithm will affect the media files in a way that makes "seeking" into them directly from the archive, without first extracting them, impossible. The result is that you always need to start from the beginning of the stream when reading from the file, even if you want to start playback 10 seconds from the end.

I would like to suggest that we add a section to the Audiobooks spec regarding compression that recommends, or even requires media files to be 'stored' rather than 'deflated'. This would be a highly logical addition to the format since it should be optimized for streaming I think.

The drawbacks are few. As far as file size is concerned it makes little difference as most media formats already are compressed (mp4, mp3, etc.)

remove para about authored/canonical manifest

This para after the requirements refers to the old concepts:

These properties do not all have to be serialized in the authored manifest. Refer to each property's definition to determine whether it is required in the manifest or can be compiled into the canonical manifest from other information.

See the note we added instead here: https://w3c.github.io/pub-manifest/#manifest-requirements

I'd avoid having requirements based on the internal representation. It'll just confuse people about what they're supposed to author.

It doesn't seem like title should be required, for example, since there are alternative ways of obtaining it. That might need to be lowered to recommended unless you really don't want anyone using the fallbacks?

Use of "alternate" to reference an item in the resources list

I have a question about the use of alternate - in this thread, @mattgarrish mentioned avoiding IDREF-style constructs.

So we defined alternate as a (list of) URL(s) or LinkedResource instance(s).

In the audiobooks spec’s sync media example, I see that alternate references a URL which can be resolved using the resources list.

I don't see that pub-manifest prohibits this but I also don't see any examples of the resource list being referenced by the reading order in this way.

So my question is if we should rewrite this example to show alternate directly containing a LinkedResource object.

fragment identifiers

It might be necessary to say something about support for fragment identifiers - are they unsupported and stripped for audiobooks, can a user agent interpret them as it chooses, is it required to support some or all of media fragments, etc.

extend processing algorithm

We need to extend the processing algorithm for the new requirements (more required/recommended properties, warn if duration/length not set, etc.)

I can take a look at this if you want.

intro/scope merged?

The paragraphs starting the introduction look like a scope.

At any rate, it would be better to group them under some heading than have them leading into the tangential sections on terminology and conformance.

flatten section 2

It's not really necessary to have all the sections of the specification after the intro under a section called "specification".

I'd recommend just dropping this wrapper section and let each major section within it stand on its own.

Alalso mention alternate text equivalent along with sync narration

It is really good to see that editor's draft presents sync narration as the accessible alternate of audio files.
Along with sync narration / sync media, the text equivalent like text in HTML file should also be mentioned.
Sync media is a great way to ensure accessibility to people with different kind of needs. At the same time it takes human labor to synchronize text with equivalent audio. There are some automated solutions available for English and other popular European language but not for other languages.
So, it may be more economical to generate simple text in well structured HTML.
Therefore it would be good to mention it along with sync narration.

Mixed Media (Audio, Video, HTML)

What are the considerations for alternative media formats (in addition to Audio).

Blackstone has 3 use-cases:

  1. Audio Only
  2. HTML/Text only
  3. Mixed Audio and Text

I would speculate that 'audio only' could be replaced with 'media only', providing support for audio and video. Such 'media only' publications ( a podcast, news segment, TV clip, etc) might be bundled with a small blurb that the content provider maintains separately to the video (hence not embedding it in the video container), and one could speculate that they would also have a defined playback sequence/order

check pub duration before resources

I noticed just now in step 5 of the processing extensions that the total duration is checked after compiling all the individual durations, but this seems too late since there's no point in calculating the individual durations if you have nothing to compare against.

I would restructure the first list of substeps to check the total duration like this:

1. If data["duration"] is set and is a valid duration value per [ISO8601]:
   a. let ... // same step
   b. for each ... // same step
   c. if resourceDuration does not specify the same total duration as data["duration"], validation error.
2. Otherwise, validation error.

cover is a relation

I noticed its listed as a required property in 5.5.1, but it's technically a rel value of a linkedresource.

With pub manifest, the requirements are a subsection of the manifest, so both properties and relations can be covered. We didn't end up listing any required/recommended relationships, but if you move the section, instead of splitting cover out you could just reword the intro paragraph to say "properties and resource relations".

duration check in the reading order item

The current processing step 4.1 of the validation test a validation error is raised if the duration is not set for a reading order item. This does make sense, but the core text should say (e.g., in §5.5) that the item SHOULD include a duration value. Currently, it is not there I believe...

Rules on finding the ToC are not precise enough

Between the lines the spec suggests that finding the ToC follows these steps:

  1. If the PEP is used (see #61 on this, b.t.w.) and the PEP does include a ToC (i.e., an element with role=doc-toc) then take this
  2. Otherwise, use the manifest to locate, as part of one of the Linked Resources, a reference to the ToC (following Pub Manifest §4.8.1.3 and then Pub Manifest §C.3).

However, (1) is not clearly specified in the document. The document(s) today would require to add a linked resource into the manifest pointing at the PEP...

I believe that:

  • just as we do with the manifest processing in Audiobook §6, there should be a separate section on how this profile extends the general text
  • again just as in the manifest processing part there should be an 'extension point' defined in Pub Manifest §C.3 for profiles in general.

Define how UA should deal with erroneous audiobooks

The question came during TPAC 2019.

An audiobook MUST contain audio files only inside its reading order.
But if a UA fetches an audiobook which has a non-audio file in its reading order, should it reject the audiobook or should it skip the faulty file?

More generically, should a UA try to support erroneous audiobooks as much as possible or strictly reject them?

context requirement and missing language

Section 2.2 is repeating the "must" from pub manifest. Could this section be done more informatively? For example:

An Audiobook manifest has to start by setting the JSON-LD context [[!pub-manifest]]:

Example 1: The context declaration.

{ 
  "@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
	…
}

To add the global language and direction of the manifest metadata, language and direction declaration [[!pub-manifest]] can also be added to the context:
...

Double check on duration

In data validation step 4.1. issues a validation error if duration is not set. However, that check already happened in step 2.

Create a vocabulary of rel attributes for extra resources

There is already a rel="cover" value (https://www.w3.org/TR/wpub/#cover) defined in the spec for referencing a cover from the list of resources. Other values already defined in the WP model are "pagelist" and "contents" (the ToC).

Audiobooks may have useful extra-resources, like a "booklet".

Which are the most useful rel values that should be defined by this group?
Should it be part of the model (i.e. standardized) or expressed as best practices?

only audio in the reading order?

The resource list says that supplemental content must be listed in the resource list, but we don't forbid resources from being listed in both.

It would probably be good to clarify that both cases are true (i.e., must in resources; must not in reading order).

Am I right in assuming that only audio content is allowed in the reading order? The reading order definition mentions it's a progression through the audio resources but doesn't say if it's a must. If so, that might be useful to note in the reading order. We'd probably need to check this during validation of the manifest, although it seems tricky.

Use of "profile" in title

The title probably needs some tweaking, as a digital publication format is being defined that implements a profile of the publication manifest.

As it is, the title sounds like we're just implementing a theoretical manifest that could be used for audiobooks.

Any reason not to just call the spec "Audiobooks"?

Similarly, the first para of the introduction says:

This specification is a profile of the Publication Manifest specification. It will describe the requirements to create an Audiobook.

I'd probably rephrase this to place emphasis on the audiobooks part:

This specification describes the requirements for the creation of Audiobooks, using a profile of the Publication Manifest [[pub-manifest]].

Temporal Media Fragments URI in readingOrder, should be time interval instead of playback start timestamp?

https://www.w3.org/TR/media-frags/#naming-time

If I am not mistaken, a temporal Media Fragments URI such as audio.wav#t=123.45 means "start playback at the given timestamp, and play until the end of the resource is reached".

This in fact corresponds to the TOC processing model ( https://w3c.github.io/audiobooks/#toc-mediafragments ), for example:

https://github.com/w3c/wpub/blob/948442a71610abcc757513dc3313f6ed0e8fd22f/experiments/audiobook/toc-as-json.json#L32

https://github.com/w3c/wpub/blob/948442a71610abcc757513dc3313f6ed0e8fd22f/experiments/audiobook/toc.html#L7

But in the context of readingOrder, shouldn't time intervals be used instead of discrete timestamps, in order to provide both start and end playback boundaries? Otherwise, the playback sequence is not as expected, see for example:

https://w3c.github.io/audiobooks/#example-9-audiobook-reading-order-for-multiple-resources-using-media-fragments

https://w3c.github.io/audiobooks/#example-13-audiobook-with-synchronized-narration

TAG review of Audiobooks

Hi,

As requested, the TAG took a look at the Audiobooks explainer & discussed it during our telcon this week. We haven't completed our review but we do have a few pieces of preliminary feedback we'd like to share. We'll edit this issue with additional feedback as our review continues.

How It Works

Given that the bulk of the data in an audiobook consists of (already compressed) media files, and that HTTP responses are themselves often compressed, is zipping up the audiobook really necessary? If the purpose of zipping up the contents of the audiobook is solely bundling, and not compression, did you consider using Bundled HTTP Exchanges instead? Or tar even?

We'll comment on the choice to create a new manifest format on w3ctag/design-reviews#344 as it applies to Web Publications as a whole and not just Audiobooks. See also w3ctag/design-principles#95 and w3c/wpub#32 (comment)

Considered Alternatives

In this section, you list several alternative designs that were considered but ultimately rejected. This section lists what alternatives were considered, but not why they were rejected. We'd really like to see some text explaining the rationale for rejecting each alternative. We also thought of a couple additional alternatives that you may or may not have already considered, which we'd like to see your thoughts on:

  1. Are your key use cases adequately covered by any existing media specifications, such as RFC 8216?
  2. It's not a priori obvious what the difference is between an audiobook and a podcast. (Perhaps it's that you don't know beforehand how many podcast episodes there will be, but Dickens likely didn't know how many installments The Pickwick Papers would end up having either.) Each consists of "audio files, a cover image[,] supplemental material, and some metadata." Did you consider using RSS or Atom instead of a custom JSON manifest format? Does your audiobook format gracefully handle serialized publications when not all installments have been published?

Each of these alternatives are "widely used and implemented technology" that appear (to us, at first blush) to "cover[… your] particular use cases," so you should "consider specifying that technology in preference to inventing something new for the same purpose." (Do not Reinvent the Wheel, one of the HTML Design Principles)

should duration match lengths?

Is there another "should" needed that the duration of the publication should match the sum of all the resources in the reading order?

Or is the duration possibly the sum of other resources, as well?

Audiobook duration, examples

In the audio-duration section of the Audiobook profile, we find Example 5 "Duration of an Audiobook in Hours" and Example 6 = "Duration of an Audiobook in Seconds".

But the duration value in Example 5 manifest sample does not reflect the caption and the Example 6 highlights the use of decimal values.

Proposal:
Example 5 will contain e.g. "duration" : "PT1H23M10S" (caption being "... in hours, minutes and seconds")
Example 6 will contain e.g. "duration" : "PT3633.52S" (caption being "... in seconds with decimal value")

Also, ISO 8601 1/ being complex and 2/ not being free, we should give more precisions on what the expected format can be (possible use of decimal values, non required use of the Y,M,D parts). Alternatively, we can point to Wikipedia, as done in schema.org/Duration and in the WP spec. but this will be much less precise for implementers.

Note: did you know that, from Wikipedia, "PT20,50s" (with a comma) is ok? wow.

address is canonical identifier?

This sentence could probably use some clarifying:

The address of the primary entry page is also the canonical identifier (i.e., it serves as its unique identifier) for the audiobook when present.

Address here points to the address property in pub manifest, but wouldn't this be the actual location of the entry page (address is supposed to be other addresses that get you to the publication/pep).

But what if a canonical identifier is already specified in the manifest, is this sentence saying it has to be overridden when the pep has an address (when doesn't it?), or only added if an identifier isn't present? In either case, this would require extending the processing algorithm to add this info from the html since we don't harvest the url.

I'm not sure exactly how best to reformulate this, but a quick idea might be to say:

When a canonical identifier is not specified in the manifest, user agents MUST use the address of the primary entry page (i.e., the address serves as the unique identifier for the audiobook).

I just wonder about harvesting when the audiobook isn't web hosted (i.e., in a package). You get an absolute url resolved from some potentially inconsistent base.

bounds of an audiobook

The concept of bounds appears to have been removed from the pub manifest spec, but it's the first thing in the audiobook spec. It's also defined in passing in the packaging spec.

How might we better explain why this concept is here? I naively think it might describe those resources to which the metadata applies. Does this conflict with how JSON-LD and schema work?

required links?

The standard refers to link as one of the recommended terms which is also tested when the manifest is checked. Why is that? Isn't the reading order and the resource list enough?

Integrate audiobooks work with MediaSession work at WICG

MediaSession

This specification aims to enable web pages to specify the media metadata to be displayed in platform UI, and respond to media controls which may come from platform UI or [hardware] media keys, thereby improving the user experience.

Once implemented the JS "glue" provided by this specification would provide the best foundation for building reading/playing systems for both audiobooks and mixed content publications which could be navigated via hardware buttons or OS/system provided UX.

Actions here include:

  • monitoring this specifications progress
  • working for cognitive parity between that spec and ours (see cover vs. artwork)
  • confirming use case coverage (i.e. anything else we'd need from this spec?)
  • explore (with their help perhaps) the creation of mix-media publications (see adFrame example)

Also be sure to checkout the MediaSession Explainer for the use cases and scenarios they're currently targeting.

Origin mention in the Privacy and Security section

In an audiobook manifest, all remote resources should share the same origin as the manifest file where possible.
https://www.w3.org/TR/audiobooks/#security-privacy

Not sure we should bring up "origin" without defining it, and not sure we should bring it up unless we plan to describe how (or if/when) UA's would be required to implement the Same-origin Policy--which is currently restricted to how desktop/mobile user-focused browsers work, and not often used as a constraint when dealing with APIs (which lean on other mechanisms of restriction).

proper conforms to URL

If we take that to be the URL of the document, then it should be https://www.w3.org/TR/audiobooks/. Note the / at the end; it is missing in some of the examples (including the one listed in the manifest document).

should absence of type be audiobook?

For the generic manifest, if type isn't specified it gets set to CreativeWork. That'll apply here, too.

If you'd rather default to Audiobook, we can do that but it probably means moving the extension steps higher up in the pub manifest algorithm so the profile defaults get priority.

/cc @iherman

Deep Linking for Packaged Audiobooks

Thanks to Lars for bringing this up.

Do we have a plan for deep linking into items in the audiobooks reading order that are packaged in LPF?

This has implications for annotations and bookmarking.

toc links

In the toc section, it says that when there is supplemental content:

the table of contents SHOULD include a link to all resources

with a pointer to the resource list, but this seems too general. If that is the case, wouldn't it have to link to any css, images, scripts, etc. to avoid warnings?

Is the intention to link to all resources in the reading order and the supplementary resources?

Typo in example 3

In this example, there is a typo in the value for conformsTo. Change https://www.w3/org/TR/audiobooks/ to https://www.w3.org/TR/audiobooks/

toc and fragments

This statement is out of date now:

MUST be the first element in the document — in document tree order [dom] — with that role value.

Since we allow fragment identifiers now, you can reference the toc directly. Placement in the document only matters if you don't specify a fragment or if it's in the PEP.

Maybe:

The table of contents is expressed via an [html] element (typically a nav element) in one of the resources. This element MUST be identified by the role attribute [html] value "doc-toc" [dpub-aria-1.0].

If the table of contents is located in the primary entry page, the table of contents MUST be the first element in the document — in document tree order [dom] — with that role value. Otherwise, the manifest SHOULD identify the resource that contains the structure.

A standard rel value for the primary entry page would be welcome

The audiobook spec states (4.1) that the primary entry page, if present, should be included as a resource (if not one must find it in the default reading order).

Even if the manifest processing model makes no use of this link, it would be good to define a standard rel value for this link, let's say 'start' or 'entry'.

If we don't define it, and if the PEP and the ToC are in different resources, a processor treating a manifest could consider it as a supplemental resource for an audiobook -> display it to the user in a list of supplement content where it does not belong.

Note: If the PEP and ToC are in the same resource, a rel=contents will be sufficient for the processor, which will use this information for fetching the ToC.

Requirements when distributing audiobooks

Context

I’ve recently started a new position as Director of R&D at De Marque, an aggregator and digital distributor connected to major ebook and audiobook retailers.

Since we’re receiving audiobooks from major publishers in Canada, France, Spain and Italy, the audiobook spec is very relevant for us as it might become the standard format that we request from them.

Given the lack of a standard format for distributing audiobooks, it felt relevant to explore what various retailers expect to receive when we deliver them an audiobook and figure out what might be different and/or missing for the proposed spec.

Documentation

Manifest Examples

Apple (XML)
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://apple.com/itunes/importer" version="music5.3">
    <language>en</language>
    <provider>AppleseedBooks</provider>
    <album>
        <album_type>audiobook</album_type>
        <vendor_id>9781106701657</vendor_id>
        <title>Dracula (Unabridged)</title>
        <original_release_date>2009-01-05</original_release_date>
        <label_name>Apple Publishing Group</label_name>
        <genres>
            <genre code="CLASSICS-00"/>
        </genres>
        <copyright_pline>2008 Apple Publishing Group</copyright_pline>
        <copyright_cline>2008 Apple Publishing Group</copyright_cline>
        <artwork_files>
            <file>
                <file_name>cover.jpg</file_name>
                <size>56723</size>
                <checksum type="md5">58a9947e2e5de47bc3039092964ad3a3</checksum>
            </file>
        </artwork_files>
        <description>Dracula is the seminal gothic horror novel of its time as Bram Stoker introduced the world to the legendary vampire Count Dracula. Published in 1897 and told through a series of diary entries and letters, the story journeys into the dark world of Count Dracula through the eyes of several different narrators. The novel explores many themes, the role of women in Victorian culture, conventional and conservative sexuality, immigration, colonialism, post colonialism and folklore. Irish author Abraham "Bram" Stoker (1847 - 1912) was a writer of novels and short stories. He was also the personal assistant of the actor Henry Irving and the business manager of the Lyceum Theatre in London, which Irving owned.</description>
        <products>
            <product>
                <territory>AU</territory>
                <wholesale_price_tier>3</wholesale_price_tier>
                <sales_start_date>2009-01-05</sales_start_date>
                <cleared_for_sale>true</cleared_for_sale>
            </product>
            <product>
                <territory>GB</territory>
                <wholesale_price_tier>3</wholesale_price_tier>
                <sales_start_date>2009-01-05</sales_start_date>
                <cleared_for_sale>true</cleared_for_sale>
            </product>
        </products>
        <artists>
            <artist>
                <artist_name>Bram Stoker</artist_name>
                <apple_id>2683478</apple_id>
                <roles>
                    <role>Author</role>
                </roles>
                <primary>true</primary>
            </artist>
            <artist>
                <artist_name>Christopher Saul</artist_name>
                <apple_id>301336965</apple_id>
                <roles>
                    <role>Narrator</role>
                </roles>
                <primary>false</primary>
            </artist>
        </artists>
        <tracks>
            <track>
                <type>audiobook</type>
                <vendor_id>9781106701657_1</vendor_id>
                <title>Dracula Track 1 (Unabridged)</title>
                <label_name>Apple Publishing Group</label_name>
                <explicit_content>none</explicit_content>
                <track_number>1</track_number>
                <audio_file>
                    <file_name>9781106701657_1.wav</file_name>
                    <size>172149800</size>
                    <checksum type="md5">2e669877c1913f59c6686a86b4d84d1d</checksum>
                </audio_file>
                <audio_language>en</audio_language>
                <preview_start_index>240</preview_start_index>
                <artists>
                    <artist>
                        <artist_name>Bram Stoker</artist_name>
                        <apple_id>2683478</apple_id>
                        <roles>
                            <role>Author</role>
                        </roles>
                        <primary>true</primary>
                    </artist>
                    <artist>
                        <artist_name>Christopher Saul</artist_name>
                        <apple_id>301336965</apple_id>
                        <roles>
                            <role>Narrator</role>
                        </roles>
                        <primary>false</primary>
                    </artist>
                </artists>
                <chapters>
                    <chapter>
                        <chapter_start_time>00:00:00.000</chapter_start_time>
                        <chapter_title>Chapter 1 - Jonathan Harker’s Journal</chapter_title>
                    </chapter>
                    <chapter>
                        <chapter_start_time>02:00:08.567</chapter_start_time>
                        <chapter_title>Chapter 2 - Jonathan Harker’s Journal Continued</chapter_title>
                    </chapter>
                    <chapter>
                        <chapter_start_time>03:59:40.321</chapter_start_time>
                        <chapter_title>Chapter 3 - Jonathan Harker’s Journal Continued</chapter_title>
                    </chapter>
                    <!-- additional chapters here as needed -->
                </chapters>
            </track>
            <!-- additional tracks here as needed -->
        </tracks>
    </album>
</package>
Kobo (JSON)
{
  "manifest_version": 1,
  "file_list": [
    {
      "duration": 15, 
      "media_type": "audio/mpeg", 
      "file_name": "01-somefilename.mp3", 
      "file_order_id": 0
    }, 
    {
      "duration": 60, 
      "media_type": "audio/mpeg", 
      "file_name": "02-anotherfilename.mp3", 
      "file_order_id": 1
    }, 
    {
      "duration": 200, 
      "media_type": "audio/mpeg", 
      "file_name": "doesnt need-to-be-in-filename-order.mp3", 
      "file_order_id": 2
    }, 
    {
      "duration": 30, 
      "media_type": "audio/mpeg", 
      "file_name": "lastchapter.mp3", 
      "file_order_id": 3
    }
  ], 
  "table_of_contents": [
    {
      "title": "Introduction", 
      "file_order_id": 0, 
      "offset": 0
    }, 
    {
      "title": "1. We hear you", 
      "file_order_id": 1, 
      "offset": 0
    }, 
    {
      "title": "2. Another chapter", 
      "file_order_id": 2, 
      "offset": 0
    }, 
    {
      "title": "3. The End", 
      "file_order_id": 3, 
      "offset": 0
    }
  ]
}

Notes

  • A number of retailers do not have the concept of a manifest and rely on naming conventions or alpha order in a ZIP/folder instead.
  • For those retailers, the TOC is tied to how an audiobook is broken down into various audio resources (for example Audible requires that: "Each file must contain only one chapter or section").
  • Apple requires content producers to concatenate audio resources to create a single file, it only allows multiple files if the audiobook is longer than 23 hours.
  • On the other end of the spectrum, Audible requires each audio resource to be no longer than 120 minutes, while Kobo requires them to be 200 Mb or less.
  • Apple and Kobo support an explicit TOC, defined directly in their manifest with a flat structure where each entry in the TOC is tied to a resource (indirectly through ID/IDref for Kobo).
  • Across retailers, there seems to be a preference for CBR (Constant Bit Rate) over VBR (Variable Bit Rate) for MP3 and M4A/AAC.
  • Indicating whether an audiobook is unabridged/abridged seems to be an important metadata.
  • Same thing for explicit content.
  • Apple can support track/resource level metadata to indicate contributors while other retailers seems to pull this information from files themselves.
  • Supplemental materials are explicitly supported for a number of retailers, including Apple (which can support booklets per track/resource)
  • Covers are expected to be square (Apple) or will be converted to a square (Google or Kobo).
  • Samples can either be provided separately (Audible or Kobo) or sample-specific metadata are available (length in % or minutes for Google, timestamp where the sample begins for Apple)

Closing Remarks

  • This issue is most likely incomplete and/or partially incorrect, feel free to chime in (cc @wareid, @GarthConboy, @geoffjukes and others).
  • I wish we had started our work on an interchange format with such a document, IMO this is very helpful to fully understand the situation in the market.
  • We should probably add support for indicating if an audiobook is abridged/unabridged or if it contains explicit material directly in the specification instead of a best practice document.
  • Our current support for the TOC (HTML with a nested structure) won't work with any of the retailers listed above. I don't know if we want to re-open that box, but a flat structure in JSON would be better aligned with what's currently requested.
  • Resource-level metadata seems to be useful, if not mandatory. This might be something worth exploring in a best practice document.
  • Requirements for how an audiobook is broken into multiple audio resources are all over the place, it's impossible to create something that works for everyone currently.
  • There's a lack of consistency regarding support for samples and supplemental material, we would probably need to explore things a bit more before anything can be done about them.

The fragile position of the Primary Entry Page

Now that the Web Publication spec is "on the side", we have 3 specifications in scope: Publication Manifest, Audiobooks and Lightweight Packaging Format.

Publication Manifest says nothing the concept of Primary Entry Page, which is logical.

LPF introduces the notion of Primary Entry Page and exposes a processing model for extracting the Manifest from the PEP if included. LPF will be a Note, and its use will not be mandatory for distributing Audiobooks, so we also have to write very clear information about the PEP in the Audiobooks spec.

Audiobooks introduces the notion of Primary Entry Page in a specific section but does not specify that it may embed the Manifest. There is only an illustration of that possibility in https://w3c.github.io/audiobooks/#example-16.

The wording of the spec can give the impression that the presence of a PEP is mandatory for audiobooks. The only thing that contradicts this impression is "The primary entry page should instead, if present, ...". The rule is much clearer in the ToC section ("a table of contents SHOULD be included;").

length not duration at resource level

In 4.4.3 it says:

Duration SHOULD be expressed for the entirety of the audiobook as part of the manifest, and SHOULD be present at the item level in the default reading order.

LinkedResource defines a length property, though, not duration.

I still find this situation weird, especially since the two properties also differ in values. No way we can standardize?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.