Code Monkey home page Code Monkey logo

wpub-ann's Introduction

W3C Logo

Web Annotation Extensions for Web Publications

This is the repository of the W3C’s specification on Web Annotation Extensions for Web Publications, developed by the Publishing Working Group. The editors’ draft of the specification can also be read directly.

Contributing to the Repository

Use the standard fork, branch, and pull request workflow to propose changes to the specification. Please make branch names informative—by including the issue or bug number for example.

Editorial changes that improve the readability of the spec or correct spelling or grammatical mistakes are welcome.

Please read CONTRIBUTING.md, about licensing contributions.

wpub-ann's People

Contributors

iherman avatar mattgarrish avatar plehegar avatar tcole3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wpub-ann's Issues

Use Cases for Span Selector - are there any?

Span Selectors can be used to describe selections that span multiple embedded resources - e.g., multiple resources included in a Web Publication. For instance, a Span Selector can be used to select from the last paragraph of Chapter 2, through all of Chapter 3, and into Chapter 4 up until (but not including) the 5th paragraph.

A Multi Selector could be used to specify the same selection, but would do so as an ordered list of paragraphs and chapters, without the benefit of saying the selection was continuous. So, a Multi Selector might do this by listing: the last paragraph of Chapter 2, Chapter 3, paragraph 1 of Chapter 4, paragraph 2 of Chapter 4, paragraph 3 of Chapter 4, and paragraph 4 of Chapter 4.

A Span Selector is therefore more succinct. Are there additional use cases that would benefit from availability of a Span Selector? Is the succinctness worth having an additional type of Selector? See also the earlier discussion of #25 when we were considering the functionality of Span Selector as an extension of Ranger Selector (rather than a separate, new selector type).

[Proposal] Iterate the Web Annotation Data Model itself

The Web Annotation Data Model needs iteration...but I'm not sure the Web Publishing WG is the place to do it...yet.

Given #38 and #40, there seems to be a need to iterate on the Web Annotation Data Model specification itself, and not merely extend it.

Much of what is currently in the publ-loc spec is focused on a Web Publication format that itself is not yet defined thoroughly enough to implement, and defining selection strategies for it and stream-related position systems seems premature.

That said, all these things should be explored (as well as addressing #40).

Proposal

Create a new version of the Web Annotation Data Model and Web Annotation Vocabulary specifications which would:

  • address #40
  • work on "contained resource" selection/locating/targeting
  • work on "cross-resource" selection/locating/targeting
  • address any Web Publication specific annotation needs (as WP and PWP become more defined with clearer needs and use cases)

The place for this work to happen at the moment is the Open Annotation Community Group. Obviously, that group is not chartered to publish a Technical Recommendation. So, here are two options for providing that service:

  1. the Web Publishing Working Group (with its notable overlap in membership with the OA CG) upstreams the work done in the OA CG's document(s) and publishes the next iteration of Web Annotation Data Model and Vocabulary as Technical Recommendations under its banner/charter
  2. (alternatively) a new Web Annotation Working Group is chartered (born out of publishing needs...again) and this new Working Group does the document iterations, explores the needed features with the Publishing WG (and other WGs), and publishes any Technical Recommendations.

Option 1 seems most likely to be effective and achievable given the overhead of chartering new working groups within the W3C. Potentially, both can be done with Option 1 happening now by simply beginning the conversations of iteration at the Open Annotation CG. Option 2 (should it be possible) could follow later and pick up from the work of the OA CG (just as the original Web Annotation WG did).

Thoughts?

Is there a need for multi-resource selectors?

The definition is complex, and we have to be sure that the selector really has reasonable use cases. Selection of 2 consecutive resources is covered by the usage of the Embedded Resource Selector (combined with refinement).

(Editorial) Simplifying the model tables

Each model table contains, at the beginning, a line for the type relationship and then a specification of the Class. With reference to RDF that made sense in the WA model, but this document (while referring to the WA document) does not refer to RDF at all. In this respect, the second line of these tables, and the concept of a "Class", seems to be superfluous, and makes things look more complicated than they are.

I propose to remove those lines overall, and remove the concept of a Class. The first line of each table already specifies what the value of type must be.

Is Embedded Resource Selector sometimes ambiguous?

E.g. a document contains the same image twice. Just saying {"type": "EmbeddedResourceSelector", "value": "example.org/image"} may thus yield two results. Is there a fix for this or a refinement that would clarify in cases where it mattered.

Created as new issue from @Treora comment on issue #24.

Use cases for side bias - are there any?

From Section 3.1.9 (Side-Bias) of EPub 3.1 CFI:
"In some situations, it is important to preserve which side of a location a reference points to. For example, when resolving a location in a dynamically paginated environment, it would make a difference if a location is attached to the content before or after it (e.g., to determine whether to display the verso or recto side at a page break)."

No one has come forward with a use case justifying this feature. If no use case is available in a timely fashion (before the end of November 2017), the feature will be removed from FPWD. Can always be added back if a use case subsequently emerges.

This issue was split off from issue #9.

frag ids - whither scope

In the Web Anno data model, 'scope' and 'source' are properties of the SpecificResource or what we are calling the Locator. They are not properties of selectors or states (or positions). But when translated to a fragment, we include scope within the selector(...), state(...) and position(...). There are good reasons for this (e.g., it is more succinct than some options and may be the more intuitive), but are there unintended negative consequences of doing this? For instance consider Example 3:

JSON: 
{
	"scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
	"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
	"selector": {
		"type": "CssSelector",
		"value": "#elemid > .elemclass + p"
	}
}

The equivalent frag id we have (Example 24, with line breaks for readability) looks like this (option 1):

https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html#selector(
	type=CssSelector,
	scope=https://dauwhe.github.io/html-first/MobyDick.wpub,
	value=%23elemid%20>%20.elemclass%20+%20p
)

There are other options, but perhaps less intuitive / more difficult to explain? e.g. (option 2):

https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html#
        scope=https://dauwhe.github.io/html-first/MobyDick.wpub&
        selector(
	type=CssSelector,
	value=%23elemid%20>%20.elemclass%20+%20p
)

If we decide we need frag ids and scope (issues #6 and #10 ), we need to make sure we have consensus on this issue. One advantage of the Option 2 is that it avoids temptation to add scope to a selector, state or position that is used to refine. I think having scope on a refinement but not on the top-level (first in hierarchy) selector, state or position would cause problems.

Do we need to specify the 'result' of selections?

What the Locator draft defines is a way to express various types of locations. However, do we need to formally define a data structure (using, e.g., IDL) that should be return when acting on those locations?

Use cases for Position (not satisfied by Selector) - are there any?

From Section 3.1.4 (Character Offset) of EPub 3.1 CFI:
"For XML character data, the offset is zero-based and always refers to a position between characters, so 0 means before the first character and a number equal to the total UTF-16 length means after the last character. A character offset value greater than the UTF-16 length of the available text must not be specified."
And from Section 3.1.9 (Side-Bias) of EPub 3.1 CFI:
"In some situations, it is important to preserve which side of a location a reference points to. For example, when resolving a location in a dynamically paginated environment, it would make a difference if a location is attached to the content before or after it (e.g., to determine whether to display the verso or recto side at a page break)."

Assuming these are real feature requirements, I don't think we have anything precisely equivalent in Web Anno data model. Putting aside for a moment whether something called a fragment identifier can be used to specify a location, how might we be able to address a need for these functionalities?

Regarding the first bit, I do note that in Web Anno we do not specify a meaning for a TextPositionSelector or DataPositionSelector having the same value for both start and end. We do talk about "Position 0 would be immediately before the first character[/byte]". So in this doc could we specify an interpretation that if the document was "abcdefghijklmnopqrstuvwxyz", the start was 4, and the end was 4, we are specifying the location immediately before the character 'e'? For completeness should we specify what to do if end (or start) is greater than the length of the normalized text?

Regarding the second bit, side-bias, I have no idea other than to suggest that this is not something a locator or fragment identifier should have to worry about - it's something the consumer of the locator should be responsible for.

Use Cases for Span Selectors and Multi Resource Selectors (Do we need both)

What are the use cases for selections that span multiple resources contained within a single source resource, e.g., selection spans parts of chapter2.html and chapter3.html within the same Web Publication? For these use cases does it matter if the selection is discontinuous or continuous (in some reading order of the Web Publication)?

RDF and JSON-LD

The current draft states: "All references to RDF and JSON-LD have been removed." That is said below the headline "Editorial Changes".

Please re-add support for RDF and JSON-LD.

rename the document

Proposals include Web Annotation for Web Publications, An Extension to Web Annotations, and other things involving the word "Annotations". Please submit recommendations for names by Wednesday 13 December.

Frag ids - source vs. scope

There is a natural inclination to consider the more granular resource as the source and the containing resource (say a Web Pub) the scope. Thus:

JSON: 
{
	"scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
	"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
	"selector": {
		"type": "TextQuoteSelector",
		"exact": "Call me Ishmael"
	}
}

But might this also be correct?

JSON: 
{
	"source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
	"scope": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
	"selector": {
		"type": "TextQuoteSelector",
		"exact": "Call me Ishmael"
	}
}

This selector is saying that somewhere in the Web Pub, you'll find the phrase "Call me Ishmael", which arguably seems correct (intellectually) although it may require you to read a bunch of Web Publication Resources to get there in the end. Or, if we agree the second serialization is incorrect, then we may want to say so, especially if we use fragment ids (issue #6) to avoid the possibility of someone wanting to do the following:

https://dauwhe.github.io/html-first/MobyDick.wpub#selector(
	type=TextQuoteSelector,
	scope=hhttps://dauwhe.github.io/html-first/MobyDickNav/html/c001.html,
        exact=Call me Ishmael
)

The temptation of course is that this url makes it particularly easy to collate locators by Web Publication identifier rather than component file URL. Of course another option is to interpret the base url of the fragment as the scope and require the inclusion of an explicit source within the fragment. for example:

https://dauwhe.github.io/html-first/MobyDick.wpub#selector(
	type=TextQuoteSelector,
	source=hhttps://dauwhe.github.io/html-first/MobyDickNav/html/c001.html,
        exact=Call me Ishmael
)

Either of the last 2 approaches means that the user agent will need to get the resource identified within the locator to actually resolve the locator. Retrieving the base url alone is not sufficient.

Do we need fragment ids?

The document consists of two parts: a description of, essentially, the Selector Model as defined by the Web Annotation Data Model, and a reformulation of that data model in the form of Fragment ID-s. It is not clear, at this moment, whether the standardization of fragment identifiers is necessary, or whether the JSON based structure fulfills the needs of the requirements. If the latter, we can remove the relevant section, and the only possible normative extension in the document is described in issue #4.

Semantics for range selections for WPUB

There has to be a formal definition how a WPUB User Agent should interpret a range selector. To avoid editorial and procedural difficulties, that formal specification should be part of the WPUB document.

Do we need the intermediate selectors for SpanSelector?

The current design requires an explicit list of selectors for the "intermediate" resources. This is to avoid making the selection dependent on an implicit reading order for a Web Publication. Is that the right choice? It would indeed simplify to rely on implicit order, but there are quite some discussions in the WG whether that is a viable assumption...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.