wicg / content-index Goto Github PK
View Code? Open in Web Editor NEWExplainer and spec for the Content Indexing proposal
Home Page: https://wicg.github.io/content-index/spec/
License: Other
Explainer and spec for the Content Indexing proposal
Home Page: https://wicg.github.io/content-index/spec/
License: Other
There's some overlap between the Media Feeds API and Content Indexing.
The APIs seem to be solving different problems though. For one thing Content Indexing is more geared towards offline pages where the type can be hinted to the browsers, whereas Media Feeds is targeting video media with a more rigid type breakdown that varies from type-to-type.
I looked a bit into potentially merging the APIs, but there doesn't seem to be a nice way of doing that so that both the APIs' goals can be achieved. It also seems to me that this is a bit like CacheStorage and IndexedDb, which are somewhat similar, but both can simultaneously co-exist as they are different tools for different nails.
I'd like to get some further thoughts from people involved, since this is something that will likely come up in the standardization track.
@beccahughes - who's working on the Media Feeds API
@jeffposnick - who's aware about both APIs
What's the point of the id
field? Why isn't (the absolutised value of) launchUrl
a primary key? If it's not and I'm misunderstanding something, maybe an example should be shown where there are non-unique launchUrl
values.
What's the relation between this spec and Web Manifests?
There seems to be an awful lot of overlap, with the major differences being:
Web Manifests collect many resources together (e.g., an entire newspaper website), while Content Index applies to individual resources (e.g., single newspaper articles).
Web Manifests are discovered declaratively, while Content Index is used via scripting. However, both still need scripting to actually install the service worker for offline use. This allows Content Index to be kept in sync with the state of the offline cache, so that only items that are actually available (and useful) offline are shown to the user.
Web Manifests don't have a way to explicitly indicate that they're useful offline; while many will be, there will inevitably be some that strictly need the network (e.g., real-time multiplayer games).
Some sort of compare-and-contrast probably belongs in the spec, as well as notes on why extending Web Manifests (e.g., with an offline
boolean field) wouldn't be as good.
I also think that it could lift some design directly from the Web Manifest spec, which has basically the right approach to responsive icons (#2) and, less perfectly but still better, to categories (#7) and i18n (#6).
In other words, should an iframe have the ability to register content?
What about sites that register things that are not actually available offline?
This is a potential spam vector. Browsers should be prepared to deal with malicious websites registering a whole pile of bogus resources to fill up the offline content discovery interface. This isn't a problem with the spec per se since this is in the realm of implementation decisions.
I think something like being able to manually hide all registrations from a particular registrable domain will be needed, as well as whatever automatic anti-spam mechanisms get put in place - probably central registrable domain nolists.
I think this will warrant a note in the security section, if only so that it's on implementers' radars.
The website may have a very good idea of what resources the user will want to access - e.g., the next chapter of the book they're reading, rather than some randomly selected one according to general-purpose browser heuristics.
Giving the website a way to indicate likely usefulness for registered and suggesting (but of course not requiring) the browser uses that as an input would likely be a good idea. Note that this should only affect the ordering of resources shown within a single registrable domain.
Concretely, this might look like a real-valued weight
field being added.
This is also distinctly optional and not needed for a MVP, but I do think it could be quite useful.
It bears explicitly noting that offline content registrations are like history in how they make revealing browsing history easy, rather than like cookies/etc which require work to leak browsing history. These should be cleared whenever history is cleared, not just when a total site data purge happens.
While this isn't exactly new with this spec - websites can track your browsing history and re-display it - it does make this sort of leakage a lot more likely, and in particular it's likely to be shown by the browser, rather than the website itself.
They would have to know which websites work offline or install a PWA to be able to browse through content while offline.
It isn't clear to me where the "or" operates here. Is it: They would have to know which websites
If so, the second point doesn't seem right. Websites can't install themselves onto the homescreen.
there no entry points
Isn't the homescreen icon the entry point?
deleteArticleResources
in example 1 references payload.id
, but it should just be id
.
Could just be my preference, but I find async functions easier to read than promise chains. Eg, for the push listener:
self.addEventListener('push', event => {
const payload = event.data.json();
// Fetch & store the article, then register it.
event.waitUntil(async function() {
const articlesCache = await caches.open('offline-articles');
await articlesCache.add(`/article/${payload.id}`);
await self.registration.index.add({
id: payload.id,
title: payload.title,
description: payload.description,
category: 'article',
icons: payload.icons,
launchUrl: `/article/${payload.id}`,
});
// Show a notification if urgent.
}());
});
I'm finding "display" a little confusing. Since it requires an environment, it feels like this UI can't be shown any time other than during the initial add()
call.
If something is displayed for each item added, will this lead to the user being bombed with UI? Eg, if I download "today's news stories", which compromises of 20 stories, will the user suddenly get 20 notification-like things?
It’s RECOMMENDED that the user agent fetch the icons when the content is being registered, and stored to be accessed when needed.
I would make this specific. Spec when the icon should be downloaded, and what should happen if that download fails.
The whole icon-downloading bit could be behind a 'may', but it should be clear how the UA should behave if it chooses to get the icon.
The UI SHOULD provide a way for the user to delete the underlying content exposed by the UI
This kinda sounds like we can guarantee that the underlying content will be deleted. From a UI point of view, a button may be provided which, when activated, runs delete a content index entry for entry.
If either of description’s id, title, description, or launchUrl
Nit: 'any' rather than 'either'.
The content categories seem a bit limiting, especially as they're required. Eg, a book isn't really a "homepage" or an "article". Is a photo gallery an "article"? What about a daily crossword? Etc etc.
What's the reasoning behind requiring a category?
add
and getDescriptions
don't seem to mirror each other in terms of naming. I'd expect addDescription
and getDescriptions
, or add
and getAll
.
Let launchUrl be the result of parsing description’s launchUrl
"parsing" is linking to the wrong thing.
As you mentioned, a new registration may be introduced with a narrower scope, that would receive navigations for content items 'owned' by another registration.
Calling add
twice with a ContentDescription
with the same ID is racy. You could solve this with some sort of queue, eg https://html.spec.whatwg.org/multipage/infrastructure.html#parallel-queue. Dunno if it matters.
Right now, the promise returned by add
is delayed by calling "display entry", which includes fetching icons. Is that deliberate?
For activating the content index entry, look at https://w3c.github.io/ServiceWorker/#clients-openwindow - it shows how to create a top level browsing context and navigate it.
The delete event provides the ID of the resource, but should it provide the whole content description?
Due to race conditions, it's possible for:
contentdelete
event for 'foo' queued.contentdelete
event for 'foo' fires.The spec draft implies that a new tab should be opened when the UI is activated. Is that the right thing to do? Should this be more flexible to allow for browser-specific implementations?
It should be launchURL, per https://w3ctag.github.io/design-principles/#casing-rules and https://url.spec.whatwg.org/#url-apis-elsewhere
Talking about content, it will be great to have more content metadata to able show it properly in the list. For example created time and updated time will be useful to sort the content after we pull it from the list.
WordPress Post object will be a good reference for available content metadata.
Some people have raised concerns that the name of the API is confusing. Is there a better name for this?
Offline Metadata
has been thrown around.
Currently there is no feedback if the id passed into the delete method doesn't exist in the Content Index.
I'm unsure of what type of exception this should throw, but doing something like:
try {
registration.index.delete("something");
} catch (e) {
console.log('Failed to remove content', e.message);
}
would be good programming™️
One of the items mentioned in the WICG proposal post is the interaction with Web Packaging. I'm very interested to know more about what this interaction would look like.
Let's start with this scenario:
comic.yoyodyne.example
. I have a PWA main site and distribute Web Package bundles with free comics.At the moment, I have only one path I can think of for this to work. My thought is that the FOO Bundle could have it's own index.html
that has embedded within it an explicit, hardcoded list of all the other files also in the Web Bundle. FOO Bundle would step through this hardcoded list of content, & index.add()
each item. Once done, it could redirect to the main PWA url.
The constraint is that currently Karen or Jane would have to know about the Bundle's unique index.html file, & know how to navigate there to initialize this problem. It's also a kind of gross solution any how, because the index.html file has to have some JS with the hardcoded list of content that's in the bundle.
What I would love to see would be a way for content to more easily declare itself as indexed. As a secondary objective, HTTP Push has almost the same problem, where the page/sw have no way to know about PUSHed content. There, a similar approach is also hacked together: use SSE or use WebSockets or some such to tell the browser about the content you have just PUSHed to it, so you can fetch() then cache that content. That issue is whatwg/fetch#65.
It would be really lovely to have a way to get content into the content index effectively. I would love for my Comic web app to be able to find out about the Comics it is being sent. Content-Index seems like it could be a breakthrough in enabling that, but there's still an outstanding question to me of how to get Web Package bundles into the content-index.
Thanks for publishing this! Some suggestions.
Intro
Why
Combined with other APIs
await
in a loop, which makes for a sequential operation, store the promises in an array and then await
for Promise.all()
, making it a parallel operation.There also are a few other things I think would be good to mention:
ServiceWorkerRegistration
?I'm sure you've seen the following document, but just in case:
https://github.com/w3ctag/w3ctag.github.io/blob/master/explainers.md
If a user decides to delete the content, the browser should fire a contentdeleted
event with the ID so developers can clean up the underlying content.
Some thought should be put into preventing malicious websites from re-adding the same content with a new ID within that event.
Would it make sense to have an "app" content category to indicate an offline-enabled web app?
The text fields (title
and description
) are monolingual and in an unspecified language. Even if multiple languages + browser selection of which one to display would be overkill, the language should be indicable to allow proper display (as matters for, e.g., CJK).
The ImageResource
definition moved to its own spec. This spec should point to that definition rather than the one in the Manifest spec.
I'm a bit concerned around the extensibility situation for categories.
First, in the spec itself, a comprehensive listing of categories that are guaranteed to be understood as a baseline would be a good idea.
The list that is there (in the IDL) is undescribed and opaque; what's the meaning of the different values, and when should they be used? E.G., I see audio
and article
are separate categories; which one does a spoken rendition of a magazine article fall into? That also seems quite different to a piece of music, but both could fall under audio
. Maybe this should be using something like schema.org's ontology, taking a combination of all the (understood) categories.
Categories are inextensible without some way to indicate less-preferred-but-more-understood fallbacks. ARIA and some CSS properties have a first-understood-value-wins rule; alternatively, a combination of all the applicable understood values could be used, like RDFa and microformats.
(This point really is a quibble.) Using un-namespaced tokens for category
means that extension is risky, as someone else might be using the values you add. If the values were URLs, anyone could mint new values without risking collisions.
Why not directly available from the main thread without having a SW registered (like the Cache API)?
Hello. At the moment, if I have a complex service worker that wants to take notice of content being added, I can either
index.add()
index.getAll()
& diff, checking for new contentSince we have this index, & this index already can tell us when content goes away, it would also be nice to know when content is added. This would be a more normalized path than (1) or (2).
It would give users a better idea of how much space they would get back upon deleting an entry, and makes for better UI.
While crawling Content Index, the following links to other specifications were detected as pointing to non-existing anchors, which should be fixed:
This issue was detected and reported semi-automatically by Strudy based on data collected in webref.
This might be less of a bug against the spec and more of an implementation detail related to Chrome 80's current behavior.
registration.index.add()
currently rejects if you pass in an icons[].src
value that isn't a valid URL. I'm curious about how this determination is made, as it's making it difficult for me to accomplish something while trying out the Content Indexing API. (Sample code.)
I've got a PWA that handles incoming media sharing requests using the Web Share Target API on Android.
If a user shares an image to my PWA, that image gets saved locally using the Cache Storage API with a cache key URL that doesn't exist on a remote server—requests for that URL will only succeed if intercepted by my service worker's fetch
event handler, which will bypass the network and return the cached media resource.
I am calling registration.index.add()
inside of the same fetch
handler that is responsible for handling the incoming POST
request from the Web Share Target API. If I pass in an icons[].src
value corresponding to a generic icon URL that exists on the remote server, everything works as expected. However, if I pass in a icons[].src
value that refers to the newly-cached image (which, again, is only valid when intercepted by the service worker, and doesn't exist on the remote server), the add()
call rejects due to an invalid icon.
I can probably refactor things so that the call to registration.index.add()
happens outside of a fetch
handler, if that's what's causing the failure. But my bigger question is whether the validity checks for icons is supposed to trigger a service worker's fetch
handler at all—because if it doesn't, I've got a bigger issue to solve.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.