wicg / compression-dictionary-transport Goto Github PK

View Code? Open in Web Editor NEW

93.0 93.0 8.0 76 KB

License: Other

compression-dictionary-transport's People

Contributors

Stargazers

Watchers

Forkers

horo-t pmeenan tomvangoethem flano-yuki jxck seanpm2001 matt-koevort hiway-media

compression-dictionary-transport's Issues

Consider making sec-available-dictionary: value path-safe

As currently spec'd, the sec-bikeshed-available-dictionary: request header is a structured field dictionary that includes the hash type and base-64 encoded hash of the dictionary file.

i.e. sec-bikeshed-available-dictionary: sha-256=:d435Qo+nKZ+gLcUHn7GQtQ72hiBVAgqoLsZnZPiTGPk=:

On the server side, it would be extremely easy to check for and serve delta-encoded resources if the hash was part of the file name. i.e. /app/main.js.sbr.<hash>.

Extracting the hash from the SF value and mapping it to a hex string or other path-safe string can be done but is maybe a bit more complicated than it needs to be.

Since the length of the hash string changes by the hash type we can send the hash without having to send the algorithm (just need to make sure all supported algorithms generate different hash lengths). Additionally, Base64 includes / as one of the characters to use when encoding so it may be cleaner to just use hex encoding. Other higher-but-safe bases could be selected as well but may complicate tooling.

If we change it to use the base-16 encoded hash and send the raw hash as the value then the server or middle boxes can construct the file name directly by appending the header value to the end of the file path (though some care should be taken to make sure it isn't abused for a path attack and that the value appended only contains valid characters).

Allow for dest matching in addition to URL pattern

Allow for matching against fetch destination (Sec-Fetch-Dest) in addition to the URL pattern.

Maybe an optional match-dest dictionary entry on the Use-As-Dictionary response and require it be a full match against the specified fetch destinations.

Dictionary partitioning in browsers without tripled-keyed HTTP cache

It might well make sense that this is layered on top of the HTTP cache as @pmeenan suggests, but not all implementations have a tripled-keyed HTTP cache at this point. Is that a pre-requisite for this feature?

See whatwg/fetch#1035 for some background and further pointers on triple-keyed HTTP cache.

A "perfect match" scenario

Hey!

Imagine an Edge based deployment of compression dictionaries, where the resources themselves are in a cloud-based storage.
Every time the CI runs, it adds a new resource to the pile, and calculates the diffs between it and N previous versions of that same resource. All of these diffs are stored in the same bucket in the cloud.

Now, whenever a resource is served, it uses a use-as-dictionary value that matches the various resource versions.
What happens when that same resource gets reloaded?

Its matches value definitely matches itself, so it's getting a request with a SHA-256 signature in its sec-available-dictionary with its own signature. That kind of 0 sized diff does not exist in the cloud storage, because the CI didn't create diffs from the resource to itself. That means the request either fails, or is retried without the dictionary. (adding delay)

What's the right way to tackle such a scenario?

One option would be to provide some signal on the request that the SHA in sec-available-dictionary is of an exact match of the URL. That would enable the edge to do something smarter about this than to fail and retry.
Another option would be for such deployments to store a "diff" from the file to itself, and unify these flows without retries. At the same time, it feels odd to add such diffs.

I'd love thoughts on the right thing here for the protocol (and developer advice that will be derived from it).

Abbreviated structured field dictionary keys

Why is it p and not path? Same for the other fields.

cc @mnot

Zstandard Interpretation of Dictionary

Zstandard can use both structured and raw content dictionaries (RFC8878 sec. 5). When a buffer is presented to Zstandard to be used as a dictionary it must be instructed how to interpret it. (If a properly formatted dictionary is used as a structured dictionary by the compressor and as a raw content dictionary by the decompressor, or vice-a-versa, the reconstructed output will likely differ from the original content.)

The three options provided by zstd are:

Auto: See if the leading bytes match the magic for a Zstandard-formatted dictionary. If so, interpret it as a formatted dictionary. Otherwise, interpret it as a raw dictionary.
Raw: Ignore the header even if it looks like a structured dictionary. Use the buffer as raw content.
Full: Interpret the dictionary as a structured dictionary. Fail if it doesn't conform.

One option is to use the MIME type of the resource being used as a dictionary (as discussed in #44) to signal how it should be interpreted. But simpler might just be to use the auto-interpretation mechanism.

Whatever we choose, the description of the zstd-d content-encoding should be updated to be explicit about this.

Use case for TTL decoupled from cache freshness

For the case of dynamic HTML resources, I can see sites with low number of returning visitors, where it can be beneficial to e.g. reuse the HTML delivered as part of the current page for future versions of the same page or for similar pages (e.g. reuse the HTML from one product page to another).

But very often, such HTML pages (especially with publishers and e-commerce) are served with very low caching freshness lifetime (if any), to ensure that typos or page errors won't live on in the browser's cache.

At the same it'd be great to be able to use these pages as a dictionary for a long while.

So it'd be great to be able to define both Cache-Control max-age and a dictionary TTL, have the browser cache keep the resource around for the duration of the longest amongst the two, but only use it for the case for which it is still fresh.

Hashes, algorithm agility, and overlap with HTTP digests.

The explainer describes that the client/and server generate SHA-256 hashes and then use those to coordinate. Is there a specific reason why algorithm agility is not built in to the protocol? In simple terms, the ability to migrate to other algorithms as the security environment evolves.

The more I look at this aspect, the more it gets me thinking about whether the design has some overlap with the HTTP digests specification https://httpwg.org/http-extensions/draft-ietf-httpbis-digest-headers.html

The explainer hints at wanting to constrain the size of the sec-bikeshed-available-dictionary field value via

SHA-256 hashes are long. Their hex representation would be 64 bytes, and we can base64 them to be ~42 (I think). We can't afford to send many hashes for both performance and privacy reasons.

but I wonder how much this really matters in practice.

If we adopted a similar approach that digests use, you could make sec-bikeshed-available-dictionary be a Structured Fields dictionary that can convey 1 or more hash values alongside their indicated algorithm e.g.

sec-bikeshed-available-dictionary:
  sha-256=:d435Qo+nKZ+gLcUHn7GQtQ72hiBVAgqoLsZnZPiTGPk=:,
  sha-512=:YMAam51Jz/jOATT6/zvHrLVgOYTGFy1d6GJiOHTohq4yP+pgk4vf2aCs
  yRZOtw8MjkM7iw7yZ/WkppmM44T3qg==:

Even if you restrict to only adding one hash, you can still benefit from agility via sending the algorithm

No support for hash-based versioning

This somewhat falls under "open question 2" in the explainer, but I thought its worth opening an issue to discuss this specific aspect.

The problem
It is very common for asset paths to include a hash of the asset's content.

usually, concatenated with the asset name itself in some form - e.g. cdn.mysite.com/assets/myscript_HASH.js
less commonly, preceding the path - e.g. cdn.mysite.com/assets/HASH/myscript.js.
Most JS tooling/bundlers production builds generate assets with the hash concatenated to the asset out of the box.

The benefit: assets with no changes between version n to n+1 keep the same hash, and load from the HTTP cache.

The proposed path/scoping rules mentioned in the explainer do not support type of versioning.
I think this is bad:

If only myscript.js/{version}-esque scoping rules are supported, then there's a mutually exclusive choice between cache-friendly hash-based versioning, and support for delta dictionaries.
The tradeoff will be:

If you stick to hash-based versioning, you get "peak performance"[1] (load from disk) for cached assets and pay the full(compressed) price for assets that changed. This is the boat we're all in today because it's the only option.
If you move to number-based versioning, you'll always have to fetch from the network but it'll be minimal deltas.

Of course, there's is no clear cut "one is better than the other".
Factors such as code-splitting granularity, deployment cadence, and user demographic/perf distribution come to top of mind.

On a meta-level (i.e. from ya'll browser folks' point of view), it's possible that the aggregate performance impact will be net negative.
Being an "opt-in" feature where devs will chose delta dictionaries over hash-based versioning doesn't guarantee a net benefit in the long run. (They might A/B test today, but circumstances change after 1/2 a year).

I think Chromium might not be able to accurately measure the net impact even with an open A/B test origin trial due to selection bias of those who opt-in for the trial.
For a non-trial, there are many other factors that could affect performance over time. Looking at improved CWV for a short window of "before/after" might tell a lie in the long run and we'd never know.

Solution thoughts
I'm only here to complain..

Adding a wildcard anywhere in the path is a problem for the proposed scoping/pathing rules.
Is it ~better if it's only allowed for the slug (last segment)?
Maybe the slug can be a prefix by-definition? i.e. /myscript.js implicitly matches both /myscript.js.hash1, /myscript.js.hash2.

[1] - https://simonhearne.com/2020/network-faster-than-cache/

Exposing storage usage for dictionaries

I'm wondering whether we should expose the storage usage for the dictionaries.

Currently Storage API is providing a way to get the storage usage.

For example in Chromium,

JSON.stringify(await navigator.storage.estimate(), null, 2);

returns

{
  "quota": 296630877388,
  "usage": 75823910,
  "usageDetails": {
    "caches": 72813056,
    "indexedDB": 2877379,
    "serviceWorkerRegistrations": 133475
  }
}

Note: usageDetails was launched in Chromium. But it is still under spec discussion.

I have two questions:

Is it OK to increase the usage for dictionaries?
Is it OK to introduce dictionaries in usageDetails?

All dictionary resources should be readable from the page, so I don't think there is any risk of exposing them. But I'd love to hear other options.

Add cross-origin compression protection

To add another layer of defense against cross-origin timing attacks, we should add language along the lines of:

When the server receives a sec-bikeshed-dictionary-available: sha256=:<hash>: request that includes an authority or origin as well as a referer request headers and where the referer is cross-origin, the dictionary may only be used for compression if the response headers includes an Access-Control-Allow-Origin: that includes the origin from the referer header.

It could be tweaked to use different sec-* headers to detect the cross-origin nature of the request but the requirement is to prevent servers from even sending responses using dictionary compression that should be opaque (and opening up the possibility of a timing attack).

Automatic retry fetching on cached dictionary read failure

Can we make the browser automatically retry the request without the sec-bikeshed-available-dictionary: header when it failed to read the cached dictionary?

The current explainer says:

In case the browser advertized a dictionary but then fails to successfuly fetch it from its cache and the dictionary was used by the server, the resource request should be terminated

So the browser must check the existence of the cached dictionary on the disk before sending the request to reduce the risk of such failure.

If the automatic retry is allowed, the browser can speculatively send the request with sec-bikeshed-available-dictionary: header without checking the cached dictionary. I think this is very important for performance.

Clear Site Data for dictionaries

There is no way to delete registered dictionaries.

I think we should support it using Clear Site Data. The Clear Site Data spec defines following types.

"cache"
"cookies"
"storage"
"executionContexts"
"*"

I think Web developers will want to delete dictionaries without deleting other types ("cache", "cookies", "storage"). So we should introduce a new type "dictionaries".

Clear-Site-Data: "dictionaries"

Consider options for Path of side-loaded dictionaries

For dictionaries loaded from a Link: header, it could be useful for the request that triggers the dictionary fetch to either specify the scope of the dictionary or for the allowable path for the dictionary to include the path from the original request and from a document <link> tag to also provide other path options.

The path restrictions for dictionary use as they are currently written are for providing some level of ownership proof when setting the scope. The request that triggers the dictionary fetch and the document itself are also proof points and could allow for serving the dictionary from a different directory than the resources it is intended to be used with (still needs to be same-origin as the resources).

Define mechanism for advertising non-bytestream dictionary formats

Brotli and Zstandard both support raw byte streams as well as "optimized" dictionaries. Most of the work to this point has assumed raw byte streams but it would be beneficial to spec what the negotiation for a custom dictionary payload would look like so that backward-compatibility doesn't become a problem.

i.e. If a browser ships without support for extended brotli dictionaries or index-based Zstandard dictionaries and support for both is added at a later time, we need to make sure that older clients will not break by trying to use the new dictionary as a raw byte stream.

This could be done with different content-encodings for the different types of dictionaries but it would be better to not explode the set of encodings if it isn't necessary.

One possibility that comes to mind:

Define separate content-type's for different types of stand-alone dictionaries. i.e. dictionary/raw, dictionary/brotli, etc.
When stand-alone dictionaries are fetched using the link rel=dictionary mechanism, Advertise the supported dictionary times in the Accept: header.
When responding with the use-as-dictionary response header, add an optional field for type= for the type of dictionary that defaults to type=raw.
When responding to a stand-alone dictionary fetch, respond with the proper mime type for the stand-alone dictionary in the content-type header.
If a client doesn't recognize the type specified in the use-as-dictionary response header then it should not store the dictionary (independent of how it was fetched).
(optional) if the client is processing a stand-alone dictionary fetch and the content-type response header is not a recognized dictionary type then it should not be stored as a dictionary.

Since custom dictionaries will only ever make sense to be fetched as stand-alone dictionaries, this should allow for backward-compatibility as new dictionary formats are created.

Grabbing authority for paths closer to the root

It seems that the proposal allows any subresource to essentially claim dictionary authority for /.

Am I misunderstanding or is that correct?
If so, can that be abused in some manner?
If so, this seems inconsistent with how we handled this with, e.g., service workers: https://w3c.github.io/ServiceWorker/#path-restriction.

Copy edit issue

The README.md says:

On a future visit to the site after the application code has changed:

The browser loads https://www.example.com/ which contains <script src="//static.example.com/app/main.js/125">.
The browser matches the /app/main.js/125 request with the /app/main.js path of the previous response that is in cache and requests https://static.example.com/app/main.js/123 with Accept-Encoding: br, gzip, sbr, sec-fetch-mode: cors and sec-bikeshed-available-dictionary: <SHA-256 HASH>.
The server for static.example.com matches the URL and hash with the pre-compressed artifact from the build and responds with it and Content-Encoding: sbr, Access-Control-Allow-Origin: https://www.example.com, Vary: Accept-Encoding,sec-bikeshed-available-dictionary.

I believe it should say:

The browser matches the /app/main.js/125 request with the /app/main.js path of the previous response that is in cache and requests https://static.example.com/app/main.js/125 with Accept-Encoding: br, gzip, sbr, sec-fetch-mode: cors and sec-bikeshed-available-dictionary: <SHA-256 HASH>.

Dictionary expiration/lifetime

If dictionaries end up scoped to a path and use some form of precedence, what are the mechanics for expiring a dictionary with more specificity for a less-specific one?

i.e., assuming dictionaries that cover 2 paths:

A - http://example.com/web/products/
B - http://example.com/

If a client has both dictionaries but a site decides to unify on a single global dictionary (B), how is dictionary A replaced? Some possibilities come to mind:

When a dictionary with a path is fetched, the response can indicate if it overrides all child paths (deleting any dictionaries for paths under it).
Just use regular cache expiration runes and dictionary A remains preferred until its cache lifetime expires.

Content-encoding may be fragile

Content-encoding is the most natural fit for the actual compression but it is likely to also cause adoption problems, at least in the short term.

It's not unusual for the serving path to consider content-encoding to be per-hop instead of end-to-end from the browser to the origin and unless the delta-encoding is being done by the leaf serving node, the sbr encoding is likely to be stripped out.

sequenceDiagram
Browser->>CDN: Accept-encoding: sbr, br, gzip
CDN->>Origin: Accept-encoding: gzip
Origin->>CDN: Content-encoding: gzip
CDN->>Browser: Content-encoding: br

If the actual encoding is done using other headers for negotiation but the content-type remains the same, then the compressed resources will be binary data and may cause other issues for middleboxes (i.e. with something like edge workers, they will be expecting to be processing text HTML, CSS or Javascript payloads). That could be workable for a given origin as long as they control the processing along their serving path.

One deployment model where it could work, but requires explicit support from both origins and CDNs is:

Origin manages the bikeshed-use-as-dictionary: <path> response header
CDN sees the header and stores the resource in a custom-dictionary cache with the appropriate hash
Browser requests with sec-bikeshed-available-dictionary: and Accept-Encoding: sbr, br, gzip
CDN checks cache for delta-compressed artifact for combination URL and dictionary
On miss, CDN checks for cached uncompressed URL (or fetches from origin on full miss)
If response is not already a delta-compressed artifact:
- Check cache for requested dictionary
- Compress response with requested dictionary (possibly as a background task for future requests)
- Serve response to browser with Content-Encoding: sbr

Supporting "no-cors" mode requests seems problematic.

When a browser fetches a cross-origin script (eg: <script src='https://static.example.com/script.js'> in https://www.example.com/index.html) , it sends a request with the mode set to no-cors and the credentials set to include.

The current explainer allows this type of request for both registering as a dictionary and using a registered dictionary for its decompression, as long as the response header contains a valid Access-Control-Allow-Origin header (* or https://www.example.com).

However, if we follow the CORS check step in the Fetch spec, the response header must also contain the Access-Control-Allow-Credentials: true header, and Access-Control-Allow-Origin header must be https://example.com. This means that the server must know the origin of the request, even though the request does not include an Origin header. (It may include a Referer header. But the Origin header and Referer header are conceptually different.)

For this reason, now I think supporting no-cors mode requests is problematic.
Maybe we should support only navigate, same-origin, and cors mode requests?

@pmeenan @yoavweiss
Do you have any thoughts?

expected implementation location

Hey folks, tracking this proposal as it seems like a huge way to cut down on our CDN traffic and get people loading updated JS bundles faster.

From an implementation perspective, is there a recommended / expected location for the implementation in standard flows? I can see two places to do this:

The CDN itself, instead of proxying a static resource bucket instead proxy a service which can dynamically generate files on the fly based on the dictionary the client has available. The CDN will then cache these dynamically generated assets for future requests.
At build time, when generating final assets also generate "diff" assets based on the last N builds and store them statically. This still requires support at the CDN layer for serving these files dynamically but it at least doesn't require dynamic generation.

There are pros and cons to both approaches but wondering if from a spec perspective there is an "ideal" approach here or specifically what was envisaged while writing the spec.

For full context we ship like ~15-20 builds a day which means that solving the "how do we generate these files" is not a trivial problem to solve in either case (dynamic vs build time). But looking to go down the path most trodden.

Allow for hash/versions in the middle of the path

A lot of build systems produce static resources that are prefixed by a build number and that doesn't work well with a prefix-only match. i.e. /app/123/main.js

We could allow for more flexible path matching with some form of wildcard support but that will complicate the "most-specific" matching logic and the ownership protections.

Using a # for a wildcard (since it is already reserved as a client-side separator) we could allow for exact matching by default, prefix matching with a # at the end or wildcard matching.

Some open questions:

How would the specificity be ranked if there are multiple matches? Overall match string length feels like it should be safe but would it lead to unexpected results?
Does this open up ownership/scope issues for shared-host environments? Maybe, at a minimum, require the first path component to be specified when using wildcards if the dictionary is not served from the root path?
Does it need to support multiple wildcards? At a minimum, allow for 1 in the middle and one at the end?

URLPattern usage

Colleagues and I were curious if Chromium had any plans it could share around whatwg/urlpattern#191. As running JS regular expressions in networking doesn't seem like it will fly.

Is the idea to have some kind of safe subset?

cc @domenic @pmeenan @cdumez

Standards positions

Hi folks working on this! If you haven't already, could you please file standards positions for WebKit and Mozilla? It would be great to get a few more eyes on this from potential implementers:

Thanks in advance!!!

Provide mechanism for A/B testing

One of the things that came up during Chrome's origin trial is that A/B testing the effectiveness of compression dictionaries is difficult (and will become more difficult when it is no longer an origin trial).

There are 2 points in the serving flow where dictionary decisions need to be made:

On the original request when the use-as-dictionary response header is sent to mark a response as an available dictionary.
On a subsequent request when the client advertises available-dictionary and the server decides if it is going to serve a dictionary-compressed response.

In the case of the origin trial, there is a third gate which is the setting of the origin trial token which enables the feature (without which the use-as-dictionary response header will be ignored. Outside of the origin trial there is no page-level gate for enabling it and in both cases, once enabled, there is no way to turn it off for individual users.

For the dynamic use case where the server is running application logic anyway and the response is not coming from a cache, it is possible to use a cookie or some other mechanism to decide if dictionaries should be used, both on the initial request and subsequent requests where the available-dictionary request can just be ignored.

In the static file use case where resources are served from an edge cache and the cache keys the resources by URL, accept-encoding and available-dictionary, there is no granular way to control user populations. All clients for a resource will get the use-as-dictionary response header and all clients that advertise a given dictionary would get the dictionary-compressed response. The page does have SOME level of control but it would require using different URLs for the resources for the different populations.

Counter-points

While it would be useful for sites to be able to have granular control over the feature for measuring the effectiveness during roll-out, that level of control is not usually exposed for transport-level features.

Other content encodings have the same restrictions, including brotli and ZStandard as they were rolled out.
As mentioned above, it is difficult but not impossible to test by using different URLs for different populations (though this is more difficult if you don't control the page where the URLs are embedded).
Allowing for a global enable/disable capability would potentially expose 1 bit of fingerprinting data across privacy boundaries.
This is only for A/B testing, at a global level there are already controls that allow for the feature to not be used in case of a catastrophic problem (either by browser flags for the browser manufacturer to disable or by ignoring the available-dictionary request headers).

Escape character and ? for URL matching

In Chromium, we are using the MatchPattern() method to process the URL-matching.

The MatchPattern() method supports both ? and *. (? matches 0 or 1 character. And * matches 0 or more characters.) Also the backslash character (\) can be used as an escape character for * and ?.

The current proposal's Dictionary URL matching doesn't support \. Also it doesn't support ?.

I think ? is useful. But ? is used in URLs before URL-query string. So I think we should support both ? and \.

Consider Websocket use case

Websockets themselves would fail a same-origin check for a dictionary delivered over HTTPS.

Would it be valuable (and safe) to allow for the path matching URL in the dictionary response to specify a wss:// scheme along with a match path (and explicitly restrict dictionaries to https, not just same-origin)? Then the dictionary-setting part of the spec could require that the match path be same-origin (and https) or the equivalent origin if wss was used as a scheme in the match path.

Something like:

Only process use-as-dictionary: response headers for requests with a https scheme.
Parse the path (or match if we change it) param as a URL.
- Usually the path will be relative since the origin is not needed but could be used if specifying wss (and doesn't hurt otherwise, allowing regular URL parsing and classes to be used).
Allow https and wss schemes if the URL is fully-qualified.
Verify that the origin for the request URL and match URL are the same.
- If the match URL uses a wss scheme, replace it with https when doing the origin comparison.

AFAIK, the actual compression should work fine for data delivered over a websocket as long as the encoding supports streamed compression (which is usually a requirement before adopting a new compression algorithm anyway).

What's the expected interaction model with Service Workers

Apologies if there are some specifics that I missed around this but I'm curious how service workers will interact with this solution. It's clearly at a lower layer and no API for SW but is it expected that, when using SW to make fetch requests, this process still happens or should it be skipped? Atm, typical browser caching layers are skipped with SW networking - for example, responses sending a etag header will not go through the process for which the request will have a if-none-match header, the SW needs to incorporate that.

Path parsing

Is this defined somewhere?

URL matching should use URLPattern

This is the new foundation we're using for URL matching across the web platform. https://github.com/WICG/urlpattern

Introducing a new type of pattern is counterproductive to our efforts. (I can't find the details from the explainer, but it says "This is parsed as a URL that relative or absolute URLs as well as * wildcard expansion.", and then #42 is open also I guess.)

Requires all caches in the path support Vary:

It's probably worth calling out that all caches in the serving path will need to support Vary: sec-bikeshed-available-dictionary so that the cache for a given URL doesn't get polluted with delta-compressed artifacts using different dictionaries.

Full Vary support for arbitrary headers isn't necessarily needed but it will be required for whatever the dictionary request header ends up being.

Not sure if it needs specific mentioning, but this is for the CDNs, Load balancers and web servers at a minimum, depending on what caches are in the path for a given origin.

I'm assuming it also needs to be limited to HTTPS (and maybe only HTTP/2 and 3) to reduce the risk of forward proxies or intercepting man-in-the-middle proxies from causing cache issues.

Consider support other Content-Encoding schemes

Hello. Could/should this specification be generalized to support it's application to other compression schemes? Currently the README seems exclusively focused on Brotli, but having wider support could help other standards.

For example there is the zstd compression scheme, here:
https://docs.google.com/document/d/1aDyUw4mAzRdLyZyXpVgWvO-eLpc4ERz7I_7VDIPo9Hc/edit . This too could use Dictionary support.

Concern over handling (or lack thereof) of dictionaries was one of the primary concerns cited in mozilla/standards-positions#105 for the defer status against the zstd compression scheme proposal. If this PR could be generalized a bit, there's a possibility of zstd & potentially other compression schemes to have a better chance, to making it forward & helping users save cpu & bandwidth on the web.

i.e. vs e.g.

(i.e. as main.js..sbr)

Did you mean to use e.g. here?

Content-Type (MIME) of Dictionary

I found that there are no definition of MIME type for dictionary itself in demo.

application/compression-dictionary maybe ?

expires or max-age

https://github.com/WICG/compression-dictionary-transport/blob/main/README.md?plain=1#L137

expires - Expiration time in seconds for the dictionary.

in Cookie, Cache-Control etc, expire is date format and max-age is time in seconds.
so it seems bit strange for me to have time in seconds for expires.
how about make it max-age ?

sec-fetch-dest for dictionary fetch

In the current explainer, when the browser detects a link element <link rel=bikeshed-dictionary as=document href="/product/dictionary_v1.dat">, it fetches the dictionary with sec-fetch-dest: document header.

However, when the server receives the request, it may be confused whether this is an actual document request for navigation or a dictionary fetch request.
Therefore, I want to recommend to introduce an appropriate sec-fetch-dest value to indicate that the request is for a dictionary fetch.

Two possible ideas are:

sec-fetch-dest: dictionary and sec-fetch-dict-dest: document
sec-fetch-dest: dictionary-for-document

In Chromium implementation, the "document" destination type is used to detect the main resource request. Therefore, introducing a new destination type is also convenient for Chromium developers.

Same-origin check, redirects, and navigations

We should make sure the correct thing is done here, to avoid confused deputy attacks.

(This came up during TPAC 2023 and nobody present was immediately clear on whether this was handled correctly.)