wicg / kv-storage Goto Github PK

[On hold] A proposal for an async key/value storage API for the web

License: Other

kv-storage's Introduction

KV Storage

KV storage (short for "key/value storage") was a proposed web API to provide a simple, asynchronous key/value store, layered on top of IndexedDB. Please check out previous revisions of this repository to learn more about it.

This proposal is currently inactive as no browser vendors are investing in it.

kv-storage's People

Contributors

Stargazers

Watchers

Forkers

nolanlawson mathiasbynens akoserwal jyjunyz developit ms2ger littledan positonic trotyl markthomas93 freshy969 johnsonjo4531 akadsuki xmorgan renowncoder global-localhost global19 global19-atlassian-net seanpm2001

kv-storage's Issues

Should undefined be special-cased to mean not-present?

Right now the API distinguishes between "not present" and "has undefined", in the same way that JS maps do.

localStorage only allows string values, so passing undefined is the same as passing "undefined".

localForage seems to convert undefined to null, but then store it anyway.

The alternative is to say that setting an item to undefined (and null?) is equivalent to deleting it. So any values set to undefined would be deleted from the keys/values/entries, in particular.

If we did this, we could also get rid of the has() method, which does not exist in localStorage, and encourages error-prone racy code like

if (await als.has("key")) {
  const val = await als.get("key");
  // oops, val might have been deleted between the has and the get; the has didn't buy us anything
}

instead encouraging people to do

const val = await als.get("key");
if (val !== undefined) {
  // good to go
}

Note that this issue becomes moot if we resolve #2 in favor of restricting values to strings.

README incorrectly refer to AngularJS as Angular

At the end of README.md, it's saying:

For comparison, Angular is downloaded ~313K times per week, React ~2.807 million times per week, and Vue ~436K times per week.

But the link of "Angular" is the "AngularJS" npm package.

As per Google's guidelines, "Angular" and "AngularJS" are different products in branding.

It should either refer to Angular's npm package or change the name to "AngularJS".

backingStore API might not be quite right

It occurs to me you can't really use the IndexedDB database manually, once a StorageArea has opened it. (Unless we add explicit close(); see #13.)

As such, the only way to really use backingStore is:

const backingStoreInfo = (new StorageArea("foo")).backingStore;

But this is kind of awkward, creating a StorageArea with no intention of actually using it. Instead, maybe it should just be

const backingStoreInfo = StorageArea.backingStore("foo");

Open connection lazily?

The current API sketch has storage as a built-in StorageArea. Given the current steps, and assuming the module is imported, this would initiate opening the database on page load, which may not be desirable - the page may never end up using storage, or may want to defer accessing storage until after the page is ready.

Counter-argument: it should be up to the UA to schedule database work to not interfere with page load. (Counter-counter-argument: some high profile web applications measure open times and don't want browsers to delay them unnecessarily.)

It might make sense to make the actually DB connection open lazily. This is often done in similarly shaped libraries by having the constructor return immediate, having a private ensure_database() method that returns a Promise with the connection, and having all other methods call that to get the connection.

Expose the methods off async-local-storage as separate exports

I think it would help developers if you would expose the methods of the storage class also.

that way I could do something like this:

import {set as save, get as load} from "...."

perhaps even rename/provide aliases from the get go.
I would like to see save|load|list|remove.

I know the last thing is a very bikeshedding sensitive one, sorry ;)

What if v1 of the library is used in a tab, while another is using v2?

Might be worth thinking about this in v1, in case code is needed in the v1 version to handle it.

Add methods to encourage atomic operations?

This is definitely beyond the scope of "local storage, but better". But, should we add methods that encourage patterns that aren't as prone to interleaving?

To be concrete, we're worried about patterns such as

if (!(await als.has("key")) {
  await als.set("key", "value");
  // oops, someone in another tab might have set something in between has returning and set being called
}

Methods that could help with this are

setIfAbsent()
Making set return the value that was present before the change (or adding a new method that does this)
replaceAll() = clear() + a series of sets
Versions of set/delete that operate on multiple entries at once

Alternately, we could encourage using web locks in conjunction?

Or, we could add "lightweight" transactions!?! Eek.

My tendency is thinking that this is too much scope creep, and that if you need these kind of atomicity guarantees, IDB with its full-fledged transactions support is your best bet. But I'd love to hear more.

Suggestion: Provide Observer API

IndexedDB Observers seems like a really useful feature -- it would provide a more robust version of postMessage -- if an iframe or tab gets loaded after a postMessage event happened, the iframe can't get what the value was, whereas IndexedDB + IndexedDB Observers could be a really powerful way of passing data to other iframes (or tabs?), regardless of the order they load.

But vendor adoption seems quite slow. Could this api provide similar observers, perhaps tied to a specific key?

Explicit close() ?

The current proposal relies on garbage collection to reclaim the StorageArea before the connection will be closed. Once #10 is resolved, this should not be observable since actions on areas should never block on upgrades (since the version is fixed).

Therefore I don't think there's a strong reason to complicate the API with close(), but I wanted to raise the issue.

Name may be misleading

Someone brought up that the name "async local storage" may imply that this was an async way to access the same data source as localStorage. But that's not true; they're separate. Maybe there's a better name.

Feel free to Bikeshed here!

Add domintro boxes and more examples

This spec needs to get a bit more web-developer friendly.

Should "area.[[DatabasePromise]] is null" be is not null?

2. Assert: area.[[DatabaseName]] is a string (and in particular is not null).
3. If area.[[DatabasePromise]] is null, initialize the database promise for area.

You assert not null in 2.

Observe "close" event on connection

Implementations fire "close" against the connection when it must be closed abnormally, e.g. in response to the user clearing browsing data.

See https://w3c.github.io/IndexedDB/#closing-connection and search on "forced flag"

A storageArea should probably listen for this, null out the [[Connection]], and set [[ConnectionError]]. I don't think we make the error details available anywhere useful in this case, so a synthetic DOMException would need to be created.

A couple more libraries to investigate/ask for feedback from

Mentioned to me by a colleague over lunch:

lawnchair.js
React Native's own async local storage

Implementing clear() via database deletion?

I was thinking it'd be a nice way to address #8 and part of #13 by having clear() delete the database entirely. I'm not sure the best IDB-ese for how to do this. From what I gather from the spec, you need to close the database before deleting it.

So far I have two ideas:

Call database.close(), then when that request succeeds, call indexedDB.deleteDatabase()
In the upgradeneeded event for the database, check for cases where the version is transitioning to null, and close the database in that case.
- This also handles deletion through raw IDB API usage, which is a nice bonus.

Thoughts, @inexorabletash?

Cannot store promise-like values

I'm not sure to what extent this is considered a problem, but unlike with Indexed DB, it seems you cannot store a value like { then: "hi", now: "boo" } due to promises unwrapping all the things.

At the very least this might be worth calling out?

Performance concerns

From Elliott Sprehn at https://twitter.com/ElliottZ/status/1105342069449474048.

Named areas/versions/upgrades

The steps/code in 3.1 seem to misunderstand how versioning and schema works in Indexed DB. upgradeneeded only fires if the version changes. So:

new StorageArea('a'); 
// opens 'async-local-storage' at version 1, which didn't previously exist
// runs upgradeneeded, creates store 'a'


new StorageArea('b');
// opens 'async-local-storage' at version 1, which previously existed
// upgradeneeded does not fire

So store 'b' will never be created

"for (const key of storage.keys()) {"

https://wicg.github.io/kv-storage/#example-live-async-iterator

for (const key of storage.keys()) {

This should say "for await", oops.

Note the easy upgrade path

If you outgrow ALS (e.g. you need real transactions), you can just move to IndexedDB, and all your data's already there!

This would require using predictable database names in the IndexedDB backing.

build in import will fail if not secure origin, but it sounds like that is not the case for polyfill

If the built-in implementation is not imported in a secure context, the import statement will cause a "SecurityError" DOMException, as persistent storage is a powerful feature.

Is that the case? I think any import including a build-in (existing or not) module should always fail if not in a secure context

What happens if `async-local-storage:${name}` already exists?

storage = new StorageArea(name) Creates a new StorageArea that provides an async key/value store view onto an IndexedDB database namedasync-local-storage:${name}`.

This does not actually open or create the database yet; that is done lazily when other methods are called. This means that all other methods can reject with database-related exceptions in failure cases.
`

What happens if it already exists? as I can create it before using indexedDB

Atomic compare-and-swap operation?

I recently ran into an issue with race conditions between the main thread and a worker thread clobbering each other's writes using @jakearchibald 's idb-keyval, and decided I needed a function like:

let oldValue: any = await idb.swap(
  key: string,
  expect: (oldValue: any) => boolean,
  value: any
);

as a primitive in order to implement a mutex. Unfortunately, even after forking idb-keyval and hacking on it for a bit, I wasn't able to code such a function correctly, AFAICT. 😭

I know that thread-safety is probably beyond the scope of this module (see #5) but I think it would be nice if a future version could add support for atomic operations.

Should keys be restricted to strings? Should values?

Local storage restricts keys and values to strings. Should async local storage do so?

I think we should not restrict values. localForage does not, and specifically touts it as a feature. In general this causes lots of JSON-serialization round-tripping, with the attendant pitfalls.

I'm less sure about the keys. localForage casts any given value to a string key. Maybe we should stick with that.

On the other hand, it's more code to disallow arbitrary keys...

Clarify in example that version becomes 1 again after clearing

I assume that is what happens after you set something

  // But clear() will delete the database entirely:
  await area.clear();

  // Now we can use it again!
  await area.set("fluffy", new Cat());

Template literals in polyfill

Why are you doing

"async-local-storage:" + `${name}`

instead of just

`async-local-storage:${name}`

Support or block key ranges?

If the key space is not explicitly restricted (#2) then as written some of the methods would implicitly support key ranges, e.g.

area.get(IDBKeyRange.bounds(100,199));`
area.has(IDBKeyRange.bounds(1e6,1e7-1));`
area.delete(IDBKeyRange.lower(20));

And similarly, as written keys(), values() and entries() would not support ranges (unlike cursors/getAll()) thus limiting the functionality.

IMHO, either key ranges should be explicitly supported everywhere, or explicitly blocked.

Schema checks on open

I'd like the lazy database open process to validate the schema on success. Non-trivial (to me) observation: the schema cannot change while the database connection is open, so the open check guarantees no future errors. Proposed checks below.

Database contains the correct object store:

objectStoreNames.indexOf('store') !== -1
objectStoreNames.length === 1

The object store has the correct schema:

objectStore('store').autoIncrement === false
objectStore('store').keyPath === null
objectStore('store').indexNames.length === 0

The main goal here is to avoid having to reason about how the API behaves when pointed at an IndexedDB instance that "mostly" matches the needed schema. Allowing other object stores is mostly harmless, while indexes can make things more complicated. I'd rather that this API starts out strict in this respect, and we relax the constraints here if there are demonstrated use cases.

Note that the proposal here doesn't restrict the upgrade path -- apps are free to switch to full IndexedDB and change the database schema. It simply says that this API will refuse to operate on a "custom" database.

Restrict allowed key types

IndexedDB supports a fairly large set of key types, and comparing between them isn't very intuitive. How about supporting an explicit subset?

Proposal:

number
string
(maybe) exactly one type of typed array; Uint8Array seems the most intuitive

The ordering has to remain the same as in IndexedDB, if we want the option of async iterators over keys / values.

Dates can be serialized to numbers when passed in -- I think this would be consistent with WebIDL.

The main missing element here is Array objects. This does preclude some nice use cases, but I claim those are more advanced. If I'm wrong, we can always expand the set of supported types, whereas it's harder / impossible to narrow it.

Provide solution similar to the storage event

The specification for localStorage includes the storage event, which can be summarised by this note:

When the setItem(), removeItem(), and clear() methods are invoked, events are fired on the Window objects of other Documents that can access the newly stored or removed data, as defined in the sections on the sessionStorage and localStorage attributes.

To my knowledge it is supported by all current browsers, but I'm not sure how widespread its usage is on websites. I've used it to keep bits of data in the UI updated across tabs on a fairly large site, though.

I didn't find a previous mention here, but if this specification is intended to cover the use cases of the original localStorage it should include a similar mechanism.

Should set() and delete() promises have return values?

Right now set() and delete() return promises for undefined. This is the most simple thing that falls out of IndexedDB semantics.

We could try to adhere closer to JS Map semantics, and have set() return a promise for the value that was set (which in our case is actually a clone of the value that was passed), and have delete() return a promise for a boolean indicating whether or not anything was deleted.

Doing so would require extra operations on the backing database, and doesn't seem that useful. Also, this should be a backward-compatible change to add if someone has a very compelling use case. So, I'm tentatively resolved to leave these promises as undefined. But, I want to leave this issue open to indicate I'm open to changing our position.

Rate limiting

We should have a story for what happens when a user consumes too many resources. Examples

too many pending requests
too many open stores / pending open requests
a keys() / values() call returning too many objects (assuming we keep some form of convenience APIs that batch all keys/values in a single array)

AFAIK, the typical story is that the implementer (browser) has to figure all this out. There are many good reasons to go this route, like not capping future workloads / machines. At the same time, this approach also leads to bugs and confusion, as apps eventually hit the real limits of available RAM or address space. So, while I imagine this API will follow the establish pattern, I'd like us to be deliberate about it -- it's easy to add rate-limiting early on and relax limits over time, and nearly impossible to add limits after an API gets widespread adoption.

As a concrete example, we (Chrome) see a non-trivial amount of out-of-memory crashes due to buggy code that creates a lot of IndexedDB transactions in a loop or code that queues up a lot of requests before waiting for any results.

Strawman proposal:

at most 1,000 pending requests
at most 100 open stores
at most 10,000 results from keys() / values() (we could implement that by passing a limit of 10,001 to getAll() and throwing if we get 10,001 results back)
exceeding any limit results in rejected promises
(maybe) these limits can be read and changed by calling some method on localStore; changing the limits loses any guarantee that the code won't crash

Note: It is quite possible to stay within these limits and crash the browser. The numbers are meant as a rough separator between "reasonable" workloads and "probably bugs".

Cross browser compatibility

So obviously, this will be in Chrome and I'm assuming that any changes made to KV Storage will also ship with any Chrome update.
What about other browsers? Will any update made be forced to be pushed by other browsers? Could that cause any conformity and compatibility issues? Cause what if one feature is available in Chrome but not in Firefox or IE?

What if the database is broken?

If I, or some stupid 3rd party of, run code like:

indexedDB.open('async-local-stroage', 15);

Am I now locked out of async-local-storage for life? Will it try to defend against this, or simply throw?

Incomplete specification for database creation

IIUC, https://domenic.github.io/async-local-storage/#initialize-the-database-promise essentially states that the versionchange handler for the database open request is (event) => event.result.close(). I think the handler needs to create the sole object store used by the DB.

Make all methods on async-local-storage curried

I would love to see that the methods exposed support currying.
In this case, I think a sample tell more as I could ever describe in words.

import {storage} from ...
export const saveUser = strorage.set('user')
export const loadUser = storage.get('user')

This will be very useful for all the methods that are in there that need more as 1 parameter. and allows for a more functional style of programming.

Thoughts on sessionStorage

This proposal only covers async localStorage at the moment, leaving its less common sibling sessionStorage unmentioned. Nevertheless I occasionally see those who recommend using it for handling temporary data.

I'm curious where you stand on this. Could you see it being added as part of this proposal later on? Do you think it should be a separate layered API? Or has the use case for sessionStorage proven small enough that it may not be justified at all?

Should keys/values/entries be async iterators?

Right now they return promises for arrays. This can be convenient; once you await them, they're just normal arrays.

However, an async iterator would map better to the underlying IndexedDB technology (cursors which advance one by one), and would work a lot better for cases with many entries.

Should we make these return async iterators?

An alternate design would be to have them return promises-for-arrays, but also have keysIter/valuesIter/entriesIter as async-iteration versions.

Consider to make async local storage a proper web API

Since, IMO, the API should be considered as a replacement for localStorage, it should be defined in the same way and be as simple as possible to use.
Even small things, like import { storage } from "std:async-local-storage";, make it harder.

Backend of the API could or should be IndexedDB.

Also, if it wasn't a layered API, it wouldn't depend on whatever happens to the layered API concept in general.

Is it OK to use a Map-like API surface, instead of a localStorage-like one?

In particular, the decision to align with Map aligns with recent web platform APIs such as the Cache API, Headers, URLSearchParams, etc. (Although some of those are multimaps, so they have slightly bigger APIs.)

But, does this modernization detract from the "it's just local storage, but better" story?

area.backingStore !== area.backingStore

As specified, https://wicg.github.io/kv-storage/#storagearea-backingstore returns a new object each time. This is a bad thing---indeed, it's something I often correct other peoples' specs about.

Options:

Cache the object the first time it's returned. (Or, equivalently in spec land, create it on StorageArea creation.)
Change to a .backingStore() or .getBackingStore() method.
Split into individual properties, e.g. area.backingDatabase, area.backingStore (backingObjectStore?), area.backingDatabaseVersion.

Thoughts welcome; not sure which way I'm leaning right now.

Make storage a default export

Is there any particular reason a named storage export was chosen over a default export? I would expect the vast majority of module consumers to only use the storage api, and it feels more in line with community modules to expose that as a default export.

It is quite likely that future built-in modules will look at this first module when designing API:s so I think this is a relevant discussion even if it's a pretty minor issue.

Use Web IDL, while preserving our goals

Especially as I work on #6, I become increasingly convinced that we should use Web IDL to specify KV storage. There's too much by-hand crap going on. Especially once whatwg/webidl#580 happens, which could replace a ton of the by-hand crap I'm doing for #6.

The current spec discusses why it doesn't use Web IDL in https://wicg.github.io/kv-storage/#class-definition-explanation. To overcome those objections, and get feature parity with the post-#6 spec, we'd need to add the following capabilities to Web IDL:

A switch that toggles on same-realm-only brand checks
The ability to expose values in modules
Async iterators, including:
- The ability to alias [Symbol.asyncIterator] to entries()
- The ability to include the same-realm-only brand check on a class's async iterator
- Parity with the spec I'm writing for #6, which seems like it would be hard to do generically with a "yield this value" framework as discussed in whatwg/webidl#580.
Optionally, but ideally, the ability to have non-enumerable methods

I consider this blocked on those Web IDL improvements, but I know @littledan was looking into those sorts of things anyway. And I wanted to log this to record my intention, and update the spec to point to this issue.

Specify fn.toString() censorship

This is not specced anywhere, but needs to be.

Connection needs to close in response to versionchange event

Deleting a database (used in clear()) is blocked if there are open connections. If two tabs have the same area open, and clear() is called on one, the other needs to close to allow the delete to proceed.

To handle this, the connection needs a versionchange event handler, and should call connection.close() in response.

I would do this in initialize the database promise, in the steps added for the success handler, right after the close handler is hooked up, e.g.

Add a simple event listener to database for "versionchange" that that performs the steps listed in the description of IDBDatabase's close() method on database.

.. and should probably have a non-normative note explaining why it exists.

Questions re: isolated (or shared) areas

Hi,

Looks great...

I have a couple questions:

The spec proposal mentions in two non-normative sections that StorageArea works for isolated storage areas, but I'm wondering whether the normative section ought to mention how this scoping works (if it doesn't already and I'm just missing something). How is scoped beyond existing localStorage restrictions?
Could some means be provided to allow a special storage area that requires user permission but which is shared across all domains? I am very eager to see site-agnostic storage mechanisms (ideally for IndexedDB too) so that, as on the desktop, the user's data created by one application need not be privileged over others (though obviously under user control). (This could I think be polyfilled in some manner as a third-party script as a go-between for asking and granting permission, even allowing namespacing and chaining multiple domains together to avoid storage size restrictions, but there'd be the issue of trust of the 3rd party.)

class StorageArea JS definition marks delete as a keyword

Subject says it all (SSIA)

Async parsing like Cloudflare KV API

Cloudflare Workers have a very similar API to persist data from their Workers (service workers running on their CDN with VM isolates).
https://developers.cloudflare.com/workers/kv/api

Their docs say you can pass the "type" of data you are retrieving with ".get" method:

NAMESPACE.get(key, [type])

I'm not sure if they actually do that, but wouldn't it be nice if you do something like this and get async JSON.parse out of the box ?

storage.get(key, 'json');

I know nothing about browsers JSON.parse implementation or if this could be implemented.

set/delete - resolve on success or complete?

Not written yet, but preemptively commenting. There's a difference between:

new Promise((resolve, reject) => {
  const tx = connection.transaction(store, 'readwrite');
  const store = tx.objectStore(store);
  const r = store.set(value, key);
  r.onsuccess = e => resolve(r.result);
  r.onerror = e => reject(r.error);
});

and

new Promise((resolve, reject) => {
  const tx = connection.transaction(store, 'readwrite');
  const store = tx.objectStore(store);
  const r = store.set(value, key);
  tx.oncomplete = e => resolve(r.result);
  tx.onabort = e => reject(tx.error);
});

... in that the latter waits until the transaction actually commits (and hits disk in some impls), whereas the former resolves earlier but has no guarantee that the transaction committed. Unless there's a compelling reason not to do the latter, that pattern should be preferred.

Secure contexts only?

While Indexed DB is usable in non-secure contexts, we are likely to offer new native storage APIs only in secure contexts since storage is a "powerful feature". We followed that path for SW's Cache API.

This might be predicated on whether we expose the layered web API concept in all contexts. If that's already been decided in favor of restricting, then this is implicitly resolved. Otherwise... could go either way IMHO.