Code Monkey home page Code Monkey logo

sql.js-httpvfs's Introduction

sql.js-httpvfs

See my blog post for an introduction: https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/

sql.js is a light wrapper around SQLite compiled with EMScripten for use in the browser (client-side).

This repo is a fork of and wrapper around sql.js to provide a read-only HTTP-Range-request based virtual file system for SQLite. It allows hosting an SQLite database on a static file hoster and querying that database from the browser without fully downloading it.

The virtual file system is an emscripten filesystem with some "smart" logic to accelerate fetching with virtual read heads that speed up when sequential data is fetched. It could also be useful to other applications, the code is in lazyFile.ts. It might also be useful to implement this lazy fetching as an SQLite VFS since then SQLite could be compiled with e.g. WASI SDK without relying on all the emscripten OS emulation.

Note that this whole thing only works well if your database and indexes are structured well.

sql.js-httpvfs also provides a proof-of-concept level implementation of a DOM virtual table that allows interacting (read/write) with the browser DOM directly from within SQLite queries.

Usage

(optional) First, improve your SQLite database:

-- first, add whatever indices you need. Note that here having many and correct indices is even more important than for a normal database.
pragma journal_mode = delete; -- to be able to actually set page size
pragma page_size = 1024; -- trade off of number of requests that need to be made vs overhead. 

insert into ftstable(ftstable) values ('optimize'); -- for every FTS table you have (if you have any)

vacuum; -- reorganize database and apply changed page size

(optional) Second, split the database into chunks and generate a json config using the create_db.sh script. This is needed if your hoster has a maximum file size. It can also be a good idea generally depending on your CDN since it allows selective CDN caching of the chunks your users actually use and reduces cache eviction.

Finally, install sql.js-httpvfs from npm and use it in TypeScript / JS!

Here's an example for people familiar with the JS / TS world. At the bottom of this readme there's a more complete example for those unfamiliar.

import { createDbWorker } from "sql.js-httpvfs"

// sadly there's no good way to package workers and wasm directly so you need a way to get these two URLs from your bundler.
// This is the webpack5 way to create a asset bundle of the worker and wasm:
const workerUrl = new URL(
  "sql.js-httpvfs/dist/sqlite.worker.js",
  import.meta.url,
);
const wasmUrl = new URL(
  "sql.js-httpvfs/dist/sql-wasm.wasm",
  import.meta.url,
);
// the legacy webpack4 way is something like `import wasmUrl from "file-loader!sql.js-httpvfs/dist/sql-wasm.wasm"`.

// the config is either the url to the create_db script, or a inline configuration:
const config = {
  from: "inline",
  config: {
    serverMode: "full", // file is just a plain old full sqlite database
    requestChunkSize: 4096, // the page size of the  sqlite database (by default 4096)
    url: "/foo/bar/test.sqlite3" // url to the database (relative or full)
  }
};
// or:
const config = {
  from: "jsonconfig",
  configUrl: "/foo/bar/config.json"
}


let maxBytesToRead = 10 * 1024 * 1024;
const worker = await createDbWorker(
  [config],
  workerUrl.toString(),
  wasmUrl.toString(),
  maxBytesToRead // optional, defaults to Infinity
);
// you can also pass multiple config objects which can then be used as separate database schemas with `ATTACH virtualFilename as schemaname`, where virtualFilename is also set in the config object.


// worker.db is a now SQL.js instance except that all functions return Promises.

const result = await worker.db.exec(`select * from table where id = ?`, [123]);

// worker.worker.bytesRead is a Promise for the number of bytes read by the worker.
// if a request would cause it to exceed maxBytesToRead, that request will throw a SQLite disk I/O error.
console.log(await worker.worker.bytesRead);

// you can reset bytesRead by assigning to it:
worker.worker.bytesRead = 0;

Cachebusting

Alongside the url or urlPrefix, config can take an optional cacheBust property whose value will be appended as a query parameter to URLs. If you set it to a random value when you update the database you can avoid caching-related database corruption.

If using a remote config (from: 'jsonconfig'), don't forget to cachebust that too.

Debugging data fetching

If your query is fetching a lot of data and you're not sure why, try this:

  1. Look at the output of explain query plan select ......

    • SCAN TABLE t1 means the table t1 will have to be downloaded pretty much fully
    • SCAN TABLE t1 USING INDEX i1 (a=?) means direct index lookups to find a row, then table lookups by rowid
    • SCAN TABLE t1 USING COVERING INDEX i1 (a) direct index lookup without a table lookup. This is the fastest.

    You want all the columns in your WHERE clause that significantly reduce the number of results to be part of an index, with the ones reducing the result count the most coming first.

    Another useful technique is to create an index containing exactly the rows filtered by and the rows selected, which SQLite reads as a COVERING INDEX in a sequential manner (no random access at all!). For example create index i1 on tbl (filteredby1, filteredby2, selected1, selected2, selected3). This index is perfect for a query filtering by the filteredby1 and filteredby2 columns that only select the three columns at the back of the index.

  2. You can look at the dbstat virtual table to find out exactly what the pages SQLite is reading contain. For example, if you have [xhr of size 1 KiB @ 1484048 KiB] in your logs that means it's reading page 1484048. You can get the full log of read pages by using worker.getResetAccessedPages(). Check the content of pages with select * from dbstat where pageno = 1484048. Do this in an SQLite3 shell not the browser because the dbstat vtable reads the whole database.

Is this production ready?

Note that this library was mainly written for small personal projects of mine and as a demonstration. I've received requests from many people for applications that are out of the scope of this library for me (Which is awesome, and I'm happy to have inspired so many interesting new idea).

In general it works fine, but I'm not making any effort to support older or weird browsers. If the browser doesn't support WebAssembly and WebWorkers, this won't work. There's also no cache eviction, so the more data is fetched the more RAM it will use. Most of the complicated work is done by SQLite, which is well tested, but the virtual file system part doesn't have any tests.

If you want to build something new that doesn't fit with this library exactly, I'd recommend you look into these discussions and libraries:

  • The general virtual file system discussion here: sql-js/sql.js#447
  • wa-sqlite, which is a much simpler wasm wrapper for SQLite than sql.js a and has different VFSes that don't require an EMScripten dependency. sql.js-httpvfs could easily be reimplemented on top of this.
  • absurd-sql, which is an implementation of a pretty efficient VFS that allows persistence / read/write queries by storing the DB in IndexedDB

Inspiration

This project is inspired by:

The original code of lazyFile is based on the emscripten createLazyFile function, though not much of that code is remaining.

Minimal example from scratch

Here's an example of how to setup a project with sql.js-httpvfs completely from scratch, for people unfamiliar with JavaScript or NPM in general.

First, You will need node and npm. Get this from your system package manager like apt install nodejs npm.

Then, go to a new directory and add a few dependencies:

mkdir example
cd example
echo '{}' > package.json
npm install --save-dev webpack webpack-cli typescript ts-loader http-server
npm install --save sql.js-httpvfs
npx tsc --init

Edit the generated tsconfig.json file to make it more modern:

...
"target": "es2020",
"module": "es2020",
"moduleResolution": "node",
...

Create a webpack config, minimal index.html file and TypeScript entry point:

Finally, create a database:

sqlite3 example.sqlite3 "create table mytable(foo, bar)"
sqlite3 example.sqlite3 "insert into mytable values ('hello', 'world')"

and build the JS bundle and start a webserver:

./node_modules/.bin/webpack --mode=development
./node_modules/.bin/http-server

Then go to http://localhost:8080

And you should see the output to the query select * from mytable.

[{"foo":"hello","bar":"world"}]

The full code of this example is in example/.

Compiling

To compile this project (only needed if you want to modify the library itself), make sure you have emscripten, then first compile sql.js, then sql.js-httpvfs:

cd sql.js
yarn build
cd ..
yarn build

sql.js-httpvfs's People

Contributors

bakkot avatar clementtsang avatar lukeramsden avatar mmomtchev avatar nev-r avatar phiresky avatar styfle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sql.js-httpvfs's Issues

Throw an error if server ignores RANGE header

The HTTP spec dictates that a server may completely ignore the RANGE header if it doesn't support it.

It would be great if a little check was done to make sure the content-length of the response matches the request so we don't end up downloading entire databases.

Specifically I found that Cloudflare usually support partial ranges for cached files, but suddenly stops supporting them above a certain size limit. This left me scratching my head when I saw the seemingly stalled requests. Was only after checking my network activity that I saw it was downloading 700MB of data.

Amazon S3?

Does this work with a distributed static filesystem like Amazon's S3?

help: basic example for the web browser

I try to make a simple web page that : loads sql.js-httpvfs, open the database, have an input text where to enter the SQL query, a button to run the query and an a js function to perform the query and display the result.

All the material I can find here is using node to build the logic but I just want plain JS in an HTML page.

Is there a way to have it in the browser ?

Does this vfs support update, delete and insert?

I have installed dependency, but it seems that there is only one query() operation. In fact, I have not understood its working principle. If only there was a more detailed introduction. Now it seems that there is only one example of query

Integration with SQLite

Any interest in a PR for a patch of SQLite that does this in the SQLite binary itself in the Unix OS file?

Server crashes on starting

Hello,

I am trying to use http-server with node version 12.14.0, when I start the server I get

$ http-server
Starting up http-server, serving ./

http-server version: 14.1.1

http-server settings:
CORS: disabled
Cache: 3600 seconds
Connection Timeout: 120 seconds
Directory Listings: visible
AutoIndex: visible
Serve GZIP Files: false
Serve Brotli Files: false
Default File Extension: none

Available on:
  http://153.89.58.106:8080
  http://192.168.1.12:8080
  http://127.0.0.1:8080
Hit CTRL-C to stop the server

events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: read EBADF
    at Pipe.onStreamRead (internal/stream_base_commons.js:201:27)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (internal/streams/destroy.js:92:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'EBADF',
  code: 'EBADF',
  syscall: 'read'
}

As seen here I have tried installing the previous version 13.0.2 but the problem persists

$ http-server
Starting up http-server, serving ./

http-server settings:
CORS: disabled
Cache: 3600 seconds
Connection Timeout: 120 seconds
Directory Listings: visible
AutoIndex: visible
Serve GZIP Files: false
Serve Brotli Files: false
Default File Extension: none

Available on:
  http://153.89.58.106:8080
  http://192.168.1.12:8080
  http://127.0.0.1:8080
Hit CTRL-C to stop the server

events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: read EBADF
    at Pipe.onStreamRead (internal/stream_base_commons.js:201:27)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (internal/streams/destroy.js:92:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'EBADF',
  code: 'EBADF',
  syscall: 'read'
}

I also tried a different port from 8080, same results

Do you have any idea why this is happening or what can I do?

Issues when used with xstate

So I'm currently rewriting my app using xstate and building with Vite.

For some reason now that I'm initialising the worker in an xstate machine I'm getting lots of errors. Strangely it still seems to function fine though.

Initially I thought maybe there was an issue where the worker is being initialised twice.. but I don't think that's happening.

Out of curiosity, is there a way to clean up the worker safely?

Any thoughts on what might be going on here?

image

Edit:

Just figured out it's an issue with the xstate inspector, will investigate further..

Not compatible with Node.js 16+

There was a very controversial PR in Node.js that rendered Webpack-built libraries incompatible if they used named exports:
webpack/webpack#13098

import { createDbWorker } from 'sql.js-httpvfs';
         ^^^^^^^^^^^^^^
SyntaxError: Named export 'createDbWorker' not found. The requested module 'sql.js-httpvfs' is a CommonJS module, which may not support all module.exports as named exports.
CommonJS modules can always be imported via the default export, for example using:

import pkg from 'sql.js-httpvfs';
const { createDbWorker } = pkg;

Alas, there is no solution besides not using named exports - or not using webpack.

Advice on speeding up initial load of a table

Hi!

First of all the hanks for the work in this project. I have some questions about how to optimize its integration in one of my personal projects, I'm not very experience with sqllite and js/browser stuff.

I have a project where im using your repo, currently sitting at https://lensdatabase.org/ and repo https://github.com/jiwidi/lens_database

I have an HTML table that displays the results from a query "select * from table" and some buttons that modify such query in order to filter the HTML responsively.

Right now, the initial load of the page takes a long time, with tons of prints that look like:
image

After the load is done all consecutive queries are very fast, even if the query is a select all it will be done almost instantly. This makes me think there is an initial built/load by the SQLite plugin that needs to be done only once.

What is really happening in that initial step of loading my page? Is there any way to optimize it?

Caching the requests persistently?

Hello,
my issue is rather a question than an issue, however, your repository does not seem to have discussions enabled.

I am running a catalog of electronic components that is intended to be served from GH pages with no backend (https://github.com/yaqwsx/jlcparts). I stumbled upon your project and it seems it might be a suitable upgrade for my project.

I studied the project briefly and I see that you cache the requests in RAM. However, I was wondering if it would be possible to cache them either via Web Storage API or Cache API persistently, so the cache survives page reload (and have an external mechanism of purging the cache). If so, could you point me to where to look in the sources to add persistent caching support?

How do you speed up a "ORDER BY rank" FTS5 query?

Thank you for writing this library. It's really neat!

I'm currently trying to debug a performance issue with querying a FTS5 virtual table. This table is populated with 6 million rows of articles that have title and text columns. The query I'm using on this table looks like: select * from articles where articles match ? order by rank LIMIT 7

Right now, when that query is executed by the sql.js-httpvfs worker, it fires a continuous stream of slew network requests, taking 20 seconds plus to download tens of MB of data.

I found that removing the "order by rank" clause reduces the latency of the query to a few seconds and the size of the query to less than 1 MB of data. Unfortunately, the results are also much less useful; they're returned in alphabetical order and not in order of relevancy.

Here are the query plan comparisons between the two:

sqlite> explain query plan select title from articles where articles match "bagel" limit 1;
QUERY PLAN
`--SCAN TABLE articles VIRTUAL TABLE INDEX 0:M2
sqlite> explain query plan select title from articles where articles match "bagel" order by rank limit 1;
QUERY PLAN
`--SCAN TABLE articles VIRTUAL TABLE INDEX 32:M2

What can I do to speed this kind of query up without losing the relevancy that "ORDER BY rank" provides?

Gatsby "Cannot read property 'query' of undefined"

I try to use this library inside my project. However, I am not successed yet. I am not expert about node.js.

What I do?
Firstly, need to webpack.config to Gatsby for that I create gatsby-node.js file and add the below code inside it.

exports.onCreateWebpackConfig = ({ actions }) => {
  actions.setWebpackConfig({
    entry: "./src/index.js",
    module: {
      rules: [
        {
          test: "/.tsx?$/",
          use: "ts-loader",
          exclude: "/node_modules/",
        },
      ],
    },
    output: {
      filename: "bundle.js",
    },
    devServer: {
      publicPath: "/dist",
    },
  });
};

Inside my index.js page, I add the code from example, just remove async part.;

import { createDbWorker } from "sql.js-httpvfs";

const workerUrl = new URL(
  "sql.js-httpvfs/dist/sqlite.worker.js",
  import.meta.url
);

const wasmUrl = new URL("sql.js-httpvfs/dist/sql-wasm.wasm", import.meta.url);

function load() {
  const worker = createDbWorker(
    [
      {
        from: "inline",
        config: {
          serverMode: "full",
          url: "../myTestDb.sqlite3",
          requestChunkSize: 4096,
        },
      },
    ],
    workerUrl.toString(),
    wasmUrl.toString()
  );

  const result = worker.db.query(`select * from districts`);

  document.body.textContent = JSON.stringify(result);
}

load();

I got this page.

image

There is no any created bundle.js or directory like dist.

Error in detecting partial file support

Hey, I'm testing this with http-server, and getting the following error message in console:

server does not support byte serving (`Accept-Ranges: bytes` header missing), or your database is hosted on CORS and the server doesn't mark the accept-ranges header as exposed seen response headers cache-control: max-age=3600
connection: keep-alive
content-length: 169918464
content-type: application/octet-stream; charset=utf-8
date: Tue, 11 May 2021 23:20:46 GMT
etag: W/"23335773-169918464-2021-05-11T22:58:16.861Z"
keep-alive: timeout=5
last-modified: Tue, 11 May 2021 22:58:16 GMT
server: ecstatic-3.3.2

However, it is working fine. I think there is an error in the code checking for partial file support? It would (correctly) report the same error when I used python http.server, and indeed it did not work with that server.

(PS: This is an awesome project <3)

HTTP/2 support

Any chance to support HTTP/2 channels for a possibly faster VFS?

Error: SQLite: file is not a database

I tried to follow your instructions with the example, but it doesn't seem to work for me.

Sorry if I'm missing something obvious here... I'm not an expert with all the tools involved.

Could I please ask if you might have any ideas about how I can work around this error?

Thanks for any help you can provide!

Error

Here's the error I get when I try to open the example/index.html page in Google Chrome:

constructing url database /example.sqlite3
c356a8c1ca29f8b1b394.js:1 filename _example.sqlite3
c356a8c1ca29f8b1b394.js:1 constructing url database /example.sqlite3 filename _example.sqlite3
c356a8c1ca29f8b1b394.js:1 [xhr of size 4 KiB @ 0 KiB]
index.ts:20 Uncaught (in promise) Error: SQLite: file is not a database
    at te.Z.handleError (c356a8c1ca29f8b1b394.js:1)
    at te.Z.exec (c356a8c1ca29f8b1b394.js:1)
    at Object.SplitFileHttpDatabase (c356a8c1ca29f8b1b394.js:1)

When I click to see the code that corresponds to this error, here is what I see:

if (this.db = new n.CustomDatabase(r),
    s) {
  const e = (await this.db.exec("pragma page_size; pragma cache_size=0"))[0].values[0][0];
  e !== s.requestChunkSize && console.warn(`Chunk size does not match page size: pragma page_size = ${e} but chunkSize = ${s.requestChunkSize}`)
}

Steps

Here are all the steps I completed:

git clone https://github.com/phiresky/sql.js-httpvfs.git
cd example/
npm install --save-dev webpack webpack-cli typescript ts-loader http-server
npm install --save sql.js-httpvfs
./node_modules/.bin/webpack --mode=development

I'm using nginx to serve the index.html file. I had to modify the /etc/nginx/nginx.conf file with these 2 additions:

http {
    # ...
    include             /etc/nginx/mime.types;
    types {
        application/wasm wasm; # <-- addition 1
    }
    default_type        application/octet-stream;
    proxy_force_ranges on;     # <-- addition 2
    # ...
}

Versions:

$ npm --version
7.13.0

Google Chrome
Version 90.0.4430.212 (Official Build) (x86_64)

Requirement of `content-length` breaks on Cloudflare Pages

Hello, first of all thanks for this super interesting project! It really blew my mind when I read the blog post :)

I'm still just playing around with it and I've been able to get an example based on Svelte Kit working on Github Pages, but on Cloudflare Pages (which I really prefer) the library breaks because the HEAD request to the sqlite file doesn't return a content-length header, triggering this error:

throw Error("server uses gzip or doesn't have length");

The lack of the header appears to be intentional according to Cloudflare: https://community.cloudflare.com/t/no-content-length-header-when-response-is-compressed/134475

Do you think it would be possible for the library to work without the header, or is it completely impossible/impractical?

Possible serverChunkSize un-awareness of lazyFile's speedup system incompatible with rangemapper db chunking.

I have a hacked up copy/paste/extraction of the lazyFile in this project put into my test lab project. I think it's possible there's a bug upstream here where the speed stuff and using the rangemapper functionality with split db chunks can sometimes result in the rangeMapper being called with absolute bytes that span server DB chunks. The reasoning for this is that the head/speed stuff can sometimes increase the next read to span the DB chunks and the rangemapper would not be able to properly map a URL.

I'm sorry, my stuff is so hacked up, that by all means, if you want to just close this, I'm fine with that since this is one of those crazy downstream people hacking it too far sort of issues. I did try to fix it and this commit of mine seems to do it. It will not try to call rangemapper with something itself could not resolve as lazyFile is now aware of the serverChunkSize and will not speedup if the next wantedChunk spans to another DB chunk.

Here's the commit:

nelsonjchen/datasette-lite@ccdc574

Without this, it seems to try to get something from the 11MB mark of a 10MB chunk for my project which I've fixed:

https://datasette-lite-lab.mindflakes.com/index.html?url=https://datasette-lite-lab.mindflakes.com/sdb/2022-10-02_93eff57de3573985_ca_unclaimed_property.sqlite#/2022-10-02_93eff57de3573985_ca_unclaimed_property?sql=SELECT+*+FROM+records+WHERE+records.Owner_name+MATCH+%0A%22charles+munger%22%0AORDER+BY+cast%28current_cash_balance+AS+float%29+DESC%3B

This commit itself may be buggy, but I just wanted to note this here. I'm 80% certain this isn't handled at the higher level SQLite worker and that this issue could still happen with this project's lazyFile. I'm only using this project's lazyFile, but I'm pretty sure my use of it would be similar to this project's sqlite worker.

My guess on the best way to reproduce this is to probably try to do many reads of a contiguous block near the end of a server chunk boundary until the speedup results in rangeMapper being called with something that it could not return a reasonable from, to and url structure where to is beyond the file size of the server chunk.

Is it possible to compress database?

For large datasets it would be great to be able to use compression. Is there any way to achieve full database compression and read it over http?

I was thinking it might be possible to just compress the whole database file, but I'm not sure if there's compression formats that you can get random access to using range requests over http..

Out of memory errors?

I'm trying to query over a sequences of multiple Sqlite3 databases, and after a few dozen, I start getting the error:

RuntimeError: abort(RangeError: WebAssembly.instantiate(): Out of memory: wasm memory). Build with -s ASSERTIONS=1 for more info.

I'm using a version of your load() that accepts an argument for the sqlite3 path.

Is there anything I can do to fix this error? I'm not purposefully persisting any objects, as far as I know, and I only need to access one database at a time. Is there something I need to do to explicitly free up memory used by a previous createDbWorker() instance? Is there like a destroyDbWorker()?

New NPM release and/or build instructions?

I was trying to use the cache busting added in #21 and discovered that it isn't in the version of the module released on NPM.

I tried installing the module from the git repository instead, but because package.json lacks a prepare script, it doesn't run the build process and installs an empty package.

I tried adding a prepare script, so that I could install the module from the repo, but failed at it: https://github.com/phiresky/sql.js-httpvfs/compare/master...jordemort:npm-prepare?expand=1

I don't really need to build the module myself, and would be perfectly content to consume an updated version from NPM, but failing that, I would appreciate some help in figuring out how to build it.

  • First, I tried npm run build
  • Next, I tried (cd sql.js && npm run rebuild) ; npm run build
  • Finally I tried adding yarn to the development dependencies and did (cd sql.js && npm run rebuild) ; yarn run build

All three attempts ended in errors similar to:

npm ERR! ERROR in ./src/index.ts
npm ERR! Module build failed (from ./node_modules/ts-loader/index.js):
npm ERR! Error: Debug Failure. False expression: Non-string value passed to `ts.resolveTypeReferenceDirective`, likely by a wrapping package working with an outdated `resolveTypeReferenceDirectives` signature. This is probably not a problem in TS itself.
npm ERR!     at Object.resolveTypeReferenceDirective (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/typescript/lib/typescript.js:43192:18)
npm ERR!     at /Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/ts-loader/dist/servicesHost.js:679:18
npm ERR!     at /Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/ts-loader/dist/servicesHost.js:118:141
npm ERR!     at Array.map (<anonymous>)
npm ERR!     at Object.resolveTypeReferenceDirectives (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/ts-loader/dist/servicesHost.js:118:124)
npm ERR!     at actualResolveTypeReferenceDirectiveNamesWorker (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/typescript/lib/typescript.js:118249:163)
npm ERR!     at resolveTypeReferenceDirectiveNamesWorker (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/typescript/lib/typescript.js:118549:26)
npm ERR!     at processTypeReferenceDirectives (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/typescript/lib/typescript.js:120046:31)
npm ERR!     at findSourceFileWorker (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/typescript/lib/typescript.js:119931:21)
npm ERR!     at findSourceFile (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/typescript/lib/typescript.js:119783:26)
npm ERR! 
npm ERR! ERROR in ./src/sqlite.worker.ts
npm ERR! Module build failed (from ./node_modules/ts-loader/index.js):
npm ERR! TypeError: Cannot read properties of undefined (reading 'getSourceFile')
npm ERR!     at Object.getEmitOutput (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/ts-loader/dist/instances.js:514:53)
npm ERR!     at getEmit (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/ts-loader/dist/index.js:261:37)
npm ERR!     at successLoader (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/ts-loader/dist/index.js:39:11)
npm ERR!     at Object.loader (/Users/jordan/.npm/_cacache/tmp/git-clone32cgcP/node_modules/ts-loader/dist/index.js:23:5)

Support for Sharding?

I'm researching the feasibility of storing a lot of text data in a Sqlite fts5 index to allow relative fast text search on a static website hosted via S3/Cloudfront.

I've written a proof of concept using your minimal example, and it works really well.

However, my data changes relatively often, involving daily inserts, and occasional updates, and database file still gets quite large (in my case hundreds of MB). That means that I'd having upload a very large file to S3 every day to keep the search fresh, and that would get expensive pretty quickly as the database file continues to grow.

The saving grace is that the vast majority of the data in the database doesn't change. Therefore, is there any way to partition the file, or split the file into separate pieces, so that the records that don't change can be separated from the records that do change, so you only have to upload the shard containing the "newest" data?

Obviously, this isn't really Sqlite's normal use case, but I stumbled across this SO question describing how Sqlite supports attaching multiple logical database files and querying them as a single database.

The use case I'm imaging is creating a Sqlite database file for each month of the year, and storing an fts index for records in the database that corresponds to the data's creation date. That way, I'll only ever be updating, and therefore uploading, the current month's file, reducing my upload time and bandwidth costs substantially.

Would this work from sql.js? Can Sqlite, when loaded via sql.js, search a fts index across multiple attached databases, that each contain their own local fts index?

Question: How to use the library with sveltekit?

Hi,

I wanted to know if it is possible to use Rollup.js as the module bundler instead of webpack? If so, I would like to see a minimal example of how it could be done, because I'm trying to use the library with sveltekit, and sveltekit seems to be using rollup.js/vite.js combination to do the bundling.

Regards,

browser hanging on webworker instantiation

Hey there, thanks for writing this lib, it's very cool!

I'm trying to use your modified sql.js implementation more directly, but curiously the call to initSqlJs with locateFile set to the httpvfs sql.js version causes the browser to hang. Using the original repos implementation doesn't.

this hangs (as does serving the file myself locally)

const SQL = await initSqlJs({ 
        locateFile: file => {
            return 'https://phiresky.github.io/blog/sql-wasm.wasm';
        }
});

this does not

const SQL = await initSqlJs({ 
        locateFile: file => {
           return `https://sql.js.org/dist/sql-wasm.wasm`;
        }
});

there's nothing obvious in the additional commits in the https://github.com/phiresky/sql.js repo - am I missing something? thanks!

Expected number of requests for ORDER BY queries on indexed columns

So I seem to be ending up with quite a lot of requests to query my indexed column, more than I would have expected. Would love to know whether it looks like I might be doing something wrong or whether that's just the expected behaviour!

This is how I'm making the database:

CREATE TABLE images(
  id TEXT NOT NULL,
  server TEXT NOT NULL,
  secret TEXT NOT NULL,
  original_secret TEXT NOT NULL,
  width INTEGER NOT NULL,
  height INTEGER NOT NULL,
  faves INTEGER NOT NULL,
  comments INTEGER NOT NULL,
  views INTEGER NOT NULL
);

.separator ' '
.import shuffled_api_dump.txt images

create index idx_views
    on images(views);

create index idx_faves
    on images(faves, views);

pragma journal_mode = delete; -- to be able to actually set page size
pragma page_size = 1024; -- trade off of number of requests that need to be made vs overhead.
vacuum; -- reorganize database and apply changed page size

and then I'm splitting it with create_db.sh.

A query like select * from images order by views desc limit 20 results in 47 queries.

Database has about 5 million rows.

Does that sound right?

Running into `Error: doXHR failed (bug)!`

Amazing project! I had to try it out with a 15 GiB database of my own (sorry, I couldn’t resist testing the limits).

I started with simple queries and gradually made them more complex. This worked fine. But when I tried running a complex query immediately without having any other data loaded first, I would run into the error doXHR failed (bug)! (after a minute or so of loading gradually increasing chunks up to the configured max chunk size of 5 MiB).

I have built a sample Observable Notebook that exhibits the problem: https://observablehq.com/@sabberworm/imdb.

The “Aggregate all years” query always runs into this error (though I have seen it running fine when starting from a simpler query, see above).

I assume the other error that pops up (“database disk image is malformed”) is a consecutive error stemming from the missing data blocks.

Abort query request with AbortController?

Hi @phiresky , thanks for such fantastic and clever project.

Is there anyway I can abort a query request? For example while testing, I wanted to cancel a misfortunate query that would return too many rows and before requesting the whole db I wanted to abort the request.

I'm wondering if would be possible to use AbortController as way to stop the async calls.

Can't import createDbWorker in Javascript ?

in db.js
import { createDbWorker } from "sql.js-httpvfs";
^^^^^^^^^^^^^^
SyntaxError: Named export 'createDbWorker' not found. The requested module 'sql.js-httpvfs' is a CommonJS module, which may not support all module.exports as named exports.

in package.json
...
"type": "module",
...

Document workaround for slow HEAD

When this library loads a database file, it does a HEAD to get the total length of the database file. On some dev servers the etag is calculated, which in turns reads the entire database file. For large databases this is very slow, on the order of seconds.

I found this to be the case when using vite on windows. I wrote a small vite plugin which intercepts these HEAD calls and just returns Content-Length

import fs from "node:fs";

const SkipHeadEtag = () => ({
  name: "skip-head-etag",
  configureServer(server) {
    server.middlewares.use((req, res, next) => {
      if (req.method === "HEAD") {
        const { size } = fs.statSync(req.url.slice(1));
        res.setHeader("Accept-Ranges", "bytes");
        res.setHeader("Cache-Control", "no-cache");
        res.setHeader("Content-Length", String(size));
        res.end();
        return;
      }
      next();
    });
  },
});

That plugin is incomplete. It doesn't handle 404 properly, but shows the main idea of the workaround.

concurrent access

Congratulations for the wonderful idea and the excellent implementation. This project can have major implications for geospatial data as most advanced geospatial formats use SQLite.

For this use, having concurrent read access will be a really useful improvement since the client downloads multiple map tiles from a database and these are completely independent.

From what I understand the underlying sql.js project has only a passing interest in concurrent access since their main objective is in-memory databases where the mono-threaded nature of JavaScript is the main problem.

I see only one possible solution left and this is to use multiple databases and to share the VFS code / cache between them.

Do you have any opinion on this? I am willing to implement it.

help: runing it on nodejs

Well, this lib seams to be very promissing to solve my current problem.

I want to run it on aws-lambda, but the WebWorkers are not suported in nodejs, I'm planing to fork this repo in order to make a synchronous version, I understand your code in a surface level, so can I ask u for a ideia on how make it work?

I meann, it seams to be possible, but I'm not sure about this part right here

vtab.ts

  // send message to main thread
  (self as DedicatedWorkerGlobalScope).postMessage({
    action: "eval",
    notify: sab,
    request,
  });

I'll be very happy with anything you could say about it
thanks

[not an issue] Questions on how the plugin works and how to speed up initial worker load

Hi!

First of all thanks for the work in this project.

I have a personal project in progress at lensdatabase.org to be a live table for lenses and choose your pluging because its speeds for applying filters by just modifying the query. Code is currently at https://github.com/jiwidi/lens_database

My problem now is that the initial load of the worker takes a lot of time, after that all queries run very fast. The worker prints the following in the console while loading (~15sec)
image

What is that loading mean? Is there any way I could speed this up or precompute that load so its faster on the client side? I'm fairly new with sqlite so maybe im missing to understand key concepts but would love some light.

Best,
Jaime

Failing to init WASM with Node.js / Web Worker polyfill / type: `module`

Error [TypeError]: Cannot set properties of undefined (setting 'exports')
    at webpack://sql.js-httpvfs/./sql.js/dist/sql-wasm.js:72:17903
    at new Promise (<anonymous>)
    at Object.default (webpack://sql.js-httpvfs/./sql.js/dist/sql-wasm.js:22:24)
    at init (webpack://sql.js-httpvfs/./src/sqlite.worker.ts:50:37)
    at Object.apply [as SplitFileHttpDatabase] (webpack://sql.js-httpvfs/./src/sqlite.worker.ts:109:26)
    at r (webpack://sql.js-httpvfs/./node_modules/comlink/dist/esm/comlink.mjs:94:48)
    at /home/mmom/src/ol-mbtiles/node_modules/web-worker/cjs/node.js:47:17
    at Array.forEach (<anonymous>)
    at dispatchEvent (/home/mmom/src/ol-mbtiles/node_modules/web-worker/cjs/node.js:45:10)
    at MessagePort.<anonymous> (/home/mmom/src/ol-mbtiles/node_modules/web-worker/cjs/node.js:171:25)

Caused by sql-js/sql.js#406

Can't create JS function with create_function()

Hi, I was trying to register JS function but it throws me this error. Not sure what happened here.

Error Message

(index):1 Uncaught DOMException: Failed to execute 'postMessage' on 'MessagePort': function getFlag(country_code) {
        // just some unicode magic
        return String.fromCodePoint(...Arra...<omitted>... } could not be cloned.
    at https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/sql.js-httpvfs.js:157:44
    at new Promise (<anonymous>)
    at E (https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/sql.js-httpvfs.js:153:18)
    at Object.apply (https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/sql.js-httpvfs.js:113:20)
    at https://adityawarmanfw.id/p/fb3644c1-bad8-4063-8f10-5f6ae4da8806/:59:24
    at async https://adityawarmanfw.id/p/fb3644c1-bad8-4063-8f10-5f6ae4da8806/:54:20

Fix broken build due to invalid dep graph

Prerelease of typescript can't satisfy any non-prerelease ranges.

$ npm i
npm ERR! code ERESOLVE
npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR! 
npm ERR! While resolving: [email protected]
npm ERR! Found: [email protected]
npm ERR! node_modules/typescript
npm ERR!   dev typescript@"^4.3.0-dev.20210331" from the root project
npm ERR! 
npm ERR! Could not resolve dependency:
npm ERR! peer typescript@">=2.7" from [email protected]
npm ERR! node_modules/ts-node
npm ERR!   dev ts-node@"^9.1.1" from the root project
npm ERR! 
npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! this command with --force or --legacy-peer-deps
npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
[...snip...]

webpack cannot bundle the worker

I followed the usage instruction and tried to bundle workers and wasm. But after i run "npm run build", it creates a sqlite.worker.[hash].js file and a sql-wasm.[hash].wasm in "static/media" folder. Then the error says "require is undefined". I guess the instruction only works for webpack, but I am using react-scripts build. Is there anyway I can still use sql.js-httpvfs in my project which is created by using create-react-app

bounding number/size of requests

It'd be nice to have the ability to bail out if a query is going to require too much data, rather than potentially fetching arbitrarily large amounts of data.

streamlining HTTP request pattern with a sidecar file

Hi @phiresky, I wanted to share some hacking I did recently on sqlite_web_vfs that I think would benefit this project too: helper .dbi files

The idea is to scan the main database file and collect all the btree interior pages into a small sidecar file (itself a SQLite database). The VFS is then coded to download this file before accessing the main database, and serve requests for those pages from that local copy (otherwise fall through to HTTP requests on the main database as usual).

This can largely eliminate the sequential chains of tiny range requests we see SQLite make in the course of a btree lookup -- usually an overall latency win, even if few of the pages in the sidecar file are needed for any given query (since they're all fetched at once, and can service multiple queries). IME the btree interior pages are <1% of the main database but usually spread all throughout (and vacuum doesn't try to make them contiguous, unfortunately).

Server crashes in the minimal example

All your steps for the minimal example work perfectly for me except:

./node_modules/.bin/http-server

For me, it initially launches correctly, but then crashes with the error:

[Thu Apr 07 2022 22:28:06 GMT-0400 (Eastern Daylight Time)]  "GET /" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36"
_http_outgoing.js:470
    throw new ERR_HTTP_HEADERS_SENT('set');
    ^

Error [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client
    at ServerResponse.setHeader (_http_outgoing.js:470:11)
    at module.exports.ResponseStream.(anonymous function) [as setHeader] (/home/chris/git/asr/sqljs/node_modules/union/lib/response-stream.js:100:34)
    at Object.exports.(anonymous function) (/home/chris/git/asr/sqljs/node_modules/http-server/lib/core/status-handlers.js:57:7)
    at Readable.stream.on (/home/chris/git/asr/sqljs/node_modules/http-server/lib/core/index.js:339:22)
    at Readable.emit (events.js:198:13)
    at emitErrorNT (internal/streams/destroy.js:91:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
    at process._tickCallback (internal/process/next_tick.js:63:19)

I'm unfamiliar with reading Node tracebacks. It looks like it's receiving a request, but is having trouble writing the response. What's causing this?

The library doesn't work after upgrading to `0.8.10`

Hi,

I was recently working with the library, but I noticed that version 0.8.10 is not working and returns errors in some cases.

To reproduce the bug you have to install version 0.8.10.

npm install [email protected]

clone the repository and bundle the example, it will return an empty array [] instead of [{"foo":"hello","bar":"world"}],

Also wanted to point out that, in the case of *.db files (mine was generated by a sql-alchemy schema) I got an error that stated that the database disk image is malformed as can be seen from the photo below:
image

Note: I got not no errors with my sql-alchemy schema generated database when I tested with 0.8.9 and it worked fine, and I also tested the attached example with 0.8.9 and it returned that data, not an empty array.

It seems like version 0.8.10 is still a work in progress, but it can be installed from npm registry.

Regards,

Firefox issue with virtual tables (gitlab pages sqlite.db file hosting)

Edit: seems I've had different issues, when creating the virtual tables (on my side), and consuming the sqlite db file from gitlab pages; putting an update in next messages.

Hello! Thanks for the really cool lib!

I'm trying to use it, with a database served from a gitlab page/artifact, but it seems the library complains from the database being malformed, while when downloading it and opening it with sqlite or a db-browser it seems to work.

Stacktrace:

Uncaught (in promise) Error: SQLite: database disk image is malformed
    handleError https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    step https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    exec https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    r https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    l https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    serialize https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    g https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    r https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    promise callback*r https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    l https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    630 https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    __webpack_require__ https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    <anonymous> https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    <anonymous> https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    <anonymous> https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1
    <anonymous> https://joblist.gitlab.io/components/assets/sqlite.worker-e48eea05.js:1

Console warning:

Warning: The server did not respond with Accept-Ranges=bytes. It either does not support byte serving or does not advertise it (`Accept-Ranges: bytes` header missing), or your database is hosted on CORS and the server doesn't mark the accept-ranges header as exposed. This may lead to incorrect results. (seen response headers: cache-control: max-age=600
content-length: 5263360
content-type: application/octet-stream
date: Wed, 16 Aug 2023 09:13:26 GMT
etag: "1febf8e0dcc95f159ed8d35648ce1493f78e109342e7186a17e3254c687ad317"
expires: Wed, 16 Aug 2023 09:23:26 UTC
last-modified: Wed, 16 Aug 2023 03:03:27 GMT
permissions-policy: interest-cohort=()
vary: Origin
x-firefox-spdy: h2)

I'm not sure at all it is a problem from your lib, but putting it here as maybe a cross-link reference for the related usage issues

I wonder if maybe that is because of how i've got the project setup with the build tools. As it seems to be working in chromium but not firefox (testing on linux).

Any way to optimize this recursive query?

Hello!

I have a sqlite DB that represents a DAG. It has two tables, node and edge, where node has some extra data about each node and edge has a node_to and node_from FK to node.id.

I created a query to get all the "ancestors" of a certain node. I initially tried running this query with sqlite.js-httpvfs, but the query took about 30 seconds to run, even with local HTTP access. I think the reason was because of poor spatial locality requiring a SCAN of the DB.

Is there any way to optimize this query to minimize SCAN?

WITH RECURSIVE
    ancestor(n) AS (
      VALUES(?)
      UNION ALL
      SELECT node_from FROM edge, ancestor
      WHERE edge.node_to=ancestor.n
    )
    SELECT node_from,node_to,node.* FROM edge
    JOIN node ON node_from = node.id WHERE graph_edge.node_to IN ancestor;

Here is the query plan for the query:

QUERY PLAN
|--SCAN edge
|--LIST SUBQUERY 3
|  |--CO-ROUTINE ancestor
|  |  |--SETUP
|  |  |  `--SCAN CONSTANT ROW
|  |  `--RECURSIVE STEP
|  |     |--SCAN ancestor
|  |     `--SEARCH edge USING AUTOMATIC COVERING INDEX (node_to=?)
|  `--SCAN ancestor
`--SEARCH node USING INDEX sqlite_autoindex_node_1 (id=?)

As a workaround, I've just exported the data to JSON, because it actually wasn't that big, but I'm curious if this could be optimized to work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.