SQLite provides a built-in unix-excl VFS implementati

Can we use the unix-excl VFS? about dqlite HOT 9 CLOSED

cole-miller commented on June 19, 2024

Can we use the unix-excl VFS?

from dqlite.

Comments (9)

freeekanayaka commented on June 19, 2024 1

So I've been investigating this specifically in the context of running sqlite3_wal_checkpoint_v2 on the thread pool in disk mode.

I'm not sure if one the goals of this investigation was to implement incremental snapshots, along the lines of @MathieuBordere described, maybe not, but if we consider only incremental snapshots for the current in-memory FSM (i.e. not disk mode) it shouldn't be necessary to run sqlite3_wal_checkpoint_v2 in the threadpool.

A possible approach could be roughly something like:

Run sqlite3_wal_checkpoint_v2 in the main thread normally, as we do now.
That will trigger xWrite calls in our VFS, those should behave just as they behave now (i.e. they overwrite pages in the in-memory database file), but on top of that behavior our VFS should also record the index of all pages that are being updated by the checkpoint. This is similar to what we already to to record write transactions that need to be replicated via raft.
After sqlite3_wal_checkpoint_v2 returns, we would run a job in libuv's threadpool that updates the database file contained in the last snapshot, using the updated pages from the in-memory database file that have indexes matching the recorded ones.
In particular, this should be done in a transactional way. For example the job could first copy to a "journal" file (e.g. <snapshot-filename>.journal) all the current non-updated pages in the on-disk snapshot database file that are going to be replaced with new pages copied from the in-memory database file. Then the job would start to actually update the existing snapshot and rename it when done. The journal file could be used to make it possible to rollback the snapshot file to its original content in case raft crashes before the job fully completes the update, and the rollback would take place when raft restarts.

from dqlite.

MathieuBordere commented on June 19, 2024 1

I'm not sure if one the goals of this investigation was to implement incremental snapshots, along the lines of @MathieuBordere described, maybe not, but if we consider only incremental snapshots for the current in-memory FSM (i.e. not disk mode) it shouldn't be necessary to run sqlite3_wal_checkpoint_v2 in the threadpool.

It wasn't, we were thinking about it in context of disk-mode.

from dqlite.

cole-miller commented on June 19, 2024

(unix-excl has been included in SQLite since v3.7.6 from 2011; Focal has v3.31.1.)

from dqlite.

cole-miller commented on June 19, 2024

We still need to touch the wal-index directly in some of our other VFS code, e.g. vfsAmendWalIndexHeader, but hopefully it's possible to port that code over to use the VFS shm accessors (there are probably complications I haven't thought of here).

from dqlite.

cole-miller commented on June 19, 2024

We still need to touch the wal-index directly in some of our other VFS code, e.g. vfsAmendWalIndexHeader, but hopefully it's possible to port that code over to use the VFS shm accessors (there are probably complications I haven't thought of here).

Specifically, we would have to go through the unix-excl xShmMap. Calling that from another VFS method might be questionable, I'm not sure. Incidentally sqlite/sqlite@3cb9339 is a helpful (if outdated) commit to look at on the SQLite side.

from dqlite.

freeekanayaka commented on June 19, 2024

We still need to touch the wal-index directly in some of our other VFS code, e.g. vfsAmendWalIndexHeader, but hopefully it's possible to port that code over to use the VFS shm accessors (there are probably complications I haven't thought of here).

Indeed, we do need direct control on the "mmap"-ed wal-index, so I think using the shm-related VFS methods of unix-excl won't cut it, and the amount of code that would be saved is probably relatively small.

Note that running sqlite3 operations off the main thread would come with a lot of changes throughout the entire dqlite codebase and would probably be close to a rewrite.

What problem are we trying to solve here?

from dqlite.

freeekanayaka commented on June 19, 2024

We still need to touch the wal-index directly in some of our other VFS code, e.g. vfsAmendWalIndexHeader, but hopefully it's possible to port that code over to use the VFS shm accessors (there are probably complications I haven't thought of here).

Specifically, we would have to go through the unix-excl xShmMap. Calling that from another VFS method might be questionable, I'm not sure. Incidentally sqlite/sqlite@3cb9339 is a helpful (if outdated) commit to look at on the SQLite side.

Yes, in dqlite too we only use heap memory for xShmMap.

from dqlite.

cole-miller commented on June 19, 2024

So I've been investigating this specifically in the context of running sqlite3_wal_checkpoint_v2 on the thread pool in disk mode. I think this poses fewer fundamental challenges than doing the same with sqlite3_step, although it's certainly possible that I'm overlooking something. In any case, I've been looking into how extensively we'd have to modify our VFS to make this safe.

Indeed, we do need direct control on the "mmap"-ed wal-index, so I think using the shm-related VFS methods of unix-excl won't cut it, and the amount of code that would be saved is probably relatively small.

For reference, I have a branch implementing this idea (incomplete, the disk-mode VFS is not updated yet, but enough to see what kind of changes are entailed I think), using xShmMap to access the wal-index: https://github.com/cole-miller/dqlite/tree/unix-excl. It's certainly possible that I'm missing something, but I'm optimistic: the only places where we need to touch the wal-index ourselves are in functions that don't belong to the sqlite3_vfs or sqlite3_file vtables [edit: except we call vfsAmendWalIndexHeader from vfsFileControlCommitPhaseTwo], which disposes of my biggest worry (that we'd have reentrancy problems/deadlocks).

You're right that the raw number of lines of code that this would allow us to delete is not impressive, but it seems like a win to me if we can get SQLite to do more work for us in an area of the code that's tightly coupled to SQLite -- especially if we're already revisiting that code with concurrency in mind. Does that reasoning make sense? I'm certainly open to being convinced otherwise, especially if there's a problem with that I've overlooked here.

from dqlite.

freeekanayaka commented on June 19, 2024

So I've been investigating this specifically in the context of running sqlite3_wal_checkpoint_v2 on the thread pool in disk mode.

My question was more: why you think you need disk mode at all in the first place?

As far as I understand it "disk mode" means "changing the dqlite raft FSM to be disk-based instead of memory-based".

We had a few calls and mail exchanges, but as an outsider, I still don't know what use case Canonical has that makes such a change an immediate hard requirement which can't even be deferred. I understand if we're talking about some private project whose details can't be disclosed. However my personal feeling is that if there is a use case for which dqlite doesn't work in its current form, there are also chances that other databases might be a better fit for that use case.

Note that there are ways to minimize the cpu/disk/memory impact of snapshots without having to switch to a disk-based FSM (e.g. taking and sending snapshots in chunks, which would be a relatively easy improvement). A disk-based FSM (aka disk mode) is only needed for workloads that have data sets that don't fit in memory (and again, chances are that for many of those workloads other databases might be a better fit).

All in all, before deciding for any solution I think it'd be important to have numbers and details about the problem being faced.

I think this poses fewer fundamental challenges than doing the same with sqlite3_step, although it's certainly possible that I'm overlooking something. In any case, I've been looking into how extensively we'd have to modify our VFS to make this safe.

I didn't yet reason about running sqlite3_wal_checkpoint_v2 in a separate thread, and the implications it has. However the current form of disk mode where the VFS makes blocking I/O calls against the file system is already problematic in my mind as outlined in this comment.

Indeed, we do need direct control on the "mmap"-ed wal-index, so I think using the shm-related VFS methods of unix-excl won't cut it, and the amount of code that would be saved is probably relatively small.

For reference, I have a branch implementing this idea (incomplete, the disk-mode VFS is not updated yet, but enough to see what kind of changes are entailed I think), using xShmMap to access the wal-index: https://github.com/cole-miller/dqlite/tree/unix-excl. It's certainly possible that I'm missing something, but I'm optimistic: the only places where we need to touch the wal-index ourselves are in functions that don't belong to the sqlite3_vfs or sqlite3_file vtables [edit: except we call vfsAmendWalIndexHeader from vfsFileControlCommitPhaseTwo], which disposes of my biggest worry (that we'd have reentrancy problems/deadlocks).

You're right that the raw number of lines of code that this would allow us to delete is not impressive, but it seems like a win to me if we can get SQLite to do more work for us in an area of the code that's tightly coupled to SQLite -- especially if we're already revisiting that code with concurrency in mind. Does that reasoning make sense? I'm certainly open to being convinced otherwise, especially if there's a problem with that I've overlooked here.

As far as I understand the code that would be saved is essentially the one in the vfsShmMap function, which seems already short and straightforward. Keeping the current code gives us the advantage of being able to perform assertions in order to validate the expectations we have about the use that SQLite does of this API (should that change in a way that matters to dqlite). Leveraging unix-excl would come with a bit more indirection than we have now. All in all, I'm not that sure it'd be a net win, but YMMV.

from dqlite.

Can we use the unix-excl VFS? about dqlite HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent