Comments (9)
So I've been investigating this specifically in the context of running sqlite3_wal_checkpoint_v2 on the thread pool in disk mode.
I'm not sure if one the goals of this investigation was to implement incremental snapshots, along the lines of @MathieuBordere described, maybe not, but if we consider only incremental snapshots for the current in-memory FSM (i.e. not disk mode) it shouldn't be necessary to run sqlite3_wal_checkpoint_v2
in the threadpool.
A possible approach could be roughly something like:
- Run
sqlite3_wal_checkpoint_v2
in the main thread normally, as we do now. - That will trigger
xWrite
calls in our VFS, those should behave just as they behave now (i.e. they overwrite pages in the in-memory database file), but on top of that behavior our VFS should also record the index of all pages that are being updated by the checkpoint. This is similar to what we already to to record write transactions that need to be replicated via raft. - After
sqlite3_wal_checkpoint_v2
returns, we would run a job in libuv's threadpool that updates the database file contained in the last snapshot, using the updated pages from the in-memory database file that have indexes matching the recorded ones. - In particular, this should be done in a transactional way. For example the job could first copy to a "journal" file (e.g.
<snapshot-filename>.journal
) all the current non-updated pages in the on-disk snapshot database file that are going to be replaced with new pages copied from the in-memory database file. Then the job would start to actually update the existing snapshot and rename it when done. The journal file could be used to make it possible to rollback the snapshot file to its original content in case raft crashes before the job fully completes the update, and the rollback would take place when raft restarts.
from dqlite.
I'm not sure if one the goals of this investigation was to implement incremental snapshots, along the lines of @MathieuBordere described, maybe not, but if we consider only incremental snapshots for the current in-memory FSM (i.e. not disk mode) it shouldn't be necessary to run
sqlite3_wal_checkpoint_v2
in the threadpool.
It wasn't, we were thinking about it in context of disk-mode.
from dqlite.
(unix-excl
has been included in SQLite since v3.7.6 from 2011; Focal has v3.31.1.)
from dqlite.
We still need to touch the wal-index directly in some of our other VFS code, e.g. vfsAmendWalIndexHeader, but hopefully it's possible to port that code over to use the VFS shm accessors (there are probably complications I haven't thought of here).
from dqlite.
We still need to touch the wal-index directly in some of our other VFS code, e.g. vfsAmendWalIndexHeader, but hopefully it's possible to port that code over to use the VFS shm accessors (there are probably complications I haven't thought of here).
Specifically, we would have to go through the unix-excl
xShmMap. Calling that from another VFS method might be questionable, I'm not sure. Incidentally sqlite/sqlite@3cb9339 is a helpful (if outdated) commit to look at on the SQLite side.
from dqlite.
We still need to touch the wal-index directly in some of our other VFS code, e.g. vfsAmendWalIndexHeader, but hopefully it's possible to port that code over to use the VFS shm accessors (there are probably complications I haven't thought of here).
Indeed, we do need direct control on the "mmap"-ed wal-index, so I think using the shm-related VFS methods of unix-excl
won't cut it, and the amount of code that would be saved is probably relatively small.
Note that running sqlite3 operations off the main thread would come with a lot of changes throughout the entire dqlite codebase and would probably be close to a rewrite.
What problem are we trying to solve here?
from dqlite.
We still need to touch the wal-index directly in some of our other VFS code, e.g. vfsAmendWalIndexHeader, but hopefully it's possible to port that code over to use the VFS shm accessors (there are probably complications I haven't thought of here).
Specifically, we would have to go through the
unix-excl
xShmMap. Calling that from another VFS method might be questionable, I'm not sure. Incidentally sqlite/sqlite@3cb9339 is a helpful (if outdated) commit to look at on the SQLite side.
Yes, in dqlite too we only use heap memory for xShmMap.
from dqlite.
So I've been investigating this specifically in the context of running sqlite3_wal_checkpoint_v2 on the thread pool in disk mode. I think this poses fewer fundamental challenges than doing the same with sqlite3_step, although it's certainly possible that I'm overlooking something. In any case, I've been looking into how extensively we'd have to modify our VFS to make this safe.
Indeed, we do need direct control on the "mmap"-ed wal-index, so I think using the shm-related VFS methods of unix-excl won't cut it, and the amount of code that would be saved is probably relatively small.
For reference, I have a branch implementing this idea (incomplete, the disk-mode VFS is not updated yet, but enough to see what kind of changes are entailed I think), using xShmMap to access the wal-index: https://github.com/cole-miller/dqlite/tree/unix-excl. It's certainly possible that I'm missing something, but I'm optimistic: the only places where we need to touch the wal-index ourselves are in functions that don't belong to the sqlite3_vfs or sqlite3_file vtables [edit: except we call vfsAmendWalIndexHeader from vfsFileControlCommitPhaseTwo], which disposes of my biggest worry (that we'd have reentrancy problems/deadlocks).
You're right that the raw number of lines of code that this would allow us to delete is not impressive, but it seems like a win to me if we can get SQLite to do more work for us in an area of the code that's tightly coupled to SQLite -- especially if we're already revisiting that code with concurrency in mind. Does that reasoning make sense? I'm certainly open to being convinced otherwise, especially if there's a problem with that I've overlooked here.
from dqlite.
So I've been investigating this specifically in the context of running sqlite3_wal_checkpoint_v2 on the thread pool in disk mode.
My question was more: why you think you need disk mode at all in the first place?
As far as I understand it "disk mode" means "changing the dqlite raft FSM to be disk-based instead of memory-based".
We had a few calls and mail exchanges, but as an outsider, I still don't know what use case Canonical has that makes such a change an immediate hard requirement which can't even be deferred. I understand if we're talking about some private project whose details can't be disclosed. However my personal feeling is that if there is a use case for which dqlite doesn't work in its current form, there are also chances that other databases might be a better fit for that use case.
Note that there are ways to minimize the cpu/disk/memory impact of snapshots without having to switch to a disk-based FSM (e.g. taking and sending snapshots in chunks, which would be a relatively easy improvement). A disk-based FSM (aka disk mode) is only needed for workloads that have data sets that don't fit in memory (and again, chances are that for many of those workloads other databases might be a better fit).
All in all, before deciding for any solution I think it'd be important to have numbers and details about the problem being faced.
I think this poses fewer fundamental challenges than doing the same with sqlite3_step, although it's certainly possible that I'm overlooking something. In any case, I've been looking into how extensively we'd have to modify our VFS to make this safe.
I didn't yet reason about running sqlite3_wal_checkpoint_v2 in a separate thread, and the implications it has. However the current form of disk mode where the VFS makes blocking I/O calls against the file system is already problematic in my mind as outlined in this comment.
Indeed, we do need direct control on the "mmap"-ed wal-index, so I think using the shm-related VFS methods of unix-excl won't cut it, and the amount of code that would be saved is probably relatively small.
For reference, I have a branch implementing this idea (incomplete, the disk-mode VFS is not updated yet, but enough to see what kind of changes are entailed I think), using xShmMap to access the wal-index: https://github.com/cole-miller/dqlite/tree/unix-excl. It's certainly possible that I'm missing something, but I'm optimistic: the only places where we need to touch the wal-index ourselves are in functions that don't belong to the sqlite3_vfs or sqlite3_file vtables [edit: except we call vfsAmendWalIndexHeader from vfsFileControlCommitPhaseTwo], which disposes of my biggest worry (that we'd have reentrancy problems/deadlocks).
You're right that the raw number of lines of code that this would allow us to delete is not impressive, but it seems like a win to me if we can get SQLite to do more work for us in an area of the code that's tightly coupled to SQLite -- especially if we're already revisiting that code with concurrency in mind. Does that reasoning make sense? I'm certainly open to being convinced otherwise, especially if there's a problem with that I've overlooked here.
As far as I understand the code that would be saved is essentially the one in the vfsShmMap function, which seems already short and straightforward. Keeping the current code gives us the advantage of being able to perform assertions in order to validate the expectations we have about the use that SQLite does of this API (should that change in a way that matters to dqlite). Leveraging unix-excl would come with a bit more indirection than we have now. All in all, I'm not that sure it'd be a net win, but YMMV.
from dqlite.
Related Issues (20)
- stderr of server threads swallowed during integration tests HOT 2
- Proposal: stop trying to handle OOM HOT 1
- Investigate growing memory usage found by microk8s benchmarking HOT 2
- Consider shipping a tiny "manifest" binary to print information about the dqlite installation
- install instructions don't work HOT 4
- Confusing error message when trying to run a query statement with Exec HOT 1
- Recommended way to perform schema migrations HOT 4
- Raft uv_timer leak when creating and destroying node HOT 1
- Cluster-wide configuration of target voter/standby count
- Idea: pass a socket instead of dqlite_node_set_bind_address HOT 4
- Implement DQLITE_VISIBLE_TO_TESTS properly, or get rid of it HOT 1
- Git layout for v1.15.0 HOT 8
- Write operations that immediately follow write operations sometimes cause a disk I/O-error, followed by loss of leadership and high latency HOT 3
- Support the RETURNING clause HOT 2
- Consider periodically using VACUUM to reduce memory footprint HOT 9
- Not Leader failure response HOT 5
- Multi threading bug in tracef() HOT 1
- Jepsen: assertion failure in vfs.c HOT 1
- Nested barriers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dqlite.