Comments (23)
Currently just testing dqlite and how quickly it grows with inserts/updates and shrinks with deletes.
I assume here you mean the raft-related growth and compaction (due to trailing logs and snapshots) and NOT increase/decrease of the actual database (which is clearly mostly stable if the number and size of inserts more or less matches the number of deletes). If that's the case, tweaking the trailing/threshold values should help, feel free to experiment by changing the hard-coded values and recompile.
We do plan to optimize for large numbers of small insert/deletes by supporting a finer grained replication model, where instead of replicating Write-Ahead Log pages we would replicate single Btree nodes. In that case the size of the raft entries being passed around should be significantly smaller, and so the amount of memory needed to keep trailing entries around would be much smaller too.
If just tweaking the trailing/snapshot values is not enough for your use case, please let me know so we have an idea of a concrete real-world case where a smarter replication model would matter.
from dqlite.
Looks like you're right that forcing a checkpoint will reclaim the space.
from dqlite.
Internally dqlite performs checkpoints automatically using sqlite3_wal_checkpoint_v2()
. Trying to issue a manual PRAGMA wal_checkpoint
from a connected client might be problematic: you should not use the PASSIVE
mode because that can potentially lead to different results when it's run on the leader and when it's run on the followers, because the set of readers might differ. On the other had using FULL
(or it's RESTART
and TRUNCATE
variants) can't use the busy handler to block and retry, since dqlite is single-threaded, so at best you need to retry on the client side or orchestrate things in a way where you know that there are no readers active at that time.
I didn't quite test the above, but those should be more or less the issues involved.
from dqlite.
When a new insert or update is made, a new raft log entry containing a new page of the SQLite Write-Ahead Log will be created in-memory, temporarily increasing memory usage. After a while outstanding raft log entries get compacted into a snapshot, and the associated memory is released.
Compaction happens after 1024
log entries and leaves behind a trail of 8192
entries as "moving window". At the moment the values are hard-coded (see line 64 in server.c), but we could expose them if there's a need to tweak them.
So to answer your question: for UPDATE memory usage will grow up to a certain amount and then it will fluctuate between a lower an upper bound. However if you keep performing new INSERTs forever without ever performing DELETEs, then memory usage will grow indefinitely since a copy of the entire database is kept in-memory.
from dqlite.
Thanks for the info. Little curious how you decided on 8192
since the default value in raft was 128
. I think it would be helpful for those to be expose. If you were to expose them would you just add dqlite_set_snapshot_threshold
and dqlite_set_snapshot_trailing
to the api or do something else?
from dqlite.
From where did you take the number 128
? The default value in C-raft should be 2048.
Right now I don't recall exactly why picked 8192
, probably to be a bit more conservative (essentially that value determines how much behind a a follower node can fall without the need to send it a full snapshot).
Yes, I think the APIs to add would be roughly what you describe. I didn't add them so far to avoid exposing unnecessary details and having to then support that public API indefinitely.
What is your use case exactly? Maybe there are more effective ways to solve your problem, or maybe there's a simpler or more user-friendly API we can expose (e.g. dqlite_set_cache_size
or something along those lines).
from dqlite.
The comment for raft_set_snapshot_trailing says the default value was 128. Didn't actually check the code.
If i use dqlite it would be running in docker container with hard memory limit. Just trying to get an idea of average memory usage so i can set the limit appropriately.
from dqlite.
The comment for raft_set_snapshot_trailing says the default value was 128. Didn't actually check the code.
Oh okay, I need to update that then.
If i use dqlite it would be running in docker container with hard memory limit. Just trying to get an idea of average memory usage so i can set the limit appropriately.
Yeah, so as I suspected I think users will be typically concerned with the final memory usage. So I'm keen to add an API as close as possible to that use case.
Are you using dqlite indirectly because it's what backs LXD's database, or do you have a custom application which uses dqlite directly?
In the latter case, do you also have an idea of the size of the "actual" database? In other words, if you were using SQLite directly, what would be the maximum size of the database file image?
I'm asking because if the size of the actual database is large compared to the trailing raft log entries, it probably won't change much what the exact value of snapshot threshold/trailing is.
from dqlite.
Are you using dqlite indirectly because it's what backs LXD's database, or do you have a custom application which uses dqlite directly?
It would be a custom application that uses dqlite.
In the latter case, do you also have an idea of the size of the "actual" database? In other words, if you were using SQLite directly, what would be the maximum size of the database file image?
Not sure yet since its currently key/value that uses raft and would like to transition to a sql backing. Currently just testing dqlite and how quickly it grows with inserts/updates and shrinks with deletes.
from dqlite.
I assume here you mean the raft-related growth and compaction (due to trailing logs and snapshots)
yeah raft-related growth and compaction.
from dqlite.
Tweaking trailing/snapshot values will help a little. Also wondering if there anyway to shrink the amount of free pages after large amount of deletes?
from dqlite.
Probably using VACUUM, but I don't know its internals and never really tested it. If you try it and it doesn't work, please let me know.
from dqlite.
I've tried using vacuum once and it seemed to double the memory usage. Auto Vacuum doesn't seem to do anything.
from dqlite.
Then I'll have to look into this at some point. For now we haven't add issues because of this, so I'm not sure when I'll get a chance to investigate.
from dqlite.
Actually, thinking about it, a possible explanation is that when setting the journal in WAL mode (as dqlite does), SQLite first creates new WAL pages when running VACUUM, and only upon checkpointing the WAL back into the database file the actual amount of used pages shrinks.
You can probably check it by yourself using the sqlite3 command line, running PRAGMA journal_mode=WAL
and looking at what happens at the main db file and at the -wal
file. There should also be a way to manually trigger checkpoints from the command line (probably via some PRAGMA statement).
from dqlite.
Looks like to turn on auto_vacuum
, it has to be before enabling WAL mode. Also seems that it won't actually truncate the pages until the connection is closed. I believe for dqlite that only occurs when leader is closed.
from dqlite.
Looks like to turn on
auto_vacuum
, it has to be before enabling WAL mode.
What are you basing this statement on? Empirical test or documentation? My reading of the relevant SQLite documentation is a bit different, see here and here.
Also seems that it won't actually truncate the pages until the connection is closed. I believe for dqlite that only occurs when leader is closed.
Same as above. I didn't do any test, but I couldn't find documentation indicating the behavior you described.
from dqlite.
Thats what I observe from testing.
from dqlite.
What testing did you do exactly?
There is a subtle point in the docs, that I'm not sure you considered:
when not in write-ahead log mode, the page_size and/or auto_vacuum properties of an existing database may be changed by using the page_size and/or pragma auto_vacuum pragmas and then immediately VACUUMing the database. When in write-ahead log mode, only the auto_vacuum support property can be changed using VACUUM.
I wrote the following program that sets auto_vacuum
to 1
(aka FULL
) on a database after setting WAL mode, you can run it with go run -tags libsqlite3
:
package main
import (
"database/sql"
"fmt"
"log"
"os"
_ "github.com/mattn/go-sqlite3"
)
const filename = "vacuum-test.db"
func main() {
os.Remove(filename)
// Open the db.
db, err := sql.Open("sqlite3", filename)
if err != nil {
log.Fatalf("open: %v", err)
}
defer db.Close()
// Set WAL journal mode.
if _, err := db.Exec("PRAGMA journal_mode=wal"); err != nil {
log.Fatalf("set WAL mode: %v", err)
}
row := db.QueryRow("PRAGMA journal_mode")
var mode string
if err := row.Scan(&mode); err != nil {
log.Fatalf("query journal mode: %v", err)
}
if mode != "wal" {
log.Fatalf("WAL mode not set: %s", mode)
}
// Set auto vacuum
if _, err := db.Exec("PRAGMA auto_vacuum=1"); err != nil {
log.Fatalf("set WAL mode: %v", err)
}
if _, err := db.Exec("VACUUM"); err != nil {
log.Fatalf("vacuum: %v", err)
}
row = db.QueryRow("PRAGMA auto_vacuum")
var auto string
if err := row.Scan(&auto); err != nil {
log.Fatalf("query auto vacuum: %v", err)
}
if auto != "1" {
log.Fatalf("auto vacuum not set: %s", auto)
}
fmt.Printf("auto_vacuum=%s\n", auto)
}
Note that if I remove the VACUUM
call right after PRAGMA auto_vacuum=1
, then the program fails, which might be why you were seeing that problem in your testing.
from dqlite.
Also, note that depending on your workload auto_vacuum
might or might not be enough, since AFAIU it only reshuffles the order of free pages so it can possibly truncate the db, but it does actually re-arrange data or defrag (like VACUUM
does), so the amount of actual space you free might be considerable less than with a VACUUM
. Again, this is just from reading the docs, I didn't do any actual testing.
from dqlite.
I was just testing with simple c program using dqlite's vfs. I was mostly watching memory usage and when vfs functions where being called. I tried setting auto_vacuum=1
after journal_mode
and noticed it didnt actual change so i tried before. Didn't try calling vacuum
after setting auto_vacuum=1
. I'll run some tests again doing that.
from dqlite.
I was just testing with simple c program using dqlite's vfs.
Do you mean manually invoking the internal dqlite vfs functions (e.g. vfsInit
) to register a VFS and then driving SQLite normally with the regular SQLite C API?
If so, I think it would be simpler to just use the regular SQLite API (perhaps via Go or bash) with its stock on-disk VFS and just observe how the size of the main db and of the WAL file vary. That's enough to know how VACUUM
interacts with WAL mode and what you find in terms of file size will directly translate to memory usage with dqlite's vfs.
I suspect that you'll need to perform a checkpoint of the WAL file after issuing a VACUUM
or committing a tx with auto_vacuum
on, in order for space to be actually reclaimed.
Didn't try calling
vacuum
after settingauto_vacuum=1
That's pretty much essentially, won't work otherwise.
from dqlite.
unfortunately it looks like manually triggering a checkpoint doesn't reclaim the space in dqlite.
from dqlite.
Related Issues (20)
- Moving sqlite3_step and other database operations off the main thread HOT 8
- Expose option to disable/enable raft snapshot compression
- Handle INTERRUPT request HOT 3
- stderr of server threads swallowed during integration tests HOT 2
- Proposal: stop trying to handle OOM HOT 1
- Investigate growing memory usage found by microk8s benchmarking HOT 2
- Consider shipping a tiny "manifest" binary to print information about the dqlite installation
- install instructions don't work HOT 4
- Confusing error message when trying to run a query statement with Exec HOT 1
- Recommended way to perform schema migrations HOT 4
- Raft uv_timer leak when creating and destroying node HOT 1
- Cluster-wide configuration of target voter/standby count
- Idea: pass a socket instead of dqlite_node_set_bind_address HOT 4
- Implement DQLITE_VISIBLE_TO_TESTS properly, or get rid of it HOT 1
- Git layout for v1.15.0 HOT 8
- Write operations that immediately follow write operations sometimes cause a disk I/O-error, followed by loss of leadership and high latency HOT 3
- Support the RETURNING clause HOT 2
- Can we use the unix-excl VFS? HOT 9
- Consider periodically using VACUUM to reduce memory footprint HOT 9
- Not Leader failure response HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dqlite.