Comments (18)
I expected it to be a bit slower because more database queries are done than before where the whole block list was completely held in memory. But not that much slower... the good news is I have a suspect: One index on the blocks
table has the wrong order of columns. I will need to provide a database migration for this. (It would be easy to test with PostgreSQL but SQLite has limitations again and the whole table needs to be recreated.)
from benji.
It would be easy to test with PostgreSQL but SQLite has limitations again and the whole table needs to be recreated.
Really sorry, I know writing migrations is the least fun thing to do. I hope alembic can help a bit, at least I think that for ALTER TABLE
it also recreates the table automatically if SQLite is in use. I'll keep daily scrubbing of 15 % activated for now so I can test the change in realistic conditions once it's there 😉.
from benji.
Okay, I barely survived writing the migration ;) Alembic didn't detect the change at all... Please try current master, it contains 226bb56. If after that the performance of deep scrubbing is still bad you could try to increase _BLOCKS_READ_WORK_PACKAGE
in class benji.benji.Benji
. But I would think that this would only matter for images larger than about 40GB.
from benji.
database-migrate
sadly says:
# benji database-migrate
INFO: $ /opt/benji/bin/benji -c /etc/benji.yaml database-migrate
INFO: Migrating from database schema revision 151248f94062 to 368014edd88c.
ERROR: ValueError: No such constraint: 'pk_blocks'
Dumping the schema in sqlite3 (.schema blocks
) shows:
CREATE TABLE blocks (
id INTEGER NOT NULL,
uid_right INTEGER,
uid_left INTEGER,
size INTEGER,
version_uid INTEGER NOT NULL,
valid BOOLEAN NOT NULL,
checksum BLOB,
PRIMARY KEY (id, version_uid),
FOREIGN KEY(version_uid) REFERENCES versions (uid) ON DELETE CASCADE,
CHECK (valid IN (0, 1))
);
CREATE INDEX ix_blocks_checksum ON blocks (checksum);
CREATE INDEX ix_blocks_uid_left ON blocks (uid_left, uid_right);
i.e. the "primary key" is not a named constraint.
from benji.
(in case I am really the only one still using SQLite3, of course I would be fine with a manual migration as described here: https://stackoverflow.com/a/16901926 but I fear I am not the only one)
from benji.
That's interesting!
I just modified the code to drop the line:
batch_op.drop_constraint(batch_op.f('pk_blocks'))
(i.e. I just commented it out)
and I only kept the creation of the new primary key.
Here's the result:
# sqlite3 benji.sqlite ".schema blocks"
CREATE TABLE "blocks" (
id INTEGER NOT NULL,
uid_right INTEGER,
uid_left INTEGER,
size INTEGER,
version_uid INTEGER NOT NULL,
valid BOOLEAN NOT NULL,
checksum BLOB,
CONSTRAINT pk_blocks PRIMARY KEY (version_uid, id),
CHECK (valid IN (0, 1)),
FOREIGN KEY(version_uid) REFERENCES versions (uid) ON DELETE CASCADE
);
CREATE INDEX ix_blocks_uid_left ON blocks (uid_left, uid_right);
CREATE INDEX ix_blocks_checksum ON blocks (checksum);
That looks just fine, maybe since alembic recreates the table anyways, and then only one primary key may exist at a time...?
Interestingly, the database size has changed significantly:
# ls -lrtah
-rw-r--r--. 1 root root 555M 1. Apr 18:46 benji.sqlite.back
-rw-r--r--. 1 root root 841M 1. Apr 18:48 benji.sqlite
but that's probably expected with the new index (maybe I should do a VACUUM;
at some point).
Does this trick also work with PostgreSQL / MySQL maybe?
from benji.
I've added code to automatically name all constraints in mid January. Your database is probably older than that and so the constraint is unnamed. Simply removing the drop_constraint
would break PostgreSQL.
from benji.
I've added code to automatically name all constraints in mid January. Your database is probably older than that and so the constraint is unnamed.
Ah! Yes, of course, that's an explanation. Sadly, there's probably no way to migrate to that.
Or would:
benji metadata-export -o foo
rm /path/to/mydb
benji database-init
benji metadata-import -i foo
do the trick, since the db is freshly initialized (with constraint names)? Would I lose anything?
Simply removing the
drop_constraint
would break PostgreSQL.
I understand. It probably does neither allow to create a second constraint with the same name, nor automatically remove the "old" primary key constraint when adding a new key with a different name.
Then I agree it's better to break here, especially since it won't affect any "younger" SQLite users than me 😉.
from benji.
Naming the constraint is exactly for migration cases like this and recommended in the Alembic documentation which is why I added it.
Actually I've not tested a full export and import of the database. It should work, but I would keep a backup handy...
from benji.
If you try this please directly compress the export as the JSON is bulky.
from benji.
If you try this please directly compress the export as the JSON is bulky.
Good point! I started an export and it's really huge. Luckily, a DB backup is easy with SQLite. I guess I will do an export, a dump of the SQLite3 DB, then do the reinitialization and import, do another dump and compare the dumps. Not sure if I manage today before our backups start, the export to JSON is (expectedly) quite slow.
If I don't finish in time, I'll try tomorrow, but at least I'll see the performance changes already this night 😉.
from benji.
But you realize that this would be mostly of academic interest? If the primary key constraint is named now and in the right order the database should be okay to use. There might be other constraints which are unnamed, too. But this will only pose (minor) problem if a database migration should touch these.
from benji.
But you realize that this would be mostly of academic interest?
Yes, but thanks for pointing it out 😉.
It will also be interesting to see if the full export actually works. I cancelled it hard just now partially through, and it left behind a locked version, blocking further metadata exports - so I restored the DB from a backup.
However, it was already using a huge amount of RAM. I'll test again and find out if exporting a 500 MB DB is possible at all on a machine with 16 GB of memory.
If it fails, I'll give up 😉.
There might be other constraints which are unnamed, too. But this will only pose (minor) problem if a database migration should touch these.
If it fails on first attempt, I'll indeed decide to ignore the issue until it pops up again in a future migration.
from benji.
Another side-note: Running:
VACUUM FULL;
ANALYZE;
changes the size of our SQLite3 DB from 841M to 492M, and probably also increases performance. I'll probably make that part of our daily backup script, it takes just 2-3 minutes.
from benji.
I'll test again and find out if exporting a 500 MB DB is possible at all on a machine with 16 GB of memory.
Indeed it's impossible:
# zgrep checksum /vmbackup/foo.gz | wc -l
927010
# sqlite3 benji.sqlite.back3 "select count(*) from blocks;"
6736896
i.e. 1/7th is done, and Benji is using 3.4 GB and seems to use linearly more memory over time. So the 16 GB or RAM of the machine are insufficient for metadata-export
.
Hence, I'll just VACUUM FULL
and ANALYZE
and check how the new primary key performs overnight. If all goes well, I'll reply tomorrow morning 😉.
from benji.
@olifre it's probably impossible the way it's designed with one enclosing data structure for all versions. Would need a streaming writer for JSON.
from benji.
Before the improvements, this ran from 06:47:25 to 09:19:07, i.e. less than 3 hours.
Now, it takes from 07:01:17 to 12:06:50, i.e. more than 5 hours. It seems we also spent 13 minutes more for the actual snapshot backups and purging of old backups, but that's ok to lose for many small volumes if large volumes are faster (and I see the backups of larger volumes are indeed a bit faster).
Now, the results are:
Deep-scrubbing already starts at 06:16:27, since backups and purging of old backups is much faster (we start those at 5am).
And deep-scrubbing finishes already at 7:21h!
So in total, we saved about 5 hours(!). I can not fully say this is "only" from the index change, since I also ran VACUUM FULL
and ANALYZE
on the database. I also did that for Backy2, and there it saved 40 minutes, so this also seems to have a hefty effect (for SQLite3). For PostgreSQL that's probably not needed at all (due to autovacuum), so that would only be a recommendation for SQLite3 users.
I'm a bit astonished since SQLite3 should "auto-create" temporary indices "on-the-fly" in memory in cases where it helps - but may the SQLite3 version shipped with CentOS 7 is too old for that.
So in any case, the performance issue is solved and Benji is faster than it ever was (I think the index change should also help PostgreSQL / MySQL users), and I understand why the JSON export can't work - so this issue is solved 😉.
Thanks again!
from benji.
Yes, it will also help PostgreSQL users, the index was simply wrong. It happened when I rearranged the columns in the blocks table. And it became apparent after I switched from the in-memory blocks list to the iterator approach. Thanks for the feedback. I'll close this.
from benji.
Related Issues (20)
- benji database-init should be called first time running the container HOT 1
- Restored data does not always match original (Ceph bug) HOT 21
- Problems with snaptrims being pended - Work around HOT 3
- Memory error on benji rm or benji enforce HOT 2
- benji-backup-pvc is no longer working since upgrade HOT 3
- IO scheme k3s is undefined HOT 2
- openstack cinder-backup integration HOT 2
- MySQL compatability problem HOT 1
- benji with celery ? HOT 1
- Publish to a Helm repository in order to support Terraform deployments HOT 2
- Metrics scaling HOT 5
- Coordination with Velero HOT 3
- How to use with rook-ceph ? HOT 2
- Python 3.10 suport HOT 1
- NBD server losing connection to PostgreSQL HOT 4
- nbdserver does not start with python 3.10
- Unable to install benji to Python 3.11 venv in Proxmox 8 (Debian 12 (bookworm)) HOT 4
- NBD export fail "nbd0: truncated" to mount with block size other than 512
- Bug: high ram usage during `benji rm`
- Some pool names are not supported
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from benji.