After the recent performance improvements especially for large backups, I see a hefty

database-migrate sadly says: <div class="snippet-

That's interesting! I just modified the code to drop the line: <code class="no

deep-scrub performance regression about benji HOT 18 CLOSED

elemental-lf commented on July 30, 2024

deep-scrub performance regression

from benji.

Comments (18)

elemental-lf commented on July 30, 2024

I expected it to be a bit slower because more database queries are done than before where the whole block list was completely held in memory. But not that much slower... the good news is I have a suspect: One index on the blocks table has the wrong order of columns. I will need to provide a database migration for this. (It would be easy to test with PostgreSQL but SQLite has limitations again and the whole table needs to be recreated.)

from benji.

olifre commented on July 30, 2024

It would be easy to test with PostgreSQL but SQLite has limitations again and the whole table needs to be recreated.

Really sorry, I know writing migrations is the least fun thing to do. I hope alembic can help a bit, at least I think that for ALTER TABLE it also recreates the table automatically if SQLite is in use. I'll keep daily scrubbing of 15 % activated for now so I can test the change in realistic conditions once it's there 😉.

from benji.

elemental-lf commented on July 30, 2024

Okay, I barely survived writing the migration ;) Alembic didn't detect the change at all... Please try current master, it contains 226bb56. If after that the performance of deep scrubbing is still bad you could try to increase _BLOCKS_READ_WORK_PACKAGE in class benji.benji.Benji. But I would think that this would only matter for images larger than about 40GB.

from benji.

olifre commented on July 30, 2024

database-migrate sadly says:

# benji database-migrate
    INFO: $ /opt/benji/bin/benji -c /etc/benji.yaml database-migrate
    INFO: Migrating from database schema revision 151248f94062 to 368014edd88c.
   ERROR: ValueError: No such constraint: 'pk_blocks'

Dumping the schema in sqlite3 (.schema blocks) shows:

CREATE TABLE blocks (
        id INTEGER NOT NULL, 
        uid_right INTEGER, 
        uid_left INTEGER, 
        size INTEGER, 
        version_uid INTEGER NOT NULL, 
        valid BOOLEAN NOT NULL, 
        checksum BLOB, 
        PRIMARY KEY (id, version_uid), 
        FOREIGN KEY(version_uid) REFERENCES versions (uid) ON DELETE CASCADE, 
        CHECK (valid IN (0, 1))
);
CREATE INDEX ix_blocks_checksum ON blocks (checksum);
CREATE INDEX ix_blocks_uid_left ON blocks (uid_left, uid_right);

i.e. the "primary key" is not a named constraint.

from benji.

olifre commented on July 30, 2024

(in case I am really the only one still using SQLite3, of course I would be fine with a manual migration as described here: https://stackoverflow.com/a/16901926 but I fear I am not the only one)

from benji.

olifre commented on July 30, 2024

That's interesting!
I just modified the code to drop the line:
batch_op.drop_constraint(batch_op.f('pk_blocks'))
(i.e. I just commented it out)
and I only kept the creation of the new primary key.

Here's the result:

# sqlite3  benji.sqlite ".schema blocks"
CREATE TABLE "blocks" (
        id INTEGER NOT NULL, 
        uid_right INTEGER, 
        uid_left INTEGER, 
        size INTEGER, 
        version_uid INTEGER NOT NULL, 
        valid BOOLEAN NOT NULL, 
        checksum BLOB, 
        CONSTRAINT pk_blocks PRIMARY KEY (version_uid, id), 
        CHECK (valid IN (0, 1)), 
        FOREIGN KEY(version_uid) REFERENCES versions (uid) ON DELETE CASCADE
);
CREATE INDEX ix_blocks_uid_left ON blocks (uid_left, uid_right);
CREATE INDEX ix_blocks_checksum ON blocks (checksum);

That looks just fine, maybe since alembic recreates the table anyways, and then only one primary key may exist at a time...?

Interestingly, the database size has changed significantly:

# ls -lrtah
-rw-r--r--. 1 root root 555M  1. Apr 18:46 benji.sqlite.back
-rw-r--r--. 1 root root 841M  1. Apr 18:48 benji.sqlite

but that's probably expected with the new index (maybe I should do a VACUUM; at some point).

Does this trick also work with PostgreSQL / MySQL maybe?

from benji.

elemental-lf commented on July 30, 2024

I've added code to automatically name all constraints in mid January. Your database is probably older than that and so the constraint is unnamed. Simply removing the drop_constraint would break PostgreSQL.

from benji.

olifre commented on July 30, 2024

I've added code to automatically name all constraints in mid January. Your database is probably older than that and so the constraint is unnamed.

Ah! Yes, of course, that's an explanation. Sadly, there's probably no way to migrate to that.
Or would:

benji metadata-export -o foo
rm /path/to/mydb
benji database-init
benji metadata-import -i foo

do the trick, since the db is freshly initialized (with constraint names)? Would I lose anything?

Simply removing the drop_constraint would break PostgreSQL.

I understand. It probably does neither allow to create a second constraint with the same name, nor automatically remove the "old" primary key constraint when adding a new key with a different name.

Then I agree it's better to break here, especially since it won't affect any "younger" SQLite users than me 😉.

from benji.

elemental-lf commented on July 30, 2024

Naming the constraint is exactly for migration cases like this and recommended in the Alembic documentation which is why I added it.
Actually I've not tested a full export and import of the database. It should work, but I would keep a backup handy...

from benji.

elemental-lf commented on July 30, 2024

If you try this please directly compress the export as the JSON is bulky.

from benji.

olifre commented on July 30, 2024

If you try this please directly compress the export as the JSON is bulky.

Good point! I started an export and it's really huge. Luckily, a DB backup is easy with SQLite. I guess I will do an export, a dump of the SQLite3 DB, then do the reinitialization and import, do another dump and compare the dumps. Not sure if I manage today before our backups start, the export to JSON is (expectedly) quite slow.
If I don't finish in time, I'll try tomorrow, but at least I'll see the performance changes already this night 😉.

from benji.

elemental-lf commented on July 30, 2024

But you realize that this would be mostly of academic interest? If the primary key constraint is named now and in the right order the database should be okay to use. There might be other constraints which are unnamed, too. But this will only pose (minor) problem if a database migration should touch these.

from benji.

olifre commented on July 30, 2024

But you realize that this would be mostly of academic interest?

Yes, but thanks for pointing it out 😉.
It will also be interesting to see if the full export actually works. I cancelled it hard just now partially through, and it left behind a locked version, blocking further metadata exports - so I restored the DB from a backup.
However, it was already using a huge amount of RAM. I'll test again and find out if exporting a 500 MB DB is possible at all on a machine with 16 GB of memory.
If it fails, I'll give up 😉.

There might be other constraints which are unnamed, too. But this will only pose (minor) problem if a database migration should touch these.

If it fails on first attempt, I'll indeed decide to ignore the issue until it pops up again in a future migration.

from benji.

olifre commented on July 30, 2024

Another side-note: Running:

VACUUM FULL;
ANALYZE;

changes the size of our SQLite3 DB from 841M to 492M, and probably also increases performance. I'll probably make that part of our daily backup script, it takes just 2-3 minutes.

from benji.

olifre commented on July 30, 2024

I'll test again and find out if exporting a 500 MB DB is possible at all on a machine with 16 GB of memory.

Indeed it's impossible:

# zgrep checksum /vmbackup/foo.gz | wc -l
927010
# sqlite3 benji.sqlite.back3 "select count(*) from blocks;"
6736896

i.e. 1/7th is done, and Benji is using 3.4 GB and seems to use linearly more memory over time. So the 16 GB or RAM of the machine are insufficient for metadata-export.

Hence, I'll just VACUUM FULL and ANALYZE and check how the new primary key performs overnight. If all goes well, I'll reply tomorrow morning 😉.

from benji.

elemental-lf commented on July 30, 2024

@olifre it's probably impossible the way it's designed with one enclosing data structure for all versions. Would need a streaming writer for JSON.

from benji.

olifre commented on July 30, 2024

Before the improvements, this ran from 06:47:25 to 09:19:07, i.e. less than 3 hours.
Now, it takes from 07:01:17 to 12:06:50, i.e. more than 5 hours. It seems we also spent 13 minutes more for the actual snapshot backups and purging of old backups, but that's ok to lose for many small volumes if large volumes are faster (and I see the backups of larger volumes are indeed a bit faster).

Now, the results are:
Deep-scrubbing already starts at 06:16:27, since backups and purging of old backups is much faster (we start those at 5am).
And deep-scrubbing finishes already at 7:21h!

So in total, we saved about 5 hours(!). I can not fully say this is "only" from the index change, since I also ran VACUUM FULL and ANALYZE on the database. I also did that for Backy2, and there it saved 40 minutes, so this also seems to have a hefty effect (for SQLite3). For PostgreSQL that's probably not needed at all (due to autovacuum), so that would only be a recommendation for SQLite3 users.

I'm a bit astonished since SQLite3 should "auto-create" temporary indices "on-the-fly" in memory in cases where it helps - but may the SQLite3 version shipped with CentOS 7 is too old for that.

So in any case, the performance issue is solved and Benji is faster than it ever was (I think the index change should also help PostgreSQL / MySQL users), and I understand why the JSON export can't work - so this issue is solved 😉.

Thanks again!

from benji.

elemental-lf commented on July 30, 2024

Yes, it will also help PostgreSQL users, the index was simply wrong. It happened when I rearranged the columns in the blocks table. And it became apparent after I switched from the in-memory blocks list to the iterator approach. Thanks for the feedback. I'll close this.

from benji.

deep-scrub performance regression about benji HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent