I would like to propose an extension to the on-disk format on zbackup, specifically th

Hi James, Would you be able to look at <a class="issue-link js-issue

Restore range (feature) about zbackup HOT 5 OPEN

jamespharaoh commented on July 16, 2024

Restore range (feature)

from zbackup.

Comments (5)

davidbartonau commented on July 16, 2024

Hi James,

Would you be able to look at #124 as it might do what you want?

Regards, David

from zbackup.

jamespharaoh commented on July 16, 2024

I'm talking about a new type of backupinstruction, enabling an internal reference in the backup stream to a partial chunk. What you've referenced is addressing a range of the expanded backup. They are separate concepts.

from zbackup.

mknjc commented on July 16, 2024

Could you please explain how zbackup is able to determine if the current ringbuffer data is a part of a chunk already saved without a rollinghash from every possible subblock in a chunk?

from zbackup.

mknjc commented on July 16, 2024

I see one optimization in handleMoreData zbackup could check if it has a partial chunk (emitted from addChunkIfMatched, when there is already data in chunkToSave), instead only checking if there are chunks with the hash after having a full chunk in the ringbuffer. But this might cause problems if a partial chunk is a prefix of another chunk, zbackup would never check if the incoming data matches the bigger chunk because it always deduplicates the prefix. So I don't know if this increases or decreases the backup size.
This might also be compatible with the current file format.

from zbackup.

jamespharaoh commented on July 16, 2024

It wouldn't be compatible with the current file format, I'm fairly sure, but yes, this is what I'm suggesting. My main reason for suggesting this optimization in zbackup is because when I first mentioned this to @dragonroot he wasn't keen because to have it included in zbackup itself because it wouldn't make use of it and therefore be difficult to test, whereas this optimization would certainly be implementable and testable.

However, I've had experience of zbackup stalling on data (which I later realised contained many, many copies of the same data over and over again), which it now seems clear was because it was repeatedly performing SHA sums on large bits of data, because it kept finding rolling sum matches. If it could match partial chunks it would (a) not store so many duplicated chunks and (b) be able to more sensibly optimize when it does SHA checks because it would have a choice of two efficient ways to optimize overlapping chunks and only have to perform an SHA sum once per chunk size (or so) in this case.

from zbackup.

Restore range (feature) about zbackup HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent