Code Monkey home page Code Monkey logo

Comments (5)

davidbartonau avatar davidbartonau commented on July 16, 2024

Hi James,

Would you be able to look at #124 as it might do what you want?

Regards, David

from zbackup.

jamespharaoh avatar jamespharaoh commented on July 16, 2024

I'm talking about a new type of backupinstruction, enabling an internal reference in the backup stream to a partial chunk. What you've referenced is addressing a range of the expanded backup. They are separate concepts.

from zbackup.

mknjc avatar mknjc commented on July 16, 2024

Could you please explain how zbackup is able to determine if the current ringbuffer data is a part of a chunk already saved without a rollinghash from every possible subblock in a chunk?

from zbackup.

mknjc avatar mknjc commented on July 16, 2024

I see one optimization in handleMoreData zbackup could check if it has a partial chunk (emitted from addChunkIfMatched, when there is already data in chunkToSave), instead only checking if there are chunks with the hash after having a full chunk in the ringbuffer. But this might cause problems if a partial chunk is a prefix of another chunk, zbackup would never check if the incoming data matches the bigger chunk because it always deduplicates the prefix. So I don't know if this increases or decreases the backup size.
This might also be compatible with the current file format.

from zbackup.

jamespharaoh avatar jamespharaoh commented on July 16, 2024

It wouldn't be compatible with the current file format, I'm fairly sure, but yes, this is what I'm suggesting. My main reason for suggesting this optimization in zbackup is because when I first mentioned this to @dragonroot he wasn't keen because to have it included in zbackup itself because it wouldn't make use of it and therefore be difficult to test, whereas this optimization would certainly be implementable and testable.

However, I've had experience of zbackup stalling on data (which I later realised contained many, many copies of the same data over and over again), which it now seems clear was because it was repeatedly performing SHA sums on large bits of data, because it kept finding rolling sum matches. If it could match partial chunks it would (a) not store so many duplicated chunks and (b) be able to more sensibly optimize when it does SHA checks because it would have a choice of two efficient ways to optimize overlapping chunks and only have to perform an SHA sum once per chunk size (or so) in this case.

from zbackup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.