Code Monkey home page Code Monkey logo

Comments (11)

lfam avatar lfam commented on August 15, 2024 1

I use gzip --rsyncable. Do you know any other deduplicatable compression schemes?

from borg.

MarkusTeufelberger avatar MarkusTeufelberger commented on August 15, 2024 1

Imho a backup should retrieve/restore bit identical data that you throw at it 100% of the time, no matter which data it is. This is probably difficult to achieve with compression algorithms that might have slight inconsistencies (e.g. embedded timestamps?) between versions.

I'm also not so sure if simple deduplication would be compressing data better than dedicated compression algorithms, so unless there is a lot of duplicated data being compressed in many seperate archives, a decompression before deduplication might not be very helpful. In that case also a global compression step (create an archive of all input files and store this) would help.

from borg.

RonnyPfannschmidt avatar RonnyPfannschmidt commented on August 15, 2024 1

archive duplication should only happen if the tool can perfectly restore them

i suspect zip files will be impossible, but various others may just fit very well (tarball streams)

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on August 15, 2024 1

I see issues with reproducing bit-identical data with that method also, so maybe it's better to use some compressor with a compression method optimized for deduplication (see --rsyncable).

from borg.

RonnyPfannschmidt avatar RonnyPfannschmidt commented on August 15, 2024 1

i think it would be a acceptable as opt-in for stream compressed formats like tar overplayed with bzip/gzip/lza

i care about bit-identical content of the uncompressed data,

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on August 15, 2024

maybe a similar effect can be had with no effort by using deduplication-friendly archive formats, where not the complete compressed stream changes if there is one little change at the beginning of the uncompressed data, but just one or a few blocks.

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on August 15, 2024

No, but that would be an interesting topic to research.

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on August 15, 2024

@MarkusTeufelberger the usecase JS had in mind is archiving NixOS source packages. Over time, there can be a lot of duplication between historical versions of the same package's contents (but as some parts of the content change, the package as a whole is maybe not deduplicatable - at least not if a "streaming compression" of everything is used).

from borg.

oderwat avatar oderwat commented on August 15, 2024

I think this should be general being avoided but I could imagine that you could allow to define "data_unpack / data_pack" scripts for single files, which the user has to define and therefor are probably not fully transparent. Like this:

You store a folder /var/xxx-files/ as /var/backup/mytar.tgz and borg gets a file which says that this tgz file has to be feed to a script which returns a "temporary path" (or errror) for create and extract (mount?).

That script could be "un-pack-tgz " and returns /tmp/unpack/file/ as path which then is used by borg to backup this file. Modes could be "unpack", "cleanup", "pack" ... and could be just some simple shell scripts the community provides.

BTW.. to store "tgz" this could simply gunzip/gzip the tar file.

from borg.

gmatht avatar gmatht commented on August 15, 2024

Decompression would allow e.g. borg.tgz and borg/ to be deduplicated. Not trival, so probably not a priority at this point, but zsync has achieved an even more impressive goal: rsyncing non-rsyncable gzips, so definitely possible.

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on August 15, 2024

considering the complexity of this, the concerns about / potential issues with bit-identical reproduction and that there was no actual work / progress on this for over 2 years, i am closing this.

from borg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.