Comments (11)
I use gzip --rsyncable
. Do you know any other deduplicatable compression schemes?
from borg.
Imho a backup should retrieve/restore bit identical data that you throw at it 100% of the time, no matter which data it is. This is probably difficult to achieve with compression algorithms that might have slight inconsistencies (e.g. embedded timestamps?) between versions.
I'm also not so sure if simple deduplication would be compressing data better than dedicated compression algorithms, so unless there is a lot of duplicated data being compressed in many seperate archives, a decompression before deduplication might not be very helpful. In that case also a global compression step (create an archive of all input files and store this) would help.
from borg.
archive duplication should only happen if the tool can perfectly restore them
i suspect zip files will be impossible, but various others may just fit very well (tarball streams)
from borg.
I see issues with reproducing bit-identical data with that method also, so maybe it's better to use some compressor with a compression method optimized for deduplication (see --rsyncable).
from borg.
i think it would be a acceptable as opt-in for stream compressed formats like tar overplayed with bzip/gzip/lza
i care about bit-identical content of the uncompressed data,
from borg.
maybe a similar effect can be had with no effort by using deduplication-friendly archive formats, where not the complete compressed stream changes if there is one little change at the beginning of the uncompressed data, but just one or a few blocks.
from borg.
No, but that would be an interesting topic to research.
from borg.
@MarkusTeufelberger the usecase JS had in mind is archiving NixOS source packages. Over time, there can be a lot of duplication between historical versions of the same package's contents (but as some parts of the content change, the package as a whole is maybe not deduplicatable - at least not if a "streaming compression" of everything is used).
from borg.
I think this should be general being avoided but I could imagine that you could allow to define "data_unpack / data_pack" scripts for single files, which the user has to define and therefor are probably not fully transparent. Like this:
You store a folder /var/xxx-files/
as /var/backup/mytar.tgz
and borg gets a file which says that this tgz file has to be feed to a script which returns a "temporary path" (or errror) for create and extract (mount?).
That script could be "un-pack-tgz " and returns /tmp/unpack/file/ as path which then is used by borg to backup this file. Modes could be "unpack", "cleanup", "pack" ... and could be just some simple shell scripts the community provides.
BTW.. to store "tgz" this could simply gunzip/gzip the tar file.
from borg.
Decompression would allow e.g. borg.tgz and borg/ to be deduplicated. Not trival, so probably not a priority at this point, but zsync has achieved an even more impressive goal: rsyncing non-rsyncable gzips, so definitely possible.
from borg.
considering the complexity of this, the concerns about / potential issues with bit-identical reproduction and that there was no actual work / progress on this for over 2 years, i am closing this.
from borg.
Related Issues (20)
- pytest startdir: py.path.local argument is deprecated
- Getting "Data integrity error: Invalid segment entry size 0" on fresh repos HOT 9
- Breaking change between b7 and b8 for encrypted repos HOT 5
- --pattern having different outcome in crontab HOT 6
- 2.0.0b5 repositories inaccessible with 2.0.0b8 - KeyError: 'type' in repoobj.py HOT 11
- borg crashes when segment file corrupted, proposed correction HOT 2
- `import-tar` doc actually explains `export-tar` HOT 3
- Crash when remote repo doesn't exist HOT 4
- Borg mount does not read BORG_REPO env variable HOT 5
- Traceback with delete subcommand when no archive is given HOT 2
- vagrant: add ubuntu noble box for borg 1.2 testing HOT 1
- vagrant: freebsd 13 box broken (1.2-maint) HOT 4
- windows: github CI broken HOT 4
- Borg 1.4.0 FreeBSD fat binary on FreeBSD 13.2 HOT 5
- `ls`-like file view or `borg list --depth` to limit subdirectory recursion HOT 3
- borg 1.4 freebsd acl code does not compile on freebsd 13 HOT 6
- Borg 1.4.0 remote mount hangs indefinitely HOT 2
- Large size of segments lead to significantly larger transfer size HOT 12
- Borg 1.4.0 peeks into excluded directories HOT 8
- Disable cache with environment variable/CLI option HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from borg.