Comments (19)
@ThomasWaldmann it's true that backward compatibility can eventually turn into an inconvenience. But please keep in mind this is backup software we're talking about. You can't treat it like some UI or some other self-contained software. Its data is designed to last an indeterminate amount of time. Otherwise, it is not a backup. Please name one backup software that doesn't treat backup format with extreme care. All formats I can think of are very long-lived, especially those that have been successful think cpio, pax, tar... there may be format versions, but those are very few and still handled by recent versions. Proprietary backup solutions may show a bit more variability but still they can read old formats almost without exception.
Breaking compatibility with upstream attic is not a smart move, especially without a very good reason and a data migration path in place. To break it for some silly strings is simply absurd. This issue is called "Discuss Goals", so let's do that. What is the mission of backup software?: is it to have flexibility? is it to fix bugs fast? No, it is to keep data safe. Those things are nice, and we all want them, but they're not the software's mission. If backup software fails to keep data accessible, it fails as backup software, it's simply useless.
Besides, if so many changes are required to the backup format, then it is designed badly. Is the attic backup format design bad? Why? What are its shortcomings specifically?
Now, I can understand making no promises of data integrity or compatibility for development versions. That's just common sense. BUT development should strive from the start to stick to one and only one format (be it attic's or some variation on it). Format should be versioned, just in case there appears a real need to change it in the future, but if it's well designed it should support most features we might want. And if it doesn't, then either it wasn't well designed or we really should think hard whether such feature is really needed. And at a minimum, read-only backward compatibility is a must.
Just my opinion, of course.
from borg.
- Avoid getting into the "compatible forever" trap - we should maybe not assure compatibility of development versions nor spanning major releases.
- When used for long-term archiving, special care might be required.
I think extracting files of an archive should possible every time independent of the used version of borgbackup. At least with latest bb version should extracting of older version of archive possible. But not vice versa.
A converting operation could be the solution (thx, @joolswills)
from borg.
One thing I would like to see, is if/when the repository format changes to offer new features, the ability to convert in place existing repositories, rather than having to start again.
Agree with your points. Thanks for your efforts.
from borg.
Converters are an option to think about when going from one release to an (incompatible) newer release.
But, someone would have to write that code and it is a burden and slows down development. Also, converting a large amount of historical archives might be a very time and space consuming affair, thus maybe impractical even if you had a converter.
Thus, I am still proposing just breaking compat now and then. Someone who wants conservative long time archiving might better off using tar (or attic) maybe, not with something that is being heavily developed. And the whole point of this fork is to accelerate development. :)
Also, of course nobody wants to write converters that convert between development snapshots and alpha/beta/rc releases.
from borg.
agreed in regards to dev, but if a new feature is added such as a new compression method it would be nice to do things like recompress etc. Also things like decrypting/encrypting repositories after set up would be useful. Just ideas anyway!
from borg.
i really like the promise of attic to keep backwards compatibility forever for storage.
who knows when i will need to restore this old backup? do we need to make such changes now anyways?
maybe an alternative here would be to stabilise versions at some point and treat those things as "API changes". so we could have a X.0 release that will be backwards-compatible with X-1.0 and forward-compatible with X+1.0 so that you could migrate your repo by upgrading incrementally?
from borg.
@anarcat I thought about it quite a lot. Keeping something compatible forever sounds great, but from development and maintenance standpoint, it is a pain.
For example, look at flash player or windows - they try to be compatible forever (or at least over very long time) and it results in an accumulation of a lot of crap code (there was a talk by FX at a CCC conference about it, that went into quite some detail). They basically rewrote that thing multiple times (for reasons), but always kept the old code also. Thus you have now not just the bugs in the latest code, but all remaining bugs in all the old versions, too. Windows still has broken time stdlib functions, because they were broken in the same way on DOS.
I am not saying attic is quality like flash player. :) We are lucky that the code base is quite good, but I found some places where it is too hardcoded or where maybe even a layer is missing and blocking the development of a needed feature. Maybe we will find more over time, that has to be seen.
Your idea with +/- 1 version compatibility is somehow similar to a converter, right?
from borg.
BTW, I pushed some code to the repo. It is not compatible with attic due to changes of the magic strings (like ATTIC_KEY, ATTICSEG, ATTICIDX). Other than that, it was mostly the content of the rather conservative "merge" branch + s/attic/borg/g. See CHANGES.txt for details.
from borg.
so i understand how hard it can be for windows to keep backwards compat with crap like DOS, or keep flash player stable. i think that's a different problem than what we are living here: as you said, attic is fairly well designed and implemented, and there are some tweaks we want to do, unless i misunderstand something deeper here.
a good example is changing the ATTIC strings: why would we do that at all, if not gratiously break backwards compatibility?
from borg.
It's a fork. If changes are necessary, now is the time, especially if those changes make borg more resilient to future changes.
Also, the idea of backup software being able to keep files and format "forever" is a little ambitious (for borg or attic). Recently a msgpack bug was discovered: attic is not standalone, its very format is dependant on third party libraries. If those evolve or are abandonned, you will need to do something about your repositories.
But I agree read-only backward compatibility is a minimum; the ability to update the format (or change compression level/encryption) would definitely come in handy.
from borg.
@barsanuphe I didn't say "forever", I said an indeterminate amount of time. Meaning, "long enough". Is the time between borg releases long enough? I don't think so, not by a long shot. Tar files have lasted decades, I don't see a reason not to strive to reach that kind of quality. Users expect that kind of quality from backup software, not having to reencode archives at any upgrade.
You say "if changes are necessary", that's precisely my point. That necessity needs to be spelled out very clearly, for very good reasons, before format changes can be considered. And they haven't. I agree that now is the right time to discuss those needs. The sooner the format is established, the sooner borg development can begin.
from borg.
Converters are an option to think about when going from one release to an (incompatible) newer release.
But, someone would have to write that code and it is a burden and slows down development.
Maybe, but must be done.
Also, converting a large amount of historical archives might be a very time and space consuming affair, thus maybe impractical even if you had a converter.
Maybe or maybe not. You must provide the conversion functionality and let the user decide. He may choose to prune archives before converting and keep only a small subset of them. Or he may have the time and space and wants Everything.
from borg.
Like anarcat and aride I too believe this fork should improve attic in a sensible way without braking too many things at once. Sure, nobody knows what bugs will be found in the future. However, I really don't see why changes in 3rd party libraries should brake compatibility with attic as attic needs to be upgraded as well in such a case.
An example would be the readability of attic's code. Variable names like "t0" and "st" are nice to write but don't make it very readable and brake PEP 0008 as they clearly aren't words:
lowercase with words separated by underscores as necessary to improve readability.
So should we change those variable names to something better to benefit readability or should we stay with them to benefit code compatibility? I would vote for the latter at the moment.
from borg.
Concerning attic and backward compatibility with older repo formats: I have an idea. It may completely suck... but here it is FWIW. Flick your patience switch to the ON position, because it involves/requires modularising the code and I need to discuss that first. Bare with me - the description may be long but the concept will (IMO) result in a quite neat, tidy, more functional and more extensible tool.
So, what I'm thinking is that there seems to be a pretty clear boundary line where the code can be modularised, as described below:
- The
borg_core
module. This is the 'engine room'. It is the only module that actually works on and touches backup repos. It's functionality isinit
,create
,extract
,check
,delete
,list
,prune
,info
,change-passphrase
- but these are only internal API's to the 'core' and not user facing... other modules wrap aroundborg_core
to provide filesystem abstraction and user facing commands as described further below. However, under this modularised approach this 'core' only communicates file content and metadata via filesystem-agnostic data structures with bare minimum knowledge to carry out the above functions. The data structure is a list of one or more of what I'll call 'file packages' (FP's). FP's are relatively 'future proof' data structures (e.g. leveraging msgpack/protocol buffers/whatever) that contains the contentborg_core
needs to get it's job done (e.g. file content, file name, perhaps/probably file content checksum) and also provides for arbitrary additional content that can be used for filesystem- and/or OS-specific data (e.g. xattrs, ACL's, whatever) thatborg_core
just stores and retrieves from the repo but doesn't need to use directly or understand in any specific way. - One or more filesystem-specific (or even OS-specific) interface modules (or perhaps we could call them "filesystem shims"...I dunno). These modules wrap around
borg_core
to provide filesystem-specific behaviour. For the sake of providing a concrete example, consider a module which I will give the nameborg_extfs_shim
, as it is designed to makeborg
work on ext3/4 filesystems:- In the case of
init
it simply passes through to the 'init' method of the 'core' module. - In the case of
create
it handles the filesystem scanning (exclude globs/regexps and special fs-specific stuff). The module reads the files to be backed up and packages them into a list/stream of FP's. It streams this list of FP's toborg_core.create
' to be pushed into the repo. Options to theborg_extfs_shim.create
method can specify how much/little of the metadata (perms, xattrs, ACLs, whatever) gets stored in the metadata portion of each FP and consequently stored in the attic repo viaborg_core
. - In the case of
extract
, the approximate reverse tocreate
occurs. In the most simple caseborg_core
spews a list/stream of FP's back toborg_generic_linuxfs_shim
, which unpacks the file content and metadata and writes the described files to the filesystem using it's special-sauce knowledge of (in this case) ext3/4 filesystem. There are more complex cases (e.g. partial extract of only certain files) that I won't go into for the sake of brevity as this post is already very long. - In the case of
check
, it could be as little as passing straight through toborg_core
's owncheck
method, but more feature-rich code could also be created to do certain checks on metadata on files in the archive if deemed worthwhile/necessary. - in the case of
list
, this module receives fromborg_core.list
a stream/list of FP's metadata only (no file content). This module then interprets the metadata and pretty-prints to stdout a detailed list of files and any metadata deemed relevant that is in the specified archive. In other words, it's very much likeextract
but only crunches metadata, not file content. - In the case of
delete
andprune
,info
andchange-passphrase
, these would be passed straight through to their counterpart methods inborg_core
- In the case of
- The
borg
module. This is the user facing module - the 'front end'. But there could be others made if a new use case warranted. By default, it automatically determines which filesystem shim module will be used to interact withborg_core
, but there could be a command-line switch to force a specific shim to be used. For example:- In the case of
borg create myrepo::my-archive ~/Documents
, theborg
module determines that the filesystem being read is ext4, so engagesborg_extfs_shim.create(repo="myrepo",archive="my-archive",source="~/Documents"
etc. - Hopefully you get the drift.
- In the case of
This design opens up numerous possibilities which both improve modularisation of the code AND could make the issue of repo backward compatibility a much easier goal to achieve.
Concerning modularisation, consider now a module borg_fuse_shim
, which only implements borg mount
. This is more neat/tidy/modularised, no? Fuse mounting is a great feature, but should not really be a part of the 'core' of borg.
Concerning modularisation once more, consider now a module borg_stdin_shim
, which only implements borg create
with the specific function of accepting stream on stdin and presenting it as a single FP to borg_core
. Nice, neat solution that moves this feature, which is nice but not critical/central, out of the core functionality of borg.
Concerning repo backward compatibility, this new modularised approach brings the goal of repo backward compatibility much closer, for two reasons:
borg_core
is now (almost) entirely file metadata-agnostic. This ensures that there will be minimal need, in future, to change the repo data structures concerning the metadata of archived file content for the foreseeable future. Admittedly, however, it has no impact on backward-compatibility of on-disk repo format. Want to support a new feature in a specific filesystem (e.g. NTFS, xfs, reiser, btrfs, NFS, Amazon S3) in future? No problem! Just write a new shim or expand an existing one, leavingborg_core
untouched.- A specific shim can be written to output an entire repo as a stream (e.g. to stdout) in a well defined format. That format could be as simple as a serialised (msgpacked)
dict
where the key is the archive name and the value is the list of FP's (exactly the list/stream that the shims use to pass data back and forth withborg_core
). Combined with a method to read an entire repo as a stream (e.g. from stdin) and you now have a mechanism for upgrading from one repo format to the next (and downgrading, for that matter). This might be in the form of the following piped commandattic_v1.53 streamout my-old-repo | attic_2.01 streamin my-new-repo
. All that is required to achieve this is the necessary disk space and adequate cups of coffee.
Sorry for the enormous post. Hopefully the idea doesn't suck. If it does, sorry for giving my readers eyestrain for no good reason.... ;-)
from borg.
[i'm hesitant in adding more to the wildly ranging conversation here, but it seems that one big issue in the goals of the project here is regarding backwards compatibility, so i'll add something about that. maybe a separate issue should be opened about this to summarize the conversation here and clarify borg's way of dealing the issue...]
anyways. so i understand where the "fork allows us to change" idea is coming from and i respect that. maybe it's fine to make a break to allow cleaning up bad assumptions in the code. i am worried about:
- gratitious changes: here i am refering specifically to 159315e - this commit changes magic number without a good justification. this seem to be contrary to even the goals stated in the summary here (namely "Don't break it accidentally / without good reason / without warning")
- eternal upgrade chase: even if we accept some of those changes, at some point, those changs need to stop and stabilise. maybe that's what a 1.0 release looks like. but then that means the software can't actually be used reliably in production until then, at which point it is locked. so a little more thought need to be put about how to introduce format changes safely and reliably.
- future-proofing: backup software should be self-contained (for disaster recovery) and able to deal with really old data. data that can't be read directly should be convertable, as a worst case option, but never lost (this already fails wrt to Attic because of the above commit, but i guess that's an acceptable compromise if we consider borg as a new backup software and not a fork (which it isn't))
I really like jborg's example of how old tar archives from 30 years ago can still be read. tar's specification also has the benefit of fitting with three paragraphs in wikipedia - clearly a different implementation. yet i believe is a standard any backup software should aspire to. notice how tar has dealt with potentially backwards-incompatible changes...
basically, my position is that attic/borg should not break backwards compatibility and support past formats forever. i haven't seen compelling evidence or changes that warrant such a break at this point, and I would like those proposing such changes to show such an example otherwise the conversation will likely continue to go nowhere... i believe that any such change can be made in a backwards compatible way, the current format is not so bad as it will explode in the future...
from borg.
since so many discussions were about backwards compatibility here, i thought it was relevant to open an issue specifically about this in #26.
from borg.
oh, and in PR #25, i actually suggest we document the goals stated in the summary here "as is" (mostly), meaning that i agree with those.
i wonder if a code of conduct or something similar wouldn't be a good idea too... a few ideas:
- http://www.ubuntu.com/about/about-ubuntu/conduct - i like that one...
- https://libreplanet.org/wiki/LibrePlanet:About/Code_of_Conduct/Draft - simpler?
- https://www.ietf.org/rfc/rfc1855.txt (classic - 1995!)
from borg.
Any progress on this issue? What is outstanding? Can it be closed?
from borg.
Guess this can be closed.
from borg.
Related Issues (20)
- borg2: Creating archive fails at specific archive name HOT 3
- disk is full: `_get_default_tempdir` raises `FileNotFoundError` No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/kmille'] HOT 7
- run in venv w/ root rights HOT 3
- As of 1.2.0, the ssh relative path hack "/./" works for most actions but not "borg init" HOT 4
- Please add a way to keep backups independently of pruning retention policy HOT 2
- ConnectionResetError: [Errno 104] Connection reset by peer HOT 3
- `borgfs` in Standalone Binary Installation Docs HOT 2
- Can't build borg on arm64 (armbian 22.04LTS) HOT 20
- Use multithreaded zstd compression HOT 9
- Security Feature: Error if local / repository nonce are not in agreement -- improve encryption trust HOT 1
- Are backup archive names encrypted? Cannot find answer in docs. HOT 1
- Backups much slower (5 mins compared to 0.3 secs) than reported "Duration" -- any way to speed it up? HOT 6
- Possible bug in pruning logic with keep-weekly and keep-monthly HOT 7
- netbsd9 vagrant box: broken libxxhash.pc HOT 2
- locking.py seems multiprocess-safe but not thread-safe HOT 3
- `borg check` hangs after errors HOT 5
- pytest startdir: py.path.local argument is deprecated
- Getting "Data integrity error: Invalid segment entry size 0" on fresh repos HOT 9
- Breaking change between b7 and b8 for encrypted repos HOT 5
- --pattern having different outcome in crontab HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from borg.