Code Monkey home page Code Monkey logo

Comments (19)

aride avatar aride commented on July 17, 2024 7

@ThomasWaldmann it's true that backward compatibility can eventually turn into an inconvenience. But please keep in mind this is backup software we're talking about. You can't treat it like some UI or some other self-contained software. Its data is designed to last an indeterminate amount of time. Otherwise, it is not a backup. Please name one backup software that doesn't treat backup format with extreme care. All formats I can think of are very long-lived, especially those that have been successful think cpio, pax, tar... there may be format versions, but those are very few and still handled by recent versions. Proprietary backup solutions may show a bit more variability but still they can read old formats almost without exception.

Breaking compatibility with upstream attic is not a smart move, especially without a very good reason and a data migration path in place. To break it for some silly strings is simply absurd. This issue is called "Discuss Goals", so let's do that. What is the mission of backup software?: is it to have flexibility? is it to fix bugs fast? No, it is to keep data safe. Those things are nice, and we all want them, but they're not the software's mission. If backup software fails to keep data accessible, it fails as backup software, it's simply useless.

Besides, if so many changes are required to the backup format, then it is designed badly. Is the attic backup format design bad? Why? What are its shortcomings specifically?

Now, I can understand making no promises of data integrity or compatibility for development versions. That's just common sense. BUT development should strive from the start to stick to one and only one format (be it attic's or some variation on it). Format should be versioned, just in case there appears a real need to change it in the future, but if it's well designed it should support most features we might want. And if it doesn't, then either it wasn't well designed or we really should think hard whether such feature is really needed. And at a minimum, read-only backward compatibility is a must.

Just my opinion, of course.

from borg.

silvio avatar silvio commented on July 17, 2024
  • Avoid getting into the "compatible forever" trap - we should maybe not assure compatibility of development versions nor spanning major releases.
  • When used for long-term archiving, special care might be required.

I think extracting files of an archive should possible every time independent of the used version of borgbackup. At least with latest bb version should extracting of older version of archive possible. But not vice versa.
A converting operation could be the solution (thx, @joolswills)

from borg.

joolswills avatar joolswills commented on July 17, 2024

One thing I would like to see, is if/when the repository format changes to offer new features, the ability to convert in place existing repositories, rather than having to start again.

Agree with your points. Thanks for your efforts.

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on July 17, 2024

Converters are an option to think about when going from one release to an (incompatible) newer release.

But, someone would have to write that code and it is a burden and slows down development. Also, converting a large amount of historical archives might be a very time and space consuming affair, thus maybe impractical even if you had a converter.

Thus, I am still proposing just breaking compat now and then. Someone who wants conservative long time archiving might better off using tar (or attic) maybe, not with something that is being heavily developed. And the whole point of this fork is to accelerate development. :)

Also, of course nobody wants to write converters that convert between development snapshots and alpha/beta/rc releases.

from borg.

joolswills avatar joolswills commented on July 17, 2024

agreed in regards to dev, but if a new feature is added such as a new compression method it would be nice to do things like recompress etc. Also things like decrypting/encrypting repositories after set up would be useful. Just ideas anyway!

from borg.

anarcat avatar anarcat commented on July 17, 2024

i really like the promise of attic to keep backwards compatibility forever for storage.

who knows when i will need to restore this old backup? do we need to make such changes now anyways?

maybe an alternative here would be to stabilise versions at some point and treat those things as "API changes". so we could have a X.0 release that will be backwards-compatible with X-1.0 and forward-compatible with X+1.0 so that you could migrate your repo by upgrading incrementally?

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on July 17, 2024

@anarcat I thought about it quite a lot. Keeping something compatible forever sounds great, but from development and maintenance standpoint, it is a pain.

For example, look at flash player or windows - they try to be compatible forever (or at least over very long time) and it results in an accumulation of a lot of crap code (there was a talk by FX at a CCC conference about it, that went into quite some detail). They basically rewrote that thing multiple times (for reasons), but always kept the old code also. Thus you have now not just the bugs in the latest code, but all remaining bugs in all the old versions, too. Windows still has broken time stdlib functions, because they were broken in the same way on DOS.

I am not saying attic is quality like flash player. :) We are lucky that the code base is quite good, but I found some places where it is too hardcoded or where maybe even a layer is missing and blocking the development of a needed feature. Maybe we will find more over time, that has to be seen.

Your idea with +/- 1 version compatibility is somehow similar to a converter, right?

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on July 17, 2024

BTW, I pushed some code to the repo. It is not compatible with attic due to changes of the magic strings (like ATTIC_KEY, ATTICSEG, ATTICIDX). Other than that, it was mostly the content of the rather conservative "merge" branch + s/attic/borg/g. See CHANGES.txt for details.

from borg.

anarcat avatar anarcat commented on July 17, 2024

so i understand how hard it can be for windows to keep backwards compat with crap like DOS, or keep flash player stable. i think that's a different problem than what we are living here: as you said, attic is fairly well designed and implemented, and there are some tweaks we want to do, unless i misunderstand something deeper here.

a good example is changing the ATTIC strings: why would we do that at all, if not gratiously break backwards compatibility?

from borg.

barsanuphe avatar barsanuphe commented on July 17, 2024

It's a fork. If changes are necessary, now is the time, especially if those changes make borg more resilient to future changes.
Also, the idea of backup software being able to keep files and format "forever" is a little ambitious (for borg or attic). Recently a msgpack bug was discovered: attic is not standalone, its very format is dependant on third party libraries. If those evolve or are abandonned, you will need to do something about your repositories.
But I agree read-only backward compatibility is a minimum; the ability to update the format (or change compression level/encryption) would definitely come in handy.

from borg.

aride avatar aride commented on July 17, 2024

@barsanuphe I didn't say "forever", I said an indeterminate amount of time. Meaning, "long enough". Is the time between borg releases long enough? I don't think so, not by a long shot. Tar files have lasted decades, I don't see a reason not to strive to reach that kind of quality. Users expect that kind of quality from backup software, not having to reencode archives at any upgrade.

You say "if changes are necessary", that's precisely my point. That necessity needs to be spelled out very clearly, for very good reasons, before format changes can be considered. And they haven't. I agree that now is the right time to discuss those needs. The sooner the format is established, the sooner borg development can begin.

from borg.

Ernest0x avatar Ernest0x commented on July 17, 2024

Converters are an option to think about when going from one release to an (incompatible) newer release.

But, someone would have to write that code and it is a burden and slows down development.

Maybe, but must be done.

Also, converting a large amount of historical archives might be a very time and space consuming affair, thus maybe impractical even if you had a converter.

Maybe or maybe not. You must provide the conversion functionality and let the user decide. He may choose to prune archives before converting and keep only a small subset of them. Or he may have the time and space and wants Everything.

from borg.

maltefiala avatar maltefiala commented on July 17, 2024

Like anarcat and aride I too believe this fork should improve attic in a sensible way without braking too many things at once. Sure, nobody knows what bugs will be found in the future. However, I really don't see why changes in 3rd party libraries should brake compatibility with attic as attic needs to be upgraded as well in such a case.

An example would be the readability of attic's code. Variable names like "t0" and "st" are nice to write but don't make it very readable and brake PEP 0008 as they clearly aren't words:

lowercase with words separated by underscores as necessary to improve readability.

So should we change those variable names to something better to benefit readability or should we stay with them to benefit code compatibility? I would vote for the latter at the moment.

from borg.

level323 avatar level323 commented on July 17, 2024

Concerning attic and backward compatibility with older repo formats: I have an idea. It may completely suck... but here it is FWIW. Flick your patience switch to the ON position, because it involves/requires modularising the code and I need to discuss that first. Bare with me - the description may be long but the concept will (IMO) result in a quite neat, tidy, more functional and more extensible tool.

So, what I'm thinking is that there seems to be a pretty clear boundary line where the code can be modularised, as described below:

  1. The borg_core module. This is the 'engine room'. It is the only module that actually works on and touches backup repos. It's functionality is init, create, extract, check, delete, list, prune, info, change-passphrase - but these are only internal API's to the 'core' and not user facing... other modules wrap around borg_core to provide filesystem abstraction and user facing commands as described further below. However, under this modularised approach this 'core' only communicates file content and metadata via filesystem-agnostic data structures with bare minimum knowledge to carry out the above functions. The data structure is a list of one or more of what I'll call 'file packages' (FP's). FP's are relatively 'future proof' data structures (e.g. leveraging msgpack/protocol buffers/whatever) that contains the content borg_core needs to get it's job done (e.g. file content, file name, perhaps/probably file content checksum) and also provides for arbitrary additional content that can be used for filesystem- and/or OS-specific data (e.g. xattrs, ACL's, whatever) that borg_core just stores and retrieves from the repo but doesn't need to use directly or understand in any specific way.
  2. One or more filesystem-specific (or even OS-specific) interface modules (or perhaps we could call them "filesystem shims"...I dunno). These modules wrap around borg_core to provide filesystem-specific behaviour. For the sake of providing a concrete example, consider a module which I will give the name borg_extfs_shim, as it is designed to make borg work on ext3/4 filesystems:
    • In the case of init it simply passes through to the 'init' method of the 'core' module.
    • In the case of create it handles the filesystem scanning (exclude globs/regexps and special fs-specific stuff). The module reads the files to be backed up and packages them into a list/stream of FP's. It streams this list of FP's to borg_core.create' to be pushed into the repo. Options to the borg_extfs_shim.create method can specify how much/little of the metadata (perms, xattrs, ACLs, whatever) gets stored in the metadata portion of each FP and consequently stored in the attic repo via borg_core.
    • In the case of extract, the approximate reverse to create occurs. In the most simple case borg_core spews a list/stream of FP's back to borg_generic_linuxfs_shim, which unpacks the file content and metadata and writes the described files to the filesystem using it's special-sauce knowledge of (in this case) ext3/4 filesystem. There are more complex cases (e.g. partial extract of only certain files) that I won't go into for the sake of brevity as this post is already very long.
    • In the case of check, it could be as little as passing straight through to borg_core's own check method, but more feature-rich code could also be created to do certain checks on metadata on files in the archive if deemed worthwhile/necessary.
    • in the case of list, this module receives from borg_core.list a stream/list of FP's metadata only (no file content). This module then interprets the metadata and pretty-prints to stdout a detailed list of files and any metadata deemed relevant that is in the specified archive. In other words, it's very much like extract but only crunches metadata, not file content.
    • In the case of delete and prune, info and change-passphrase, these would be passed straight through to their counterpart methods in borg_core
  3. The borg module. This is the user facing module - the 'front end'. But there could be others made if a new use case warranted. By default, it automatically determines which filesystem shim module will be used to interact with borg_core, but there could be a command-line switch to force a specific shim to be used. For example:
    • In the case of borg create myrepo::my-archive ~/Documents, the borg module determines that the filesystem being read is ext4, so engages borg_extfs_shim.create(repo="myrepo",archive="my-archive",source="~/Documents" etc.
    • Hopefully you get the drift.

This design opens up numerous possibilities which both improve modularisation of the code AND could make the issue of repo backward compatibility a much easier goal to achieve.

Concerning modularisation, consider now a module borg_fuse_shim, which only implements borg mount. This is more neat/tidy/modularised, no? Fuse mounting is a great feature, but should not really be a part of the 'core' of borg.

Concerning modularisation once more, consider now a module borg_stdin_shim, which only implements borg create with the specific function of accepting stream on stdin and presenting it as a single FP to borg_core. Nice, neat solution that moves this feature, which is nice but not critical/central, out of the core functionality of borg.

Concerning repo backward compatibility, this new modularised approach brings the goal of repo backward compatibility much closer, for two reasons:

  1. borg_core is now (almost) entirely file metadata-agnostic. This ensures that there will be minimal need, in future, to change the repo data structures concerning the metadata of archived file content for the foreseeable future. Admittedly, however, it has no impact on backward-compatibility of on-disk repo format. Want to support a new feature in a specific filesystem (e.g. NTFS, xfs, reiser, btrfs, NFS, Amazon S3) in future? No problem! Just write a new shim or expand an existing one, leaving borg_core untouched.
  2. A specific shim can be written to output an entire repo as a stream (e.g. to stdout) in a well defined format. That format could be as simple as a serialised (msgpacked) dict where the key is the archive name and the value is the list of FP's (exactly the list/stream that the shims use to pass data back and forth with borg_core). Combined with a method to read an entire repo as a stream (e.g. from stdin) and you now have a mechanism for upgrading from one repo format to the next (and downgrading, for that matter). This might be in the form of the following piped command attic_v1.53 streamout my-old-repo | attic_2.01 streamin my-new-repo. All that is required to achieve this is the necessary disk space and adequate cups of coffee.

Sorry for the enormous post. Hopefully the idea doesn't suck. If it does, sorry for giving my readers eyestrain for no good reason.... ;-)

from borg.

anarcat avatar anarcat commented on July 17, 2024

[i'm hesitant in adding more to the wildly ranging conversation here, but it seems that one big issue in the goals of the project here is regarding backwards compatibility, so i'll add something about that. maybe a separate issue should be opened about this to summarize the conversation here and clarify borg's way of dealing the issue...]

anyways. so i understand where the "fork allows us to change" idea is coming from and i respect that. maybe it's fine to make a break to allow cleaning up bad assumptions in the code. i am worried about:

  1. gratitious changes: here i am refering specifically to 159315e - this commit changes magic number without a good justification. this seem to be contrary to even the goals stated in the summary here (namely "Don't break it accidentally / without good reason / without warning")
  2. eternal upgrade chase: even if we accept some of those changes, at some point, those changs need to stop and stabilise. maybe that's what a 1.0 release looks like. but then that means the software can't actually be used reliably in production until then, at which point it is locked. so a little more thought need to be put about how to introduce format changes safely and reliably.
  3. future-proofing: backup software should be self-contained (for disaster recovery) and able to deal with really old data. data that can't be read directly should be convertable, as a worst case option, but never lost (this already fails wrt to Attic because of the above commit, but i guess that's an acceptable compromise if we consider borg as a new backup software and not a fork (which it isn't))

I really like jborg's example of how old tar archives from 30 years ago can still be read. tar's specification also has the benefit of fitting with three paragraphs in wikipedia - clearly a different implementation. yet i believe is a standard any backup software should aspire to. notice how tar has dealt with potentially backwards-incompatible changes...

basically, my position is that attic/borg should not break backwards compatibility and support past formats forever. i haven't seen compelling evidence or changes that warrant such a break at this point, and I would like those proposing such changes to show such an example otherwise the conversation will likely continue to go nowhere... i believe that any such change can be made in a backwards compatible way, the current format is not so bad as it will explode in the future...

from borg.

anarcat avatar anarcat commented on July 17, 2024

since so many discussions were about backwards compatibility here, i thought it was relevant to open an issue specifically about this in #26.

from borg.

anarcat avatar anarcat commented on July 17, 2024

oh, and in PR #25, i actually suggest we document the goals stated in the summary here "as is" (mostly), meaning that i agree with those.

i wonder if a code of conduct or something similar wouldn't be a good idea too... a few ideas:

from borg.

JensRantil avatar JensRantil commented on July 17, 2024

Any progress on this issue? What is outstanding? Can it be closed?

from borg.

ThomasWaldmann avatar ThomasWaldmann commented on July 17, 2024

Guess this can be closed.

from borg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.