Code Monkey home page Code Monkey logo

benji's Introduction

https://img.shields.io/github/workflow/status/elemental-lf/benji/All-in-One/master?style=plastic https://img.shields.io/pypi/l/benji.svg?style=plastic&label=License https://img.shields.io/pypi/v/benji.svg?style=plastic&label=PyPI%20version https://img.shields.io/pypi/pyversions/benji.svg?style=plastic&label=Supported%20Python%20versions

Benji Backup

Benji Backup is a block based deduplicating backup software. It builds on the excellent foundations and concepts of backyยฒ by Daniel Kraft.

While Benji can backup any block device or image file (this includes LVM logical volumes and snapshots) it excels at backing up Ceph RBD images and it also includes preliminary support to backup iSCSI targets.

Benji is written in Python and is available in PyPI for installation with pip. Benji also features a generic container image with all dependencies included as well as an image and Helm chart to integrate Benji into a Kubernetes environment to backup Ceph RBD based persistent volumes.

The documentation is available here.

Status

Benji is beta quality and will probably stay that way due to time constraints. Please open an issue on GitHub if you have any usage question that is not or incorrectly covered by the documentation. And have a look at the CHANGES file for any upgrade notes.

Benji requires Python 3.6.5 or newer because older Python versions have some shortcomings in the concurrent.futures implementation which lead to an excessive memory usage.

The master branch contains the development version of Benji and may be broken at times and may even destroy your backups. Please use the latest pre-releases to get some resemblance of stability and a migration path from one pre-release to the next.

The benji-k8s container image together with the Helm chart provides a solid way for backing up persistent volumes provided by Ceph RBD. This includes volumes provisioned by Rook, Ceph CSI or the older volume plugin integrated into kubelet.

Main Features

Small backups

Benji deduplicates all data read and each unique block is only written to the storage location once. The deduplication takes into account all historic data present on the backup storage and so spans all backups and all backup sources.

In addition Benji supports fast state-of-the-art compression to further reduce the storage space requirements.

Fast backups
With the help of snapshots and the rbd diff command Benji only backups blocks that have changed since the last backup when used with Ceph RBD images. The same mechanism can be extended to other backup sources.
Fast restores
Sparse blocks are be skipped on restore providing fast restores of sparsely populated disk images.
Low bandwidth requirements
As only changed and not yet known blocks are written to the backup storage, the bandwidth requirements for the network connection between Benji and the storage location are usually low. Even with newly created block devices the traffic to the backup storage location is generally small as these devices mostly contain sparse blocks. Enabling compression further reduces the bandwidth requirements.
Support for a variety of backup storage locations

Benji supports AWS S3 as a backup storage location and it has options to enable compatibility with other S3 implementations like Google Storage, Ceph's RADOS Gateway or Minio.

Benji also supports Backblaze's B2 Cloud Storage which opens up a very cost effective way to store backups.

Benji is able to use any file based storage including external hard drives and network based storage solutions like NFS, SMB or even CephFS.

Multiple different storage locations can be used simultaneously and in parallel to accomodate different backup strategies.

Confidentiality
Benji supports AES-256 in GCM mode to encrypt all data blocks on the backup storage. By using envelope encryption every block is encrypted with its own unique random key. This makes plaintext attacks even more difficult.
Integrity
Each data block in Benji is protected by a checksum. This checksum is not only used for deduplication but also to ensure the integrity of the whole backup. Long-term availability of backups is ensured by regularly checking existing backups for bit rot.
Integrated NBD server

Benji brings its own NBD (network block device) server which makes backup images directly accessible as a block device - even over the network. The block device can be mounted if it contains a filesystem and any individual files needed can be easily restored even though Benji is a block based backup solution.

Benji can also provide a writable version of a backup via NBD. This enables repair operations like fsck. The original backup is not changed in this case. All changes are transparently written to a new backup via copy-on-write and this new backup can be restored just like any other backup after the repair is complete.

Concurrency
Benji supports running multiple operations simultaneously. Instances can be distributed across different hosts or containers without the need for a central server.
Extensibility
Benji comes with a module framework to easily add new protocols for accessing backup sources or storages. New compression and encryption algorithms are also easily integrated into Benji.

benji's People

Contributors

1337andre avatar alexander-bauer avatar allenporter avatar arcticsnowman avatar bk203 avatar elemental-lf avatar fr3aker avatar gschoenberger avatar jasonb5 avatar jubalh avatar ksperis avatar kvaps avatar olifre avatar pn-d9t avatar q3k avatar sea-you avatar serialvelocity avatar wamdam avatar wech71 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

benji's Issues

Large RBD Volume backups are very slow

Hey guys I have been testing quite a bit with Benji. I am using Benji with a simple Postgressql database.

I am noticing that it takes a very long time to backup a large RBD volume with no changes.

For example I have two RBD volumes setup. 1 is 6TB and 1 is 500G. The 500G volume with no change takes 4 minutes for a complete backup. The 6TB volume takes between 80-100 minutes.

Is there anything which can be done to speedup this process? It seems the benji preparation step is limited to a single CPU and just spins at 100%.

As a comparison, backy2 is able to backup this same volume in 20 minutes. Which is still pretty long for no change.

LVM Backups with hints

Our Ceph RBD backups have been going really well and we would love to start using Benji for LVM backups.

However some of our LVM volumes are quite large (6-15TB). This results in Benji having to read the entire LVM snapshot on each backup. Getting about 200MB/s from each Benji backup results in a 10 hour backup of a 6TB volume or 20+ hours on a 15TB volume.

Does anyone know of a way to come up with a "hints" file between two LVM snapshots, similar to the hints file for ceph/rbd? I noticed in the documentation that its "possible" but there wasn't much talk about it.

incomplete backup can't be removed

When a backup is incompleted caused by a killed process / power outage etc. like this here

| 2019-02-04T23:48:03 | V0000000007 | RBD-2810875e-ba27-4ccf-b802-96268ac7104e | FREEZE_2019-02-04T19:07:57.864475+00:00 | 120.0GiB |     4.0MiB | incomplete |   False   | file    |

trying to remove the backup does not work

benji rm -f V7
    INFO: $ /backup/benji/venv/bin/benji rm -f V7
   ERROR: Version V0000000007 is already locked.

what would be the way to clean it from the database and the storage?

ERROR: ModuleNotFoundError: No module named 'rados'

When trying to do benji database-init I get the following error:

INFO: $ /usr/local/benji/bin/benji database-init
ERROR: ModuleNotFoundError: No module named 'rados'

My configuration looks like this:

configurationVersion: '1'
logFile: /var/log/benji.log
databaseEngine: sqlite:////backup/benji.sqlite
blockSize: 4194304
processName: benji
disallowRemoveWhenYounger: 6
defaultStorage: local
storages:
  - name: local
    storageId: 1
    module: file
    configuration:
      path: /backup/
  - name: b2
    storageId: 2
    module: b2
    configuration:
      accountId: {blablabla}
      applicationKey: {blablabla}
      bucketName: hugetest
      writeObjectAttempts: 2
      uploadAttempts: 5
      bandwidthRead: 0
      bandwidthWrite: 0
      simultaneousWrites: 20
      simultaneousReads: 20
ios:
  - name: rbd
    module: rbd
    configuration:
      cephConfigFile: /etc/ceph/ceph.conf
      clientIdentifier: admin
      newImageFeatures:
        - RBD_FEATURE_LAYERING
        - RBD_FEATURE_EXCLUSIVE_LOCK
        - RBD_FEATURE_STRIPINGV2
      simultaneousReads: 20
  - name: file
    module: file

NBD export fails to mount with default blocksize

Trying to mount any backup via NBD:

backup-host # benji nbd -r
desktop-machine # nbd-client -N V0000000002 localhost -p 10809 /dev/nbd0
Negotiation: ..size = 10240MB
bs=1024, sz=10737418240 bytes
desktop-machine # LANG=C mount -t ext4 -o ro,norecovery /dev/nbd0p1 mnt
mount: /home/olifre/mnt: wrong fs type, bad option, bad superblock on /dev/nbd0p1, missing codepage or helper program, or other error.

Interestingly, syslog on desktop-machine contains:

[2541735.786008]  nbd0: p1
[2541735.786062] nbd0: p1 size 41938944 extends beyond EOD, truncated
[2541753.379898] EXT4-fs (nbd0p1): VFS: Can't find ext4 filesystem

Trying partprobe (but that seems to happen automatically on newer systems?):

desktop-machine # LANG=C partprobe /dev/nbd0
Error: Can't have a partition outside the disk!

backup-host is a CentOS 7 machine, so I don't have NBD kernel modules :sad:.
desktop-machine is a modern kernel 4.19 Gentoo machine with:

nbd-tools version: 3.15.3
kernel: 4.19

Retrying with:

nbd-client -N V0000000002 localhost -b 512 -p 10809 /dev/nbd0

works completely well, though. I can reproduce the same with Backy2 by the way.

Is this some configuration issue / version problem on my end?
The documentation also contains the example

# modprobe nbd
# nbd-client -N V0000000001 127.0.0.1 -p 10809 /dev/nbd0
Negotiation: ..size = 10MB
bs=1024, sz=10485760 bytes

so I presume this should theoretically work with other block sizes than 512?

sqlite3.OperationalError: database is locked

Attempting to export a version via NBD which is valid and not concurrently accessed, the following happens:

    INFO: Starting to serve NBD on 127.0.0.1:10809
    INFO: Incoming connection from 127.0.0.1:53868
    INFO: [127.0.0.1:53868] Negotiated export: V0000000002
   ERROR: Task exception was never retrieved
future: <Task finished coro=<NbdServer.handler() done, defined at /opt/benji/lib64/python3.6/site-packages/benji/nbdserver.py:112> exception=InternalError('Attempt to release lock "V0000000002" even though it isn\'t held',)>
Traceback (most recent call last):
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/benji/lib64/python3.6/site-packages/benji/database.py", line 1054, in lock
    self._session.commit()
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 954, in commit
    self.transaction.commit()
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 467, in commit
    self._prepare_impl()
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 447, in _prepare_impl
    self.session.flush()
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2313, in flush
    self._flush(objects)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2440, in _flush
    transaction.rollback(_capture_exception=True)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
    raise value
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2404, in _flush
    flush_context.execute()
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 395, in execute
    rec.execute(self)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 560, in execute
    uow
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj
    mapper, table, insert)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 836, in _emit_insert_statements
    execute(statement, multiparams)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/opt/benji/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked [SQL: 'INSERT INTO locks (host, process_id, lock_name, reason, date) VALUES (?, ?, ?, ?, ?)'] [parameters: ('backup.virt.physik.uni-bonn.de', '7e6f9db0137511e9b9fe003048c63494', 'V0000000002', 'NBD', '2019-01-08 18:44:58.876437')] (Background on this error at: http://sqlalche.me/e/e3q8)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/benji/lib64/python3.6/site-packages/benji/nbdserver.py", line 178, in handler
    self.store.open(version)
  File "/opt/benji/lib64/python3.6/site-packages/benji/benji.py", line 1120, in open
    self._benji_obj._locking.lock_version(version.uid, reason='NBD')
  File "/opt/benji/lib64/python3.6/site-packages/benji/database.py", line 1117, in lock_version
    locked_msg='Version {} is already locked.'.format(version_uid.v_string))
  File "/opt/benji/lib64/python3.6/site-packages/benji/database.py", line 1058, in lock
    raise AlreadyLocked(locked_msg)
benji.exception.AlreadyLocked: Version V0000000002 is already locked.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/benji/lib64/python3.6/site-packages/benji/nbdserver.py", line 309, in handler
    self.store.close(version)
  File "/opt/benji/lib64/python3.6/site-packages/benji/benji.py", line 1123, in close
    self._benji_obj._locking.unlock_version(version.uid)
  File "/opt/benji/lib64/python3.6/site-packages/benji/database.py", line 1126, in unlock_version
    self.unlock(lock_name=version_uid.v_string)
  File "/opt/benji/lib64/python3.6/site-packages/benji/database.py", line 1091, in unlock
    raise InternalError('Attempt to release lock "{}" even though it isn\'t held'.format(lock_name))
benji.exception.InternalError: Attempt to release lock "V0000000002" even though it isn't held

Since a backup of a different snapshot of the same image was performed in parallel, the database lock is fully expected.

Not sure if this should really work, but it should at least not crash ๐Ÿ˜‰ .

Grafana Dashboards

I think it would be helpful if the benji project shared a generic grafana dashboard.json file to help users discover what key things they should be monitoring via prometheus.

Metadata-import gets stuck

Hello,

When i try to make metadata-import it gets stuck and can't complete(i've waited all night and nothing happens).

# benji --log-level  DEBUG metadata-import
    INFO: $ /benji/bin/benji --log-level DEBUG metadata-import
   DEBUG: commands.metadata_import(**{'input_file': None})
   DEBUG: Using block hash BLAKE2b with kwargs {'digest_bits': 256}.
   DEBUG: Resolved schema for benji.io.base-v1: {'configuration': {'type': 'dict', 'nullable': True, 'required': True, 'schema': {'simultaneousReads': {'type': 'integer', 'empty': False, 'min': 1, 'default': 1}}}}.
   DEBUG: Resolved schema for benji.io.rbd-v1: {'configuration': {'type': 'dict', 'nullable': True, 'required': True, 'schema': {'simultaneousReads': {'type': 'integer', 'empty': False, 'min': 1, 'default': 1}, 'cephConfigFile': {'type': 'string', 'empty': False, 'default': '/etc/ceph/ceph.conf'}, 'clientIdentifier': {'type': 'string', 'empty': False, 'default': 'admin'}, 'newImageFeatures': {'type': 'list', 'empty': False, 'default': ['RBD_FEATURE_LAYERING'], 'schema': {'type': 'string', 'regex': '^RBD_FEATURE_.*'}}}, 'empty': False}}.
   DEBUG: Configuration for module benji.io.rbd: {'cephConfigFile': '/etc/ceph/ceph.conf', 'clientIdentifier': 'admin', 'newImageFeatures': ['RBD_FEATURE_LAYERING', 'RBD_FEATURE_EXCLUSIVE_LOCK', 'RBD_FEATURE_STRIPINGV2'], 'simultaneousReads': 20}.
   DEBUG: Resolved schema for benji.storage.base-v1: {'configuration': {'type': 'dict', 'empty': False, 'schema': {'activeTransforms': {'type': 'list', 'empty': False, 'schema': {'type': 'string', 'empty': False}}, 'simultaneousWrites': {'type': 'integer', 'empty': False, 'min': 1, 'default': 1}, 'simultaneousReads': {'type': 'integer', 'empty': False, 'min': 1, 'default': 1}, 'bandwidthRead': {'type': 'integer', 'empty': False, 'min': 0, 'default': 0}, 'bandwidthWrite': {'type': 'integer', 'empty': False, 'min': 0, 'default': 0}, 'consistencyCheckWrites': {'type': 'boolean', 'empty': False, 'default': False}, 'hmac': {'type': 'dict', 'empty': False, 'schema': {'password': {'type': 'string', 'empty': False, 'required': True, 'minlength': 8}, 'kdfSalt': {'type': 'string', 'empty': False, 'required': True}, 'kdfIterations': {'type': 'integer', 'empty': False, 'required': True, 'min': 1000}}}}}}.
   DEBUG: Resolved schema for benji.storage.base.ReadCache-v1: {'configuration': {'type': 'dict', 'empty': False, 'schema': {'activeTransforms': {'type': 'list', 'empty': False, 'schema': {'type': 'string', 'empty': False}}, 'simultaneousWrites': {'type': 'integer', 'empty': False, 'min': 1, 'default': 1}, 'simultaneousReads': {'type': 'integer', 'empty': False, 'min': 1, 'default': 1}, 'bandwidthRead': {'type': 'integer', 'empty': False, 'min': 0, 'default': 0}, 'bandwidthWrite': {'type': 'integer', 'empty': False, 'min': 0, 'default': 0}, 'consistencyCheckWrites': {'type': 'boolean', 'empty': False, 'default': False}, 'hmac': {'type': 'dict', 'empty': False, 'schema': {'password': {'type': 'string', 'empty': False, 'required': True, 'minlength': 8}, 'kdfSalt': {'type': 'string', 'empty': False, 'required': True}, 'kdfIterations': {'type': 'integer', 'empty': False, 'required': True, 'min': 1000}}}, 'readCache': {'type': 'dict', 'empty': False, 'schema': {'directory': {'type': 'string', 'required': True, 'empty': False, 'dependencies': ['maximumSize', 'shards']}, 'maximumSize': {'type': 'integer', 'required': True, 'min': 1, 'dependencies': ['directory', 'shards']}, 'shards': {'type': 'integer', 'required': True, 'min': 1, 'dependencies': ['directory', 'maximumSize']}}}}, 'required': True}}.
   DEBUG: Resolved schema for benji.storage.b2-v1: {'configuration': {'type': 'dict', 'empty': False, 'schema': {'activeTransforms': {'type': 'list', 'empty': False, 'schema': {'type': 'string', 'empty': False}}, 'simultaneousWrites': {'type': 'integer', 'empty': False, 'min': 1, 'default': 1}, 'simultaneousReads': {'type': 'integer', 'empty': False, 'min': 1, 'default': 1}, 'bandwidthRead': {'type': 'integer', 'empty': False, 'min': 0, 'default': 0}, 'bandwidthWrite': {'type': 'integer', 'empty': False, 'min': 0, 'default': 0}, 'consistencyCheckWrites': {'type': 'boolean', 'empty': False, 'default': False}, 'hmac': {'type': 'dict', 'empty': False, 'schema': {'password': {'type': 'string', 'empty': False, 'required': True, 'minlength': 8}, 'kdfSalt': {'type': 'string', 'empty': False, 'required': True}, 'kdfIterations': {'type': 'integer', 'empty': False, 'required': True, 'min': 1000}}}, 'readCache': {'type': 'dict', 'empty': False, 'schema': {'directory': {'type': 'string', 'required': True, 'empty': False, 'dependencies': ['maximumSize', 'shards']}, 'maximumSize': {'type': 'integer', 'required': True, 'min': 1, 'dependencies': ['directory', 'shards']}, 'shards': {'type': 'integer', 'required': True, 'min': 1, 'dependencies': ['directory', 'maximumSize']}}}, 'accountId': {'type': 'string', 'required': True, 'empty': False, 'excludes': ['accountIdFile']}, 'accountIdFile': {'type': 'string', 'required': True, 'empty': False, 'excludes': ['accountId']}, 'applicationKey': {'type': 'string', 'required': True, 'empty': False, 'excludes': ['applicationKeyFile']}, 'applicationKeyFile': {'type': 'string', 'required': True, 'empty': False, 'excludes': ['applicationKey']}, 'bucketName': {'type': 'string', 'required': True, 'empty': False}, 'accountInfoFile': {'type': 'string', 'empty': False}, 'uploadAttempts': {'type': 'integer', 'empty': False, 'default': 5, 'min': 1}, 'writeObjectAttempts': {'type': 'integer', 'empty': False, 'default': 3, 'min': 1}, 'readObjectAttempts': {'type': 'integer', 'empty': False, 'default': 3, 'min': 1}}, 'required': True}}.
   DEBUG: Configuration for module benji.storage.b2: {'accountId': '***', 'applicationKey': '***', 'bucketName': '***', 'writeObjectAttempts': 2, 'uploadAttempts': 5, 'bandwidthRead': 0, 'bandwidthWrite': 0, 'simultaneousWrites': 20, 'simultaneousReads': 20, 'consistencyCheckWrites': False, 'readObjectAttempts': 3}.
   DEBUG: Current database schema revision: fe79ce75cefa.
   DEBUG: Expected database schema revision: fe79ce75cefa.

   
# benji version-info
    INFO: $ /benji/bin/benji version-info
    INFO: Benji version: 0.1.1.dev73+g1f49663.
    INFO: Configuration version: 1, supported >=1,<2.
    INFO: Metadata version: 1.0.0, supported >=1,<2.
    INFO: Object metadata version: 1.0.0, supported >=1,<2.

Prometheus metrics for invalid and incomplete backup versions

Report number of invalid backup versions as a Prometheus metric. This is easy to implement.
For the incomplete metric we need to exclude recent backup versions because it is okay for them to be in this state if they belong to a running backup. This will require support for comparing dates in the filter expression parser in benji.database._QueryBuilder.

benji ls 'labels["benji-backup.me/instance"] == "benji-k8s" and status == "invalid"'
benji ls 'labels["benji-backup.me/instance"] == "benji-k8s" and status == "incomplete"' and date < ...'

state of project

Hi,

I found this project after I discovered some problems with backy2.

I started packaging backy2 for openSUSE, but it is too unmaintained from upstream to make it into the distro. I wondered what are the plans about this fork. It seems much more active and well maintained.

However when writing an email to maintainer the mail bounced back. Not a good sign either :-)

I would like to know what your future goals with this project are and how reliable it will be.
Thanks.

Anyway to improve restore speeds

I could be missing something but I was wondering if there is anyway to improve the restore of a benji backup.

Ive messed with these settings and they don't seem to have much effect.

simultaneousReads: 
simultaneousWrites: 

The best I can ever get is about 15MB/s in writes during the restore. Benji is setup on a host which has a 10G connection on the ceph public network. I can mount up a rbd volume on the benji VM and write out at 200MB/s.

Maybe there are some other settings to improve this? Appreciate the input as always!

Configuration for module benji.config is invalid after upgrade

Hello,

After upgrading docker image to the latest version(based on centos) I can't run benji. I've found that config file syntax has changed so I've edited my config basing on documentation.

Unfortunately I'm still getting error:

# benji ls
Configuration validation errors:
  configuration.databaseEngine: required field
  configuration.io: unknown field
  configuration.ios: required field
  configuration.metadataEngine: unknown field
  configuration.storage: unknown field
  configuration.storages: required field
Uncaught exception
Traceback (most recent call last):
  File "/benji/bin/benji", line 11, in <module>
    load_entry_point('benji===unknown', 'console_scripts', 'benji')()
  File "/benji/lib64/python3.6/site-packages/benji/scripts/benji.py", line 812, in main
    config = Config()
  File "/benji/lib64/python3.6/site-packages/benji/config.py", line 171, in __init__
    self._config = ConfigDict(self.validate(module=__name__, config=config))
  File "/benji/lib64/python3.6/site-packages/benji/config.py", line 127, in validate
    raise ConfigurationError('Configuration for module {} is invalid.'.format(module))
benji.exception.ConfigurationError: Configuration for module benji.config is invalid.

My config(i think it's fine):

configurationVersion: '1'

logFile: /var/log/benji/benji.log
blockSize: 4194304
#hashFunction: BLAKE2b,digest_bits=256
processName: benji
disallowRemoveWhenYounger: 6

metadataEngine: postgresql://postgres:***@postgres/benji
defaultStorage: b2

storage:
  file:
    path: /var/lib/benji/data

  b2:
     accountId: ***
     applicationKey: ***
     bucketName: ceph-backup
     writeObjectAttempts: 2
     uploadAttempts: 5

  simultaneousWrites: 10
  simultaneousReads: 10
  bandwidthRead: 0
  bandwidthWrite: 0

nbd:
  cacheDirectory: /tmp

io:
  file:
    simultaneousReads: 20

  rbd:
    cephConfigFile: /etc/ceph/ceph.conf
    simultaneousReads: 10
    simultaneousWrites: 10
    newImageFeatures:
      - RBD_FEATURE_LAYERING
      - RBD_FEATURE_EXCLUSIVE_LOCK
      - RBD_FEATURE_STRIPINGV2
      #-RBD_FEATURE_OBJECT_MAP
      #-RBD_FEATURE_FAST_DIFF
      #-RBD_FEATURE_DEEP_FLATTEN
    clientIdentifier: admin

error also occurs in default config copied from docs:

configurationVersion: '1'
dataBackend:
  type: file
  file:
    path: /tmp
metadataBackend:
  engine: sqlite:///tmp/benji.sqlite

metadata-export and metadata-restore unavailable

Hello,

I'm using latest version of elementalnet/benji docker container and I can't use metadata-export and metadata-restore commands.

root@880e3487f43a:/# benji metadata-export
usage: benji [-h] [-v] [-m] [-V] [-c CONFIGFILE] [--no-color]
{initdb,backup,restore,protect,unprotect,rm,enforce,scrub,deep-scrub,bulk-scrub,bulk-deep-scrub,export,import,export-to-backend,import-from-backend,cleanup,ls,stats,diff-meta,nbd,add-tag,rm-tag}
...
benji: error: invalid choice: 'metadata-export' (choose from 'initdb', 'backup', 'restore', 'protect', 'unprotect', 'rm', 'enforce', 'scrub', 'deep-scrub', 'bulk-scrub', 'bulk-deep-scrub', 'export', 'import', 'export-to-backend', 'import-from-backend', 'cleanup', 'ls', 'stats', 'diff-meta', 'nbd', 'add-tag', 'rm-tag')
root@880e3487f43a:/# benji metadata-restore
usage: benji [-h] [-v] [-m] [-V] [-c CONFIGFILE] [--no-color]
{initdb,backup,restore,protect,unprotect,rm,enforce,scrub,deep-scrub,bulk-scrub,bulk-deep-scrub,export,import,export-to-backend,import-from-backend,cleanup,ls,stats,diff-meta,nbd,add-tag,rm-tag}
...
benji: error: invalid choice: 'metadata-restore' (choose from 'initdb', 'backup', 'restore', 'protect', 'unprotect', 'rm', 'enforce', 'scrub', 'deep-scrub', 'bulk-scrub', 'bulk-deep-scrub', 'export', 'import', 'export-to-backend', 'import-from-backend', 'cleanup', 'ls', 'stats', 'diff-meta', 'nbd', 'add-tag', 'rm-tag')

Document faster way to restore sparse backups to existing volumes

The following trick seems to work.
I am unsure if it has an impact on shareability of objects in case there are clones of the volume, though (we don't use clones yet).

# Check sizes... 
rbd info test-vm.example.com-disk1
rbd info test-vm.example.com-disk2
# Resize to "empty". Truncation resets all blocks to zero.
rbd resize --size 0M test-vm.example.com-disk1 --allow-shrink
rbd resize --size 0M test-vm.example.com-disk2 --allow-shrink
# Size back up.
rbd resize --size 20G test-vm.example.com-disk1
rbd resize --size 5G test-vm.example.com-disk2
benji restore -s -f 134 rbd://rbd/test-vm.physik.uni-bonn.de-disk1
benji restore -s -f 158 rbd://rbd/test-vm.physik.uni-bonn.de-disk2

The advantage over purging the volume and recreating it is that existing snapshots can be kept.
Since Ceph RBD actively knows that all non-restored blocks are zeroed, it will also not consume space for them.
Using restore without -s appears to consume full space (at least according to rbd du) until fstrim is used again after.

If you do not find a flaw in my argumentation, it might be worthwhile to document it, or maybe even implement it as a flag in benji restore ;-).

Progress broken for restore

Updating to the most recent version, restore now looks as follows:

    INFO: $ /opt/benji/bin/benji -c /etc/benji.yaml restore -f -s 2953 rbd://rbd/test-vm.example.com-disk1
    INFO: Active transforms for storage vmbackup: zstd (zstd).
    INFO: Restored 0/1 blocks (0.0%)
    INFO: Restored 0/2 blocks (0.0%)
    INFO: Restored 0/3 blocks (0.0%)
    INFO: Restored 0/4 blocks (0.0%)
    INFO: Restored 0/5 blocks (0.0%)
    INFO: Restored 0/6 blocks (0.0%)
    INFO: Restored 0/7 blocks (0.0%)
    INFO: Restored 1/8 blocks (12.5%)
    INFO: Restored 2/9 blocks (22.2%)
    INFO: Restored 3/10 blocks (30.0%)
...
    INFO: Restored 737/744 blocks (99.1%)
    INFO: Restored 738/745 blocks (99.1%)
    INFO: Restored 739/746 blocks (99.1%)
    INFO: Restored 740/747 blocks (99.1%)
    INFO: Restored 741/747 blocks (99.2%)
    INFO: Restored 742/747 blocks (99.3%)
    INFO: Restored 743/747 blocks (99.5%)
    INFO: Restored 744/747 blocks (99.6%)
    INFO: Restored 745/747 blocks (99.7%)
    INFO: Restored 746/747 blocks (99.9%)
    INFO: Restored 747/747 blocks (100.0%)
    INFO: Restore of version V0000002953 successful.

It seems the progress information is somehow broken. Checking the process name, the percentage shown is different (and appears to scale more correctly). Maybe that broke in the parallelization?

Can Benji do this?

We are interested in having powered off cold backup storage.

We would have two completely different backup storage locations. Only one would be powered on and in use, we would alternate them weekly.

How would benji handle backing up in this scenario? I am thinking we would need atleast 1 full backup on each storage location? The biggest item in my head is how benji will know to create a full backup on the 2nd storage set once its introduced as a full backup will exist in the database on the 1st storage.

Appreciate the input!

module 'b2' has no attribute 'bucket'

When trying to make a backup with benji to backblaze b2 i get this error:

ERROR: An exception of type AttributeError occurred: module 'b2' has no attribute 'bucket'

Odd deduplication Results

I am in the process of testing Benji to backup to two separate storage locations.

I currently have 15 RBD volumes, all of which have 43G of 100G used. 14 of them are benji restores from the original image.

The first initial backup to the first storage resulted in 43G of data used.

The first initial backup to the 2nd storage resulted in 114G of data used.

Seeing as I am backing up the same block devices, shouldn't I get similar results in space utilization?

I attempted to run a cleanup on the storage as well, but there wasn't anything to clean up.

Ceph RBD: Backups when object maps are invalid

This is more of a question than an issue (hopefully). What happens to a differential backup if the object-map is invalid?

I recently learned many of our snapshots have invalid object maps (Mimic, 13.2.1) for unclear reasons:

$ rbd --id=libvirt object-map check test-vm.example.com-disk2@b-2018-12-26T01:22:08+0100
Object Map Check: 5% complete...2019-01-09 17:42:06.709 7f8afcff9700 -1 librbd::ObjectMapIterateRequest: object map error: object rbd_data.88a47c83e458.0000000000000040 marked as 1, but should be 3
Object Map Check: 99% complete...2019-01-09 17:42:06.788 7f8af3fff700 -1 librbd::object_map::InvalidateRequest: 0x7f8ae00323e0 should_complete: r=0
Object Map Check: 100% complete...done

After a check, it's marked as invalid, and the same is then true for the fastdiff (could be recreated with object-map rebuild).

So this question is two-fold:

  • Does Benji explicitly check object maps?
  • Does Benji treat object maps marked as invalid correctly?

Or is this all the user's responsibility, since the user runs rbd diff to feed the RBD hints to Benji?

Example bash script fails to work when no labels are specified

@wanthalf noticed that the example bash script scripts/ceph.sh fails to work when no labels are specified. Could you please try this patch and report if it helps? Thanks.

diff --git a/scripts/ceph.sh b/scripts/ceph.sh
index cda4b31..846a76d 100644
--- a/scripts/ceph.sh
+++ b/scripts/ceph.sh
@@ -59,7 +59,7 @@ function benji::backup::ceph::initial {
         || return $?
 
     VERSION_UID="$(benji -m --log-level "$BENJI_LOG_LEVEL" backup -s "$CEPH_RBD_SNAPSHOT" -r "$CEPH_RBD_DIFF_FILE" \
-        $(printf -- "-l %s " "${VERSION_LABELS[@]}") rbd:"$CEPH_POOL"/"$CEPH_RBD_IMAGE"@"$CEPH_RBD_SNAPSHOT" \
+        $([[ ${#VERSION_LABELS[@]} -gt 0 ]] && printf -- "-l %s " "${VERSION_LABELS[@]}") rbd:"$CEPH_POOL"/"$CEPH_RBD_IMAGE"@"$CEPH_RBD_SNAPSHOT" \
         "$VERSION_NAME" 2> >(tee "$BENJI_BACKUP_STDERR_FILE" >&2) | _extract_version_uid | benji::version::uid::format)"
     local EC=$?
     BENJI_BACKUP_STDERR="$(<${BENJI_BACKUP_STDERR_FILE})"
@@ -98,7 +98,7 @@ function benji::backup::ceph::differential {
         || return $?
 
     VERSION_UID="$(benji -m --log-level "$BENJI_LOG_LEVEL" backup -s "$CEPH_RBD_SNAPSHOT" -r "$CEPH_RBD_DIFF_FILE" -f "$BENJI_VERSION_UID_LAST" \
-        $(printf -- "-l %s " "${VERSION_LABELS[@]}") rbd:"$CEPH_POOL"/"$CEPH_RBD_IMAGE"@"$CEPH_RBD_SNAPSHOT" \
+        $([[ ${#VERSION_LABELS[@]} -gt 0 ]] && printf -- "-l %s " "${VERSION_LABELS[@]}") rbd:"$CEPH_POOL"/"$CEPH_RBD_IMAGE"@"$CEPH_RBD_SNAPSHOT" \
         "$VERSION_NAME" 2> >(tee "$BENJI_BACKUP_STDERR_FILE" >&2) | _extract_version_uid  | benji::version::uid::format)"
     local EC=$?
     BENJI_BACKUP_STDERR="$(<${BENJI_BACKUP_STDERR_FILE})"

extras_require keys should not contain spaces

It seems not to work as intended:

pip install "benji[disk based read cache]"
pip install "benji[disk\ based\ read\ cache]"
pip install "benji[disk%20based%20read%20cache]"

all fail with something along the lines of:

Exception:
Traceback (most recent call last):
  File "/opt/benji/lib64/python3.6/site-packages/pip/_vendor/packaging/requirements.py", line 93, in __init__
    req = REQUIREMENT.parseString(requirement_string)
  File "/opt/benji/lib64/python3.6/site-packages/pip/_vendor/pyparsing.py", line 1632, in parseString
    raise exc
  File "/opt/benji/lib64/python3.6/site-packages/pip/_vendor/pyparsing.py", line 1622, in parseString
    loc, tokens = self._parse( instring, 0 )
  File "/opt/benji/lib64/python3.6/site-packages/pip/_vendor/pyparsing.py", line 1379, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/opt/benji/lib64/python3.6/site-packages/pip/_vendor/pyparsing.py", line 3395, in parseImpl
    loc, exprtokens = e._parse( instring, loc, doActions )
  File "/opt/benji/lib64/python3.6/site-packages/pip/_vendor/pyparsing.py", line 1383, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/opt/benji/lib64/python3.6/site-packages/pip/_vendor/pyparsing.py", line 3183, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pip._vendor.pyparsing.ParseException: Expected stringEnd (at char 11), (line:1, col:12)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/benji/lib64/python3.6/site-packages/pip/_internal/basecommand.py", line 141, in main
    status = self.run(options, args)
  File "/opt/benji/lib64/python3.6/site-packages/pip/_internal/commands/install.py", line 274, in run
    self.name, wheel_cache
  File "/opt/benji/lib64/python3.6/site-packages/pip/_internal/basecommand.py", line 213, in populate_requirement_set
    wheel_cache=wheel_cache
  File "/opt/benji/lib64/python3.6/site-packages/pip/_internal/req/req_install.py", line 248, in from_line
    extras = Requirement("placeholder" + extras.lower()).extras
  File "/opt/benji/lib64/python3.6/site-packages/pip/_vendor/packaging/requirements.py", line 97, in __init__
    requirement_string[e.loc:e.loc + 8]))
pip._vendor.packaging.requirements.InvalidRequirement: Invalid requirement, parse error at "'[disk ba'"

Looking at:
loli/medpy#49
it seems spaces do not work in the keys of extras_require (i.e. tooling does not like it).

Add override-lock to other options

Benji has been going really well. However we just hit this.

(benji) [root@ceph-benji ~]# /usr/local/benji/bin/benji restore -f --sparse V0000001452 file:/benji-bunker/test.img
INFO: $ /usr/local/benji/bin/benji restore -f --sparse V0000001452 file:/benji-bunker/test.img
ERROR: AlreadyLocked: Version V0000001452 is already locked.

Anyway we can get --override-locks added to the restore option?

Deep-Scrubbing incomplete backups marks them as valid

Cancelling a differential backup:

    INFO: $ /opt/benji/bin/benji -c /etc/benji.yaml backup -s b-2018-12-09T01:22:13+0100 -r /tmp/benji_vmbackup.draxbH -f 7 rbd://rbd/myhost.example.com-disk1@b-2018-12-09T01:22:13+0100 myhost.example.com-disk1
    INFO: Marked version V0000000008 as unprotected.
    INFO: Starting sanity check with 0.1% of the ignored blocks.
    INFO: Finished sanity check. Checked 3 blocks {2, 3, 4, 5, 6, 2462, 2335, 546, 115, 127}.
    INFO: Active transforms for storage vmbackup: zstd (zstd).
    INFO: Backed up 3/568 blocks (0.5%)
    INFO: Backed up 6/568 blocks (1.1%)
    INFO: Backed up 9/568 blocks (1.6%)
    INFO: Backed up 12/568 blocks (2.1%)
    INFO: Backed up 15/568 blocks (2.6%)
    INFO: Backed up 18/568 blocks (3.2%)
^C WARNING: IO backend closed with 549 outstanding read jobs, cancelling them.
 WARNING: Storage backend closed with 15 outstanding write jobs, cancelling them.
^C   ERROR: 

yields an invalid backup as expected. Running deep-scrub marks that as valid, however:

    INFO: $ /opt/benji/bin/benji -c /etc/benji.yaml deep-scrub V0000000008
 WARNING: Version V0000000008 is already marked as invalid.
    INFO: Active transforms for storage vmbackup: zstd (zstd).
    INFO: Deep scrubbed 3/463 blocks (0.6%)
    INFO: Deep scrubbed 6/463 blocks (1.3%)
    INFO: Deep scrubbed 9/463 blocks (1.9%)
    INFO: Deep scrubbed 12/463 blocks (2.6%)
<...>
    INFO: Deep scrubbed 453/463 blocks (97.8%)
    INFO: Deep scrubbed 456/463 blocks (98.5%)
    INFO: Deep scrubbed 459/463 blocks (99.1%)
    INFO: Deep scrubbed 462/463 blocks (99.8%)
    INFO: Deep scrubbed 463/463 blocks (100.0%)
    INFO: Marked version V0000000008 as valid.
    INFO: Deep scrub of version V0000000008 successful.

Interestingly, the number of blocks is different.

Removing that backup version yields a warning about missing metadata, though:

benji rm -f V0000000008
    INFO: $ /opt/benji/bin/benji -c /etc/benji.yaml rm -f V0000000008
    INFO: Active transforms for storage vmbackup: zstd (zstd).
 WARNING: Unable to remove version V0000000008 metadata from backend storage, the object wasn't found.
    INFO: Removed backup version V0000000008 with 2560 blocks.

Maybe deep-scrub is missing a metadata validity check?

Retention removed version it shouldn't have

I believe I hit a bug. Finally got a larger scale test going. 15 rbd volumes being backed up nightly by benji.

The first backup I did manually, the rest were scheduled via cron.

| 2019-04-05T10:39:05 | V0000000199 | vm-114-disk-0 | set-a-2019-04-05T10:39:02 | 100.0GiB | 4.0MiB | valid | False | set-a |
| 2019-04-06T00:08:35 | V0000000214 | vm-114-disk-0 | set-a-2019-04-06T00:08:32 | 100.0GiB | 4.0MiB | valid | False | set-a |
| 2019-04-07T00:08:35 | V0000000229 | vm-114-disk-0 | set-a-2019-04-07T00:08:32 | 100.0GiB | 4.0MiB | valid | False | set-a |
| 2019-04-08T00:08:35 | V0000000244 | vm-114-disk-0 | set-a-2019-04-08T00:08:33 | 100.0GiB | 4.0MiB | valid | False | set-a |
+---------------------+-------------+---------------+---------------------------+----------+------------+--------+-----------+---------+

I ran a enforce which was

/usr/local/benji/bin/benji enforce latest1,days14,months17

Which resulted in.

| 2019-04-05T09:50:19 | V0000000139 | vm-114-disk-0 | set-a-2019-04-05T09:50:17 | 100.0GiB | 4.0MiB | valid | False | set-a |
| 2019-04-07T00:08:35 | V0000000229 | vm-114-disk-0 | set-a-2019-04-07T00:08:32 | 100.0GiB | 4.0MiB | valid | False | set-a |
| 2019-04-08T00:08:35 | V0000000244 | vm-114-disk-0 | set-a-2019-04-08T00:08:33 | 100.0GiB | 4.0MiB | valid | False | set-a |
+---------------------+-------------+---------------+---------------------------+----------+------------+--------+-----------+---------+

So for whatever reason it removed V0000000214 which was my 04-06 daily snapshot.

I should also say when running a dry run, its very hard to interpret the output in such a way to make a quick decision on whats being deleted. Is there any way to list each version with its corresponding date line by line?

Add storage option to restore with --database-backend-less option

When restoring a version with --database-backend-less option which resides on storage outside of the default storage defined in /etc/benji/benji.yaml it fails. Would it be possible to add an option to the restore with --database-backend-less to define the storage which the version resides?

Config Issues for RBD Features

Looking at this.

https://github.com/elemental-lf/benji/blob/master/etc/benji.yaml

Seems like this should work in my config file.

  • name:
    module: rbd
    configuration:
    cephConfigFile: /etc/ceph/ceph.conf
    clientIdentifier: admin
    newImageFeatures:
    - RBD_FEATURE_LAYERING
    - RBD_FEATURE_EXCLUSIVE_LOCK
    - RBD_FEATURE_STRIPINGV2
    - RBD_FEATURE_OBJECT_MAP
    - RBD_FEATURE_FAST_DIFF
    - RBD_FEATURE_DEEP_FLATTEN

[root@ceph-backups ~]# /usr/local/benji/bin/benji ls
Uncaught exception
Traceback (most recent call last):
File "/usr/local/benji/lib64/python3.6/site-packages/benji/config.py", line 144, in init
config = ruamel.yaml.load(f, Loader=ruamel.yaml.SafeLoader)
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/main.py", line 935, in load
return loader._constructor.get_single_data()
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/constructor.py", line 109, in get_single_data
node = self.composer.get_single_node()
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/composer.py", line 78, in get_single_node
document = self.compose_document()
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/composer.py", line 101, in compose_document
node = self.compose_node(None, None)
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/composer.py", line 138, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/composer.py", line 218, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/composer.py", line 136, in compose_node
node = self.compose_sequence_node(anchor)
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/composer.py", line 180, in compose_sequence_node
node.value.append(self.compose_node(node, index))
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/composer.py", line 138, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/composer.py", line 211, in compose_mapping_node
while not self.parser.check_event(MappingEndEvent):
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/parser.py", line 141, in check_event
self.current_event = self.state()
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/parser.py", line 563, in parse_block_mapping_key
if self.scanner.check_token(KeyToken):
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/scanner.py", line 169, in check_token
self.fetch_more_tokens()
File "/usr/local/benji/lib64/python3.6/site-packages/ruamel/yaml/scanner.py", line 321, in fetch_more_tokens
self.reader.get_mark(),
ruamel.yaml.scanner.ScannerError: while scanning for the next token
found character '\t' that cannot start any token
in "/etc/benji/benji.yaml", line 20, column 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/benji/bin/benji", line 11, in
load_entry_point('benji==0.4.0.dev6+gb61bef7', 'console_scripts', 'benji')()
File "/usr/local/benji/lib64/python3.6/site-packages/benji/scripts/benji.py", line 297, in main
config = Config()
File "/usr/local/benji/lib64/python3.6/site-packages/benji/config.py", line 146, in init
raise ConfigurationError('Configuration file {} is invalid.'.format(source)) from exception
benji.exception.ConfigurationError: Configuration file /etc/benji/benji.yaml is invalid.
[root@ceph-backups ~]#

Can't make backblaze backups

Hello.
I've created a simple setup which is follows:

configurationVersion: '1'                                                                                                                                                                                                                                                                                      
databaseEngine: postgresql://benji:***@192.168.107.151/benji
defaultStorage: cloud-backups
storages:
  - name: cloud-backups
    storageId: 1
    module: b2
    configuration:
      accountId: 0cc**
      applicationKey: 003**
      bucketName: cephbackup
ios:
  - name: rbd
    module: rbdaio
    configuration:
      simultaneousReads: 5
      simultaneousWrites: 5 

When i try to make backup, i've get following error message:
ERROR: InvalidAuthToken: Invalid authorization token. Server said: (bad_auth_token)

The accountId (Master application Key) and applicationKey are correct.
What I am doing wrong ?

Dumb question - How do we use the scripts?

How do we leverage the scripts provided at benji/scripts/ceph.sh etc?
Running bash ceph.sh fails with

bash-4.2# source scripts/ceph.sh
bash: scripts/ceph.sh: line 64: syntax error near unexpected token `}'
bash: scripts/ceph.sh: line 64: `        } catch {'
bash-4.2#

I'm sure i'm missing something here? I'd rather not have to rewrite the script into 'standard' bash or use the old Backy2 scripts, this one looks to do exactly what I want, I just cant figure out how to use it :/

Slow incremental backup of empty, but large RBD volume

I have an RBD volume with 250 GB in size, zero blocks provisioned, daily snapshotted.
The rbd diff between two snapshots is zero blocks.
Taking a backup using this empty diff between two snapshots takes a large amount of time and is CPU bound (>1 hour on a slow machine). Benji is working at this stage:

benji [Preparing version V0000000314 (6.2%)]

Checking with strace, I see a lot of accesses to the sqlite backend (but not enough to explain the high CPU load alone). I presume the time is spent in the loop over all blocks:

for id in range(num_blocks):

What's not yet clear to me:

  • This time seems to be spent even though all blocks are sparse, and zero blocks have changed. Is this only the case since there is no non-empty backup?
  • Do you have an idea what eats the time here?

I'm thinking about hourly backups for some VMs with large volumes, but not-so-large changeset between the backups. If that slowness only affects RBD volumes which have always been empty, that's not a "real" issue, but if that's a constant overhead by total block count, it prevents that usecase.

I don't observe this effect with Backy2, but a lot has changed in terms of metadata (and backy2 is slower in almost all other operations ๐Ÿ˜‰ ).

Versions and Retention

I was wondering if there is anyway besides using labels to determine what versions are a full backup and what versions are differential?

Also wondering how retention determines what version to keep in specific scenario's? For example, if we are keeping 14 daily versions and wanted to keep monthly versions, how does benji determine what daily is the best to keep? Does it just keep the oldest daily as the monthly?

Error when trying to init the database

Hello,

i have setup an CentOS7 VM and installed benji without a container. Everything works well, but when I try to init the database, I get the following error:

{'event': 'Uncaught exception', 'level': 'error', 'timestamp': 1567076555.5844882, 'file': '/usr/local/lib/python3.6/site-packages/benji/logging.py', 'line': 169, 'function': 'handle_exception', 'process': 2333, 'thread_name': 'MainThread', 'thread_id': 140201858869056, 'exception': 'Traceback (most recent call last):\n File "/usr/local/bin/benji", line 10, in <module>\n sys.exit(main())\n File "/usr/local/lib/python3.6/site-packages/benji/scripts/benji.py", line 292, in main\n config = Config()\n File "/usr/local/lib/python3.6/site-packages/benji/config.py", line 168, in __init__\n if version_obj not in VERSIONS.configuration.supported:\n File "/usr/local/lib/python3.6/site-packages/semantic_version/base.py", line 642, in __contains__\n return self.match(version)\n File "/usr/local/lib/python3.6/site-packages/semantic_version/base.py", line 630, in match\n return self.clause.match(version)\n File "/usr/local/lib/python3.6/site-packages/semantic_version/base.py", line 745, in match\n return all(clause.match(version) for clause in self.clauses)\n File "/usr/local/lib/python3.6/site-packages/semantic_version/base.py", line 745, in <genexpr>\n return all(clause.match(version) for clause in self.clauses)\n File "/usr/local/lib/python3.6/site-packages/semantic_version/base.py", line 910, in match\n return version >= self.target\n File "/usr/local/lib/python3.6/site-packages/semantic_version/base.py", line 467, in __ge__\n return self.precedence_key >= other.precedence_key\nTypeError: \'>=\' not supported between instances of \'NoneType\' and \'int\''}

Could please someone explain me, what to do? Thank you very much.

Filter expressions slow / triple expressions fail

Using:

benji -m ls 'snapshot_name == "b-2018-10-31T01:22:09+0100" and name == "somehost.example.com-disk1" and valid == True'

yields:

ERROR: Invalid filter expression snapshot_name == "b-2018-10-31T01:22:09+0100" and name == "somehost.example.com-disk1" and valid == True (2).

Interestingly, this works fine:

benji -m ls '(snapshot_name == "b-2018-10-31T01:22:09+0100" and name == "somehost.example.com-disk1") and valid == True'

but executes for about half a minute.

The same slowness is observed if just brackets are added:

benji -m ls '(snapshot_name == "b-2018-10-31T01:22:09+0100" and name == "somehost.example.com-disk1")'

As compared to e.g.

benji -m ls 'snapshot_name == "b-2018-10-31T01:22:09+0100" and name == "somehost.example.com-disk1"' | jq -r '.versions[] | select(.valid == true)'

which executes in about 2 seconds on the same host (including python interpreter startup, activating the virtualenv etc.).

It seems that:

  • Having three logically combined expressions is not supported (yet). Is this expected?
  • Adding brackets triggers extremely slow runtime. Does this cause a full table scan? Even then, I would not expect that.

Using jq, however, is a viable workaround for my usecase.

Nota bene: The filter system is a really cool feature! ๐Ÿ‘

Retention Policy and Epoch time stamps

Working with enforcing a retention policy and I am hitting a somewhat odd issues.

Benji is reporting the epoch time stamp for the existing snapshots in the "future", which is causing the enforce option to not work as expected.

| 2019-03-26T09:58:58 | V0000000018 | vm-101-disk-0 | 2019-03-26T09:58:20 | 2.0TiB | 4.0MiB | valid | False | Ceph-Backup |

As you can see the date is 3-26-2019 at 09:58:58, but when I run benji enforce its not the same.

[root@ceph-backups ~]# /usr/local/benji/bin/benji enforce hours3 --dry-run
INFO: $ /usr/local/benji/bin/benji enforce hours3 --dry-run
WARNING: Version V0000000018: 1553623138.596453 isn't earlier than the reference time 1553611165.268765.

1553623138.596453 = Tue, 26 Mar 2019 13:58:58 -0400

Is there an option I need to add to have benji properly calculate this epoch with my proper timezone? Benji is reporting the correct current epoch.

Benji Read Performance

We have been doing some more testing with Benji and seem to be running into a odd performance issue.

We are pointing Benji at a logical volume. It reads the logical volume at around 200MB/s. I know my storage has considerably more performance. For example, if I have 2 benji backups running at the same time to the same storage, I can hit 400MB/s. Is there anything I can do so that a single backups can achieve those same speeds?

I have tried messing with the simultaneousReads and simultaneousWrites options but they don't seem to have an effect. With those set to 15 I hit 200MB/s, set to 30 and I still hit 200MB/s.

The storage has almost no latency and is under very little load.

CentOS 7 Issues

Hey all, very cool project and hoping to make good use of it. Would like to get benji working on C7, based on the install steps it was pretty straight forward. However I can't seem to get around this specific issue.

benji) [root@ceph-backups backy2]# benji backup --snapshot-name 2019-03-20T10:37:1 --rbd-hints /tmp/vm-101-disk-0.diff rbd://Test/vm-101-disk-0 vm-101-disk-0
INFO: $ /usr/local/benji/bin/benji backup --snapshot-name 2019-03-20T10:37:1 --rbd-hints /tmp/vm-101-disk-0.diff rbd://Test/vm-101-disk-0 vm-101-disk-0
ERROR: Module file benji.io.rbd not found or related import error.

Am I missing something major?

deep-scrub performance regression

After the recent performance improvements especially for large backups, I see a hefty performance regression for deep-scrubbing.

We run:

benji batch-deep-scrub -p 15

each day (such that all backups are scrubbed ~ 1 once per week).

Before the improvements, this ran from 06:47:25 to 09:19:07, i.e. less than 3 hours.
Now, it takes from 07:01:17 to 12:06:50, i.e. more than 5 hours. It seems we also spent 13 minutes more for the actual snapshot backups and purging of old backups, but that's ok to lose for many small volumes if large volumes are faster (and I see the backups of larger volumes are indeed a bit faster).

Any ideas?
By now, I think that running a full deep scrub over everything once a week instead of 15 % deep scrubs daily might be more efficient, but this slowdown is still a bit peculiar.

Backup slow caused by blocking stderr output over broken ssh connection

Hi,

I'm evaluating the backup solution. When backing up RBD to local file storage, the progress stopped here:

2019-02-05 02:08:22,932 [1354678] Backed up 10620/117943 blocks (9.0%)

The process still is there but does nothing... also the postgres lookup:

postgres 1354685  0.1  1.2 322188 103876 ?       Ss   00:58   0:31 postgres: 11/main: benji benji ::1(38032) idle in transaction

An strace to the pid gives something like this:

[pid 1354689] epoll_wait(9,  <unfinished ...>
[pid 1354688] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 1354687] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 1354688] <... clock_gettime resumed> {tv_sec=1794949, tv_nsec=712566362}) = 0
[pid 1354688] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 1354686] futex(0x12f0044, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 1354688] <... clock_gettime resumed> {tv_sec=1794949, tv_nsec=712654055}) = 0
[pid 1354678] write(2, "   DEBUG: Queued block 11171 for"..., 70 <unfinished ...>
[pid 1354688] epoll_wait(6,  <unfinished ...>
[pid 1354699] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 1354699] clock_gettime(CLOCK_REALTIME, {tv_sec=1549351027, tv_nsec=750511022}) = 0
[pid 1354699] futex(0x185c0c8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 1354699] futex(0x185c140, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1549351029, tv_nsec=750511022}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 1354699] clock_gettime(CLOCK_REALTIME, {tv_sec=1549351029, tv_nsec=750843541}) = 0
[pid 1354699] futex(0x185c0c8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 1354699] futex(0x185c140, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1549351031, tv_nsec=750843541}, 0xffffffff <unfinished ...>
[pid 1354694] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 1354694] clock_gettime(CLOCK_REALTIME, {tv_sec=1549351030, tv_nsec=176449012}) = 0
[pid 1354694] clock_gettime(CLOCK_REALTIME, {tv_sec=1549351030, tv_nsec=176568263}) = 

Database migration fails with sqlite

Hi,

after adopting cb2447c , I get:

# benji database-migrate
    INFO: $ /opt/benji/bin/benji -c /etc/benji.yaml database-migrate
    INFO: Migrating from database schema revision fe79ce75cefa to 151248f94062.
   ERROR: NotImplementedError: This backend does not support multiple-table criteria within UPDATE

Using Encryption

Testing out the encryption of benji.

Here is what I have in my config (replaced master key with xxx).

transforms:

  • name: encryption
    module: aes_256_gcm
    configuration:
    masterKey: xxxxxxxxxxxxx

Did a backup, then commented out the above in the benji.yaml.

Kicked off a restore and it seems to be restoring the data, shouldn't it fail if encryption is enabled and it doesn't have the masterKey?

Differential Backups of ceph image not always not working

Following the steps on https://benji-backup.me/backup.html#examples to do differential backups works only some times, but most of the times I encounter the following problem:

(benji) root@node3:/# rbd snap create ceph-ssd/vm-111-disk-1@backup1
(benji) root@node3:/# rbd diff --whole-object ceph-ssd/vm-111-disk-1@backup1 --format=json > /tmp/vm1.diff
(benji) root@node3:/# benji backup --snapshot-name backup1 --rbd-hints /tmp/vm1.diff rbd:ceph-ssd/vm-111-disk-1@backup1 vm1
INFO: $ /usr/local/benji/bin/benji backup --snapshot-name backup1 --rbd-hints /tmp/vm1.diff rbd:ceph-ssd/vm-111-disk-1@backup1 vm1
INFO: Backed up 1/15 blocks (6.7%)
INFO: Backed up 2/15 blocks (13.3%)
INFO: Backed up 3/15 blocks (20.0%)
INFO: Backed up 4/15 blocks (26.7%)
INFO: Backed up 5/15 blocks (33.3%)
INFO: Backed up 6/15 blocks (40.0%)
INFO: Backed up 7/15 blocks (46.7%)
INFO: Backed up 8/15 blocks (53.3%)
INFO: Backed up 9/15 blocks (60.0%)
INFO: Backed up 10/15 blocks (66.7%)
INFO: Backed up 11/15 blocks (73.3%)
INFO: Backed up 12/15 blocks (80.0%)
INFO: Backed up 13/15 blocks (86.7%)
INFO: Backed up 14/15 blocks (93.3%)
INFO: Backed up 15/15 blocks (100.0%)
INFO: Set status of version V0000000001 to valid.
INFO: Backed up metadata of version V0000000001.
INFO: New version V0000000001 created, backup successful.

(benji) root@node3:/# rbd snap create ceph-ssd/vm-111-disk-1@backup2
(benji) root@node3:/# rbd diff --whole-object ceph-ssd/vm-111-disk-1@backup2 --from-snap backup1 --format=json > /tmp/vm1.diff
(benji) root@node3:/# benji ls 'name == "vm1" and snapshot_name == "backup1"'
INFO: $ /usr/local/benji/bin/benji ls name == "vm1" and snapshot_name == "backup1"
+---------------------+-------------+------+---------------+--------+------------+--------+-----------+---------+
| date | uid | name | snapshot_name | size | block_size | status | protected | storage |
+---------------------+-------------+------+---------------+--------+------------+--------+-----------+---------+
| 2019-09-03T02:59:44 | V0000000001 | vm1 | backup1 | 1.0GiB | 4.0MiB | valid | False | usb |
+---------------------+-------------+------+---------------+--------+------------+--------+-----------+---------+
(benji) root@node3:/# benji backup --snapshot-name backup2 --rbd-hints /tmp/vm1.diff --base-version V0000000001 rbd:ceph-ssd/vm-111-disk-1@backup2 vm1
INFO: $ /usr/local/benji/bin/benji backup --snapshot-name backup2 --rbd-hints /tmp/vm1.diff --base-version V0000000001 rbd:ceph-ssd/vm-111-disk-1@backup2 vm1
WARNING: Hints are empty, assuming nothing has changed.
INFO: Starting sanity check with 0.1% of the ignored blocks.
ERROR: Source and backup don't match in regions outside of the ones indicated by the hints.
ERROR: Looks like the hints don't match or the source is different.
ERROR: Found wrong source data at block 4: offset 16777216, length 4194304
ERROR: InputDataError: Source changed in regions outside of ones indicated by the hints.

Cleanup is terribly slow

Hello,

I'm using benji with rbd and backblaze and I've made simple test:

  1. Backed up two 25GB rbd volumes - took 45 minutes
  2. Executed benji cleanup to remove unused block from backblaze - took 110 minutes

Summary: removing backups takes 2,5x more time than doing backups.

Is there any chance to speed up this process?

Input/output error accessing NBD

I've configured Benji to use a dedicated PostgreSQL database and storing backups in a Ceph radosgw using both compression and encryption. Backups and incremental backups complete without errors and I'm able to restore backups to new RBD images where a sha512sum matches the source and destination volumes.

Testing NBD exports is however unsuccessful, are there any caveats that I should be aware of?

Server:
(benji) [root@kvm1a ~]# benji nbd -a 192.168.1.60 -r INFO: $ /usr/local/benji/bin/benji nbd -a 192.168.1.60 -r INFO: Starting to serve NBD on 192.168.1.60:10809 INFO: Incoming connection from 192.168.1.61:58222. INFO: [192.168.1.61:58222] Negotiated export: V0000000085. INFO: [192.168.1.61:58222] Version V0000000085 has been opened. INFO: [192.168.1.61:58222] Export is read only. INFO: Active transforms for storage radosgw: zstd (zstd), aes_256_gcm (aes_256_gcm). ERROR: [192.168.1.61:58222] 0 bytes read on a total of 28 expected bytes

Client:
[admin@kvm1b ~]# nbd-client -N V0000000085 192.168.1.60 -p 10809 -b 512 -t 10 /dev/nbd5 Negotiation: ..size = 10240MB Connected /dev/nbd5 [admin@kvm1b ~]# parted /dev/nbd5 Warning: Error fsyncing/closing /dev/nbd5: Input/output error Retry/Ignore? i GNU Parted 3.2 Using /dev/nbd5 Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Error: /dev/nbd5: unrecognised disk label Model: Unknown (unknown) Disk /dev/nbd5: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: unknown Disk Flags: (parted) q Warning: Error fsyncing/closing /dev/nbd5: Input/output error Retry/Ignore? i [admin@kvm1b ~]# nbd-client -d /dev/nbd5

Size of NBD block device appears to be correct...

Benji backup I'm attempting to mount via NBD:
(benji) [root@kvm1a ~]# benji ls | grep -e '+' -e name -e V0000000085 INFO: $ /usr/local/benji/bin/benji ls +---------------------+-------------+---------------------------+-----------------------+----------+------------+--------+-----------+---------+ | date | uid | name | snapshot_name | size | block_size | status | protected | storage | +---------------------+-------------+---------------------------+-----------------------+----------+------------+--------+-----------+---------+ | 2019-10-07T21:03:54 | V0000000085 | office-sip-disk0 | b-2019-10-07T21:03:49 | 10.0GiB | 4.0MiB | valid | False | radosgw | +---------------------+-------------+---------------------------+-----------------------+----------+------------+--------+-----------+---------+

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.