Code Monkey home page Code Monkey logo

mongodb_consistent_backup's Introduction

MongoDB Consistent Backup Tool - mongodb-consistent-backup

https://github-release-version.herokuapp.com/github/Percona-Lab/mongodb_consistent_backup/release.svg?style=flat https://travis-ci.org/Percona-Lab/mongodb_consistent_backup.svg?branch=master

About

Creates cluster-consistent point-in-time backups of MongoDB with optional archiving, compression/de-duplication, encryption and upload functionality

The motivation for this tool in explained in this Percona blog post (more posts coming soon): "MongoDB Consistent Backups"

Note: Percona does not actively develop this tool since the release of [percona-backup-mongodb](https://github.com/percona/percona-backup-mongodb/) at the beginning of October 2019. This newer tool makes consistent backups for sharded clusters with the same dump and oplog-taking procedures, but uses golang agents rather than shell scripts.

Features

  • Works on a single replset (2+ members) or a sharded cluster
  • Auto-discovers healthy members for backup by considering replication lag, replication 'priority' and by preferring 'hidden' members
  • Creates cluster-consistent backups across many separate shards
  • 'mongodump' is the default (and currently only) backup method. Other methods coming soon!
  • Transparent restore process (just add --oplogReplay flag to your mongorestore command)
  • Archiving and compression of backups (optional)
  • Block de-duplication and optional AES encryption at rest via ZBackup archiving method (optional)
  • AWS S3 Secure Multipart backup uploads (optional)
  • Google Cloud Storage Secure backup uploads (optional)
  • Rsync (over SSH) secure backup uploads (optional)
  • Nagios NSCA push notification support (optional)
  • Zabbix sender notification support (optional)
  • Modular backup, archiving, upload and notification components
  • Support for MongoDB Authentication and SSL database connections
  • Support for Read Preference Tags for selecting specific nodes for backup
  • mongodb+srv:// DNS Seedlist support
  • Rotation of backups by time or count
  • Multi-threaded, single executable
  • Auto-scales to number of available CPUs by default

Limitations

  • MongoDB Replication is required on all nodes (sharding config servers included)
  • The host running 'mongodb-consistent-backup' must have enough disk, network and cpu resources to backup all shards in parallel
  • When MongoDB authentication is used, the same user/password/authdb and role(s) must exist on all hosts

Requirements:

  • MongoDB / Percona Server for MongoDB 3.2 and above with Replication enabled
  • Backup consistency depends on consistent server time across all hosts! Server time must be synchronized on all nodes using ntpd and a consistent time source or virtualization guest agent that syncs time
  • Must have 'mongodump' installed and specified if not at default: /usr/bin/mongodump. Even if you do not run MongoDB 3.2+, it is strongly recommended to use MongoDB 3.2+ mongodump binaries due to inline compression and parallelism features
  • Must have Python 2.7 installed

Releases

Pre-built release binaries and packages are available on our GitHub Releases Page. We recommend most users deploy mongodb_consistent_backup using these packages.

Build/Install

To build on CentOS/RedHat, you will need the following packages installed:

$ yum install python python-devel python-virtualenv gcc git make libffi-devel openssl-devel

To build an CentOS/RedHat RPM of the tool (recommended):

$ cd /path/to/mongodb_consistent_backup
$ yum install -y rpm-build
$ make rpm

To build and install from source (to default '/usr/local/bin/mongodb-consistent-backup'):

$ cd /path/to/mongodb_consistent_backup
$ make
$ make install

Use the PREFIX= variable to change the installation path (default: /usr/local), ie: make PREFIX=/usr install to install to: '/usr/bin/mongodb-consistent-backup'.

MongoDB Authorization

If your replset/cluster uses Authentication, you must add a user with the "backup" and "clusterMonitor" built-in auth roles.

To create a user, execute the following replace the 'pwd' field with a secure password!:

db.getSiblingDB("admin").createUser({
        user: "mongodb_consistent_backup",
        pwd: "PASSWORD-HERE",
        roles: [
                { role: "backup", db: "admin" },
                { role: "clusterMonitor", db: "admin" }
        ]
})

User and password are set using the 'user' and 'password' config-file fields or via the '-u' and '-p' command-line flags not recommended due to security concerns

Run a Backup

Using Command-Line Flags

Note: username+password is visible in process lists when set using the command-line flags. Use a config file (below) to hide credentials!

$ mongodb-consistent-backup -H mongos1.example.com -P 27018 -u mongodb-consistent-backup -p s3cr3t -n prodwebsite -l /var/lib/mongodb-consistent-backup
...
...
$ ls /opt/mongobackups
prodwebsite

Using a Config File

The tool supports a YAML-based config file for settings. The config file is loaded first and any additional command-line arguments override the file based config settings.

$ mongodb-consistent-backup --config /etc/mongodb-consistent-backup.yml
...

An example (with comments) of the YAML-based config file is here: conf/mongodb-consistent-backup.example.conf.

A description of all available config settings can also be listed by passing the '--help' flag to the tool.

Restore a Backup

The backups are mongorestore compatible and stored in a directory per backup. The --oplogReplay flag MUST be present to replay the oplogs to ensure consistency.

$ tar xfvz <shardname>.tar.gz
...
$ mongorestore --host mongod12.example.com --port 27017 -u admin -p 123456 --oplogReplay --gzip --dir /var/lib/mongodb-consistent-backup/default/20170424_0000/rs0/dump

Run as Docker Container

To persist logs, configs and backup data 3 directories should be mapped to be inside the Docker container.

The 'docker run' command -v/--volume flags in the examples below map container paths to paths on your Docker host. The example below assumes there is a path on the Docker host named '/data/mongobackup' with 'data', 'conf' and 'logs' subdirectories mapped to inside the container. Replace any instance of '/data/mongobackup' below to a different path if necessary.

Note: store a copy of your mongodb-consistent-backup.conf in the 'conf' directory and pass it's container path as the --config= flag if you wish to use config files.

Via Docker Hub

$ mkdir -p /data/mongobackup/{conf,data,logs}
$ cp -f /path/to/mongodb-consistent-backup.conf /data/mongobackup/conf
$ docker run -it \
    -v "/data/mongobackup/conf:/conf:Z" \
    -v "/data/mongobackup/data:/var/lib/mongodb-consistent-backup:Z" \
    -v "/data/mongobackup/logs:/var/log/mongodb-consistent-backup:Z" \
  perconalab/mongodb_consistent_backup:latest --config=/conf/mongodb-consistent-backup.conf

Build and Run Docker Image

$ cd /path/to/mongodb_consistent_backup
$ make docker
$ mkdir -p /data/mongobackup/{conf,data,logs}
$ cp -f /path/to/mongodb-consistent-backup.conf /data/mongobackup/conf
$ docker run -it \
    -v "/data/mongobackup/conf:/conf:Z" \
    -v "/data/mongobackup/data:/var/lib/mongodb-consistent-backup:Z" \
    -v "/data/mongobackup/logs:/var/log/mongodb-consistent-backup:Z" \
  mongodb_consistent_backup --config=/conf/mongodb-consistent-backup.conf

ZBackup Archiving (Optional)

Note: the ZBackup archive method is not yet compatible with the 'Upload' phase. Disable uploading by setting 'upload.method' to 'none' in the meantime.

ZBackup (with LZMA compression) is an optional archive method for mongodb_consistent_backup. This archive method significantly reduces disk usage for backups via de-duplication and compression.

ZBackup offers block de-duplication and compression of backups and optionally supports AES-128 (CBC mode with PKCS#7 padding) encryption at rest. The ZBackup archive method causes backups to be stored via ZBackup at archive time.

To enable, ZBackup must be installed on your system and the 'archive.method' config file variable (or --archive.method flag=) must be set to 'zbackup'.

ZBackup's compression is most efficient when compression is disabled in the backup phase, to do this set 'backup.<method>.compression' to 'none'.

Install on CentOS/RHEL

$ yum install zbackup

Install on Debian/Ubuntu

$ apt-get install zbackup

Get Backup from ZBackup

ZBackup data is stored in a storage directory named 'mongodb_consistent_backup-zbackup' and must be restored using a 'zbackup restore ...' command.

$ zbackup restore --password-file /etc/zbackup.passwd /mnt/backup/default/mongodb_consistent_backup-zbackup/backups/20170424_0000.tar | tar -xf

Delete Backup from ZBackup

To remove a backup, first delete the .tar file in 'backups' subdir of the ZBackup storage directory. After, run a 'zbackup gc full' garbage collection to remove unused data.

$ rm -f /mnt/backup/default/mongodb_consistent_backup-zbackup/backups/20170424_0000.tar
$ zbackup gc full --password-file /etc/zbackup.passwd /mnt/backup/default/mongodb_consistent_backup-zbackup

Submitting Code

  • Submitted code must pass Python 'flake8' checks. Run 'make flake8' to test.
  • To make review easier, pull requests must address and solve one problem at a time.

Links

Contact

Contact Percona

mongodb_consistent_backup's People

Contributors

akira-kurogane avatar box9527 avatar corey-hammerton avatar dbmurphy avatar delgod avatar dschneller avatar genuss avatar islue avatar jessewiles avatar k0te avatar maulal avatar omasana avatar timvaillancourt avatar warp3r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mongodb_consistent_backup's Issues

Incorrect count of secondaries in Replset.py

In this line secondary is counted as a replSet member only if it has better score than the previous one.
https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/MongoBackup/Replset.py#L113

If there is more than three members in replSet it will randomly raise "Not enough secondaries in replset %s to safely take backup!"

I don't know if it's intended for secondaries with too big lag to be counted to QUORUM(https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/MongoBackup/Replset.py#L119) or not (https://github.com/Percona-Lab/mongodb_consistent_backup/blob/master/MongoBackup/Replset.py#L109)

Support replicaset-based config servers properly

The code currently assumes config servers are not a replset and there is no Oplog/Tail.py thread opened for it.

Fix: add detection for replicaset-based config servers and dump them using the same method as regular shards. If non-replicaset, dump the config server after all shard dumps complete.

1.0.3: Setting replication.max_lag_secs in YAML-config causes Replset:find_secondary to have score '0' for all nodes

When setting config-file the field on release 1.0.3:

production:
  ...
  replication:
    max_lag_secs: <num>
  ...

All nodes receive a score of '0' and are then chosen at random. This is incorrect as (of course) the scoring of secondaries should not be zero.

EDIT: this may be a polymorphism problem and the int is being considered to be a string. Check that the config value field becomes an int the code is able to compare.

slash in collection names, the backup fails

Hi

Thanks for the awesome tool . I am trying to use it via docker. It fails

[2017-06-14 21:17:02,893] [INFO] [MongodumpThread-2] [MongodumpThread:wait:92] xxxx-prod-mongo/ip-192-168-160-21.ec2.internal:27000: Failed: "abc.abc_allows_/api/accounts/list/:userId" contains a path separator '/' and can't be dumped to the filesystem

The same error comes if i try to use mongodump directly, but in mongodump if we give --archive=xxxx.dump, then it succeeds

I dont see any way of passing this config parameter while using the tool here. how can we add more config options like --archive=path_to_dump , so that this tool can also work

Not logging storage location

During a replica set backup all mentions to where oplog and data are stored are empty like below:

[2017-05-18 10:08:49,713] [INFO] [MongodumpThread-2] [MongodumpThread:wait:92] nosql15/dvnosql02-prd:27017:     writing captured oplog to
[2017-05-18 10:08:49,728] [INFO] [MongodumpThread-2] [MongodumpThread:run:161] Backup nosql15/dvnosql02-prd:27017 completed in 0.15 seconds, 5 oplog changes, end ts: Timestamp(1495094929, 35)

'NoneType' object is not callable

I receive the following error when trying to compile:

Could not satisfy all requirements for multiprocessing:
multiprocessing
Exception in thread Thread-2 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
File "/usr/local/lib/python2.7/threading.py", line 754, in run
File "~/mongodb_consistent_backup/build/venv/lib/python2.7/site-packages/pex/crawler.py", line 144, in execute
File "/usr/local/lib/python2.7/Queue.py", line 172, in get
<type 'exceptions.TypeError'>: 'NoneType' object is not callable

Prefer hidden secondary hosts for backups

It seems the backup tool will pick the replset member which is most in-sync even if there is a hidden secondary.

If it's within safe replication lag limits, we should probably prefer hidden nodes slightly to cause less impact to a live DB setup

Use 'selected secondaries' list consistently in Sharded backup mode

Currently ReplsetHandlerSharded is called twice in a backup: once in Oplog/Tailer.py and once in Mongodumper.py. Calling this twice can lead to different results and break the backup.

Fix: call ReplsetHandlerSharded from Backup.py and pass the same object/result down to Oplog/Tailer.py and Mongodumper.py.

spec file need updating

Spec file is referencing some old locations and should be updated for the new pathing of this repo

build script error

Hi,

When I try to build the project, I received an error from the scripts/build.sh file in line:

${venvdir}/bin/python2.7 ${venvdir}/bin/pip install --download-cache=${pipdir} pex requests

I changed to the pip install option:

--cache-dir

And build process finished correctly.

Should be your pip version is too old? My current version of python is 2.7.13 and pip version is 9.0.1

I do not know if the build script is correct or not.

There is no cleanup stage if mongodump executable not found

Here is the result of sample run:

$ mongodb-consistent-backup -n ff -l ~/backup_dir/ -P 21021
[2017-07-27 12:47:29,771] [INFO] [MainProcess] [Main:init:143] Starting mongodb-consistent-backup version 1.0.3 (git commit: 90e7f2fd9cb58bcbdc63bf0d8640e3c7f411e7f0)
[2017-07-27 12:47:29,771] [INFO] [MainProcess] [Main:init:144] Loaded config: {"archive": {"method": "tar", "tar": {"compression": "gzip"}, "zbackup": {"binary": "/usr/bin/zbackup", "cache_mb": 128, "compression": "lzma"}}, "authdb": "admin", "backup": {"location": "/home/shahriyar.rzaev/backup_dir/", "method": "mongodump", "mongodump": {"binary": "/usr/bin/mongodump", "compression": "auto"}, "name": "ff"}, "environment": "production", "host": "localhost", "lock_file": "/tmp/mongodb-consistent-backup.lock", "notify": {"method": "none"}, "oplog": {"compression": "none", "tailer": {"status_interval": 30}}, "port": 21021, "replication": {"max_lag_secs": 5, "max_priority": 1000}, "sharding": {"balancer": {"ping_secs": 3, "wait_secs": 300}}, "upload": {"method": "none", "s3": {"chunk_size_mb": 50, "region": "us-east-1", "retries": 5, "secure": true, "threads": 4}}}
[2017-07-27 12:47:29,771] [INFO] [MainProcess] [Stage:init:32] Notify stage disabled, skipping
[2017-07-27 12:47:29,774] [INFO] [MainProcess] [State:init:135] Initializing root state directory /home/shahriyar.rzaev/backup_dir/ff
[2017-07-27 12:47:29,775] [INFO] [MainProcess] [State:load_backups:153] Found 0 existing completed backups for set
[2017-07-27 12:47:29,775] [INFO] [MainProcess] [State:init:119] Initializing backup state directory: /home/shahriyar.rzaev/backup_dir/ff/20170727_1247
[2017-07-27 12:47:29,776] [INFO] [MainProcess] [Stage:init:32] Upload stage disabled, skipping
[2017-07-27 12:47:29,776] [INFO] [MainProcess] [Main:run:268] Running backup in replset mode using seed node(s): localhost:21021
[2017-07-27 12:47:29,863] [CRITICAL] [MainProcess] [Mongodump:can_gzip:60] Cannot find or execute the mongodump binary file /usr/bin/mongodump!

But if it is could not connect to mongo there is a cleanup stage:

$ mongodb-consistent-backup -n ff -l /home/shahriyar.rzaev/backup_dir/
[2017-07-27 12:34:44,318] [INFO] [MainProcess] [Main:init:143] Starting mongodb-consistent-backup version 1.0.3 (git commit: 90e7f2fd9cb58bcbdc63bf0d8640e3c7f411e7f0)
[2017-07-27 12:34:44,318] [INFO] [MainProcess] [Main:init:144] Loaded config: {"archive": {"method": "tar", "tar": {"compression": "gzip"}, "zbackup": {"binary": "/usr/bin/zbackup", "cache_mb": 128, "compression": "lzma"}}, "authdb": "admin", "backup": {"location": "/home/shahriyar.rzaev/backup_dir/", "method": "mongodump", "mongodump": {"binary": "/usr/bin/mongodump", "compression": "auto"}, "name": "ff"}, "environment": "production", "host": "localhost", "lock_file": "/tmp/mongodb-consistent-backup.lock", "notify": {"method": "none"}, "oplog": {"compression": "none", "tailer": {"status_interval": 30}}, "port": 27017, "replication": {"max_lag_secs": 5, "max_priority": 1000}, "sharding": {"balancer": {"ping_secs": 3, "wait_secs": 300}}, "upload": {"method": "none", "s3": {"chunk_size_mb": 50, "region": "us-east-1", "retries": 5, "secure": true, "threads": 4}}}
[2017-07-27 12:34:44,318] [INFO] [MainProcess] [Stage:init:32] Notify stage disabled, skipping
[2017-07-27 12:34:49,330] [ERROR] [MainProcess] [DB:connect:47] Unable to connect to localhost:27017! Error: localhost:27017: [Errno 111] Connection refused
[2017-07-27 12:34:49,331] [CRITICAL] [MainProcess] [Main:exception:218] Cannot connect to seed host(s): localhost:27017
[2017-07-27 12:34:49,331] [INFO] [MainProcess] [Main:cleanup_and_exit:174] Starting cleanup procedure! Stopping running threads
[2017-07-27 12:34:49,334] [INFO] [MainProcess] [Main:cleanup_and_exit:204] Cleanup complete, exiting

some of the oplog tailers never stop

Here is a piece of logs. No warn or error captured.

[2017-06-03 16:42:55,634] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 176740 oplog changes, ts: Timestamp(1496508174, 25)
[2017-06-03 16:43:01,184] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1123373 oplog changes, ts: Timestamp(1496508180, 48)
[2017-06-03 16:43:26,595] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 178661 oplog changes, ts: Timestamp(1496508205, 35)
[2017-06-03 16:43:32,116] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1134272 oplog changes, ts: Timestamp(1496508211, 64)
[2017-06-03 16:43:56,693] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 180553 oplog changes, ts: Timestamp(1496508235, 46)
[2017-06-03 16:44:02,942] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1144761 oplog changes, ts: Timestamp(1496508241, 310)
[2017-06-03 16:44:26,900] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 182454 oplog changes, ts: Timestamp(1496508265, 56)
[2017-06-03 16:44:33,736] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1155331 oplog changes, ts: Timestamp(1496508272, 217)
[2017-06-03 16:44:56,900] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 184355 oplog changes, ts: Timestamp(1496508296, 15)
[2017-06-03 16:45:04,596] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1165687 oplog changes, ts: Timestamp(1496508303, 219)
[2017-06-03 16:45:27,104] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 186225 oplog changes, ts: Timestamp(1496508326, 8)
[2017-06-03 16:45:35,380] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1175690 oplog changes, ts: Timestamp(1496508334, 143)
[2017-06-03 16:45:57,339] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 188117 oplog changes, ts: Timestamp(1496508356, 30)
[2017-06-03 16:46:06,226] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1185682 oplog changes, ts: Timestamp(1496508365, 125)
[2017-06-03 16:46:27,434] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 189992 oplog changes, ts: Timestamp(1496508386, 35)
[2017-06-03 16:46:37,111] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1196267 oplog changes, ts: Timestamp(1496508396, 34)
[2017-06-03 16:46:57,555] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 191871 oplog changes, ts: Timestamp(1496508416, 43)
[2017-06-03 16:47:07,986] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1206821 oplog changes, ts: Timestamp(1496508427, 1)
[2017-06-03 16:47:27,650] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 193755 oplog changes, ts: Timestamp(1496508446, 57)
[2017-06-03 16:47:38,744] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1217278 oplog changes, ts: Timestamp(1496508457, 294)
[2017-06-03 16:47:57,678] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 195634 oplog changes, ts: Timestamp(1496508476, 52)
[2017-06-03 16:48:09,573] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1228031 oplog changes, ts: Timestamp(1496508488, 220)
[2017-06-03 16:48:27,820] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 197515 oplog changes, ts: Timestamp(1496508506, 64)
[2017-06-03 16:48:40,427] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1237997 oplog changes, ts: Timestamp(1496508519, 123)
[2017-06-03 16:48:43,486] [INFO] [MongodumpThread-5] [MongodumpThread:wait:92] ps-rs1/ip-10-1-15-43:27017:      [####....................]  .oplog  196708/1017579  (19.3%)
[2017-06-03 16:48:46,486] [INFO] [MongodumpThread-5] [MongodumpThread:wait:92] ps-rs1/ip-10-1-15-43:27017:      [###############.........]  .oplog  640370/1017579  (62.9%)
[2017-06-03 16:48:49,486] [INFO] [MongodumpThread-5] [MongodumpThread:wait:92] ps-rs1/ip-10-1-15-43:27017:      [########################]  .oplog  1065050/1017579  (104.7%)
[2017-06-03 16:48:50,471] [INFO] [MongodumpThread-5] [MongodumpThread:wait:92] ps-rs1/ip-10-1-15-43:27017:      [########################]  .oplog  1180521/1017579  (116.0%)
[2017-06-03 16:48:58,662] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 199453 oplog changes, ts: Timestamp(1496508537, 48)
[2017-06-03 16:49:00,705] [INFO] [MongodumpThread-5] [MongodumpThread:run:161] Backup ps-rs1/ip-10-1-15-43:27017 completed in 2936.25 seconds, 1180521 oplog changes, end ts: Timestamp(1496508522, 35)
[2017-06-03 16:49:11,341] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1248646 oplog changes, ts: Timestamp(1496508550, 148)
[2017-06-03 16:49:28,739] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 201349 oplog changes, ts: Timestamp(1496508567, 59)
[2017-06-03 16:49:42,211] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1259175 oplog changes, ts: Timestamp(1496508581, 35)
[2017-06-03 16:49:59,717] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 203345 oplog changes, ts: Timestamp(1496508598, 90)
[2017-06-03 16:50:13,149] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1269164 oplog changes, ts: Timestamp(1496508612, 50)
[2017-06-03 16:50:29,805] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 205246 oplog changes, ts: Timestamp(1496508628, 96)
[2017-06-03 16:50:43,957] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1279387 oplog changes, ts: Timestamp(1496508642, 309)
[2017-06-03 16:50:55,486] [INFO] [MongodumpThread-6] [MongodumpThread:wait:92] ps-rs0/ip-10-1-15-239:27017:     [#######.................]  .oplog  361056/1120609  (32.2%)
[2017-06-03 16:50:58,486] [INFO] [MongodumpThread-6] [MongodumpThread:wait:92] ps-rs0/ip-10-1-15-239:27017:     [#################.......]  .oplog  797852/1120609  (71.2%)
[2017-06-03 16:50:59,961] [INFO] [TailThread-4] [TailThread:status:61] Oplog tailer ps-rsc/ip-10-1-12-46:27019 status: 207130 oplog changes, ts: Timestamp(1496508658, 109)
[2017-06-03 16:51:01,486] [INFO] [MongodumpThread-6] [MongodumpThread:wait:92] ps-rs0/ip-10-1-15-239:27017:     [########################]  .oplog  1174436/1120609  (104.8%)
[2017-06-03 16:51:02,356] [INFO] [MongodumpThread-6] [MongodumpThread:wait:92] ps-rs0/ip-10-1-15-239:27017:     [########################]  .oplog  1281210/1120609  (114.3%)
[2017-06-03 16:51:13,938] [INFO] [MongodumpThread-6] [MongodumpThread:run:161] Backup ps-rs0/ip-10-1-15-239:27017 completed in 3069.48 seconds, 1281210 oplog changes, end ts: Timestamp(1496508652, 197)
[2017-06-03 16:51:14,722] [INFO] [TailThread-3] [TailThread:status:61] Oplog tailer ps-rs0/ip-10-1-15-239:27017 status: 1289044 oplog changes, ts: Timestamp(1496508673, 255)
[2017-06-03 16:51:17,912] [INFO] [MainProcess] [Mongodump:wait:92] All mongodump backups completed successfully
[2017-06-03 16:51:17,913] [INFO] [MainProcess] [Stage:run:92] Completed running stage mongodb_consistent_backup.Backup with task Mongodump in 3073.48 seconds
[2017-06-03 16:51:17,914] [INFO] [MainProcess] [Tailer:stop:72] Stopping all oplog tailers
[2017-06-03 16:51:17,942] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:18,443] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:18,944] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:19,446] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:19,947] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:20,448] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:20,949] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:21,450] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:21,952] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:22,453] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:22,954] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:23,455] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:23,956] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:24,457] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:24,959] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:25,460] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:25,961] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:26,462] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:26,963] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:27,464] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:27,966] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:28,467] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:28,968] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:29,469] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:29,970] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)
[2017-06-03 16:51:30,472] [INFO] [MainProcess] [Tailer:stop:94] Waiting for ps-rs1/ip-10-1-15-43:27017 tailer to reach ts: Timestamp(1496508677, 234), currrent: Timestamp(1496506248, 198)

Should error on non-replset/non-cluster systems

We assume that if we are not connecting to Mongos, that it should be connecting to a replica set. We should not do this but rather detect if running in stand alone mongod mode, and if so abort warning that it is not possible to have a consistent backup without an oplog ( created when running as a replicaset or shard)

check_balancer_running method never returns True on 3.4

We have a small MongoDB 3.4 sharded cluster where we have been testing MongoDB Consistent Backup.
After configuring the YML conf file we have been able to start the operation, but we only got this repeated message before a time-out kicked in:

[2017-05-12 13:59:44,684] [INFO] [MainProcess] [Sharding:stop_balancer:113] Balancer is still running, sleeping for 3 sec(s)

Having a look into the code, it seems like check_balancer_running() never returns True because config.locks.state is always 2 in 3.4, the MongoDB documentation reports:

Changed in version 3.4: As of version 3.4, the state field will always have a value 2 to prevent any legacy mongos instances from performing the balancing operation. The when field specifies the time when the config server member became the primary.

While check_balancer_running() looks for 0 to ensure that the balancer is disabled.

def check_balancer_running(self):
        try:
            config = self.connection['config']
            lock   = config['locks'].find_one({'_id': 'balancer'})
            if 'state' in lock and int(lock['state']) == 0:
                return False
            return True
        except Exception, e:
raise DBOperationError(e)

Could you please have a look?

Reuse DB connections in Main thread

In version 0.2.0 we do not always reuse Mongo DB connections. DB connections cannot be reused across threads (must be reopened in each thread), but there are many connections in the "MainProcess" thread that could be reused by passing a DB class from Backup.py to ReplsetHandler.py + ShardingHandler.py.

Here is the DB:connect debug output for a 2-shard cluster (11 x DB connections created!!!):

[tim@centos7 mongodb_consistent_backup]$ grep 'DB:connect' log 
[2016-07-08 19:41:40,493] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongos1:27017
[2016-07-08 19:41:40,500] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongos1:27017
[2016-07-08 19:41:50,602] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongos1:27017
[2016-07-08 19:41:50,612] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongod1:27017
[2016-07-08 19:41:50,626] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongod3:27017
[2016-07-08 19:41:50,836] [DEBUG] [OplogTail-1] [DB:connect:23] Getting MongoDB connection to centos7-mongod2:27017
[2016-07-08 19:41:50,839] [DEBUG] [OplogTail-2] [DB:connect:23] Getting MongoDB connection to centos7-mongod4:27017
[2016-07-08 19:41:50,856] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongos1:27017
[2016-07-08 19:41:50,884] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongos1:27017
[2016-07-08 19:41:50,890] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongod1:27017
[2016-07-08 19:41:50,905] [DEBUG] [MainProcess] [DB:connect:23] Getting MongoDB connection to centos7-mongod3:27017

Cannot authenticate with a user - config.username != config.user

In mongodb_consistent_backup/Common/Config.py you're assigning the user to config.user, however, the DB class (and other places) attempt to access config.username.

The most probable fix would be to Config.py:

-        parser.add_argument("-u", "--user", dest="user", help="MongoDB Authentication Username (for optional auth)", type=str)
+        parser.add_argument("-u", "--user", dest="username", help="MongoDB Authentication Username (for optional auth)", type=str)

Support nsca crypto

Allow configuration of pynsca crypto mode.
Include python-mcrypt and pycrypto in requirements.txt.
Don't assume mode 16 just because a password is defined.

No option for backing up from primary

We have a use case where backup needs to be taken on the data center hosting the replica set primaries, while the secondaries are geographically another continents.
I read through the documents, but I didn't find any option to specify primary reads. Apparently this option is statically passed to mongodump:

mongodump_flags.extend(["--readPreference=secondary"])

in MongodumpThread.py, along with a secondary hostname.

Any plan to add flexibility on this aspect?

Thanks

Should replication lag be calculated from heartbeat metrics?

Currently I use mongodb_consistent_backup as a backup tool for my config-server replica set. In this case updates are very rare. The tool calculates lag using last oplog operation time. In my case this often causes false-positive results of too-high replication lag. Will it be more consious to use lastHeartBeat* metrics or am I missing something here?
As an example please see attached metrics from one of my secondary replica:
image
Note the oplog lag goes up to 10 seconds although heartBeat doesn't exceed 2 seconds (my heartbeatInterval)

error: 'utf8' codec can't decode byte 0xa1

$ grep TailThread-2 /var/log/mongodb-consistent-backup/backup.stats.20170525_0557.log
...
[2017-05-25 07:09:30,587] [INFO] [TailThread-2] [TailThread:status:60] Oplog tailer rs0/ip-10-1-16-127:27017 status: 1227990 oplog changes, ts: Timestamp(1495696169, 6)
[2017-05-25 07:10:01,011] [INFO] [TailThread-2] [TailThread:status:60] Oplog tailer rs0/ip-10-1-16-127:27017 status: 1233520 oplog changes, ts: Timestamp(1495696200, 2)
[2017-05-25 07:10:22,763] [CRITICAL] [TailThread-2] [TailThread:run:102] Tailer rs0/ip-10-1-16-127:27017 error: 'utf8' codec can't decode byte 0xa1 in position 0: invalid start byte

Sorry, 'utf8' codec issue again.
I have checked the codes around but can't find something calling decode. Any idea?

Did not use gzip compression

[root@localhost prodwebsite]# mongodb-consistent-backup -H 127.0.0.1 -P 27017 -n prodwebsite --archive.tar.threads 2 --archive.tar.compression gzip -l /home/data/db
[2017-08-14 05:40:34,549] [INFO] [MainProcess] [Main:init:144] Starting mongodb-consistent-backup version 1.1.0 (git commit: 34818a2)
[2017-08-14 05:40:34,549] [INFO] [MainProcess] [Main:init:145] Loaded config: {"archive": {"method": "tar", "tar": {"compression": "gzip", "threads": 2}, "zbackup": {"binary": "/usr/bin/zbackup", "cache_mb": 128, "compression": "lzma"}}, "authdb": "admin", "backup": {"location": "/home/data/db", "method": "mongodump", "mongodump": {"binary": "/usr/bin/mongodump", "compression": "auto"}, "name": "prodwebsite"}, "environment": "production", "host": "127.0.0.1", "lock_file": "/tmp/mongodb-consistent-backup.lock", "notify": {"method": "none"}, "oplog": {"compression": "none", "flush": {"max_docs": 1000, "max_secs": 1}, "tailer": {"enabled": "true", "status_interval": 30}}, "port": 27017, "replication": {"max_lag_secs": 10, "max_priority": 1000}, "sharding": {"balancer": {"ping_secs": 3, "wait_secs": 300}}, "upload": {"gs": {"threads": 4}, "method": "none", "s3": {"chunk_size_mb": 50, "region": "us-east-1", "retries": 5, "secure": true, "threads": 4}}}
[2017-08-14 05:40:34,549] [INFO] [MainProcess] [Stage:init:32] Notify stage disabled, skipping
[2017-08-14 05:40:34,553] [INFO] [MainProcess] [State:init:135] Initializing root state directory /home/data/db/prodwebsite
[2017-08-14 05:40:34,554] [INFO] [MainProcess] [State:load_backups:153] Found 4 existing completed backups for set
[2017-08-14 05:40:34,554] [INFO] [MainProcess] [State:init:119] Initializing backup state directory: /home/data/db/prodwebsite/20170814_0540
[2017-08-14 05:40:34,556] [INFO] [MainProcess] [Stage:init:32] Upload stage disabled, skipping
[2017-08-14 05:40:34,556] [INFO] [MainProcess] [Main:run:269] Running backup in replset mode using seed node(s): 127.0.0.1:27017
[2017-08-14 05:40:34,561] [INFO] [MainProcess] [Main:run:297] Backup method supports compression, disabling compression in archive step
[2017-08-14 05:40:34,562] [INFO] [MainProcess] [Task:compression:38] Setting Tar compression method: none
[2017-08-14 05:40:34,562] [INFO] [MainProcess] [Stage:run:83] Running stage mongodb_consistent_backup.Backup with task: Mongodump
[2017-08-14 05:40:34,564] [INFO] [MainProcess] [Replset:find_primary:153] Found PRIMARY: rs0/127.0.0.1:27018 with optime Timestamp(1502700966, 4)
[2017-08-14 05:40:34,565] [INFO] [MainProcess] [Replset:find_secondary:229] Found SECONDARY rs0/127.0.0.1:27017: {'priority': 1, 'lag': 0, 'optime': Timestamp(1502700966, 4), 'score': 100}
[2017-08-14 05:40:34,565] [INFO] [MainProcess] [Replset:find_secondary:229] Found SECONDARY rs0/127.0.0.1:27019: {'priority': 1, 'lag': 0, 'optime': Timestamp(1502700966, 4), 'score': 100}
[2017-08-14 05:40:34,565] [INFO] [MainProcess] [Replset:find_secondary:239] Choosing SECONDARY rs0/127.0.0.1:27017 for replica set rs0 (score: 100)
[2017-08-14 05:40:34,567] [WARNING] [MainProcess] [Mongodump:threads:126] Threading unsupported by mongodump version 3.0.4. Use mongodump 3.2.0 or greater to enable per-dump threading.
[2017-08-14 05:40:34,567] [WARNING] [MainProcess] [Mongodump:threads:126] Threading unsupported by mongodump version 3.0.4. Use mongodump 3.2.0 or greater to enable per-dump threading.
[2017-08-14 05:40:34,567] [INFO] [MainProcess] [Mongodump:run:158] Starting backups using mongodump 3.0.4 (options: threads_per_dump=None, mongodump=3.0.4, git=efe71bf185cdcfe9632f1fc2e42ca4e895f93269, compression=auto)
[2017-08-14 05:40:34,572] [INFO] [MongodumpThread-2] [MongodumpThread:run:140] Starting mongodump backup of rs0/127.0.0.1:27017
[2017-08-14 05:40:34,580] [INFO] [MongodumpThread-2] [MongodumpThread:wait:107] rs0/127.0.0.1:27017: writing admin.system.indexes to /home/data/db/prodwebsite/20170814_0540/rs0/dump/admin/system.indexes.bson
[2017-08-14 05:40:34,582] [INFO] [MongodumpThread-2] [MongodumpThread:wait:107] rs0/127.0.0.1:27017: writing admin.system.users to /home/data/db/prodwebsite/20170814_0540/rs0/dump/admin/system.users.bson
[2017-08-14 05:40:34,583] [INFO] [MongodumpThread-2] [MongodumpThread:wait:107] rs0/127.0.0.1:27017: writing admin.system.users metadata to /home/data/db/prodwebsite/20170814_0540/rs0/dump/admin/system.users.metadata.json
[2017-08-14 05:40:34,584] [INFO] [MongodumpThread-2] [MongodumpThread:wait:107] rs0/127.0.0.1:27017: done dumping admin.system.users
[2017-08-14 05:40:34,585] [INFO] [MongodumpThread-2] [MongodumpThread:wait:107] rs0/127.0.0.1:27017: writing admin.system.version to /home/data/db/prodwebsite/20170814_0540/rs0/dump/admin/system.version.bson
[2017-08-14 05:40:34,585] [INFO] [MongodumpThread-2] [MongodumpThread:wait:107] rs0/127.0.0.1:27017: writing admin.system.version metadata to /home/data/db/prodwebsite/20170814_0540/rs0/dump/admin/system.version.metadata.json
[2017-08-14 05:40:34,587] [INFO] [MongodumpThread-2] [MongodumpThread:wait:107] rs0/127.0.0.1:27017: done dumping admin.system.version
[2017-08-14 05:40:34,587] [INFO] [MongodumpThread-2] [MongodumpThread:wait:107] rs0/127.0.0.1:27017: writing captured oplog to /home/data/db/prodwebsite/20170814_0540/rs0/dump/oplog.bson
[2017-08-14 05:40:34,589] [INFO] [MongodumpThread-2] [MongodumpThread:run:176] Backup rs0/127.0.0.1:27017 completed in 0.02 seconds, 0 oplog changes
[2017-08-14 05:40:38,573] [INFO] [MainProcess] [Mongodump:wait:110] All mongodump backups completed successfully
[2017-08-14 05:40:38,574] [INFO] [MainProcess] [Stage:run:92] Completed running stage mongodb_consistent_backup.Backup with task Mongodump in 4.01 seconds
[2017-08-14 05:40:38,575] [INFO] [MainProcess] [Stage:run:83] Running stage mongodb_consistent_backup.Archive with task: Tar
[2017-08-14 05:40:38,582] [INFO] [MainProcess] [Tar:run:56] Archiving backup directories with pool of 1 thread(s)
[2017-08-14 05:40:38,584] [INFO] [PoolWorker-3] [TarThread:run:41] Archiving directory: /home/data/db/prodwebsite/20170814_0540/rs0
[2017-08-14 05:40:40,585] [INFO] [MainProcess] [Stage:run:92] Completed running stage mongodb_consistent_backup.Archive with task Tar in 2.01 seconds
[2017-08-14 05:40:40,586] [INFO] [MainProcess] [Main:update_symlinks:161] Updating prodwebsite previous symlink to: /home/data/db/prodwebsite/20170814_0538
[2017-08-14 05:40:40,587] [INFO] [MainProcess] [Main:update_symlinks:167] Updating prodwebsite latest symlink to: /home/data/db/prodwebsite/20170814_0540
[2017-08-14 05:40:40,587] [INFO] [MainProcess] [Main:run:473] Completed mongodb-consistent-backup in 6.03 sec

[root@localhost prodwebsite]# cd /home/data/db/prodwebsite/20170814_0540
[root@localhost 20170814_0540]# ls
mongodb-consistent-backup_META rs0.tar

Incremental backups

It would be nice to support incremental backups, eg.

  • First release: v1 only for replicasets.
--incremental-base $PATH_TO_UNTARRED_FULL_BACKUP

  • Final release: v2 with shards.
--incremental-base $PATH_TO_INCREMENTAL_BACKUP

The untarred path should contain some metadata (eg. shards, previous backups, last oplog for each shard...)

If you are interested in the feature and provide some guidance I'm willing to contribute.

ResolverThread not end

Hi again,

I have another problem, when I run the tool, with this configuration file:

<test.conf>

production:
  host: 22.6.48.124
  port: 30010
  username: <user>
  password: ******
  log_dir: /home/vagrant/mgbackup/log
  backup:
    method: mongodump
    name: test
    location: /home/vagrant/shared
    mongodump:
      binary: /home/vagrant/mgbin/mongodump
  archive:
    method: tar
  notify:
    method: none
  upload:
    method: none

The process do not finish, and the last log shows that the process is trying to resolve the oplog:

[vagrant@mongobackupboxcentos bin]$ ./mongodb-consistent-backup --config /home/vagrant/mgbackup/conf/test.conf
[2017-04-28 07:53:59,383] [INFO] [MainProcess] [Main:init:127] Starting mongodb-consistent-backup version 1.0.0 (git commit: d780ad545b603d3a2f807e1813f1de407e81f1ba)
[2017-04-28 07:53:59,383] [INFO] [MainProcess] [Main:init:128] Loaded config: {"archive": {"method": "tar", "tar": {"compression": "gzip"}, "zbackup": {"binary": "/usr/bin/zbackup", "cache_mb": 128, "compression": "lzma"}}, "authdb": "admin", "backup": {"location": "/home/vagrant/shared", "method": "mongodump", "mongodump": {"binary": "/home/vagrant/mgbin/mongodump", "compression": "gzip"}, "name": "test"}, "configPath": "/home/vagrant/mgbackup/conf/test.conf", "environment": "production", "host": "22.6.48.124", "lock_file": "/tmp/mongodb-consistent-backup.lock", "log_dir": "/home/vagrant/mgbackup/log", "notify": {"method": "none"}, "oplog": {"compression": "none", "tailer": {"status_interval": 30}}, "password": "******", "port": 30010, "replication": {"max_lag_secs": 5, "max_priority": 1000}, "sharding": {"balancer": {"ping_secs": 3, "wait_secs": 300}}, "upload": {"method": "none", "s3": {"chunk_size_mb": 50, "threads": 4}}, "username": "<user>"}
[2017-04-28 07:53:59,490] [INFO] [MainProcess] [State:init:127] Initializing root state directory /home/vagrant/shared/test
[2017-04-28 07:53:59,558] [INFO] [MainProcess] [State:load_backups:145] Found 0 existing completed backups for set
[2017-04-28 07:53:59,571] [INFO] [MainProcess] [State:init:111] Initializing backup state directory: /home/vagrant/shared/test/20170428_0753
[2017-04-28 07:53:59,582] [INFO] [MainProcess] [Stage:init:32] Notify stage disabled, skipping
[2017-04-28 07:53:59,583] [INFO] [MainProcess] [Stage:init:32] Upload stage disabled, skipping
[2017-04-28 07:53:59,583] [INFO] [MainProcess] [Main:run:305] Running backup in sharding mode using seed node(s): 22.6.48.124:30010
[2017-04-28 07:53:59,620] [INFO] [MainProcess] [Sharding:get_start_state:43] Began with balancer state running: True
[2017-04-28 07:53:59,670] [INFO] [MainProcess] [Sharding:get_config_server:145] Found sharding config server: CSRSA_PNUX/22.6.48.124:30000,22.6.48.125:30000,22.6.48.126:30000
[2017-04-28 07:53:59,716] [INFO] [MainProcess] [Sharding:stop_balancer:106] Stopping the balancer and waiting a max of 300 sec
[2017-04-28 07:53:59,759] [INFO] [MainProcess] [Sharding:stop_balancer:117] Balancer stopped after 0.04 seconds
[2017-04-28 07:53:59,942] [INFO] [MainProcess] [Main:run:360] Backup method supports compression, disabling compression in archive step and enabling oplog compression
[2017-04-28 07:53:59,943] [INFO] [MainProcess] [Task:compression:38] Setting Tar compression method: none
[2017-04-28 07:53:59,943] [INFO] [MainProcess] [Task:compression:38] Setting Tailer compression method: gzip
[2017-04-28 07:53:59,943] [INFO] [MainProcess] [Tailer:run:41] Starting oplog tailers on all replica sets (options: compression=gzip, status_secs=30)
[2017-04-28 07:53:59,975] [INFO] [MainProcess] [Replset:find_primary:132] Found PRIMARY: CSRSA_PNUX/22.6.48.124:30000 with optime Timestamp(1493365879, 1)
[2017-04-28 07:53:59,975] [INFO] [MainProcess] [Replset:find_secondary:204] Found SECONDARY CSRSA_PNUX/22.6.48.125:30000: {'priority': 1, 'configsvr': True, 'lag': 0.0, 'optime': Timestamp(1493365879, 1), 'score': 100}
[2017-04-28 07:53:59,976] [INFO] [MainProcess] [Replset:find_secondary:204] Found SECONDARY CSRSA_PNUX/22.6.48.126:30000: {'priority': 1, 'configsvr': True, 'lag': 0.0, 'optime': Timestamp(1493365879, 1), 'score': 100}
[2017-04-28 07:53:59,976] [INFO] [MainProcess] [Replset:find_secondary:215] Choosing SECONDARY CSRSA_PNUX/22.6.48.125:30000 for replica set CSRSA_PNUX (score: 100)
[2017-04-28 07:53:59,995] [INFO] [TailThread-2] [TailThread:run:64] Tailing oplog on CSRSA_PNUX/22.6.48.125:30000 for changes
[2017-04-28 07:54:00,510] [INFO] [MainProcess] [Replset:find_primary:132] Found PRIMARY: RSA_PNUX/22.6.48.124:30001 with optime Timestamp(1493109944, 111)
[2017-04-28 07:54:00,511] [INFO] [MainProcess] [Replset:find_secondary:204] Found SECONDARY RSA_PNUX/22.6.48.125:30001: {'priority': 1, 'lag': 0, 'optime': Timestamp(1493109944, 111), 'score': 100}
[2017-04-28 07:54:00,511] [INFO] [MainProcess] [Replset:find_secondary:204] Found SECONDARY RSA_PNUX/22.6.48.126:30001: {'priority': 1, 'lag': 0, 'optime': Timestamp(1493109944, 111), 'score': 100}
[2017-04-28 07:54:00,511] [INFO] [MainProcess] [Replset:find_secondary:215] Choosing SECONDARY RSA_PNUX/22.6.48.125:30001 for replica set RSA_PNUX (score: 100)
[2017-04-28 07:54:00,531] [INFO] [TailThread-3] [TailThread:run:64] Tailing oplog on RSA_PNUX/22.6.48.125:30001 for changes
[2017-04-28 07:54:01,024] [INFO] [MainProcess] [Stage:run:83] Running stage mongodb_consistent_backup.Backup with task: Mongodump
[2017-04-28 07:54:01,030] [INFO] [MainProcess] [Mongodump:run:134] Starting backups using mongodump r3.4.4 (options: compression=gzip, threads_per_dump=1)
[2017-04-28 07:54:01,064] [INFO] [MongodumpThread-4] [MongodumpThread:run:96] Starting mongodump backup of CSRSA_PNUX/22.6.48.125:30000
[2017-04-28 07:54:01,065] [INFO] [MongodumpThread-5] [MongodumpThread:run:96] Starting mongodump backup of RSA_PNUX/22.6.48.125:30001
[2017-04-28 07:54:01,232] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      writing admin.system.users to
[2017-04-28 07:54:01,244] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      done dumping admin.system.users (1 document)
[2017-04-28 07:54:01,244] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      writing admin.system.version to
[2017-04-28 07:54:01,256] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      done dumping admin.system.version (1 document)
[2017-04-28 07:54:01,262] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      writing test.example to
[2017-04-28 07:54:01,348] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing admin.system.users to
[2017-04-28 07:54:01,371] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping admin.system.users (1 document)
[2017-04-28 07:54:01,371] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing admin.system.version to
[2017-04-28 07:54:01,392] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping admin.system.version (1 document)
[2017-04-28 07:54:01,409] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.shards to
[2017-04-28 07:54:01,438] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.shards (1 document)
[2017-04-28 07:54:01,439] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.chunks to
[2017-04-28 07:54:01,489] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.chunks (0 documents)
[2017-04-28 07:54:01,490] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.mongos to
[2017-04-28 07:54:01,520] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.mongos (2 documents)
[2017-04-28 07:54:01,521] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.lockpings to
[2017-04-28 07:54:01,559] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.lockpings (5 documents)
[2017-04-28 07:54:01,559] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.settings to
[2017-04-28 07:54:01,599] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.settings (1 document)
[2017-04-28 07:54:01,600] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.version to
[2017-04-28 07:54:01,644] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.version (1 document)
[2017-04-28 07:54:01,644] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.locks to
[2017-04-28 07:54:01,692] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.locks (2 documents)
[2017-04-28 07:54:01,692] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.databases to
[2017-04-28 07:54:01,744] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.databases (1 document)
[2017-04-28 07:54:01,744] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.tags to
[2017-04-28 07:54:01,797] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.tags (0 documents)
[2017-04-28 07:54:01,798] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing config.changelog to
[2017-04-28 07:54:01,858] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    done dumping config.changelog (1 document)
[2017-04-28 07:54:01,876] [INFO] [MongodumpThread-4] [MongodumpThread:wait:71] CSRSA_PNUX/22.6.48.125:30000:    writing captured oplog to
[2017-04-28 07:54:02,449] [INFO] [MongodumpThread-4] [MongodumpThread:run:132] Backup CSRSA_PNUX/22.6.48.125:30000 completed in 1.38 seconds, 2 oplog changes, end ts: Timestamp(1493365882, 2)
[2017-04-28 07:54:04,111] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      [##......................]  test.example  32/300  (10.7%)
[2017-04-28 07:54:07,111] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      [#####...................]  test.example  67/300  (22.3%)
[2017-04-28 07:54:10,111] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      [########................]  test.example  102/300  (34.0%)
[2017-04-28 07:54:13,112] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      [##########..............]  test.example  137/300  (45.7%)
[2017-04-28 07:54:16,111] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      [#############...........]  test.example  172/300  (57.3%)
[2017-04-28 07:54:18,411] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      [########################]  test.example  300/300  (100.0%)
[2017-04-28 07:54:18,412] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      done dumping test.example (300 documents)
[2017-04-28 07:54:18,415] [INFO] [MongodumpThread-5] [MongodumpThread:wait:71] RSA_PNUX/22.6.48.125:30001:      writing captured oplog to
[2017-04-28 07:54:18,644] [INFO] [MongodumpThread-5] [MongodumpThread:run:132] Backup RSA_PNUX/22.6.48.125:30001 completed in 17.58 seconds, 0 oplog changes
[2017-04-28 07:54:22,558] [INFO] [MainProcess] [Mongodump:wait:88] All mongodump backups completed successfully
[2017-04-28 07:54:22,559] [INFO] [MainProcess] [Stage:run:92] Completed running stage mongodb_consistent_backup.Backup with task Mongodump in 21.53 seconds
[2017-04-28 07:54:22,568] [INFO] [MainProcess] [Tailer:stop:70] Stopping all oplog tailers
[2017-04-28 07:54:22,678] [INFO] [TailThread-3] [TailThread:run:120] Done tailing oplog on RSA_PNUX/22.6.48.125:30001, 0 oplog changes
[2017-04-28 07:54:23,823] [INFO] [TailThread-2] [TailThread:run:120] Done tailing oplog on CSRSA_PNUX/22.6.48.125:30000, 11 oplog changes, end ts: Timestamp(1493365898, 4)
[2017-04-28 07:54:24,581] [INFO] [MainProcess] [Tailer:stop:110] Oplog tailing completed in 24.64 seconds
[2017-04-28 07:54:24,592] [INFO] [MainProcess] [Sharding:restore_balancer_state:98] Restoring balancer state to: True
[2017-04-28 07:54:24,690] [INFO] [MainProcess] [Task:compression:38] Setting Resolver compression method: gzip
[2017-04-28 07:54:24,690] [INFO] [MainProcess] [Resolver:run:84] Resolving oplogs (options: threads=2, compression=gzip)
[2017-04-28 07:54:24,703] [INFO] [MainProcess] [Resolver:run:96] No oplog changes to resolve for 22.6.48.125:30001
[2017-04-28 07:54:24,709] [INFO] [PoolWorker-6] [ResolverThread:run:29] Resolving oplog for 22.6.48.125:30000 to max ts: Timestamp(1493365898, 0)
Killed
[vagrant@mongobackupboxcentos bin]$

Finally I killed the process because it run for more than one hour and maybe the oplog is empty or do not have operations.

Is it possible the configuration or my deployment is not correct?

Backup archives not gzipped

Hi,

I'm just having a play with this. All looks good apart from the fact my backup files don't get gzipped.

My command-line is as follows...

mongodb-consistent-backup --host=XXXXXXX --port=27017 --user=XXXXXX --password=XXXXXX --backup_binary=/bin/mongodump --location=/home/mcb --name=REFBACKUP

This produces the follow files...

configsvr.tar
shard0.tar
shard1.tar
shard2.tar

File reports...

shard0.tar: POSIX tar archive (GNU)

I see there's a flag to turn gzip compression off (--no-archive-gzip ) but not one to enable it, so it's on by default I guess? Am I doing something wrong? gzip is installed on my test system;

Red Hat Enterprise Linux Server release 7.2 (Maipo)
Python 2.7.5
mongodump version: r3.2.10

Any ideas?

Cheers,

Rhys

Support list of hosts

It would be nice if the tool supported a list of mongos host/ports. Right now we take in a single host/port and in a sharded system this means relying on 1 x mongos host/port to be up.

Perhaps we can allow the -H/--host flag to take in either a single host or a CSV list of host:port,host:port or similar and on the yaml-config side we would take in a yaml array of hosts.

We could also ditch using the mongos and use the config servers.

Resolver TimeoutError

I updated to commit 7c7da82 and the backup hung up after a TimeoutError exception. I retried for several times and all ended up with process's hanging.
I am using Percona Server for MongoDB 3.2 and I think it worked in 1.0.1 (with some patches such as bson encode). Any idea?

[2017-05-16 06:58:32,013] [INFO] [MainProcess] [Task:compression:38] Setting Resolver compression method: gzip
[2017-05-16 06:58:32,013] [INFO] [MainProcess] [Resolver:run:91] Resolving oplogs (options: threads=8, compression=gzip)
[2017-05-16 06:58:32,029] [DEBUG] [PoolWorker-8] [Oplog:open:30] Opening oplog file /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rs1/dump/oplog.bson
[2017-05-16 06:58:32,030] [DEBUG] [PoolWorker-8] [Oplog:open:30] Opening oplog file /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rs1/oplog-tailed.bson
[2017-05-16 06:58:32,030] [INFO] [PoolWorker-8] [ResolverThread:run:30] Resolving oplog for ip-10-1-15-43:27017 to max ts: Timestamp(1494917906, 0)
[2017-05-16 06:58:32,037] [DEBUG] [PoolWorker-9] [Oplog:open:30] Opening oplog file /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rs0/dump/oplog.bson
[2017-05-16 06:58:32,038] [DEBUG] [MainProcess] [Resolver:wait:77] Waiting for 3 oplog resolver thread(s) to stop
[2017-05-16 06:58:32,039] [DEBUG] [PoolWorker-9] [Oplog:open:30] Opening oplog file /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rs0/oplog-tailed.bson
[2017-05-16 06:58:32,039] [INFO] [PoolWorker-9] [ResolverThread:run:30] Resolving oplog for ip-10-1-15-239:27017 to max ts: Timestamp(1494917906, 0)
[2017-05-16 06:58:32,039] [DEBUG] [PoolWorker-10] [Oplog:open:30] Opening oplog file /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rsc/dump/oplog.bson
[2017-05-16 06:58:32,040] [DEBUG] [PoolWorker-10] [Oplog:open:30] Opening oplog file /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rsc/oplog-tailed.bson
[2017-05-16 06:58:32,040] [INFO] [PoolWorker-10] [ResolverThread:run:30] Resolving oplog for ip-10-1-12-46:27019 to max ts: Timestamp(1494917906, 0)

Traceback (most recent call last):
  File "/var/lib/mongodb-consistent-backup/.pex/install/mongodb_consistent_backup-1.0.1-py2-none-any.whl.b9b798d133a47c06550dbd36a7afa63e6729fcac/mongodb_consistent_backup-1.0.1-py2-none-any.whl/mongodb_consistent_backup/__init__.py", line 15, in run
    m.run()
  File "/var/lib/mongodb-consistent-backup/.pex/install/mongodb_consistent_backup-1.0.1-py2-none-any.whl.b9b798d133a47c06550dbd36a7afa63e6729fcac/mongodb_consistent_backup-1.0.1-py2-none-any.whl/mongodb_consistent_backup/Main.py", line 420, in run
    resolver_summary = self.resolver.run()
  File "/var/lib/mongodb-consistent-backup/.pex/install/mongodb_consistent_backup-1.0.1-py2-none-any.whl.b9b798d133a47c06550dbd36a7afa63e6729fcac/mongodb_consistent_backup-1.0.1-py2-none-any.whl/mongodb_consistent_backup/Oplog/Resolver/Resolver.py", line 125, in run
    self.wait()
  File "/var/lib/mongodb-consistent-backup/.pex/install/mongodb_consistent_backup-1.0.1-py2-none-any.whl.b9b798d133a47c06550dbd36a7afa63e6729fcac/mongodb_consistent_backup-1.0.1-py2-none-any.whl/mongodb_consistent_backup/Oplog/Resolver/Resolver.py", line 84, in wait
    raise e
TimeoutError
[2017-05-16 06:58:56,127] [DEBUG] [PoolWorker-10] [ResolverThread:close:60] Closing oplog file handles
[2017-05-16 06:58:56,128] [DEBUG] [PoolWorker-10] [ResolverThread:close:66] Removing temporary/tailed oplog file: /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rsc/oplog-tailed.bson
[2017-05-16 06:58:56,129] [INFO] [PoolWorker-10] [ResolverThread:run:55] Applied 294329 oplog changes to ip-10-1-12-46:27019 oplog, end ts: Timestamp(1494917906, 1)
[2017-05-16 06:59:00,490] [DEBUG] [PoolWorker-9] [ResolverThread:close:60] Closing oplog file handles
[2017-05-16 06:59:00,490] [DEBUG] [PoolWorker-9] [ResolverThread:close:66] Removing temporary/tailed oplog file: /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rs0/oplog-tailed.bson
[2017-05-16 06:59:00,500] [INFO] [PoolWorker-9] [ResolverThread:run:55] Applied 12040 oplog changes to ip-10-1-15-239:27017 oplog, end ts: Timestamp(1494917906, 1)
[2017-05-16 06:59:06,690] [DEBUG] [PoolWorker-8] [ResolverThread:close:60] Closing oplog file handles
[2017-05-16 06:59:06,690] [DEBUG] [PoolWorker-8] [ResolverThread:close:66] Removing temporary/tailed oplog file: /var/lib/mongodb-consistent-backup/audience/20170516_0542/ps-rs1/oplog-tailed.bson
[2017-05-16 06:59:06,700] [INFO] [PoolWorker-8] [ResolverThread:run:55] Applied 122145 oplog changes to ip-10-1-15-43:27017 oplog, end ts: Timestamp(1494917906, 1)

Script failed on S3 step but tar is on S3

I am encountering the following error when uploading to S3, even though I can see the tar backup on S3:

[2017-07-07 16:04:01,612] [ERROR] [MainProcess] [S3:run:131] Uploading to AWS S3 failed! Error:
[2017-07-07 16:04:01,630] [ERROR] [MainProcess] [Stage:run:95] Stage mongodb_consistent_backup.Upload did not complete!
[2017-07-07 16:04:01,630] [CRITICAL] [MainProcess] [Main:exception:218] Problem performing upload of backup! Error: Stage mongodb_consistent_backup.Upload did not complete!

ipv6 support

The tool does not work with ipv6 only hosts.
In the log I see the following line:

[2017-05-22 15:38:29,314] [CRITICAL] [MainProcess] [Main:exception:218] Error setting up mongodb-consistent-backup: Could not resolve host 'myhostname', error: [Errno -2] Name or service not known

The problem is that the function validate_hostname() here does not support ipv6.

Mongodump tool supports ipv6 out of the box. For testing purposes I simply commented out this check and backups were done without problems.

S3.py raises a string Exception which is not allowed

In mongodb_consistent_backup/Upload/S3/S3.py, the S3 class raises a string exception in its __init__ method which is currently on line 50. I presume you want this to be:
raise OperationError("Invalid S3 security key or region detected!")
rather than just:
raise "Invalid S3 security key or region detected!"

max_lag_secs should be a tunable parameter

The max_lag_secs value used to determine how lagged a secondary can be in order to proceed with a backup should be a tunable/user defined value. This will allow users to proceed with a backup even with what is a reasonable amount of lag for their environment.

Mongodump commands hang on larger data sets

On long-running backups I've noticed mongodump commands stall at a certain point into the backup. The mongodump process remains running with low CPU/mem usage just doing nothing, and the mongodb-consistent-backup tool sits waiting for it to complete.

This seems to be due to the use of subprocess.Popen() with PIPE-based output in Common/LocalCommand.py. Seeing .communicate() is only called after the command has completed, this makes the PIPE buffer a lot of data and eventually fill.

This issue is to track the problem and resolution: frequently emptying the buffer from Popen() processes so the buffer does not fill.

backup a sharded configuration - Problem getting shard secondaries! Error: need more than 1 value to unpack

Hi

I obtain this below error when i run the program with theses parameters:

./mongodb-consistent-backup -H dvxx2wsc1b -P 27014 -l /images -n ESSAI -B /opt/mongodb/na/3.2.10/bin/mongodump

TCP 27014 -> mongos

[2017-03-22 12:03:49,864] [INFO] [MainProcess] [Backup:run:221] Starting mongodb-consistent-backup version 0.3.6 (git commit hash: GIT_COMMIT_HASH)
[2017-03-22 12:03:49,864] [INFO] [MainProcess] [Backup:run:268] Running backup of dvxx2wsc1b:27014 in sharded mode
[2017-03-22 12:03:49,867] [INFO] [MainProcess] [Sharding:get_start_state:41] Began with balancer state running: True
[2017-03-22 12:03:49,868] [ERROR] [MainProcess] [Backup:exception:200] Problem getting shard secondaries! Error: need more than 1 value to unpack
Traceback (most recent call last):
File "/root/.pex/install/MongoBackup-0.3.6-py2-none-any.whl.9d75e3e74edc5b8c29db47d0db827306e806ed35/MongoBackup-0.3.6-py2-none-any.whl/MongoBackup/Backup.py", line 294, in run
self.secondaries = self.replset_sharded.find_secondaries()
File "/root/.pex/install/MongoBackup-0.3.6-py2-none-any.whl.9d75e3e74edc5b8c29db47d0db827306e806ed35/MongoBackup-0.3.6-py2-none-any.whl/MongoBackup/ReplsetSharded.py", line 68, in find_secondaries
for rs_name in self.get_replsets():
File "/root/.pex/install/MongoBackup-0.3.6-py2-none-any.whl.9d75e3e74edc5b8c29db47d0db827306e806ed35/MongoBackup-0.3.6-py2-none-any.whl/MongoBackup/ReplsetSharded.py", line 48, in get_replsets
shard_name, members = shard['host'].split('/')
ValueError: need more than 1 value to unpack
[2017-03-22 12:03:49,869] [INFO] [MainProcess] [Backup:cleanup_and_exit:172] Starting cleanup and exit procedure! Killing running threads
[2017-03-22 12:03:49,869] [INFO] [MainProcess] [Sharding:restore_balancer_state:82] Restoring balancer state to: True
[2017-03-22 12:03:49,871] [INFO] [MainProcess] [Backup:cleanup_and_exit:195] Cleanup complete. Exiting

My configuration is :

mongos> db.shards.find()
{ "_id" : "rs1", "host" : "rs1/10.198.203.31:27001,10.198.203.31:27002,10.198.203.31:27003" }
{ "_id" : "rs2", "host" : "rs2/10.198.203.31:27011,10.198.203.31:27012,10.198.203.31:27013" }

Have you an idea ?
Best regards

--config - using any configuration other than "production" causes a config error, however, this error is not clear

The config parses fine when production is used. As per:

production:
    host: some.mongo.host
    port: 27018
    backup:
        method: mongodump
        name: dump1
        location: /tmp/backup

However, using the following configuration causes the following:

development:
    host: some.mongo.host
    port: 27018
    backup:
        method: mongodump
        name: dump1
        location: /tmp/backup
Error setting up configuration: 'Field "backup.name" must be set via command-line or config file!'!

This error is very unclear.

when i was backup a mongodb sharding cluster ,it cames with authenticate error .

here is the logs
[admin@iZ23eo8z1y0Z mongodb_consistent_backup-master]$ /usr/local/bin/mongodb-consistent-backup -P 30001 -u dbadmin -p 'Tg@Rs12fg' -n fafabackup -l /alidata1/admin/backup
[2016-10-28 15:02:47,423] [INFO] [MainProcess] [Backup:run:220] Starting mongodb-consistent-backup version 0.3.1 (git commit hash: GIT_COMMIT_HASH)
[2016-10-28 15:02:47,424] [INFO] [MainProcess] [Backup:run:267] Running backup of localhost:30001 in sharded mode
[2016-10-28 15:02:47,447] [INFO] [MainProcess] [Sharding:get_start_state:41] Began with balancer state running: True
[2016-10-28 15:02:47,526] [INFO] [MainProcess] [Sharding:get_config_server:129] Found sharding config server: cfg1/10.139.55.215:20001

**[2016-10-28 15:03:17,759] [CRITICAL] [MainProcess] [DB:auth_if_required:42] Unable to authenticate with host cfg1/10.139.55.215:20001: cfg1/10.139.55.215:20001: [Errno -2] Name or service not known

[2016-10-28 15:03:17,760] [CRITICAL] [MainProcess] [Sharding:get_config_server:142] Unable to locate config servers for localhost:30001!
[2016-10-28 15:03:17,760] [ERROR] [MainProcess] [Backup:exception:199] Problem getting shard secondaries! Error: cfg1/10.139.55.215:20001: [Errno -2] Name or service not known
Traceback (most recent call last):**
File "/home/admin/.pex/install/MongoBackup-0.3.1-py2-none-any.whl.32e77b741ee1b2f9541da2e6824ccd221ede9f39/MongoBackup-0.3.1-py2-none-any.whl/MongoBackup/Backup.py", line 293, in run
self.secondaries = self.replset_sharded.find_secondaries()
File "/home/admin/.pex/install/MongoBackup-0.3.1-py2-none-any.whl.32e77b741ee1b2f9541da2e6824ccd221ede9f39/MongoBackup-0.3.1-py2-none-any.whl/MongoBackup/ReplsetSharded.py", line 68, in find_secondaries
for rs_name in self.get_replsets():
File "/home/admin/.pex/install/MongoBackup-0.3.1-py2-none-any.whl.32e77b741ee1b2f9541da2e6824ccd221ede9f39/MongoBackup-0.3.1-py2-none-any.whl/MongoBackup/ReplsetSharded.py", line 59, in get_replsets
configsvr = self.sharding.get_config_server()
File "/home/admin/.pex/install/MongoBackup-0.3.1-py2-none-any.whl.32e77b741ee1b2f9541da2e6824ccd221ede9f39/MongoBackup-0.3.1-py2-none-any.whl/MongoBackup/Sharding.py", line 143, in get_config_server
raise e
ServerSelectionTimeoutError: cfg1/10.139.55.215:20001: [Errno -2] Name or service not known
[2016-10-28 15:03:17,761] [INFO] [MainProcess] [Backup:cleanup_and_exit:171] Starting cleanup and exit procedure! Killing running threads
[2016-10-28 15:03:17,761] [INFO] [MainProcess] [Sharding:restore_balancer_state:82] Restoring balancer state to: True
[2016-10-28 15:03:17,764] [INFO] [MainProcess] [Backup:cleanup_and_exit:194] Cleanup complete. Exiting

please help me , i do not konw why.....
it said that 'Unable to authenticate with host cfg1/10.139.55.215:20001', but all the cluster and shard had the same user and password.

The backup is not point in time if a tailed oplog's last timestamp is before a mongodump's.

This is only for a sharded cluster, and could occur frequently for small sharded clusters.

So the current process is:
start tailed oplogs
start mongodumps of each replicaset with --oplog
finish mongodumps
stop tailed oplogs

Then the evaluation is made using Resolver.get_consistent_end_ts() which looks at the tailed oplogs, and then appends the oplog entries from the tailed oplogs to the mongodump oplogs.

However, in its current state, it could be common that the backups taken are not point in time.

When I was testing, I found that the optime on the configservers update a lot less frequently. In fact, it looks like it can be every 10 seconds:

configReplSet:PRIMARY> db.oplog.rs.find({}, {'ts': 1, 'op': 1, 'o': 1}).sort({$natural:-1})
{ "ts" : Timestamp(1496305850, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:50.770Z") } } }
{ "ts" : Timestamp(1496305848, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:48.833Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305848, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:48.830Z"), "up" : NumberLong(8444197), "waiting" : false } } }
{ "ts" : Timestamp(1496305839, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:39.157Z") } } }
{ "ts" : Timestamp(1496305838, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:38.823Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305838, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:38.817Z"), "up" : NumberLong(8444187), "waiting" : false } } }
{ "ts" : Timestamp(1496305828, 3), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.961Z") } } }
{ "ts" : Timestamp(1496305828, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.809Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305828, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.804Z"), "up" : NumberLong(8444177), "waiting" : false } } }
{ "ts" : Timestamp(1496305820, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:20.765Z") } } }
{ "ts" : Timestamp(1496305818, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:18.800Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305818, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:18.796Z"), "up" : NumberLong(8444167), "waiting" : false } } }
{ "ts" : Timestamp(1496305809, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:09.151Z") } } }
{ "ts" : Timestamp(1496305808, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:08.789Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305808, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:08.782Z"), "up" : NumberLong(8444157), "waiting" : false } } }
{ "ts" : Timestamp(1496305798, 3), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.955Z") } } }
{ "ts" : Timestamp(1496305798, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.776Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305798, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.771Z"), "up" : NumberLong(8444147), "waiting" : false } } }
{ "ts" : Timestamp(1496305790, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:50.764Z") } } }
{ "ts" : Timestamp(1496305788, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:48.770Z"), "waiting" : true } } }

This becomes a problem because...
The last mongodump that finishes, it's oplog time would be near the current time. But when they finish, the oplog tailers are killed off almost immediately, which means, a config server oplog tailer may have a last_ts of up to 10 seconds ago.
When the evaluation for get_consistent_end_ts() is made the ResolverThread then has a max_end_ts of the config server's last_ts. This means, tailed oplog entries are appended to the mongodump oplog only if they're prior to this max_end_ts, however, one of the mongodump's oplog already exceeds the max_end_ts. This leaves one mongodump with a higher oplog time than the others.

I was testing this by adding around 1000 documents per second against the mongos for one of the sharded collections, and then comparing the documents.

I can work around this by adding a sleep after the mongo dumps finish, given the tailed oplog for the configserver a chance to write it's next timestamp, of which a point in time backup is then made.

The graceful fix for this would be:
Once the mongodumps finish, take the last oplog timestamp from the mongodumps, and then ensure the oplog tailers exceed this. This way, the get_consistent_end_ts() would then be correct.

Please feel free to ask if you need any further information.

Thanks,

Documentation required for current features

Missing documentation:

  1. General command-line usage (update and extend).
  2. Build/Installation.
  3. Installation with MongoDB Authorization: what user roles are needed to backup DB w/auth enabled (include example user JS, etc).
  4. Restoring a full cluster from backups. (added from Issue #1)
  5. AWS S3 upload feature: how to make the AWS keys, what access required, setting up bucket policy, lifecycles/TTL, etc.
  6. Nagios NSCA notifications: what to setup on the Nagios/NSCA server to receive alerts from the tool.

This can go in a 'docs' subdir in git markdown IMO

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.