Code Monkey home page Code Monkey logo

couchdb-dump's Introduction

Couchdb-dump (& restore)

It works on LINUX/UNIX, Bash based systems (MacOSx)

Bash command line script to EASILY Backup & Restore a CouchDB database

  • Needs bash (plus curl, tr, file, split, awk, sed)
  • Dumped database is output to a file (configurable).

Quickstart (& quickend)

  • Backup:

bash couchdb-dump.sh -b -H 127.0.0.1 -d my-db -f dumpedDB.json -u admin -p password

  • Restore:

bash couchdb-dump.sh -r -H 127.0.0.1 -d my-db -f dumpedDB.json -u admin -p password

Why do you need it?

Surprisingly, there is not a straightforward way to dump a CouchDB database. Often you are suggested to replicate it or to dump it with the couchdb _all_docs directive.

But, using the _all_docs directive provides you with JSON which cannot be directly re-import back into CouchDB.

Hence, the goal of this script(s) is to give you a simple way to Dump & Restore your CouchDB database.

NOTE

Attachments in Database documents are only supported in CouchDB 1.6+

Usage

Usage: ./couchdb-dump.sh [-b|-r] -H <COUCHDB_HOST> -d <DB_NAME> -f <BACKUP_FILE> [-u <username>] [-p <password>] [-P <port>] [-l <lines>] [-t <threads>] [-a <import_attempts>]
	-b   Run script in BACKUP mode.
	-r   Run script in RESTORE mode.
	-H   CouchDB Hostname or IP. Can be provided with or without 'http(s)://'
	-d   CouchDB Database name to backup/restore.
	-f   File to Backup-to/Restore-from.
	-P   Provide a port number for CouchDB [Default: 5984]
	-u   Provide a username for auth against CouchDB [Default: blank]
	       -- can also set with 'COUCHDB_USER' environment var
	-p   Provide a password for auth against CouchDB [Default: blank]
	       -- can also set with 'COUCHDB_PASS' environment var
	-l   Number of lines (documents) to Restore at a time. [Default: 5000] (Restore Only)
	-t   Number of CPU threads to use when parsing data [Default: nProcs-1] (Backup Only)
	-a   Number of times to Attempt import before failing [Default: 3] (Restore Only)
	-c   Create DB on demand, if they are not listed.
	-q   Run in quiet mode. Suppress output, except for errors and warnings.
	-z   Compress output file (Backup Only)
	-T   Add datetime stamp to output file name (Backup Only)
	-V   Display version information.
	-h   Display usage information.

Example: ./couchdb-dump.sh -b -H 127.0.0.1 -d mydb -f dumpedDB.json -u admin -p password

Bonus 1! Full Database Compaction

In the past, we've used this script to greatly compress a bloated database. In our use case, we had non-sequential IDs which cause CouchDB's B-Tree to balloon out of control, even with daily compactions.

How does this fix work? When running the export, all of the documents are pulled out in "ID Order"- When re-importing these (now sorted) documents again, the B-Tree can be created in a much more efficient manner. We've seen 15GB database files, containing only 2.1GB of raw JSON, reduced to 2.5GB on disk after import!

Bonus 2! Purge Historic and Deleted Data

CouchDB is an append-only database. When you delete records, the metadata is maintained for future reference, and is never fully deleted. All documents also retain a historic revision count. With the above points in mind; the export and import does not include Deleted documents, or old revisions; therefore, using this script, you can export and re-import your data, cleansing it of any previously (logically) deleted data!

If you pair this with deletion and re-creation of replication rules (using the 'update_seq' parameter to avoid re-pulling the entire DB/deleted documents from a remote node) you can manually compress and clean out an entire cluster of waste, node-by-node. Note though; after creating all the rules with a fixed update_seq, once completed to the entire cluster, you will need to destroy and recreate all replication rules without the fixed update_seq - else, when restarting a node etc, replication will restart from the old seq.

couchdb-dump's People

Contributors

danielebailo avatar dalgibbard avatar epos-eu avatar ahodgkinson avatar dgibbard-cisco avatar splanquart avatar maxhbr avatar chozekun avatar tonklon avatar theobrigitte avatar bryant1410 avatar noni73 avatar psander-com avatar peteruithoven avatar nuin avatar yiu31802 avatar hadrien-toma avatar skade avatar david-byng avatar lightweight avatar atlas48 avatar fungiboletus avatar aduchate avatar ateam-adam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.