danielebailo / couchdb-dump Goto Github PK

View Code? Open in Web Editor NEW

364.0 26.0 129.0 161 KB

Bash command line scripts to dump &restore a couchdb database

License: Other

Shell 100.00%

couchdb-dump's Introduction

Couchdb-dump (& restore)

It works on LINUX/UNIX, Bash based systems (MacOSx)

Bash command line script to EASILY Backup & Restore a CouchDB database

Needs bash (plus curl, tr, file, split, awk, sed)
Dumped database is output to a file (configurable).

Quickstart (& quickend)

Backup:

bash couchdb-dump.sh -b -H 127.0.0.1 -d my-db -f dumpedDB.json -u admin -p password

Restore:

bash couchdb-dump.sh -r -H 127.0.0.1 -d my-db -f dumpedDB.json -u admin -p password

Why do you need it?

Surprisingly, there is not a straightforward way to dump a CouchDB database. Often you are suggested to replicate it or to dump it with the couchdb _all_docs directive.

But, using the _all_docs directive provides you with JSON which cannot be directly re-import back into CouchDB.

Hence, the goal of this script(s) is to give you a simple way to Dump & Restore your CouchDB database.

NOTE

Attachments in Database documents are only supported in CouchDB 1.6+

Usage

Usage: ./couchdb-dump.sh [-b|-r] -H <COUCHDB_HOST> -d <DB_NAME> -f <BACKUP_FILE> [-u <username>] [-p <password>] [-P <port>] [-l <lines>] [-t <threads>] [-a <import_attempts>]
	-b   Run script in BACKUP mode.
	-r   Run script in RESTORE mode.
	-H   CouchDB Hostname or IP. Can be provided with or without 'http(s)://'
	-d   CouchDB Database name to backup/restore.
	-f   File to Backup-to/Restore-from.
	-P   Provide a port number for CouchDB [Default: 5984]
	-u   Provide a username for auth against CouchDB [Default: blank]
	       -- can also set with 'COUCHDB_USER' environment var
	-p   Provide a password for auth against CouchDB [Default: blank]
	       -- can also set with 'COUCHDB_PASS' environment var
	-l   Number of lines (documents) to Restore at a time. [Default: 5000] (Restore Only)
	-t   Number of CPU threads to use when parsing data [Default: nProcs-1] (Backup Only)
	-a   Number of times to Attempt import before failing [Default: 3] (Restore Only)
	-c   Create DB on demand, if they are not listed.
	-q   Run in quiet mode. Suppress output, except for errors and warnings.
	-z   Compress output file (Backup Only)
	-T   Add datetime stamp to output file name (Backup Only)
	-V   Display version information.
	-h   Display usage information.

Example: ./couchdb-dump.sh -b -H 127.0.0.1 -d mydb -f dumpedDB.json -u admin -p password

Bonus 1! Full Database Compaction

In the past, we've used this script to greatly compress a bloated database. In our use case, we had non-sequential IDs which cause CouchDB's B-Tree to balloon out of control, even with daily compactions.

How does this fix work? When running the export, all of the documents are pulled out in "ID Order"- When re-importing these (now sorted) documents again, the B-Tree can be created in a much more efficient manner. We've seen 15GB database files, containing only 2.1GB of raw JSON, reduced to 2.5GB on disk after import!

Bonus 2! Purge Historic and Deleted Data

CouchDB is an append-only database. When you delete records, the metadata is maintained for future reference, and is never fully deleted. All documents also retain a historic revision count. With the above points in mind; the export and import does not include Deleted documents, or old revisions; therefore, using this script, you can export and re-import your data, cleansing it of any previously (logically) deleted data!

If you pair this with deletion and re-creation of replication rules (using the 'update_seq' parameter to avoid re-pulling the entire DB/deleted documents from a remote node) you can manually compress and clean out an entire cluster of waste, node-by-node. Note though; after creating all the rules with a fixed update_seq, once completed to the entire cluster, you will need to destroy and recreate all replication rules without the fixed update_seq - else, when restarting a node etc, replication will restart from the old seq.

couchdb-dump's People

Contributors

Stargazers

Watchers

Forkers

tubalmartin capricasix rdolgushin ixaxaar endast guojing geomars telephonic camechis visualjeff dalgibbard gpkfr skade gnubio-bwarzeski psander-com olafradicke weilera neogis-de javee splanquart pokerchang theobrigitte jpknapp enquora mcoquet josejamilena alyec thomaspre viscasillas ateamsystems laking magicallyindia junerod yiu31802 prasannakumark mobiniustechnologies majgis vanng822 chendaole anthonyttaylor lansanasylla david-byng bongfrog ianlawrence amaymon freakdev maxhbr taktik ozymandiiaz stevejhang ahodgkinson nozzlegear tonklon bryant1410 vijendra07kulhade protolalo cocoawang yorci peteruithoven circls rackmaze svujic wiecklabs handersonc admssa ikorzh lunedam-git noni73 rhendrickson42 mgesmundo leilabgd nknwn-ncgnt andrebarradas angelpatxi priyabratap basivireddy maxlath danielalbuquerque couchdbbrasil satish1337 dobredodo barbalex produnis karting06 kore nanusdad sophikitis gutschilla fungiboletus imuledx atlas48 tirula chozekun bergemalm hieupvtsdv graste piercarloslavazza coinbird gurunathsane tanqy201

couchdb-dump's Issues

Working with doc attachments

Does this script is sufficient for also backup doc attachments?

[Bug] Import attempts to load _design files when none are present

Found when importing a dataset with no design docs that the if statement in use wasn't matching correctly.

Example:

./couchdb-backup.sh -R -H 127.0.0.1 -u admin -p pass -d db6 -f pim.short.json
... INFO: Separating Design documents
... INFO: Duplicating original file for alteration
... INFO: Stripping _design elements from regular documents

Improvement - Versioning contained within the code

It would probably be a reasonable idea to set 'version=x.x' within the code, and offer this within the arguments that can be passed; ie:

./couchdb-dump.sh -V
CouchDB Dump - Version x.x

This will assist with identifying which version of the code someone is running if/when they report an issue.

[BUG] '-t' argument not handled.

Passing the '-t' option for choosing the number of threads for Backup parsing doesn't work. Fixing now.

Allow for batched exports with retry

When exporting very large datasets, it would be nice to break the export up, so that any failures can be reattempted, without the user needing to restart the job from scratch.

When dumping doesn't escape " and '

When dumping the characters ' and " if they are present in the values are not correctly escaped.

Dumped a db and wanted to re-import it again and the import command gave me this error
{"error":"bad_request","reason":"invalid_json"}

On the fly compression for backups

Brain dump:
curl to stdout and gzip the stream; could monitor curl status still through pipefail.
Would need to figure out a more sensible way to check for JSON errors returned from CouchDB (curl sees them as successful, but CouchDB reports an error) - HTTP headers maybe?; and import would need to check the file type and treat gzipped backups accordingly. Plus... Ability to chose the compression agent.

In relation to chunked backups, it might be a bit fiddly though.

Add checking for presence of Curl

Fresh builds of Ubuntu/CentOS frequently lack curl. Add a check for this rather than dirty failing.

"Document update conflict." does not result in Error of the script

There is an issue with error handling logic at
https://github.com/danielebailo/couchdb-dump/blob/master/couchdb-backup.sh#L510
where stderr always returns 0 for such kind of API error.

dump removes legitimate backslashes

e.g. when a string contains html, where attributes need escapes on the href="...", etc.

Otherwise, interesting script.

Handle restore for large datasets

Hey; a problem I'm having (as noted in: http://stackoverflow.com/questions/10979479/how-to-do-bulk-insert-from-huge-json-file-460-mb-in-couchdb) is that large imports time out.

Ideally, the restore code should import a (configurable?) subset of data in blocks, rather than all at once.

Backup all databases

Hello,
there is an option to backup/restore all databases?
Thank you!

... ERROR: Insufficient Disk Space Available:

The dump seems to save the file, but i get the following error in the terminal:

couchdb-backup.sh: line 74: [: -ge: unary operator expected

... ERROR: Insufficient Disk Space Available:
        * Full Path:           food.json
        * Affected Directory:   food.json
        * Space Available:       KB
        * Total Space Required: 294 KB
expr: syntax error
        * Additional Space Req:  KB

Error when restoring: POST body must include `docs` parameter.

I have backed up my db using this command:

./couchdb-backup.sh -b -H http://localhost:5984 -d artendb -f $FILENAME -u name -p password -P 5984

Now I try to restore using:

bash couchdb-backup.sh -r -H 127.0.0.1 -d artendb -f 2016-07-21_23-00-01_artendb_dump.json -u name -p password

But I get this output:

alex@pca:/mnt/c/Users/alexa/Downloads$ bash couchdb-backup.sh -r -H 127.0.0.1 -d artendb -f 2016-07-21_23-00-01_artendb_dump.json -u name -p password
... INFO: Checking for database
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   103    0   103    0     0  10567      0 --:--:-- --:--:-- --:--:-- 11444
... INFO: Checking for Design documents
... INFO: No Design Documents found for import.
... INFO: Block import set to 5000 lines.
... INFO: Generating files to import
... INFO: Header already applied to 2016-07-21_23-00-01_artendb_dump.json.splitaaa
... INFO: Adding footer to 2016-07-21_23-00-01_artendb_dump.json.splitaaa
... INFO: Inserting 2016-07-21_23-00-01_artendb_dump.json.splitaaa
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.8M  100    76  100 27.8M    108  39.7M --:--:-- --:--:-- --:--:-- 39.7M
... WARN: CouchDB Reported and error during import - Attempt 1/3 - Retrying...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.8M  100    76  100 27.8M    105  38.7M --:--:-- --:--:-- --:--:-- 38.7M
... WARN: CouchDB Reported and error during import - Attempt 2/3 - Retrying...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.8M  100    76  100 27.8M    109  40.0M --:--:-- --:--:-- --:--:-- 40.0M
... ERROR: CouchDB Reported: {"error":"bad_request","reason":"POST body must include `docs` parameter."}

This is the backup: https://www.dropbox.com/s/y9b2ztle2xuwwrk/2016-07-21_23-00-01_artendb_dump.txt.tar.gz?dl=0 (it was unzipped before using)

What could I be doing wrong?

I am currently running couchdb v2.2.0.
At the time of backing up it would have been a 1.6 version, I guess.

Backup & Restore database with docs with binary attachments

Great work on these scripts! Thank you for writing these and sharing. It's great to be able to just grab a copy of a whole CouchDB database.

Did you design the scripts in mind to handle binary attachments? We seem to able to download a full database that includes binary attachments. The backup file and the original database size are similar. However, when I try to restore this database dump with binary files, I get the following error:

{"error":"bad_request","reason":"invalid UTF-8 JSON"}

Have you come across the above error before?

I have been able to restore pure JSON (non binary) databases without any trouble.

Thank you!

Graham
CTO, Telephonic

Is it possible to only append the delta of a file

Hi,
I'm wondering if it's possible to only append the delta to a file and not to create a new file each time? I mean the option is currently not available, however did you guys considered to add it in the future?
Thanks in advance!

Improvement - DB export - Data Parsing could be Multi-threaded

Me again :)

Running the Sed statements after exporting the DB can take more than 5mins on a 2GB exported file - the main limitation being that sed is capped to a single CPU.
We should probably:

Count the number of CPUs, split the export into that many files, and run the sed across all split parts simultaneously for maximum performance (ie. forced multi-threaded processing)
Allow the end-user to define the CPU concurrency value manually (as long as it's less than CPU count - WARN if user tries to set it higher, and override to MAX setting instead) - this allows them to throttle it back to a limited number of CPUs in case they're running the export on a machine which is used for other things (which they don't want to impact)
Identify at which point (ie. filesize) this becomes useful, and not split it if below that.

Note that 'Header correction' and 'Final document line correction' will then only need to be applied to the first and last file splits respectively.
After finishing processing, split files should be re-merged to a single file.

Duplicated rev field after import

After importing the file I exported I now have duplicated _rev and rev fields in my docs.

[Improvement] During batched import, offer to resume on failed file import

Our Production DB is pretty big/fiddly (around 7million documents+), and when re-importing this, I've sometimes hit CouchDB errors halfway or so through - meaning I have to delete the DB, clean up the files, and start afresh. It would be useful if it detects failure, and offers the user to retry a few times (enabling the user to restart the DB or whatever as appropriate before retrying) before failing - when our imports are taking up to 3hrs to complete, a failure halfway means 1.5hrs of lost time... :(

Please add license

Hi,
really nice script, but could you please upload a license for it (e.g. Apache 2.0)?
I really like that script and would like to use it at work. But I can only use it, if there is a license attached which allows commercial usage...

Thanks in advance
Konrad

Doesn't work in OS X 10.10

Great app! Just has some issues with BSD-style commands:

[zero]couchdb-dump(master)→ ./couchdb-backup.sh -H thehost.whatever.com -d delivery-index -f db.json -b
./couchdb-backup.sh: line 157: nproc: command not found
expr: syntax error
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.2M    0 12.2M    0     0  1000k      0 --:--:--  0:00:12 --:--:--  799k
... INFO: File contains Windows carridge returns- converting...
du: illegal option -- B
usage: du [-H | -L | -P] [-a | -s | -d depth] [-c] [-h | -k | -m | -g] [-x] [-I mask] [file ...]
... ERROR: checkdiskspace() was not passed the correct arguments.
[zero]couchdb-dump(master %)→ uname -a
Darwin zero.local 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec 22 23:10:38 PST 2014; root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64

Makes two output files.. second one with same name put with "" on the end..

Hi,
Thanks for a very useful script.
Just thought you might like to know of some slightly odd results..

I'm on a mac using Yosemite.
The script works great.. but oddly outputs TWO files.

For example: a commandline like this:

bash couchdb-backup.sh -b -H 127.0.0.1 -d my_users -f my_users.json

will produce two files:

my_users.json
my_users.json""

The json file produced appears to be just fine..
Its just the two files.. strikes me as odd...

Thanks again..

Is it possible to set the password with a environment variable?

The password with the -p parameter is visible with ps aux. Is it possible to specify the password with an environment variable?

couchdb-dump doesn't work with busybox's grep anymore

Since 2640981, the script fails silently when busybox's grep is used to do the backup. The U option is not recognized and it produces the following error :

grep: unrecognized option: U

However, it doesn't stop the script and produces a file. It's when you try to restore it that couchdb complains with the following error :

... ERROR: CouchDB Reported: {"error":"bad_request","reason":"invalid UTF-8 JSON"}

In my case, I used alpine linux to do the backups and it's busybox's grep by default but I was able to fix the issue by installing Gnu grup with the command apk add grep. However I have two weeks of backups that are not valid, as I didn't detect the issue because the script doesn't fail and I don't test restores frequently enough (my bad). Do you think it could be possible to fix the existing files so I can restore them ?

The line in question:

couchdb-dump/couchdb-dump.sh

Line 340 in fb21b73

if grep -qU $'\x0d' $file_name; then

If which is not present, curl is sad to be not present.

## Check for curl
if [ "x`which curl`" = "x" ]; then
    echo "... ERROR: This script requires 'curl' to be present."
    exit 1
fi

some systems, for example lightweight containers don't have which, but have curl.

## Check for curl
curl --version > /dev/null
if [ "$?" != "0" ]
then
    echo "... ERROR: This script requires 'curl' to be present."
    exit 1
fi

. . should do the trick too.

Invalid UTF-8 JSON error while restoring large file

I did a database export that has 44 MB / 33k lines. When restoring it, it is split into several split* files. When I try to restore the dump, I receive the following error:

[root@kazoo1 ~]# ./couchdb-dump.sh -a 1 -c -r -H localhost -d account%2F9b%2F7d%2Fa8712e54b4d596b51a1e74f58208 -f 9b7da8712e54b4d596b51a1e74f58208/account.json -P 15984
... INFO: Checking for database
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1613    0  1613    0     0   128k      0 --:--:-- --:--:-- --:--:--  131k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    12  100    12    0     0     74      0 --:--:-- --:--:-- --:--:--    74
... INFO: Checking for Design documents
... INFO: No Design Documents found for import.
... INFO: Block import set to 5000 lines.
... INFO: Generating files to import
... INFO: Header already applied to 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
... INFO: Adding footer to 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
... INFO: Inserting 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  533k  100    54  100  532k   1587  15.2M --:--:-- --:--:-- --:--:-- 15.7M
... ERROR: CouchDB Reported: {"error":"bad_request","reason":"invalid UTF-8 JSON"}

I checked split* and they are not valid JSON files, is that normal? What can I do for troubleshooting this?
CouchDB version: 2.1.2
Thanks!

nproc missing in some environments

line 166: cores=nproc
in some cases nproc command not present in OS.
better introducing a test: if nproc is null then cores=1

Problems with usernames and passwords (authorization failed)

How carefully has this script been tested with usernames and passwords? I ask because I am attempting to make a backup of a server that requires a username and password, and I keep getting authentication errors. I worry that this code never worked.

Upon investigation, I found that the following code caused problems (approx. line 273):

if [ ! "x$username" = "x" ]&&[ ! "x$password" = "x" ]; then
    curlopt="${curlopt} -U '${username}:${password}'"
fi

Which I patched to (and which also worked, though it's not a robust solution. See the note below):

if [ ! "x$username" = "x" ]&&[ ! "x$password" = "x" ]; then
    curlopt="${curlopt} --user ${username}:${password}"
fi

Changes:

Changed -U to --user. E.g. Changed from --proxy-user to --user
The username & password single quotes are removed. With the single quotes, I think they are sent as part of the username/password.

Can you test that this change works and verify it works for you with servers that require a user name and password?

Note: This is not a complete fix. When you have usernames & password with special characters the curl command line will fail. The better solution is to refactor the curl command line and put the username/password quoting there, as is done with the curl URL. I can implement this once we know that the fix basically works.

Mac OS X support breaks sed on Linux

Commit that originated the issue

... INFO: File contains Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
sed: can't read s/.*,"doc"://g: No such file or directory
Stage failed.

Improvement - Check Available Disk Space

We should ideally be checking for available disk space where possible.
Note:

We can't check this for the main export, as it's impossible to determine the completed output file size
Everytime we run sed etc, it will create a tempfile, roughly equal to that of the original file (minus whatever we're cutting out).
When running a restore which contains design files, we're creating a clone of the DB file (minus _design docs) so that we're not amending the original export (in case something fails, user wants to import again etc). We then run sed on that... which means at that point we need to account for approx "<DB_FILE_SIZE>*3" available disk space.

Improvement - Combine both codesets into a single utility

As per the title; a lot of the code is shared within these two scripts. We could likely merge it without much hassle. Although, if we do, we should look to push most of the code into their own functions rather than running the code serially :)

Allow for separating data parsing operations from backup

This tool looks fantastic! Thanks @danielebailo for putting it together.

We have a very large CouchDB installation (~400GB in size). Are there any downsides to running this tool against a large data set like this?

ERROR: Curl encountered an issue whilst dumping the database

I am running this code in file sichereBbAufDropbox.sh:

# dumped artendb
# stellt dem Filenamen das Datum voran
# komprimiert das File
# kopiert das File auf die dropbox
# entfernt das File
FILENAME=$(date +"%Y-%m-%d_%H-%M-%S_artendb_dump.txt")
FILENAME_GZ=$FILENAME.tar.gz
/home/alex/backup/couchdb-backup.sh -b -H http://localhost:5984 -d artendb -f $FILENAME -u admin -p secret -P 5984
tar cvzf $FILENAME_GZ $FILENAME
/home/alex/backup/dropbox_uploader.sh upload $FILENAME_GZ $FILENAME_GZ
rm $FILENAME
rm $FILENAME_GZ

This is the output:

alex@ae-2018-01:~/backup$ bash sichereBbAufDropbox.sh
... INFO: Output file 2018-01-07_12-05-23_artendb_dump.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  197M    0  197M    0     0  4962k      0 --:--:--  0:00:40 --:--:--     0
curl: (18) transfer closed with outstanding read data remaining
... ERROR: Curl encountered an issue whilst dumping the database.

I am using:

ubuntu 16.04.3
couchdb 2.2.1
curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
couchdb-backup.sh downloaded today

This happens on a newly installed server. It used to work on the last server with the same db. I run similar backups on two other servers and they work fine.

Backup exits with exitcode 1 in silent mode

In silent mode the last line of backup mode causes the script to exit with exit 1

$echoVerbose && echo "... INFO: Export completed successfully. File available at: ${file_name}"
This is caused by the $echoVerbose.
In non silent mode the echo command will result in exit 0

Script doesn't continue

ERROR: Unable to post data to "http://localhost:5984/communities-api-dev/_bulk_docs" (http status code = 100)
i think it's because my file is huge 80M Aug 17 01:12 backup.js

Backup / restore all databases?

Is there an easy / recommended way to backup / restore all databases under an account?

can dump all revisions docs

hello,

Couchdb-dump is perfect and work perfectly , thanks a lot.
But I have a little question , i want to dump all docs with all revisions ( for conserve history of docs) , on couchdb 2 cluster. It is possible with update_seq option ?

Thanks .

Victor.

Getting error - sed: 1: "test.json": undefined label 'est.json' (Mac OS X 10.10.4)

Im getting an error I don't understand (Mac OS X 10.10.4)

This is the command...
bash couchdb-backup.sh -b -H 127.0.0.1 -d test -f test.json

This is the output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 961k 0 961k 0 0 6875k 0 --:--:-- --:--:-- --:--:-- 6916k
... INFO: File contains Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
sed: 1: "test.json": undefined label 'est.json'
Stage failed.

Seems like its tripping over something there that I can't make out. Any suggestions appreciated.

sed: 1: "db.json": extra characters at the end of d command on macos 10.10

I try to backup remote couchdb, i use macos 10.10
bash couchdb-backup.sh -b -H 1.2.3.4 -d _users -f db._users.json -u uuuu -p pppp

here's the output:
... INFO: File contains Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
sed: 1: "db._users.json": extra characters at the end of d command

[Improvement] Cleanup output text + Temp file management

Fix 'Multithreading Parsing' to 'Multithreaded Parsing'
Rename ${file_name}.design to ${file_name}-design to match the nodesign filename
Ensure removal of ${file_name}-nodesign and ${file_name}-design on successful import (files to be retained for debug/analysis if import fails)

[BUG] - Design documents cause insert failures

In a CouchDB Database which has _design documents defined (data restrictions, views etc), when exporting the JSON using ./couchdb-dump, these special document types are appended to the end of the JSON dump.
_bulk_docs can't handle these, so the last split file to be inserted fails for all documents contained within it.

The fix here is to break out all of the _design documents from the exported JSON when we want to restore the data, and handle these first.

NOTE: I have the fix for this already; i'll request a merge in a little while.

Possible to backup all databases

Is it possible to backup all available databases?

{"error":"bad_request","reason":"invalid UTF-8 JSON"}

Restoring DB from backup file error.

Document update conflict upon restore

Hi,
I'm trying to use your script to dump dbs: everything's fine during export, but when I try to restore the DB,

["... WARN: CouchDB Reported an error during import - Attempt 1/3 - Retrying...","... WARN: CouchDB Reported an error during import - Attempt 2/3 - Retrying...","... ERROR: CouchDB Reported: {"error":"conflict","reason":"Document update conflict."}"]

Am I missing something? Do I have to delete records manually before restore?
Thank you

Taking backup for all revisions

I am wondering is there any way to take a backup for all revisions ?

Stuck at 'Stage 1 - Document filtering'

I'm using couchdb-dump version: 1.1.7

I have a database, which is successfully downloaded to a file (39MB), but it get's stuck at Stage 1 - Document filtering.

... INFO: Output file bob.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 38.2M    0 38.2M    0     0  10.0M      0 --:--:--  0:00:03 --:--:-- 10.0M
... INFO: File may contain Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering

Since it's below 250MB, the parsing isn't multi-threaded.

I'm assuming it's stuck at the sed line:

$sed_cmd ${sed_edit_in_place} 's/.*,"doc"://g'

Could someone what's the purpose of removing .*,"doc":? Is this the Database Compaction or Purge Historic and Deleted Data logic?

Looking into the json file, it removed the following part on each line.

{"id":"...","key":"...","value":{"rev":"..."},"doc":

I think a comment above that code is welcome.

I'm assuming my issue is caused by binary attachments in all the docs.

I don't think I'm helped with #31, since I do want this to happen.

Error on OS X, line 497

I get these errors on OS X 10.11.4:

./couchdb-backup.sh: line 497: syntax error near unexpected token `<'
./couchdb-backup.sh: line 497: `        done < <(cat ${design_file_name})'

separating out design documents not working on OS X

.. INFO: Separating Design documents ... INFO: Duplicating original file for alteration ... INFO: Stripping _design elements from regular documents ... INFO: Fixing end document ... INFO: Inserting Design documents ... INFO: Successfully imported 0 Design Documents ... INFO: Small dataset. Importing as a single file. {"error":"bad_request","reason":"Missing JSON list of 'docs'"}

This is the output i'm getting. I'm wondering it there is something wrong with the sed command?

Attachment contents is only added for CouchDB versions above 1.6.0

Hello!

attachments=true in view API endpoints only works for CouchDB versions above 1.6.0.

Here you have more information about this feature.

Thanks!

split: illegal option -- d

I am running the script on mac but I got the error split: illegal option -- d ERROR: Unable to create split files. I am trying to get a >2gb json file