danielebailo / couchdb-dump Goto Github PK
View Code? Open in Web Editor NEWBash command line scripts to dump &restore a couchdb database
License: Other
Bash command line scripts to dump &restore a couchdb database
License: Other
Hello!
attachments=true
in view API endpoints only works for CouchDB versions above 1.6.0
.
Here you have more information about this feature.
Thanks!
I am running the script on mac but I got the error split: illegal option -- d ERROR: Unable to create split files. I am trying to get a >2gb json file
In silent mode the last line of backup mode causes the script to exit with exit 1
This is caused by the $echoVerbose.
In non silent mode the echo command will result in exit 0
After importing the file I exported I now have duplicated _rev and rev fields in my docs.
Great app! Just has some issues with BSD-style commands:
[zero]couchdb-dump(master)→ ./couchdb-backup.sh -H thehost.whatever.com -d delivery-index -f db.json -b
./couchdb-backup.sh: line 157: nproc: command not found
expr: syntax error
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12.2M 0 12.2M 0 0 1000k 0 --:--:-- 0:00:12 --:--:-- 799k
... INFO: File contains Windows carridge returns- converting...
du: illegal option -- B
usage: du [-H | -L | -P] [-a | -s | -d depth] [-c] [-h | -k | -m | -g] [-x] [-I mask] [file ...]
... ERROR: checkdiskspace() was not passed the correct arguments.
[zero]couchdb-dump(master %)→ uname -a
Darwin zero.local 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec 22 23:10:38 PST 2014; root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64
Great work on these scripts! Thank you for writing these and sharing. It's great to be able to just grab a copy of a whole CouchDB database.
Did you design the scripts in mind to handle binary attachments? We seem to able to download a full database that includes binary attachments. The backup file and the original database size are similar. However, when I try to restore this database dump with binary files, I get the following error:
{"error":"bad_request","reason":"invalid UTF-8 JSON"}
Have you come across the above error before?
I have been able to restore pure JSON (non binary) databases without any trouble.
Thank you!
Graham
CTO, Telephonic
As per the title; a lot of the code is shared within these two scripts. We could likely merge it without much hassle. Although, if we do, we should look to push most of the code into their own functions rather than running the code serially :)
Is it possible to backup all available databases?
This tool looks fantastic! Thanks @danielebailo for putting it together.
We have a very large CouchDB installation (~400GB in size). Are there any downsides to running this tool against a large data set like this?
I get these errors on OS X 10.11.4:
./couchdb-backup.sh: line 497: syntax error near unexpected token `<'
./couchdb-backup.sh: line 497: ` done < <(cat ${design_file_name})'
Hi,
I'm trying to use your script to dump dbs: everything's fine during export, but when I try to restore the DB,
["... WARN: CouchDB Reported an error during import - Attempt 1/3 - Retrying...","... WARN: CouchDB Reported an error during import - Attempt 2/3 - Retrying...","... ERROR: CouchDB Reported: {"error":"conflict","reason":"Document update conflict."}"]
Am I missing something? Do I have to delete records manually before restore?
Thank you
When dumping the characters ' and " if they are present in the values are not correctly escaped.
Dumped a db and wanted to re-import it again and the import command gave me this error
{"error":"bad_request","reason":"invalid_json"}
e.g. when a string contains html, where attributes need escapes on the href="...", etc.
Otherwise, interesting script.
Hi,
I'm wondering if it's possible to only append the delta to a file and not to create a new file each time? I mean the option is currently not available, however did you guys considered to add it in the future?
Thanks in advance!
Fresh builds of Ubuntu/CentOS frequently lack curl
. Add a check for this rather than dirty failing.
I'm using couchdb-dump version: 1.1.7
I have a database, which is successfully downloaded to a file (39MB), but it get's stuck at Stage 1 - Document filtering
.
... INFO: Output file bob.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 38.2M 0 38.2M 0 0 10.0M 0 --:--:-- 0:00:03 --:--:-- 10.0M
... INFO: File may contain Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
Since it's below 250MB, the parsing isn't multi-threaded.
I'm assuming it's stuck at the sed line:
$sed_cmd ${sed_edit_in_place} 's/.*,"doc"://g'
Could someone what's the purpose of removing .*,"doc":
? Is this the Database Compaction or Purge Historic and Deleted Data logic?
Looking into the json file, it removed the following part on each line.
{"id":"...","key":"...","value":{"rev":"..."},"doc":
I think a comment above that code is welcome.
I'm assuming my issue is caused by binary attachments in all the docs.
I don't think I'm helped with #31, since I do want this to happen.
It would probably be a reasonable idea to set 'version=x.x' within the code, and offer this within the arguments that can be passed; ie:
./couchdb-dump.sh -V
CouchDB Dump - Version x.x
This will assist with identifying which version of the code someone is running if/when they report an issue.
Restoring DB from backup file error.
Brain dump:
curl to stdout and gzip the stream; could monitor curl status still through pipefail.
Would need to figure out a more sensible way to check for JSON errors returned from CouchDB (curl sees them as successful, but CouchDB reports an error) - HTTP headers maybe?; and import would need to check the file type and treat gzipped backups accordingly. Plus... Ability to chose the compression agent.
In relation to chunked backups, it might be a bit fiddly though.
The dump seems to save the file, but i get the following error in the terminal:
couchdb-backup.sh: line 74: [: -ge: unary operator expected
... ERROR: Insufficient Disk Space Available:
* Full Path: food.json
* Affected Directory: food.json
* Space Available: KB
* Total Space Required: 294 KB
expr: syntax error
* Additional Space Req: KB
I am wondering is there any way to take a backup for all revisions ?
The password with the -p parameter is visible with ps aux
. Is it possible to specify the password with an environment variable?
hello,
Couchdb-dump is perfect and work perfectly , thanks a lot.
But I have a little question , i want to dump all docs with all revisions ( for conserve history of docs) , on couchdb 2 cluster. It is possible with update_seq option ?
Thanks .
Victor.
Hi,
really nice script, but could you please upload a license for it (e.g. Apache 2.0)?
I really like that script and would like to use it at work. But I can only use it, if there is a license attached which allows commercial usage...
Thanks in advance
Konrad
Commit that originated the issue
... INFO: File contains Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
sed: can't read s/.*,"doc"://g: No such file or directory
Stage failed.
Our Production DB is pretty big/fiddly (around 7million documents+), and when re-importing this, I've sometimes hit CouchDB errors halfway or so through - meaning I have to delete the DB, clean up the files, and start afresh. It would be useful if it detects failure, and offers the user to retry a few times (enabling the user to restart the DB or whatever as appropriate before retrying) before failing - when our imports are taking up to 3hrs to complete, a failure halfway means 1.5hrs of lost time... :(
Hey; a problem I'm having (as noted in: http://stackoverflow.com/questions/10979479/how-to-do-bulk-insert-from-huge-json-file-460-mb-in-couchdb) is that large imports time out.
Ideally, the restore code should import a (configurable?) subset of data in blocks, rather than all at once.
I am running this code in file sichereBbAufDropbox.sh
:
# dumped artendb
# stellt dem Filenamen das Datum voran
# komprimiert das File
# kopiert das File auf die dropbox
# entfernt das File
FILENAME=$(date +"%Y-%m-%d_%H-%M-%S_artendb_dump.txt")
FILENAME_GZ=$FILENAME.tar.gz
/home/alex/backup/couchdb-backup.sh -b -H http://localhost:5984 -d artendb -f $FILENAME -u admin -p secret -P 5984
tar cvzf $FILENAME_GZ $FILENAME
/home/alex/backup/dropbox_uploader.sh upload $FILENAME_GZ $FILENAME_GZ
rm $FILENAME
rm $FILENAME_GZ
This is the output:
alex@ae-2018-01:~/backup$ bash sichereBbAufDropbox.sh
... INFO: Output file 2018-01-07_12-05-23_artendb_dump.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 197M 0 197M 0 0 4962k 0 --:--:-- 0:00:40 --:--:-- 0
curl: (18) transfer closed with outstanding read data remaining
... ERROR: Curl encountered an issue whilst dumping the database.
I am using:
This happens on a newly installed server. It used to work on the last server with the same db. I run similar backups on two other servers and they work fine.
.. INFO: Separating Design documents ... INFO: Duplicating original file for alteration ... INFO: Stripping _design elements from regular documents ... INFO: Fixing end document ... INFO: Inserting Design documents ... INFO: Successfully imported 0 Design Documents ... INFO: Small dataset. Importing as a single file. {"error":"bad_request","reason":"Missing JSON list of 'docs'"}
This is the output i'm getting. I'm wondering it there is something wrong with the sed command?
How carefully has this script been tested with usernames and passwords? I ask because I am attempting to make a backup of a server that requires a username and password, and I keep getting authentication errors. I worry that this code never worked.
Upon investigation, I found that the following code caused problems (approx. line 273):
if [ ! "x$username" = "x" ]&&[ ! "x$password" = "x" ]; then
curlopt="${curlopt} -U '${username}:${password}'"
fi
Which I patched to (and which also worked, though it's not a robust solution. See the note below):
if [ ! "x$username" = "x" ]&&[ ! "x$password" = "x" ]; then
curlopt="${curlopt} --user ${username}:${password}"
fi
Changes:
-U
to --user
. E.g. Changed from --proxy-user
to --user
Can you test that this change works and verify it works for you with servers that require a user name and password?
Note: This is not a complete fix. When you have usernames & password with special characters the curl
command line will fail. The better solution is to refactor the curl
command line and put the username/password quoting there, as is done with the curl
URL. I can implement this once we know that the fix basically works.
Hi,
Thanks for a very useful script.
Just thought you might like to know of some slightly odd results..
I'm on a mac using Yosemite.
The script works great.. but oddly outputs TWO files.
For example: a commandline like this:
bash couchdb-backup.sh -b -H 127.0.0.1 -d my_users -f my_users.json
will produce two files:
my_users.json
my_users.json""
The json file produced appears to be just fine..
Its just the two files.. strikes me as odd...
Thanks again..
${file_name}.design
to ${file_name}-design
to match the nodesign filename${file_name}-nodesign
and ${file_name}-design
on successful import (files to be retained for debug/analysis if import fails)We should ideally be checking for available disk space where possible.
Note:
In a CouchDB Database which has _design documents defined (data restrictions, views etc), when exporting the JSON using ./couchdb-dump, these special document types are appended to the end of the JSON dump.
_bulk_docs can't handle these, so the last split file to be inserted fails for all documents contained within it.
The fix here is to break out all of the _design documents from the exported JSON when we want to restore the data, and handle these first.
NOTE: I have the fix for this already; i'll request a merge in a little while.
Hello,
there is an option to backup/restore all databases?
Thank you!
I have backed up my db using this command:
./couchdb-backup.sh -b -H http://localhost:5984 -d artendb -f $FILENAME -u name -p password -P 5984
Now I try to restore using:
bash couchdb-backup.sh -r -H 127.0.0.1 -d artendb -f 2016-07-21_23-00-01_artendb_dump.json -u name -p password
But I get this output:
alex@pca:/mnt/c/Users/alexa/Downloads$ bash couchdb-backup.sh -r -H 127.0.0.1 -d artendb -f 2016-07-21_23-00-01_artendb_dump.json -u name -p password
... INFO: Checking for database
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 103 0 103 0 0 10567 0 --:--:-- --:--:-- --:--:-- 11444
... INFO: Checking for Design documents
... INFO: No Design Documents found for import.
... INFO: Block import set to 5000 lines.
... INFO: Generating files to import
... INFO: Header already applied to 2016-07-21_23-00-01_artendb_dump.json.splitaaa
... INFO: Adding footer to 2016-07-21_23-00-01_artendb_dump.json.splitaaa
... INFO: Inserting 2016-07-21_23-00-01_artendb_dump.json.splitaaa
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27.8M 100 76 100 27.8M 108 39.7M --:--:-- --:--:-- --:--:-- 39.7M
... WARN: CouchDB Reported and error during import - Attempt 1/3 - Retrying...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27.8M 100 76 100 27.8M 105 38.7M --:--:-- --:--:-- --:--:-- 38.7M
... WARN: CouchDB Reported and error during import - Attempt 2/3 - Retrying...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27.8M 100 76 100 27.8M 109 40.0M --:--:-- --:--:-- --:--:-- 40.0M
... ERROR: CouchDB Reported: {"error":"bad_request","reason":"POST body must include `docs` parameter."}
This is the backup: https://www.dropbox.com/s/y9b2ztle2xuwwrk/2016-07-21_23-00-01_artendb_dump.txt.tar.gz?dl=0 (it was unzipped before using)
What could I be doing wrong?
I am currently running couchdb v2.2.0.
At the time of backing up it would have been a 1.6 version, I guess.
## Check for curl
if [ "x`which curl`" = "x" ]; then
echo "... ERROR: This script requires 'curl' to be present."
exit 1
fi
some systems, for example lightweight containers don't have which, but have curl.
## Check for curl
curl --version > /dev/null
if [ "$?" != "0" ]
then
echo "... ERROR: This script requires 'curl' to be present."
exit 1
fi
. . should do the trick too.
Since 2640981, the script fails silently when busybox's grep is used to do the backup. The U option is not recognized and it produces the following error :
grep: unrecognized option: U
However, it doesn't stop the script and produces a file. It's when you try to restore it that couchdb complains with the following error :
... ERROR: CouchDB Reported: {"error":"bad_request","reason":"invalid UTF-8 JSON"}
In my case, I used alpine linux to do the backups and it's busybox's grep by default but I was able to fix the issue by installing Gnu grup with the command apk add grep
. However I have two weeks of backups that are not valid, as I didn't detect the issue because the script doesn't fail and I don't test restores frequently enough (my bad). Do you think it could be possible to fix the existing files so I can restore them ?
The line in question:
Line 340 in fb21b73
Found when importing a dataset with no design docs that the if statement in use wasn't matching correctly.
Example:
./couchdb-backup.sh -R -H 127.0.0.1 -u admin -p pass -d db6 -f pim.short.json
... INFO: Separating Design documents
... INFO: Duplicating original file for alteration
... INFO: Stripping _design elements from regular documents
I try to backup remote couchdb, i use macos 10.10
bash couchdb-backup.sh -b -H 1.2.3.4 -d _users -f db._users.json -u uuuu -p pppp
here's the output:
... INFO: File contains Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
sed: 1: "db._users.json": extra characters at the end of d command
I did a database export that has 44 MB / 33k lines. When restoring it, it is split into several split*
files. When I try to restore the dump, I receive the following error:
[root@kazoo1 ~]# ./couchdb-dump.sh -a 1 -c -r -H localhost -d account%2F9b%2F7d%2Fa8712e54b4d596b51a1e74f58208 -f 9b7da8712e54b4d596b51a1e74f58208/account.json -P 15984
... INFO: Checking for database
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1613 0 1613 0 0 128k 0 --:--:-- --:--:-- --:--:-- 131k
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12 100 12 0 0 74 0 --:--:-- --:--:-- --:--:-- 74
... INFO: Checking for Design documents
... INFO: No Design Documents found for import.
... INFO: Block import set to 5000 lines.
... INFO: Generating files to import
... INFO: Header already applied to 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
... INFO: Adding footer to 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
... INFO: Inserting 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 533k 100 54 100 532k 1587 15.2M --:--:-- --:--:-- --:--:-- 15.7M
... ERROR: CouchDB Reported: {"error":"bad_request","reason":"invalid UTF-8 JSON"}
I checked split*
and they are not valid JSON files, is that normal? What can I do for troubleshooting this?
CouchDB version: 2.1.2
Thanks!
Im getting an error I don't understand (Mac OS X 10.10.4)
This is the command...
bash couchdb-backup.sh -b -H 127.0.0.1 -d test -f test.json
This is the output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 961k 0 961k 0 0 6875k 0 --:--:-- --:--:-- --:--:-- 6916k
... INFO: File contains Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
sed: 1: "test.json": undefined label 'est.json'
Stage failed.
Seems like its tripping over something there that I can't make out. Any suggestions appreciated.
Is there an easy / recommended way to backup / restore all databases under an account?
Does this script is sufficient for also backup doc attachments?
Passing the '-t' option for choosing the number of threads for Backup parsing doesn't work. Fixing now.
Me again :)
Running the Sed statements after exporting the DB can take more than 5mins on a 2GB exported file - the main limitation being that sed is capped to a single CPU.
We should probably:
Note that 'Header correction' and 'Final document line correction' will then only need to be applied to the first and last file splits respectively.
After finishing processing, split files should be re-merged to a single file.
When exporting very large datasets, it would be nice to break the export up, so that any failures can be reattempted, without the user needing to restart the job from scratch.
ERROR: Unable to post data to "http://localhost:5984/communities-api-dev/_bulk_docs" (http status code = 100)
i think it's because my file is huge 80M Aug 17 01:12 backup.js
There is an issue with error handling logic at
https://github.com/danielebailo/couchdb-dump/blob/master/couchdb-backup.sh#L510
where stderr always returns 0 for such kind of API error.
line 166: cores=nproc
in some cases nproc command not present in OS.
better introducing a test: if nproc is null then cores=1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.