Code Monkey home page Code Monkey logo

couchdb-dump's People

Contributors

aduchate avatar ahodgkinson avatar ateam-adam avatar atlas48 avatar bryant1410 avatar chozekun avatar dalgibbard avatar danielebailo avatar david-byng avatar dgibbard-cisco avatar epos-eu avatar fungiboletus avatar hadrien-toma avatar lightweight avatar maxhbr avatar noni73 avatar nuin avatar peteruithoven avatar psander-com avatar skade avatar splanquart avatar theobrigitte avatar tonklon avatar yiu31802 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

couchdb-dump's Issues

split: illegal option -- d

I am running the script on mac but I got the error split: illegal option -- d ERROR: Unable to create split files. I am trying to get a >2gb json file

Backup exits with exitcode 1 in silent mode

In silent mode the last line of backup mode causes the script to exit with exit 1

$echoVerbose && echo "... INFO: Export completed successfully. File available at: ${file_name}"
This is caused by the $echoVerbose.
In non silent mode the echo command will result in exit 0

Doesn't work in OS X 10.10

Great app! Just has some issues with BSD-style commands:

[zero]couchdb-dump(master)→ ./couchdb-backup.sh -H thehost.whatever.com -d delivery-index -f db.json -b
./couchdb-backup.sh: line 157: nproc: command not found
expr: syntax error
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.2M    0 12.2M    0     0  1000k      0 --:--:--  0:00:12 --:--:--  799k
... INFO: File contains Windows carridge returns- converting...
du: illegal option -- B
usage: du [-H | -L | -P] [-a | -s | -d depth] [-c] [-h | -k | -m | -g] [-x] [-I mask] [file ...]
... ERROR: checkdiskspace() was not passed the correct arguments.
[zero]couchdb-dump(master %)→ uname -a
Darwin zero.local 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec 22 23:10:38 PST 2014; root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64

Backup & Restore database with docs with binary attachments

Great work on these scripts! Thank you for writing these and sharing. It's great to be able to just grab a copy of a whole CouchDB database.

Did you design the scripts in mind to handle binary attachments? We seem to able to download a full database that includes binary attachments. The backup file and the original database size are similar. However, when I try to restore this database dump with binary files, I get the following error:

{"error":"bad_request","reason":"invalid UTF-8 JSON"}

Have you come across the above error before?

I have been able to restore pure JSON (non binary) databases without any trouble.

Thank you!

Graham
CTO, Telephonic

Improvement - Combine both codesets into a single utility

As per the title; a lot of the code is shared within these two scripts. We could likely merge it without much hassle. Although, if we do, we should look to push most of the code into their own functions rather than running the code serially :)

Error on OS X, line 497

I get these errors on OS X 10.11.4:

./couchdb-backup.sh: line 497: syntax error near unexpected token `<'
./couchdb-backup.sh: line 497: `        done < <(cat ${design_file_name})'

Document update conflict upon restore

Hi,
I'm trying to use your script to dump dbs: everything's fine during export, but when I try to restore the DB,

["... WARN: CouchDB Reported an error during import - Attempt 1/3 - Retrying...","... WARN: CouchDB Reported an error during import - Attempt 2/3 - Retrying...","... ERROR: CouchDB Reported: {"error":"conflict","reason":"Document update conflict."}"]

Am I missing something? Do I have to delete records manually before restore?
Thank you

When dumping doesn't escape " and '

When dumping the characters ' and " if they are present in the values are not correctly escaped.

Dumped a db and wanted to re-import it again and the import command gave me this error
{"error":"bad_request","reason":"invalid_json"}

Is it possible to only append the delta of a file

Hi,
I'm wondering if it's possible to only append the delta to a file and not to create a new file each time? I mean the option is currently not available, however did you guys considered to add it in the future?
Thanks in advance!

Stuck at 'Stage 1 - Document filtering'

I'm using couchdb-dump version: 1.1.7

I have a database, which is successfully downloaded to a file (39MB), but it get's stuck at Stage 1 - Document filtering.

... INFO: Output file bob.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 38.2M    0 38.2M    0     0  10.0M      0 --:--:--  0:00:03 --:--:-- 10.0M
... INFO: File may contain Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering

Since it's below 250MB, the parsing isn't multi-threaded.

I'm assuming it's stuck at the sed line:

$sed_cmd ${sed_edit_in_place} 's/.*,"doc"://g'

Could someone what's the purpose of removing .*,"doc":? Is this the Database Compaction or Purge Historic and Deleted Data logic?

Looking into the json file, it removed the following part on each line.

{"id":"...","key":"...","value":{"rev":"..."},"doc":

I think a comment above that code is welcome.

I'm assuming my issue is caused by binary attachments in all the docs.

I don't think I'm helped with #31, since I do want this to happen.

Improvement - Versioning contained within the code

It would probably be a reasonable idea to set 'version=x.x' within the code, and offer this within the arguments that can be passed; ie:

./couchdb-dump.sh -V
CouchDB Dump - Version x.x

This will assist with identifying which version of the code someone is running if/when they report an issue.

On the fly compression for backups

Brain dump:
curl to stdout and gzip the stream; could monitor curl status still through pipefail.
Would need to figure out a more sensible way to check for JSON errors returned from CouchDB (curl sees them as successful, but CouchDB reports an error) - HTTP headers maybe?; and import would need to check the file type and treat gzipped backups accordingly. Plus... Ability to chose the compression agent.

In relation to chunked backups, it might be a bit fiddly though.

... ERROR: Insufficient Disk Space Available:

The dump seems to save the file, but i get the following error in the terminal:

couchdb-backup.sh: line 74: [: -ge: unary operator expected

... ERROR: Insufficient Disk Space Available:
        * Full Path:           food.json
        * Affected Directory:   food.json
        * Space Available:       KB
        * Total Space Required: 294 KB
expr: syntax error
        * Additional Space Req:  KB

can dump all revisions docs

hello,

Couchdb-dump is perfect and work perfectly , thanks a lot.
But I have a little question , i want to dump all docs with all revisions ( for conserve history of docs) , on couchdb 2 cluster. It is possible with update_seq option ?

Thanks .

Victor.

Please add license

Hi,
really nice script, but could you please upload a license for it (e.g. Apache 2.0)?
I really like that script and would like to use it at work. But I can only use it, if there is a license attached which allows commercial usage...

Thanks in advance
Konrad

[Improvement] During batched import, offer to resume on failed file import

Our Production DB is pretty big/fiddly (around 7million documents+), and when re-importing this, I've sometimes hit CouchDB errors halfway or so through - meaning I have to delete the DB, clean up the files, and start afresh. It would be useful if it detects failure, and offers the user to retry a few times (enabling the user to restart the DB or whatever as appropriate before retrying) before failing - when our imports are taking up to 3hrs to complete, a failure halfway means 1.5hrs of lost time... :(

ERROR: Curl encountered an issue whilst dumping the database

I am running this code in file sichereBbAufDropbox.sh:

# dumped artendb
# stellt dem Filenamen das Datum voran
# komprimiert das File
# kopiert das File auf die dropbox
# entfernt das File
FILENAME=$(date +"%Y-%m-%d_%H-%M-%S_artendb_dump.txt")
FILENAME_GZ=$FILENAME.tar.gz
/home/alex/backup/couchdb-backup.sh -b -H http://localhost:5984 -d artendb -f $FILENAME -u admin -p secret -P 5984
tar cvzf $FILENAME_GZ $FILENAME
/home/alex/backup/dropbox_uploader.sh upload $FILENAME_GZ $FILENAME_GZ
rm $FILENAME
rm $FILENAME_GZ

This is the output:

alex@ae-2018-01:~/backup$ bash sichereBbAufDropbox.sh
... INFO: Output file 2018-01-07_12-05-23_artendb_dump.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  197M    0  197M    0     0  4962k      0 --:--:--  0:00:40 --:--:--     0
curl: (18) transfer closed with outstanding read data remaining
... ERROR: Curl encountered an issue whilst dumping the database.

I am using:

  • ubuntu 16.04.3
  • couchdb 2.2.1
  • curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
  • couchdb-backup.sh downloaded today

This happens on a newly installed server. It used to work on the last server with the same db. I run similar backups on two other servers and they work fine.

separating out design documents not working on OS X

.. INFO: Separating Design documents ... INFO: Duplicating original file for alteration ... INFO: Stripping _design elements from regular documents ... INFO: Fixing end document ... INFO: Inserting Design documents ... INFO: Successfully imported 0 Design Documents ... INFO: Small dataset. Importing as a single file. {"error":"bad_request","reason":"Missing JSON list of 'docs'"}

This is the output i'm getting. I'm wondering it there is something wrong with the sed command?

Problems with usernames and passwords (authorization failed)

How carefully has this script been tested with usernames and passwords? I ask because I am attempting to make a backup of a server that requires a username and password, and I keep getting authentication errors. I worry that this code never worked.

Upon investigation, I found that the following code caused problems (approx. line 273):

if [ ! "x$username" = "x" ]&&[ ! "x$password" = "x" ]; then
    curlopt="${curlopt} -U '${username}:${password}'"
fi

Which I patched to (and which also worked, though it's not a robust solution. See the note below):

if [ ! "x$username" = "x" ]&&[ ! "x$password" = "x" ]; then
    curlopt="${curlopt} --user ${username}:${password}"
fi

Changes:

  1. Changed -U to --user. E.g. Changed from --proxy-user to --user
  2. The username & password single quotes are removed. With the single quotes, I think they are sent as part of the username/password.

Can you test that this change works and verify it works for you with servers that require a user name and password?

Note: This is not a complete fix. When you have usernames & password with special characters the curl command line will fail. The better solution is to refactor the curl command line and put the username/password quoting there, as is done with the curl URL. I can implement this once we know that the fix basically works.

Makes two output files.. second one with same name put with "" on the end..

Hi,
Thanks for a very useful script.
Just thought you might like to know of some slightly odd results..

I'm on a mac using Yosemite.
The script works great.. but oddly outputs TWO files.

For example: a commandline like this:

bash couchdb-backup.sh -b -H 127.0.0.1 -d my_users -f my_users.json

will produce two files:

my_users.json
my_users.json""

The json file produced appears to be just fine..
Its just the two files.. strikes me as odd...

Thanks again..

[Improvement] Cleanup output text + Temp file management

  • Fix 'Multithreading Parsing' to 'Multithreaded Parsing'
  • Rename ${file_name}.design to ${file_name}-design to match the nodesign filename
  • Ensure removal of ${file_name}-nodesign and ${file_name}-design on successful import (files to be retained for debug/analysis if import fails)

Improvement - Check Available Disk Space

We should ideally be checking for available disk space where possible.
Note:

  • We can't check this for the main export, as it's impossible to determine the completed output file size
  • Everytime we run sed etc, it will create a tempfile, roughly equal to that of the original file (minus whatever we're cutting out).
  • When running a restore which contains design files, we're creating a clone of the DB file (minus _design docs) so that we're not amending the original export (in case something fails, user wants to import again etc). We then run sed on that... which means at that point we need to account for approx "<DB_FILE_SIZE>*3" available disk space.

[BUG] - Design documents cause insert failures

In a CouchDB Database which has _design documents defined (data restrictions, views etc), when exporting the JSON using ./couchdb-dump, these special document types are appended to the end of the JSON dump.
_bulk_docs can't handle these, so the last split file to be inserted fails for all documents contained within it.

The fix here is to break out all of the _design documents from the exported JSON when we want to restore the data, and handle these first.

NOTE: I have the fix for this already; i'll request a merge in a little while.

Error when restoring: POST body must include `docs` parameter.

I have backed up my db using this command:

./couchdb-backup.sh -b -H http://localhost:5984 -d artendb -f $FILENAME -u name -p password -P 5984

Now I try to restore using:

bash couchdb-backup.sh -r -H 127.0.0.1 -d artendb -f 2016-07-21_23-00-01_artendb_dump.json -u name -p password

But I get this output:

alex@pca:/mnt/c/Users/alexa/Downloads$ bash couchdb-backup.sh -r -H 127.0.0.1 -d artendb -f 2016-07-21_23-00-01_artendb_dump.json -u name -p password
... INFO: Checking for database
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   103    0   103    0     0  10567      0 --:--:-- --:--:-- --:--:-- 11444
... INFO: Checking for Design documents
... INFO: No Design Documents found for import.
... INFO: Block import set to 5000 lines.
... INFO: Generating files to import
... INFO: Header already applied to 2016-07-21_23-00-01_artendb_dump.json.splitaaa
... INFO: Adding footer to 2016-07-21_23-00-01_artendb_dump.json.splitaaa
... INFO: Inserting 2016-07-21_23-00-01_artendb_dump.json.splitaaa
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.8M  100    76  100 27.8M    108  39.7M --:--:-- --:--:-- --:--:-- 39.7M
... WARN: CouchDB Reported and error during import - Attempt 1/3 - Retrying...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.8M  100    76  100 27.8M    105  38.7M --:--:-- --:--:-- --:--:-- 38.7M
... WARN: CouchDB Reported and error during import - Attempt 2/3 - Retrying...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.8M  100    76  100 27.8M    109  40.0M --:--:-- --:--:-- --:--:-- 40.0M
... ERROR: CouchDB Reported: {"error":"bad_request","reason":"POST body must include `docs` parameter."}

This is the backup: https://www.dropbox.com/s/y9b2ztle2xuwwrk/2016-07-21_23-00-01_artendb_dump.txt.tar.gz?dl=0 (it was unzipped before using)

What could I be doing wrong?

I am currently running couchdb v2.2.0.
At the time of backing up it would have been a 1.6 version, I guess.

If which is not present, curl is sad to be not present.

## Check for curl
if [ "x`which curl`" = "x" ]; then
    echo "... ERROR: This script requires 'curl' to be present."
    exit 1
fi

some systems, for example lightweight containers don't have which, but have curl.

## Check for curl
curl --version > /dev/null
if [ "$?" != "0" ]
then
    echo "... ERROR: This script requires 'curl' to be present."
    exit 1
fi

. . should do the trick too.

couchdb-dump doesn't work with busybox's grep anymore

Since 2640981, the script fails silently when busybox's grep is used to do the backup. The U option is not recognized and it produces the following error :

grep: unrecognized option: U

However, it doesn't stop the script and produces a file. It's when you try to restore it that couchdb complains with the following error :

... ERROR: CouchDB Reported: {"error":"bad_request","reason":"invalid UTF-8 JSON"}

In my case, I used alpine linux to do the backups and it's busybox's grep by default but I was able to fix the issue by installing Gnu grup with the command apk add grep. However I have two weeks of backups that are not valid, as I didn't detect the issue because the script doesn't fail and I don't test restores frequently enough (my bad). Do you think it could be possible to fix the existing files so I can restore them ?

The line in question:

if grep -qU $'\x0d' $file_name; then

[Bug] Import attempts to load _design files when none are present

Found when importing a dataset with no design docs that the if statement in use wasn't matching correctly.

Example:

./couchdb-backup.sh -R -H 127.0.0.1 -u admin -p pass -d db6 -f pim.short.json
... INFO: Separating Design documents
... INFO: Duplicating original file for alteration
... INFO: Stripping _design elements from regular documents

sed: 1: "db.json": extra characters at the end of d command on macos 10.10

I try to backup remote couchdb, i use macos 10.10
bash couchdb-backup.sh -b -H 1.2.3.4 -d _users -f db._users.json -u uuuu -p pppp

here's the output:
... INFO: File contains Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
sed: 1: "db._users.json": extra characters at the end of d command

Invalid UTF-8 JSON error while restoring large file

I did a database export that has 44 MB / 33k lines. When restoring it, it is split into several split* files. When I try to restore the dump, I receive the following error:

[root@kazoo1 ~]# ./couchdb-dump.sh -a 1 -c -r -H localhost -d account%2F9b%2F7d%2Fa8712e54b4d596b51a1e74f58208 -f 9b7da8712e54b4d596b51a1e74f58208/account.json -P 15984
... INFO: Checking for database
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1613    0  1613    0     0   128k      0 --:--:-- --:--:-- --:--:--  131k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    12  100    12    0     0     74      0 --:--:-- --:--:-- --:--:--    74
... INFO: Checking for Design documents
... INFO: No Design Documents found for import.
... INFO: Block import set to 5000 lines.
... INFO: Generating files to import
... INFO: Header already applied to 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
... INFO: Adding footer to 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
... INFO: Inserting 9b7da8712e54b4d596b51a1e74f58208/account.json.splitaaa
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  533k  100    54  100  532k   1587  15.2M --:--:-- --:--:-- --:--:-- 15.7M
... ERROR: CouchDB Reported: {"error":"bad_request","reason":"invalid UTF-8 JSON"}

I checked split* and they are not valid JSON files, is that normal? What can I do for troubleshooting this?
CouchDB version: 2.1.2
Thanks!

Getting error - sed: 1: "test.json": undefined label 'est.json' (Mac OS X 10.10.4)

Im getting an error I don't understand (Mac OS X 10.10.4)

This is the command...
bash couchdb-backup.sh -b -H 127.0.0.1 -d test -f test.json

This is the output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 961k 0 961k 0 0 6875k 0 --:--:-- --:--:-- --:--:-- 6916k
... INFO: File contains Windows carridge returns- converting...
... INFO: Completed successfully.
... INFO: Amending file to make it suitable for Import.
... INFO: Stage 1 - Document filtering
sed: 1: "test.json": undefined label 'est.json'
Stage failed.

Seems like its tripping over something there that I can't make out. Any suggestions appreciated.

Improvement - DB export - Data Parsing could be Multi-threaded

Me again :)

Running the Sed statements after exporting the DB can take more than 5mins on a 2GB exported file - the main limitation being that sed is capped to a single CPU.
We should probably:

  • Count the number of CPUs, split the export into that many files, and run the sed across all split parts simultaneously for maximum performance (ie. forced multi-threaded processing)
  • Allow the end-user to define the CPU concurrency value manually (as long as it's less than CPU count - WARN if user tries to set it higher, and override to MAX setting instead) - this allows them to throttle it back to a limited number of CPUs in case they're running the export on a machine which is used for other things (which they don't want to impact)
  • Identify at which point (ie. filesize) this becomes useful, and not split it if below that.

Note that 'Header correction' and 'Final document line correction' will then only need to be applied to the first and last file splits respectively.
After finishing processing, split files should be re-merged to a single file.

Allow for batched exports with retry

When exporting very large datasets, it would be nice to break the export up, so that any failures can be reattempted, without the user needing to restart the job from scratch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.