The dynamodb-replicator's discuss from mapbox

dyno failing to restore incremental backup?

simple test case with a small table:

ENV=dev
TABLE=dsrtest2

. config.env.$ENV

bin/incremental-backfill.js $AWS_REGION/$TABLE s3://$BackupBucket/$BackupPrefix

bin/incremental-snapshot.js s3://$BackupBucket/$BackupPrefix/$TABLE  s3://$BackupBucket/${TABLE}-snapshot

s3print s3://$BackupBucket/${TABLE}-snapshot | dyno put $AWS_REGION/dsr-test-restore-$TABLE

%% sh test-backup.sh
12 - 11.89/s[Fri, 09 Dec 2016 23:54:59 GMT] [info] [incremental-snapshot] Starting snapshot from s3://dsr-ddb-rep-testing/testprefix/dsrtest2 to s3://dsr-ddb-rep-testing/dsrtest2-snapshot
[Fri, 09 Dec 2016 23:55:01 GMT] [info] [incremental-snapshot] Starting upload of part #0, 0 bytes uploaded, 12 items uploaded @ 6.26 items/s
[Fri, 09 Dec 2016 23:55:01 GMT] [info] [incremental-snapshot] Uploaded snapshot to s3://dsr-ddb-rep-testing/dsrtest2-snapshot
[Fri, 09 Dec 2016 23:55:01 GMT] [info] [incremental-snapshot] Wrote 12 items and 148 bytes to snapshot
undefined:1
�
^

SyntaxError: Unexpected token  in JSON at position 0
    at Object.parse (native)
    at Function.module.exports.deserialize (/Users/draistrick/git/github/dynamodb-replicator/node_modules/dyno/lib/serialization.js:49:18)
    at Transform.Parser.parser._transform (/Users/draistrick/git/github/dynamodb-replicator/node_modules/dyno/bin/cli.js:94:25)
    at Transform._read (_stream_transform.js:167:10)
    at Transform._write (_stream_transform.js:155:12)
    at doWrite (_stream_writable.js:307:12)
    at writeOrBuffer (_stream_writable.js:293:5)
    at Transform.Writable.write (_stream_writable.js:220:11)
    at Stream.ondata (stream.js:31:26)
    at emitOne (events.js:96:13)

Next step would be to diff the two tables - but the pipe to dyno fails. I've tried 1.0.0 and 1.3.0 with the same result.

What data format is dyno expecting? The file on s3 (tried multiple tables including real data tables) is a binary blob?

cheese:~%% aws --region=us-west-2 s3 cp s3://dsr-ddb-rep-testing/dsrtest-snapshot -
m�1�
��ߠl�EG�EB�uL0\�Tuq�ݵ#������$L�6�/8�%Z�r�[d�p
���5h)��X�ֻ�j�ƪ�
 ۘ��&�WJ'❑��`�T��������􁒷
cheese:~%%

So maybe this is a problem with backfill? or I'm missing something? :)

2016-12-09 18:54:35 149 dsrtest-snapshot
2016-12-09 18:55:01 148 dsrtest2-snapshot
2016-12-09 18:37:20 1428 receipt_log_dev-01-snapshot
2016-12-09 18:53:15 13457328 showdownlive_dev-01-snapshot

License is marked in the package.json as ISC, but that is default

It would be great to get a LICENSE file in the repo, just for clarity.

Thanks!

Trouble setting up the Lambda Function

Hey,

Thanks for open sourcing this great resource.

I have been trying to set it up but I have the error below when setting the replicator as a Lambda function.

Would you be able to take a look and see if its something obvious :-)

I have manually set up the following Environmental variables like so:
process.env.ReplicaTable = "testTableReplica",
process.env.ReplicaRegion = "us-west-2"
process.env.ReplicaEndpoint = "https://dynamodb.us-west-2.amazonaws.com"

Are there any others I have missed?

I call index.replicate as the function name

I have tried diff-tables us-west-2/testTable us-west-2/testTableReplica --backfill and that worked without a hitch, so I am certain its not a difference in the tables etc

I am looking into using Streambot for real deployment which looks sweet as it removes the configuration from the code entirely. I figured the best way was to get basic example up and running first, then translate that into the streambot js

2015-12-04T09:16:14.357Z    8ff15f9e-ca79-478b-9794-1e528a76d52a    [failed-request] request-id: undefined | id-2: undefined | params:
{
    "RequestItems": {
        "testTableReplica": [
            {
                "PutRequest": {
                    "Item": {
                        "Id": {
                            "S": "ddddddd"
                        }
                    }
                }
            },
            {
                "PutRequest": {
                    "Item": {
                        "Id": {
                            "S": "hjhgfhgfhg"
                        }
                    }
                }
            }
        ]
    }
}

Cheers
Adrian

Cannot restore table backup done using dynamodb-replicator

I have executed command:
backup-table eu-west-1/ARTIFICIAL_APPLICANT_ID s3://some-s3

backup file is on the s3 but contains binary data not json

when I run
s3print s3://some-s3/ull/bba24099b07f53f5/0 | dyno import eu-west-1/TMP_ATSI_ARTIFICIAL_APPLICANT_ID

it says:
undefined:1
�
^

SyntaxError: Unexpected token in JSON at position 0
at JSON.parse ()
at Function.module.exports.deserialize (C:\Users\kamil.topolewski\AppData\Roaming\npm\node_modules\dyno\lib\serialization.js:49:18)
at Transform.Parser.parser._transform (C:\Users\kamil.topolewski\AppData\Roaming\npm\node_modules\dyno\bin\cli.js:94:25)
at Transform._read (_stream_transform.js:186:10)
at Transform._write (_stream_transform.js:174:12)
at doWrite (_stream_writable.js:387:12)
at writeOrBuffer (_stream_writable.js:373:5)
at Transform.Writable.write (_stream_writable.js:290:11)
at Stream.ondata (internal/streams/legacy.js:16:26)
at emitOne (events.js:116:13)
events.js:183
throw er; // Unhandled 'error' event
^

Error: write EPIPE
at _errnoException (util.js:1024:11)
at Socket._writeGeneric (net.js:767:25)
at Socket._write (net.js:786:8)
at doWrite (_stream_writable.js:387:12)
at writeOrBuffer (_stream_writable.js:373:5)
at Socket.Writable.write (_stream_writable.js:290:11)
at Socket.write (net.js:704:40)
at PassThrough.ondata (_stream_readable.js:639:20)
at emitOne (events.js:116:13)
at PassThrough.emit (events.js:211:7)

Test diff-record.js

I accidentally committed a script to spot check a single record across tables. I didn't mean to bring this in until I wrote tests for it, but look there it is.

Next steps

write tests for it
readme.md
package.json bin

Comparison of serialized features

mapbox/dyno#86 is exposing a Dyno.serialize() function that should better support the "new" dynamodb data types. We should use the string result of this function to compare two objects instead of an assert.deepEqual

Lets branch off from #21 to try this out.

cc @jakepruitt

No commit statuses or check runs found!

👋 Hey there! It's Changebot, and I help repositories follow our engineering best practices. My magic wand found some things I wanted to highlight for your review:

Item	Current status	Best practice guidelines
Number of status checks at time of merging	0	>= 1

Could not find any status checks for this PR: #108

Can you take a look at these best practices and make any adjustments if needed?

Please visit my status check docs if you have any questions.

Restore incremental backups?

Love the incremental backups to S3 that works really well.

What do you use for restoring these incremental backups?

i.e - Latest copy from S3 restored to DynamoDb,
or
Restoring a point in time from S3 to DynamoDb.

Replicate to a table with a different key schema

A bit of an unusual use case but in trying to replicate to destination table with an different key schema than the source table, this line is problematic. We are assuming both tables have identical key schema.

We could workaround this by allowing the user to define the destination key schema explicitly or by looking up the key schema for the destination table.

Not usable tool

It's nice that you have implemented such tool but there is not way to restore table from backup.

Questions:

why there is no restore option in dynamodb-replicator?
when I try to restore using dyno (which I assume can read these backups) I end up with error:

s3print s3://some-s3/ull/bba24099b07f53f5/0 | dyno put eu-west-1/TMP_ATSI_ARTIFICIAL_APPLICANT_ID
undefined:1
�
^

SyntaxError: Unexpected token in JSON at position 0
at JSON.parse ()
at Function.module.exports.deserialize (C:\Users\kamil.topolewski\AppData\Roaming\npm\node_modules\dyno\lib\serialization.js:49:18)
at Transform.Parser.parser._transform (C:\Users\kamil.topolewski\AppData\Roaming\npm\node_modules\dyno\bin\cli.js:94:25)
at Transform._read (_stream_transform.js:186:10)
at Transform._write (_stream_transform.js:174:12)
at doWrite (_stream_writable.js:387:12)
at writeOrBuffer (_stream_writable.js:373:5)
at Transform.Writable.write (_stream_writable.js:290:11)
at Stream.ondata (internal/streams/legacy.js:16:26)
at emitOne (events.js:116:13)
events.js:183
throw er; // Unhandled 'error' event
^

Error: write EPIPE
at _errnoException (util.js:1024:11)
at Socket._writeGeneric (net.js:767:25)
at Socket._write (net.js:786:8)
at doWrite (_stream_writable.js:387:12)
at writeOrBuffer (_stream_writable.js:373:5)
at Socket.Writable.write (_stream_writable.js:290:11)
at Socket.write (net.js:704:40)
at PassThrough.ondata (_stream_readable.js:639:20)
at emitOne (events.js:116:13)
at PassThrough.emit (events.js:211:7)

Underscore .isEqual fails for buffers

This equality check is inadequate for comparing two JavaScript objects that may include buffers. This leads to false-positive different-in-replica reports.

》node
> var u = require('underscore');
undefined
> var a = { hello: new Buffer('world') };
undefined
> var b = { hello: new Buffer('world') };
undefined
> u.isEqual(a, b)
false
> var assert = require('assert')
undefined
> assert.deepEqual(a, b)
undefined
>

incremental-snapshot creates extra output in snapshot

bin/incremental-snapshot.js s3://$BackupBucket/$BackupPrefix/$TABLE s3://$BackupBucket/${TABLE}-snapshot

sometimes this leaves empty lines in the snapshot output - for example:

aws s3 cp s3://$BackupBucket/${TABLE}-snapshot - | gzcat

{"what":{"S":"new1"},"a":{"S":"b"}}
{"what":{"S":"new2"},"a":{"S":"11"}}
{"what":{"S":"a"},"b":{"S":"ccd"}}
{"what":{"S":"test2"},"a":{"S":"asdf"}}
{"what":{"S":"new10"},"a":{"S":"b"}}
{"what":{"S":"sdfg"},"a":{"S":"asdf"}}
{"what":{"S":"asdf"},"aa":{"S":"bb"}}
{"what":{"S":"test"},"a":{"S":"test1"}}

{"what":{"S":"new"},"a":{"S":"fish faster 8"}}
{"what":{"S":"test4"}}
{"what":{"S":"b"},"a":{"S":"aa"},"b":{"S":"cc"}}

This one, anyway, is easy enough to handle during the uncompress ( | gzip | grep -v "^$) - but...

removed other case, found where that data came from and it was on me.

Update the package dependency "dyno" to use latest version

Current version of dyno package uses the flatmap-stream ; which got compromised by the bad merge of malicious code.
If you could update the package, it would be helpful.
Thanks!

Consider using Serverless

Serverless is a framework for deploying AWS Lambda projects. It also allows defining custom CloudFormation resources, and setting up Lambdas to trigger on DynamoDB events.

This might also help with #72 in that most of the documentation regarding deploying and setting up etc. can be grokked by reading Serverless' docs.

CLI tool to replicate a single record

If a record is identified as out-of-sync between the primary and replica tables, it would be convenient to be able to run a CLI command to bring them in sync.

Error Running diff-tables with --repair or --backfill

Hi,

First, thanks for a great tool. Provides a viable alternative to the relative black box that is the official AWS solution.

I have configured Dynamo replication using the replicator function with Lambda, but was keen to use the diff-tables script to attain a bit of confidence in what was being replicated. Unfortunately it fails whenever I attempt to pass the --repair or --backfill flag (i.e. to actually make any changes).

The stack trace is as follows:

/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:30
            throw err;
                  ^
TypeError: Object.keys called on non-object
    at Function.keys (native)
    at Response.<anonymous> (/usr/local/lib/node_modules/dynamodb-replicator/diff.js:206:32)
    at Request.<anonymous> (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:353:18)
    at Request.callListeners (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
    at Request.emit (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:595:14)
    at Request.transition (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:21:10)
    at AcceptorStateMachine.runTo (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:37:9)

Any input would be appreciated!

Even more logging

Debugging failed lambda invocations is a difficult thing. Some insights on what we need to log better:

Replication function

We will only run one replication per key, even if that key is affected more than once. We need a list of the unique keys, and a cross-checkable list of the keys that were affected
Keep a running count of number of records that have been successfully replicated for quick and easy comparison

Backup function

We run each change, even if that means running more than once per unique key. This means we need a list of each change-key combo, cross-checkable against the changes that have been implemented
I'm wondering if we should try and use a setTimeout to actually print a list of changes that failed to be implemented in 58s or something.
consider, for each change/key combination in an invocation, logging an md5sum of it. Its difficult to search cloudwatch logs for JSON objects, which is what you'd like to to in order to confirm that a change/key was retried.

gzip backups

This would be a good idea.

incremental-snapshot doesnt handle s3 timeouts well

incremental-snapshot.js doesnt seem to handle s3 timeouts very well - leaving a broken (partial, missing, or otherwise) snapshot in it's wake:

bin/incremental-snapshot.js s3://$BackupBucket/$BackupPrefix/$TABLE s3://$BackupBucket/${TABLE}-snapshot

[Tue, 10 Jan 2017 17:12:49 GMT] [info] [incremental-snapshot] Starting snapshot from s3://dsr-ddb-rep-testing/testprefix/showdownlive_gamedata_dev-01 to s3://dsr-ddb-rep-testing/showdownlive_gamedata_dev-01-snapshot
[Tue, 10 Jan 2017 17:12:59 GMT] [info] [incremental-snapshot] Starting upload of part #0, 0 bytes uploaded, 3000 items uploaded @ 297.65 items/s
[Tue, 10 Jan 2017 17:13:06 GMT] [error] [incremental-snapshot] TimeoutError: Connection timed out after 1000ms
    at ClientRequest.<anonymous> (/Users/draistrick/git/github/dynamodb-replicator/node_modules/aws-sdk/lib/http/node.js:56:34)
    at ClientRequest.g (events.js:286:16)
    at emitNone (events.js:86:13)
    at ClientRequest.emit (events.js:185:7)
    at TLSSocket.emitTimeout (_http_client.js:614:10)
    at TLSSocket.g (events.js:286:16)
    at emitNone (events.js:91:20)
    at TLSSocket.emit (events.js:185:7)
    at TLSSocket.Socket._onTimeout (net.js:333:8)
    at tryOnTimeout (timers.js:228:11)
    message: Connection timed out after 1000ms
    code: NetworkingError
    region: us-west-2
    hostname: dsr-ddb-rep-testing.s3-us-west-2.amazonaws.com

this case also exits 0, instead of with an error...so hard to handle externally

Could you provide directions how to deploy either of the lambdas?

incremental backup and incremental backfill generate different file names

Hi there!

First off, great library. It's super useful and a much better/simpler option (for me) than the whole EMR/Datapipeline situation.

I have this simple lambda function that is subscribed to the tables I want to update:
(the bucket, region, and prefix are set as env variables in the lambda function)

var replicator = require('dynamodb-replicator')
module.exports.streaming = (event, context, callback) => {
  return replicator.backup(event, callback)
}

Then I ran the backfill by importing dynamodb-replicator/s3-backfill and passing it a config object.

However, I noticed that when records get updated via the stream/lambda function, they are written to a different file from the one created by the backfill.

I see that the formula for generating filenames is slightly different.

\\backfilll
            var id = crypto.createHash('md5')
                .update(Dyno.serialize(key))
                .digest('hex');

\\backup
            var id = crypto.createHash('md5')
                .update(JSON.stringify(change.dynamodb.Keys))
                .digest('hex');

https://github.com/mapbox/dynamodb-replicator/blob/master/s3-backfill.js#L46-L48
https://github.com/mapbox/dynamodb-replicator/blob/master/index.js#L130-L132

Does this make any practical difference? Should the restore function work regardless?

dynamodb-replicator needs a team as a repo admin

👋 Hey there! It's Changebot, and I help repositories follow our engineering best practices. My magic wand found some things I wanted to highlight for your review:

Item	Current status	Best practice guidelines
Teams enabled on dynamodb-replicator	None	Your team

To follow least privilege best practices, please add your team as the repo admin.

Can you take a look at these best practices and make any adjustments if needed?

Please tag @mapbox/security-and-compliance on this issue if you have any questions

Documentation: real world user documentation

would it be possible to get some real world complete setup examples for using this tool?

Before I go further - I appreciate the hard work involved in getting this tool this far, and don't take my critical comments below in the wrong context. I'm trying to help improve the user experience for possibly the best and only complete replication and backup/restore tool for dynamo that exists today!! :)

The current documentation is just a marketing glossy - a user doesn't even have a feature-to-tool map. Where is "A replicator function that processes events from a DynamoDB stream" ? What file? I have to go become an expert in node, lambda, ddb streams, streambot to be able to consume this project.

A more complete walkthrough (even if it's only an example setup) would be great - aws cli, aws web console, just something to help a user understand all of the pieces required (and to know how to skip the pieces that are not required).

How do I setup/config DDB's streams for the purpose of using this tool?

How do I setup lambda?

IAM as it applies to the tool and relate services?

s3 as it relates to the tool? (specific bucket setup requirements for this usage?)

Other areas of concern - How does using the tool for replication, and for backups, impact ddb scaling in various scenarios?

thanks - I'd love to use it, but trying to figure out how to use this is going to be a huge undertaking and trial/error event.....

Even just a high level sketch of the pieces, without tons of detail, would be a great starting point for us to contribute to.

New serialization for backups

mapbox/dyno#86 is exposing a Dyno.serialize() function that backups should use on each item before writing it to S3. This should better support the "new" dynamodb data types.

Lets branch off from #21 to try this out.

cc @jakepruitt

Cross-Account Replication

Hello,

Do you have any plans to include cross-account replication between DynamoDB tables?

Thanks!
Pierre

Throughput Exceptions

Hi guys,

I've gotten some throughput exceeded exceptions which are fine because I was provisioning my tables with a very low number, but is there some kind of exponential backoff feature in the tool ? ( I didn't see any, maybe I looked wrong)

Also, is there a way you guys limit the capacity used by the tool ?

Any suggestion welcome !

Thank you

dynamodb-incremental backup

Hey,

Thanks a lot for creating this utility, it would be really helpful, if you go can go through below steps and let me know what I am missing.

I have configured the utility for taking the incremental backup for dynamodb table, I am not sure the series of steps required for successfully implement incremental backup, below are the steps I followed -

execute incremental-backfill, which created the single file for all items in table in s3 location.
enabled version control for the s3 bucket location.
enabled streams on dynamodb table.
created lambda function for capturing the update/delete/insert from the stream for the dynamodb table.
performed updates on few items in the table.
executed incremental-backfill, to take the backup again.

While executing step 6, all the items were backed again while only updated items should have been backed up.

I am not sure what should be the next step for successful implementation of the utility.

replicator wihout streambot?

any plans for a future release to make this work without streambot, now that streambot is deprecated? This could make setup much more straightforward... :)

mapbox/streambot@7675dc0

mapbox / dynamodb-replicator Goto Github PK

dynamodb-replicator's Issues

Next steps

Replication function

Backup function

Recommend Projects

Recommend Topics

Recommend Org