Code Monkey home page Code Monkey logo

aurora-snapshot-tool's Introduction

Snapshot Tool for Amazon Aurora

The Snapshot Tool for Amazon Aurora automates the task of creating manual snapshots, copying them into a different account and a different region, and deleting them after a specified number of days. It also allows you to specify the backup schedule (at what times and how often) and a retention period in days. This version will only work with Amazon Aurora MySQL and PostgreSQL instances. For a version that works with other Amazon RDS instances, please visit the Snapshot Tool for Amazon RDS.

Getting Started

Building From Source and Deploying

You will need also to build from source and deploy the code for the Lambda functions to your own bucket in your own account. To build, you need to be on a unix-like system (e.g., macOS or some flavour of Linux) and you need to have make and zip.

  1. Create an S3 bucket to hold the Lambda function zip files. The bucket must be in the same region where the Lambda functions will run. And the Lambda functions must run in the same region as the RDS instances.

  2. Clone the repository using git or downloading from Github

  3. Edit the Makefile file and set S3DEST to be the bucket name where you want the functions to go. Set the AWSARGS, AWSCMD and ZIPCMD variables as well.

  4. Type make at the command line. It will call zip to make the zip files, and then it will call aws s3 cp to copy the zip files to the bucket you named.

  5. Be sure to use the correct bucket name in the CodeBucket parameter when launching the stack in both accounts.

To deploy on your accounts, you will need to use the Cloudformation templates provided. snapshot_tool_aurora_source.json needs to run in the source account (or the account that runs the Aurora clusters) snapshot_tool_aurora_dest.json needs to run in the destination account (or the account where you'd like to keep your snapshots)

IMPORTANT Run the Cloudformation templates on the same region where your Aurora clusters run (both in the source and destination accounts). If that is not possible because AWS Step Functions is not available, you will need to use the SourceRegionOverride parameter explained below.

Source Account

Components

The following components will be created in the source account:

  • 3 Lambda functions (TakeSnapshotsAurora, ShareSnapshotsAurora, DeleteOldSnapshotsAurora)
  • 3 State Machines (Amazon Step Functions) to trigger execution of each Lambda function (stateMachineTakeSnapshotAurora, stateMachineShareSnapshotAurora, stateMachineDeleteOldSnapshotsAurora)
  • 3 Cloudwatch Event Rules to trigger the state functions
  • 3 Cloudwatch Alarms and 1 associated SNS Topic to alert on State Machines failures
  • A Cloudformation stack containing all these resources

Installing in the source account

Run snapshot_tool_aurora_source.json on the Cloudformation console. You wil need to specify the different parameters. The default values will back up all Aurora clusters in the region at 1AM UTC, once a day. If your clusters are encrypted, you will need to provide access to the KMS Key to the destination account. You can read more on how to do that here: https://aws.amazon.com/premiumsupport/knowledge-center/share-cmk-account/

Here is a break down of each parameter for the source template:

  • BackupInterval - how many hours between backup

  • BackupSchedule - at what times and how often to run backups. Set in accordance with BackupInterval. For example, set BackupInterval to 8 hours and BackupSchedule 0 0,8,16 * * ? * if you want backups to run at 0, 8 and 16 UTC. If your backups run more often than BackupInterval, snapshots will only be created when the latest snapshot is older than BackupInterval

  • ClusterNamePattern - set to the names of the clusters you want this tool to back up. You can use a Python regex that will be searched in the cluster identifier. For example, if your clusters are named prod-01, prod-02, etc, you can set ClusterNamePattern to prod. The string you specify will be searched anywhere in the name unless you use an anchor such as ^ or $. In most cases, a simple name like "prod" or "dev" will suffice. More information on Python regular expressions here: https://docs.python.org/2/howto/regex.html

  • DestinationAccount - the account where you want snapshots to be copied to

  • LogLevel - The log level you want as output to the Lambda functions. ERROR is usually enough. You can increase to INFO or DEBUG.

  • RetentionDays - the amount of days you want your snapshots to be kept. Snapshots created more than RetentionDays ago will be automatically deleted (only if they contain a tag with Key: CreatedBy, Value: Snapshot Tool for Aurora)

  • ShareSnapshots - Set to TRUE if you are sharing snapshots with a different account. If you set to FALSE, StateMachine, Lambda functions and associated Cloudwatch Alarms related to sharing across accounts will not be created. It is useful if you only want to take backups and manage the retention, but do not need to copy them across accounts or regions.

  • SourceRegionOverride - if you are running Aurora on a region where Step Functions is not available, this parameter will allow you to override the source region. For example, at the time of this writing, you may be running Aurora in Northern California (us-west-1) and would like to copy your snapshots to Montreal (ca-central-1). Neither region supports Step Functions at the time of this writing so deploying this tool there will not work. The solution is to run this template in a region that supports Step Functions (such as North Virginia or Ohio) and set SourceRegionOverride to us-west-1. IMPORTANT: deploy to the closest regions for best results.

  • CodeBucket - this parameter specifies the bucket where the code for the Lambda functions is located. The Lambda function code is located in the lambda directory. These files need to be zipped and on the *root of the bucket or the CloudFormation templates will fail. Please see instructions below on how to build this file hierarchy

  • DeleteOldSnapshots - Set to TRUE to enable functionanility that will delete snapshots after RetentionDays. Set to FALSE if you want to disable this functionality completely. (Associated Lambda and State Machine resources will not be created in the account). WARNING If you decide to enable this functionality later on, bear in mind it will delete all snapshots, older than RetentionDays, created by this tool; not just the ones created after DeleteOldSnapshots is set to TRUE.

  • ShareSnapshots - Set to TRUE to enable functionality that will share snapshots with DestAccount. Set to FALSE to completely disable sharing. (Associated Lambda and State Machine resources will not be created in the account.)

  • SnapshotNamePrefix - Set a name that will be added to the front of the snapshot identifiers when created, so that they are formatted as Addname-ClusterIdentifier-Timestamp. Useful if you need to share snapshots from multiple accounts and need to identify from which account they came from. Use 'NONE' or leave empty if you do not need a prefix (default)

Destination Account

Components

The following components will be created in the destination account:

  • 2 Lambda functions (CopySnapshotsDestAurora, DeleteOldSnapshotsDestAurora)
  • 2 State Machines (Amazon Step Functions) to trigger execution of each Lambda function (stateMachineCopySnapshotsDestAurora, stateMachineDeleteOldSnapshotsDestAurora)
  • 2 Cloudwatch Event Rules to trigger the state functions
  • 2 Cloudwatch Alarms and associated 1 SNS Topic to alert on State Machines failures
  • A Cloudformation stack containing all these resources

On your destination account, you will need to run snapshot_tool_aurora_dest.json on the Cloudformation. As before, you will need to run it in a region where Step Functions is available. The following parameters are available:

  • DestinationRegion - the region where you want your snapshots to be copied. If you set it to the same as the source region, the snapshots will be copied from the source account but will be kept in the source region. This is useful if you would like to keep a copy of your snapshots in a different account but would prefer not to copy them to a different region.
  • CrossAccountCopy - if you only need to copy snapshots across regions and not to a different account, set this to FALSE. When set to FALSE, any snapshots shared with the account will be ignored.
  • SnapshotPattern - similar to ClusterNamePattern. See above
  • DeleteOldSnapshots - Set to TRUE to enable functionanility that will delete snapshots after RetentionDays. Set to FALSE if you want to disable this functionality completely. (Associated Lambda and State Machine resources will not be created in the account). WARNING If you decide to enable this functionality later on, bear in mind it will delete ALL SNAPSHOTS older than RetentionDays created by this tool, not just the ones created after DeleteOldSnapshots is set to TRUE.
  • KmsKeySource KMS Key to be used for copying encrypted snapshots on the source region. If you are copying to a different region, you will also need to provide a second key in the destination region.
  • KmsKeyDestination KMS Key to be used for copying encrypted snapshots to the destination region. If you are not copying to a different region, this parameter is not necessary.
  • RetentionDays - as in the source account, the amount of days you want your snapshots to be kept. Do not set this parameter to a value lower than the source account. Snapshots created more than RetentionDays ago will be automatically deleted (only if they contain a tag with Key: CopiedBy, Value: Snapshot Tool for Aurora)

Updating

This tool is fundamentally stateless. The state is mainly in the tags on the snapshots themselves and the parameters to the CloudFormation stack. If you make changes to the parameters or make changes to the Lambda function code, it is best to delete the stack and then launch the stack again.

Authors

License

This project is licensed under the Apache License - see the LICENSE.txt file for details

aurora-snapshot-tool's People

Contributors

dougneal avatar jinnko avatar larslevie avatar lefthand avatar mrcoronel avatar ninetyninenines avatar polaskj avatar rts-rob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aurora-snapshot-tool's Issues

Retention days parameter in the dest account not working as expected

Hello,
I have setup the snapshot tool for both Aurora postgres and Aurora Mysql and for both of them I set the retentionDays to 3 in the destination account. The source account has the retention of 1 day.

I only see one snapshot in the destination account/region. I am not sure why it is deleting the other snapshots when we specified the 3 day retention in the destination. Can you please let me know If something else needs to be done.

Same account, different region

Hello, I'm trying to use this project to copy Aurora snapshots from source 1 region (us-east-1) to dest region (us-west-2) within the same AWS account.

Source snapshot is being created properly every day, but I see nothing in the dest region.

I'm wondering if I need a DB created in that region, because currently it is empty. I just wanted to have the snapshots replicated in another region.

Any hint?

Thanks.

Failed to create stack

Hi Expert,

My DR plan is:

  1. do snapshot in us-west-1 region, us-west-1 is source.
  2. copy snapshot to us-west-2 region. us-west-2 is destination

Because this is DR plan, I would like to put snapshot tool to different region instead of us-west-1. I supposed we cannot use us-west-1 any more when something happened, that's mean I cannot use S3 in us-west-1.

so I created a S3 in us-west-2 for us-west-1. But I failed to create stack in us-west-1.

Here is my mapping:
"Mappings": {
"Buckets": {
"us-east-1": {
"Bucket": "snapshots-tool-aurora-us-east-1"
},
"us-west-1": {
"Bucket": "xxxx-db-dr-tools"
},
"us-west-2": {
"Bucket": "snapshots-tool-aurora-us-west-2"
},
"us-east-2": {
"Bucket": "snapshots-tool-aurora-us-east-2"
},
"ap-southeast-2": {
"Bucket": "snapshots-tool-aurora-ap-southeast-2"
},
"ap-northeast-1": {
"Bucket": "snapshots-tool-aurora-ap-northeast-1"
},
"eu-west-1": {
"Bucket": "snapshots-tool-aurora-eu-west-1"
},
"eu-west-2": {
"Bucket": "snapshots-tool-aurora-eu-west-2-real"
},
"eu-central-1": {
"Bucket": "snapshots-tool-aurora-eu-central-1"
}
}
},

Here is error information:

10:28:07 UTC-0700 ROLLBACK_IN_PROGRESS AWS::CloudFormation::Stack snapshot-stack-ben The following resource(s) failed to create: [lambdaTakeSnapshotsAurora, lambdaDeleteOldSnapshotsAurora]. . Rollback requested by user.
10:28:06 UTC-0700 CREATE_FAILED AWS::Lambda::Function lambdaTakeSnapshotsAurora Resource creation cancelled
10:28:06 UTC-0700 CREATE_FAILED AWS::Lambda::Function lambdaDeleteOldSnapshotsAurora Error occurred while GetObject. S3 Error Code: PermanentRedirect. S3 Error Message: The bucket is in this region: us-west-2. Please use this region to retry the request (Service: AWSLambda; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: dc5c1ed1-c343-11e8-a7e0-adf5d5e6a444)

Ben

Support for Aurora PostgreSQL?

The README.md says it's Aurora MySQL only. But then there is also #3 which is closed and says it appears PostgreSQL is also functional.

Can you tell me whether it is the case that this tools does support Aurora PostgreSQL? If so, then the README.md probably needs updating.

Thanks!

Incomplete pagination prevents sharing

Problem

When an account has more than 25 Aurora cluster snapshots of any type, snapshots are no longer copied over to the target account.

Root Cause

snapshots_tool_utils.py uses custom pagination code. Somehow this pagination is limited to 25 rows. Once an account has more than 25 manual backups, snapshots are created but not shared, and thus are not copied over.

Proposed Solution

Modify snapshots_tool_utils.py to use the built-in Boto paginator. A PR will be submitted against this issue.

Missing rds:addTagsToResources

the problem:

is not authorized to perform: rds:AddTagsToResource on resource: <ResourceArn>

Appeared on:

  • dest-lambda Copy Snapshots Aurora
  • source-lambda Backup Snapshots Aurora

I solved it now by adding the permission in the template under iamroleSnapshotsAurora and it worked fine afterwards

KMS ID or KMS ARN

Hi!

I'm having a hard time getting this tool to work as it keeps giving errors about kms keys not being found, but I am not completely sure how to debug such errors.

What values are required for setting up the KMS key configuration? Should I use the KMS Key ID or the KMS Key ARN?

Occasional rate limiting errors

We are running this tool with quite a lot of snapshots being generated (once an hour, kept for 7 days for multiple clusters).

In this configuration, we are seeing the occasional errors due to rate limiting:

[ERROR]	2018-10-08T02:30:33.680Z	6153e97f-733f-4764-899c-a89c052312c8	Exception sharing dev-aurora-cluster20181001042229986400000001-2018-10-04-23-00: An error occurred (Throttling) when calling the ModifyDBClusterSnapshotAttribute operation (reached max retries: 4): Rate exceeded

or

An error occurred (Throttling) when calling the ListTagsForResource operation (reached max retries: 4): Rate exceeded: ClientError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 49, in lambda_handler
ResourceName=snapshot_arn)
File "/var/runtime/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (Throttling) when calling the ListTagsForResource operation (reached max retries: 4): Rate exceeded

This happens often enough that it not only fails the Lambda but also the Step Function after retries.

Is there a way to reduce the number of calls the tool makes or are we simply going beyond the limit of what's possible with the current implementation?

Need overly permissive policy to avoid KMSKeyNotAccessibleFault

Hi!

If I add the permissions to the KMS Key as described in https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying-external-accounts.html

{ "Sid": "Allow an external account to use this CMK", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::444455556666:root" ] }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" }

I get the following error when the lambda for local copy runs:
[ERROR] 2020-03-31T09:00:41.188Z An error occurred (KMSKeyNotAccessibleFault) when calling the CopyDBClusterSnapshot operation: The source snapshot KMS key [arn:aws:kms:eu-west-1:accountnumber:key/keynumber] does not exist, is not enabled or you do not have permissions to access it.

However, if I do the same thing, but change the "Action" to '*', like this:
{ "Sid": "Allow an external account to use this CMK", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::444455556666:root" ] }, "Action": "*", "Resource": "*" }

It works without issues.

What permission might be missing? I am not comfortable with this overly permissive policy...

Thanks in advance!

Same-Account snapshot copies

It doesn't look like same-account snapshot copying is available. Am I missing something? Is there a way to do this? Thanks!

Including automated snapshots increases run time and cost.

Problem

Automated snapshots are parsed by the tool, even though only manual snapshots can be shared and copied. This results in unnecessary pagination and additional runtime, which leads to additional cost.

Root Cause

Calls to describe_db_cluster_snapshots are not filtered by snapshot type.

Proposed Solution

Modify snapshot pagination calls to only apply to manually-created snapshots via the SnapshotType parameter.

Snapshot issue

Every time step funcion runs for snapshot making on one instance it trows "errorMessage": "Could not back up every cluster. Backups pending: 1" if I run it on multiple instances "errorMessage": "Could not back up every cluster. Backups pending: 3" any idea hot to solve it?

Public bucket access

Is there a way to create the S3 bucket so that it doesn't open up public access and still allows the Makefile, specifically this line, to work? I can only figure out how to make it work with a publicly accessible bucket.

Related question: why is that line necessary? I tried commenting it out, and everything seemed to work, at least with my setup. I acknowledge that it's not a security vulnerability, in that only open-source code is stored in the bucket, but it doesn't seem to follow the principle of least privilege.

Thanks in advance!

How to check destination settings?

Hello,

I have setup one stack in us-west-1 region for source snapshot, and setup one stack in us-west-2 region for destination snapshot, I found two snapshots were created correctly in us-west-1 region, but I did not find any snapshots that were copied to us-west-2 region.

Here are destination settings:
Key | Value | Resolved Value
CodeBucket | DEFAULT_BUCKET |  
CrossAccountCopy | FALSE |  
DeleteOldSnapshots | TRUE |  
DestinationRegion | us-west-2 |  
KmsKeyDestination | None |  
KmsKeySource | None |  
LogLevel | ERROR |  
RetentionDays | 7 |  
SnapshotPattern | snapshot-db-dr |  
SNSTopic |   |  
SourceRegionOverride | NO

Here are source settings:
Key | Value | Resolved Value
BackupInterval | 2 |  
BackupSchedule | 0 19,21,23 * * ? * |  
ClusterNamePattern | snapshot-db-dr |  
CodeBucket | DEFAULT_BUCKET |  
DeleteOldSnapshots | TRUE |  
DestinationAccount | 000000000000 |  
LogLevel | ERROR |  
RetentionDays | 7 |  
ShareSnapshots | FALSE |  
SNSTopic |   |  
SourceRegionOverride | NO

Thanks,
Ben

Unable to copy AWS managed KMS encrypted snapshots between accounts

I have an Aurora cluster, encrypted with the AWS managed KMS key. The snapshot tool successfully creates a snapshot in the source account, and shares it with the backup account, however the copy into the backup account fails with:

The source snapshot KMS key [arn:aws:kms:(source account arn)] does not exist, is not enabled or you do not have permissions to access it.

Being an AWS managed key, I'm (as far as I can tell) unable to grant the backup account permissions to use it.

I'd much rather not recreate the cluster with a CMK. Is there a a way of working around this?

Patterns match against snapshot instead of cluster

Pattern matching in the share function matches against the DBClusterSnapshotIdentifier instead of the DBClusterIdentifier. This results in orphaned snapshots which are not shared, and therefore not copied to the destination account.

An example:

The pattern .+(-production)$ matches all clusters that end in -production. Given an Aurora cluster django-production and an Aurora cluster django-production-reporting, we want to copy the first but not the second.

The pattern matches correctly when taking snapshots, but appends YYYY-MM-DD-hh-mm to the DBClusterSnapshotIdentifier. When the share function executes, the DBClusterSnapshotIdentifier does not match the pattern, so the snapshot is not shared.

Snapshot clusters by tag

Cloudformation does not currently support setting the cluster identifier. This makes snapshotting based on this name less useful. Would it be possible to snapshot based on a tag?

Parameters: [CodeBucket] must have values

There is an error when applying the template:

Parameters: [CodeBucket] must have values

Documentation for that parameter says

Name of the bucket that contains the lambda functions to deploy. Leave the default value to download the code from the AWS Managed buckets

Add option to use a different KMS key when sharing

This project gave me a great starting point, so foremost thank you for that!
Just as reported in the sister project (eg: awslabs/rds-snapshot-tool#60), my use case was to share a snapshot with a different account. Since some of the clusters are using the default KMS key this doesn't really work, as the destination account can never access the needed KMS key and therefore can't make a local copy.

To fix this I implemented an extra copy step in the source account (after the take snapshot) to bring all the snapshots to use the same KMS key and then share them. It's potentially less efficient as it generates an extra snapshot copy, but it makes the process generic after that point.

Maybe it's something that can be added to the project, but it would make the generic solution more complex.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.