Code Monkey home page Code Monkey logo

etcd-aws's Introduction

etcd-aws

This repository contains tools for building a robust etcd cluster in AWS.

It uses CloudFormation to establish a three node autoscaling group of etcd instances. In case of the failure of a single node, the cluster remains available and the replacement nodes are integrated automatically into the cluster. Each node in the cluster can be replaced by a new node, one at a time, and the cluster remains available. In the event of failure of all nodes simultaneously, the cluster recovers from the backup stored in S3 without intervention.

Please see this blog post for more on how this little utility came to be.

Invoking the etcd-aws program will configure and launch etcd based on the current autoscaling group:

etcd-aws

It is also available as a Docker container:

/usr/bin/docker run --name etcd-aws \
  -p 2379:2379 -p 2380:2380 \
  -v /var/lib/etcd2:/var/lib/etcd2 \
  -e ETCD_BACKUP_BUCKET=my-etcd-backups \
  --rm crewjam/etcd-aws

CloudFormation

The program etcd-aws-cfn generates and deploys a CloudFormation template:

go install ./...
etcd-aws-cfn -key-pair my-key

You can also generate the CloudFormation template and deploy it yourself:

etcd-aws-cfn -key-pair my-key -dry-run > etcd.template

The template consists of:

  • A VPC containing three subnets across three availability zones.
  • An autoscaling group of CoreOS instances running etcd with an initial size of 3.
  • An internal load balancer that routes etcd client requests to the autoscaling group.
  • A lifecycle hook that monitors the autoscaling group and sends termination events to an SQS queue.
  • An S3 bucket that stores the backup.
  • CloudWatch alarms that monitor the health of the cluster and that the backup is happening.

Cluster Discovery

The program etcd-aws discovers other cluster members by looking for EC2 instances that are part of the same autoscaling group. It invokes etcd with appropriate configuration settings based on the result of cluster discovery.

When adding nodes to an existing cluster, etcd-aws automatically registers the node before it is launched.

The program monitors an AWS AutoScaling Lifecycle Hook to detect when nodes are terminated and removes them from the cluster. This is important because the terminated nodes no longer count against the etcd quorum calculation.

Backup

Periodically, etcd-aws writes a file to S3 containing the value of all the keys in the etcd database.

When creating the first node of a cluster, etcd-aws checks for an existing backup and automatically restores it. In this way, an etcd-aws cluster can recover from failure of all nodes in the cluster.

Load Balancer

The CloudFormation template creates a load balancer which can be used by etcd clients to discover cluster members. Etcd clients tend to be cluster aware -- they discover the cluster members on initial contact. You can configure an etcd client to connect to the load balancer, which will provide the initial node list, and then the client will connect directly to the current nodes in the cluster. This avoids the need for clients to maintain and update a list of etcd nodes.

etcd-aws's People

Contributors

cmattoon avatar crewjam avatar pieterlange avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

etcd-aws's Issues

Migrating to etcd3

Since Kubernetes 1.6.0 use etcd3 by default, is there any plan about upgrading the embeded etcd to the lastest version ?

Compilation of etcd-aws-cfn fails

Per the documentation, I ran "go install ./..." in the project directory to install the etcd-aws-cfn tool. I got this error:

go install ./...
# github.com/crewjam/etcd-aws/aws
aws/health.go:24: cannot use cloudformation.String("1") (type *cloudformation.St
ringExpr) as type *cloudformation.IntegerExpr in field value
aws/health.go:33: cannot use cloudformation.String("60") (type *cloudformation.S
tringExpr) as type *cloudformation.IntegerExpr in field value
aws/health.go:54: cannot use cloudformation.String("1") (type *cloudformation.St
ringExpr) as type *cloudformation.IntegerExpr in field value
aws/health.go:63: cannot use cloudformation.String("300") (type *cloudformation.StringExpr) as type *cloudformation.IntegerExpr in field value

I replaced cfn.String with cfn.Integer in health.go, and compilation succeeded. I am going to test to make sure the tool works as expected, then I'll submit a pull request.

AWS CloudFormation Template

Hi guys,

I read all the blog post and I loved the solution. But I'm having some problems to get de AWS CloudFormation Template. I'm not a Go developer, and I don't know even start to build this. But I really really need this solution. Is possible to someone send me this etcd.template?

Best regards.

TKS!

Unable to compile package

Could you please give more details on how to compile this repository with Go? "go install ./..." complains about broken dependencies and doesn't work even if i do "go get %" for all of them...

Flocker

Hi, love your work -- didn't realise it was possible to save/restore the state of etcd and start up without a discovery token.

I have a similar requirement to get a HA CoreOS cluster working in AWS, I had been apprehensive about the cluster getting damaged during a catastrophic reboot.

Having understood what you are doing, have you considered Flocker?

Since you are running the etcd_aws in a docker you will get the load/save into an EBS volume. Don't know if it will make much difference to your design but it removes the need for a named S3 bucket....

ResourceSignal timeout

Hi,
Seems 5 minutes for
"MasterAutoscale": {
"Type": "AWS::AutoScaling::AutoScalingGroup",
"CreationPolicy": {
"ResourceSignal": {
"Count": 3,
"Timeout": "PT5M"
}
},
not enough. Deployment stops.

When change to 10 - ok.
Here's example:

14:38:32 UTC+0300   CREATE_IN_PROGRESS  AWS::AutoScaling::AutoScalingGroup  MasterAutoscale     Received SUCCESS signal with UniqueId i-df0b7058
Physical ID:CF23-ETCD-PBOGUK-STACK-MasterAutoscale-4N1QTY021CQU
14:32:59 UTC+0300   CREATE_IN_PROGRESS  AWS::AutoScaling::AutoScalingGroup  MasterAutoscale     Received SUCCESS signal with UniqueId i-d3106b54
Physical ID:CF23-ETCD-PBOGUK-STACK-MasterAutoscale-4N1QTY021CQU
14:32:38 UTC+0300   CREATE_IN_PROGRESS  AWS::AutoScaling::AutoScalingGroup  MasterAutoscale     Received SUCCESS signal with UniqueId i-b657a42c

As you can see difference 14:38:32 and 14:32:59 more than 5 minutes.
I made about 10 attempts and results are quite similar.

FYI

Only one node starts etcd-aws

Hi,

Faced with next situation:
After cluster finished to create - only one node runs etcd-aws service.
On other two I see
Failed Units: 1
etcd-aws.service

journalctl -xe
May 06 12:28:46 ip-10-242-131-220.ec2.internal locksmithd[619]: [etcd.service etcd2.service] are inactive
May 06 12:28:46 ip-10-242-131-220.ec2.internal locksmithd[619]: Unlocking old locks failed: [etcd.service etcd2.service] are inactive. Retrying in 5m0s.

And only if I start service by hand( under root by executing systemctl start etcd-aws) etcd-aws service(and docker container) starts to work.

To recap:
Only one node start etcd-awd after CF deployment, 2 others need to perform to start etcd-aws service by hand.

Any suggestions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.