palantir / bouncer Goto Github PK
View Code? Open in Web Editor NEWAn application to cycle (bounce) all nodes in a coordinated fashion in an AWS ASG or set of related ASGs
License: Apache License 2.0
An application to cycle (bounce) all nodes in a coordinated fashion in an AWS ASG or set of related ASGs
License: Apache License 2.0
currently bouncer aws client is hardcoding the os.getenv
for AWS_DEFAULT_REGION
and overwrites region in aws config object - however aws sdk itself supports both AWS_REGION
and AWS_DEFAULT_REGION
this makes bouncer slightly less intuitive to use..
The fix for me was to use this null_resource:
resource "null_resource" "concourse_worker_bouncer" {
# A map of arbitrary strings that, when changed, will force the null resource to be replaced, re-running any associated provisioners.
# Used to trigger based on the evaluation of conditionally gated mutually exclusive launchtemplates
triggers {
lc_change = "${element(concat(aws_launch_template.concourse_worker_launchtemplate.*.id, aws_launch_template.concourse_worker_launchtemplate_ephemeral.*.id), 0)}"
}
provisioner "local-exec" {
# Bounce all nodes in this ASG using the canary method
command = "AWS_DEFAULT_REGION=${var.region} ./bouncerw canary -a '${aws_autoscaling_group.concourse_worker_asg.name}:${var.concourse_worker_instance_count}'"
}
}
Hi,
I was just doing some testing with this tool with an ALB and ASG and a Canary deployment. One thing I noticed was bouncer would assert instance health outside of the ASG.
Bouncer seemed to only assert the instance had booted and not that it was fully operational within the ASG at this point it then started to drain connections and deregister the existing instance which caused a blip of downtime before the ASG had asserted the canary instance was healthy.
Is that to be expected that the "Canary" health is not asserted this way?
It appears something has changed with bintray. This has been working in my environment for an extended period of time
I am not able to get any response from
BOUNCER_VERSION=$(curl -s "https://api.bintray.com/packages/palantir/releases/bouncer" | grep -Eoh '"latest_version":"\S*?"' | cut -d ':' -f 2 | cut -d ',' -f 1 | sed 's/"//g')
as such the the download command provides me with an empty file:
wget -q -O bouncer.tgz "https://palantir.bintray.com/releases/com/palantir/bouncer/bouncer/${BOUNCER_VERSION}/bouncer-${BOUNCER_VERSION}-linux-amd64.tgz"
I have tried switching things around to get the package directly from github like:
wget -q -O bouncer.tgz "https://github.com/palantir/bouncer/releases/download/${BOUNCER_VERSION}bouncer-${BOUNCER_VERSION}-linux-amd64.tgz"
I am able to download from github but then get a different error:
null_resource.server_bouncer_consul_cluster: Provisioning with 'local-exec'...
null_resource.server_bouncer_consul_cluster (local-exec): Executing: ["/bin/sh" "-c" "../../../bouncerw rolling -a 'XXXXXXXXXXXXXXX:5' 'stage'"]
null_resource.server_bouncer_consul_cluster (local-exec): BOUNCER_VERSION is not set. Looking for the latest bouncer release...
null_resource.server_bouncer_consul_cluster (local-exec): Installing bouncer version 0.10.0
module.consul_cluster.aws_launch_configuration.launch_configuration.deposed: Destruction complete after 0s
null_resource.server_bouncer_consul_cluster (local-exec): ../../../bouncerw: line 29: ./bouncer: cannot execute binary file: Exec format error
null_resource.server_bouncer_consul_cluster: Creation complete after 2s (ID: 578946282353217746)
Hey,
When using your tool in Canary mode, testing with new instances that fail the lifecycle hook and do not go into service in the ASG bouncer just continues until it hits the default timeout (20 mins). Is there a way to make bouncer put the desired capacity back to its original value when bouncer hits the timeout value?
If this isn't possible could you add it as a feature request?
Cheers
Richard
Currently AWS only accepts env configuration: https://github.com/palantir/bouncer/blob/master/aws/aws.go#L39
This prevents us from using other config mechanisms like profiles
Hi, I was wondering if there is any plan to add support for spot fleets (aws_spot_fleet_request)?
We use a mix of auto scaling groups and spot fleets and it would be great if you could cycle instances in a spot fleet in the same way you can currently with auto scaling groups using bouncer.
Thanks
Not all of our ASG are behind ALBs, and EC2 health check not really showing us much in terms of health of the application.
It would be neat to add an option to specify the Consul service of your application running on that ASG and use that to determine if Bouncer should kill the old instances or not.
Details mentioned here: https://stackoverflow.com/questions/64820508/terraform-issue-in-gitlab
null_resource.server_canary_bouncer (local-exec): Executing: ["/bin/sh" "-c" "./bouncer canary -a 'my-asg':$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name 'my-asg' --query 'AutoScalingGroups[0].DesiredCapacity')"]
null_resource.server_canary_bouncer (local-exec): /bin/sh: ./bouncer: No such file or directory
Error: Error running command './bouncer canary -a 'my-asg':$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name 'my-asg' --query 'AutoScalingGroups[0].DesiredCapacity')': exit status 127. Output: /bin/sh: ./bouncer: No such file or directory
[terragrunt] 2020/11/12 12:16:31 Hit multiple errors:
exit status 1
Cleaning up file based variables
00:01
ERROR: Job failed: exit code 1
There's no clear what documented to install this tool.
More of a question, but wondering if you've used bouncer with ECS clusters. If so, what's the recommended approach? If not, any interest making bouncer ECS "aware"?
I'm new to bouncer. I'm following ReadMe document to bounce my ASGs but getting errors doing so. I'm trying to bounce in serial mode
./bouncer serial -a example-asg:2
INFO[0000] Beginning bouncer serial run
ERRO[0005] ASG desired capacity doesn't match expected starting value ASG=example-asg desired_capacity actual=1 desired_capacity given=2
FATA[0005] error validating initial ASG state
Can someone help me with the usage and scale up and down my asg.
currently Bouncer only uses the LaunchConfigurationName
to determine if an instance needs to be replaced.
This does not work for ASG using LaunchTemplates
aws-sdk ref
func isInstanceOld(asgInst *autoscaling.Instance, ec2Inst *ec2.Instance, launchConfigName *string, force bool, startTime time.Time) bool {
if asgInst.LaunchConfigurationName == nil {
log.WithFields(log.Fields{
"InstanceID": *asgInst.InstanceId,
}).Debug("Instance marked as old because launch config is nil")
return true
}
if *asgInst.LaunchConfigurationName != *launchConfigName {
log.WithFields(log.Fields{
"InstanceID": *asgInst.InstanceId,
"LaunchConfig": *asgInst.LaunchConfigurationName,
}).Debug("Instance marked as old because of launch config")
return true
}
// In force mode, mark any node that was launched before this runner was started as old
if force {
if startTime.After(*ec2Inst.LaunchTime) {
log.WithFields(log.Fields{
"InstanceID": *asgInst.InstanceId,
"LaunchTime": *ec2Inst.LaunchTime,
}).Debug("Instance marked as old because of launch time (force mode)")
return true
}
}
return false
}
We should have a mechanism to tag ASGs as not bouncer-compatible so we can't accidentally automatically ruin the world (e.g. in our environment oops i murdered GHE everyone wants to kill me.)
I've recently been seeing bouncer
fail in our TF runs due to AWS rate limiting:
time="2019-07-16T17:07:20Z" level=fatal msg="error in run: error building ASGSet: Error getting information for ASG myservice: error getting AWS ASG object: Error describing ASGs: Throttling: Rate exceeded
status code: 400, request id: 2c3a4f65-a7ec-11e9-a003-3ff6a1d0860b"
While the AWS SDK includes some default retry logic, it seems that the number of retries or the backoff rate is not high enough. In addition to increasing that, it would be nice to make bouncer
warn about but otherwise ignore throttling errors so that it isn't interrupted in the middle of operation. Since it is already polling based, skipping a single check due to an error shouldn't be a problem.
Alternately, maybe there is a way to reduce the number of API calls made during a run to reduce the chance of hitting rate limits.
Open request for packaging to support darwin-arm64
architecture for M1 macbooks
We should have a policy section covering how to grant permissions required to successfully bounce, and which ones we can scope if possible.
I ran into a situation where my ASG got into this state:
level=info msg="Killing a batch of nodes" Extra nodes=1 Healthy nodes=8 Old nodes=0
It seems that bouncer got stuck in this state and never killed off the extra healthy new node and instead eventually continued to add new nodes to the ASG.
Hi Team,
I trying to use bouncer, but something missing here, please let me know, how to fix this.
version: 0.8.7
~/bouncer(master)$ ./bouncerw serial --help
Attempting to lock .bouncer_download_lock
Lock acquired
BOUNCER_VERSION is not set. Looking for the latest bouncer release...
Installing bouncer version 0.8.7
tar: bouncer: Cannot open: File exists
tar: Exiting with failure status due to previous errors
Releasing lock on .bouncer_download_lock
Lock released
./bouncerw: line 58: ./bouncer: Is a directory
Hi there,
I can't download the bouncer via:
https://bintray.com/palantir/releases/bouncer/_latestVersion
Thanks
Hi,
we are using bouncer for canary deployment.
We've updated from version 0.8.0 to 0.12.0 and are using latest version of bouncerw file.
There is an bug in line 39 with jq.
https://github.com/palantir/bouncer/blob/master/bouncerw#L39
Since pull request #116 jq is called without any arguements:
Lock acquired BOUNCER_VERSION is not set. Looking for the latest bouncer release... jq - commandline JSON processor [version 1.5-1-a5b5cbe] Usage: jq [options] <jq filter> [file...] jq is a tool for processing JSON inputs, applying the given filter to its JSON text inputs and producing the filter's results as JSON on standard output. The simplest filter is ., which is the identity filter, copying jq's input to its output unmodified (except for formatting). For more advanced filters see the jq(1) manpage ("man jq") and/or https://stedolan.github.io/jq Some of the options include: -c compact instead of pretty-printed output; -n use
null as the single input value; -e set the exit status code based on the output; -s read (slurp) all inputs into an array; apply filter to it; -r output raw strings, not JSON texts; -R read raw strings, not JSON texts; -C colorize JSON; -M monochrome (don't colorize JSON); -S sort keys of objects on output; --tab use tabs for indentation; --arg a v set variable $a to value <v>; --argjson a v set variable $a to JSON value <v>; --slurpfile a f set variable $a to an array of JSON texts read from <f>; See the manpage for more options. Installing bouncer version gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now chmod: cannot access './bouncer': No such file or directory
As a workaround, we have set BOUNVER_VERSION to a fixed value
Hi,
I love this util, just a request.
Is it possible to add a delay during canary deployment please. E.g When we select canary, it launches additional replica of EC2 and deploys the container, performs healthcheck, and then terminates older EC2. In this scenario can we introduce a delay after healthcheck and before old EC2 terminattion
thanks,
With Added LaunchTemplate support, the required permissions in README would need to be revised
AWS_REGION is not the AWS-standard env var per https://docs.aws.amazon.com/cli/latest/userguide/cli-environment.html; we should switch to AWS_DEFAULT_REGION.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.