aws-samples / amazon-k8s-node-drainer Goto Github PK

Gracefully drain Kubernetes pods from EKS worker nodes during autoscaling scale-in events.

License: Other

Shell 1.34% Python 98.66%

amazon-k8s-node-drainer's Introduction

Note This repository is archived and this code is not maintained anymore. We recommend using the Karpenter tool for the functionality this repo provided.

Amazon EKS Node Drainer [DEPRECATED]

This sample code provides a means to gracefully terminate nodes of an Amazon Elastic Container Service for Kubernetes (Amazon EKS) cluster when managed as part of an Amazon EC2 Auto Scaling Group.

The code provides an AWS Lambda function that integrates as an Amazon EC2 Auto Scaling Lifecycle Hook. When called, the Lambda function calls the Kubernetes API to cordon and evict all evictable pods from the node being terminated. It will then wait until all pods have been evicted before the Auto Scaling group continues to terminate the EC2 instance. The lambda may be killed by the function timeout before all evictions complete successfully, in which case the lifecycle hook may re-execute the lambda to try again. If the lifecycle heartbeat expires then termination of the EC2 instance will continue regardless of whether or not draining was successful. You may need to increase the function and heartbeat timeouts in template.yaml if you have very long grace periods.

Using this approach can minimise disruption to the services running in your cluster by allowing Kubernetes to reschedule the pod prior to the instance being terminated enters the TERMINATING state. It works by using Amazon EC2 Auto Scaling Lifecycle Hooks to trigger an AWS Lambda function that uses the Kubernetes API to cordon the node and evict the pods.

NB: The lambda function created assumes that the Amazon EKS cluster's Kubernetes API server endpoint has public access enabled, if your endpoint only has private access enabled then you must modify the template.yml file to ensure the lambda function is running in the correct VPC and subnet.

This lambda can also be used against a non-EKS Kubernetes cluster by reading a kubeconfig file from an S3 bucket specified by the KUBE_CONFIG_BUCKET and KUBE_CONFIG_OBJECT environment variables. If these two variables are passed in then Drainer function will assume this is a non-EKS cluster and the IAM authenticator signatures will not be added to Kubernetes API requests. It is recommended to apply the principle of least privilege to the IAM role that governs access between the Lambda function and S3 bucket.

amazon-k8s-node-drainer's People

Contributors

Stargazers

Watchers

Forkers

transferwise andreluiz365 samhays polyconseil dialoguemd-archives vaibhavpokale ringtail mfriedman1-chwy vzradha vglen bshelton229 jpsrn techdawg270 ivelichkovich gladiatr72 mrwlad victorgs d-luu abulkaware anishmourya marcincuber lukekalbfleisch thomasdupas vklindukh pramine citrusoft bjethwan learoyklinginsmith hari2192 maniacs-oss roeera isabellarossi hangchan iratemonkey amulmgr ayobuba xuyun-io bernardalvares rbapst-tamedia geekifier flinus maungsan clarksource downager marianod92 j1anm1nxu hippolin ik-kubernetes abdelali12-codes netf ati59 jacobselvaraj minhtri2582 eagleusb dmurgesscout24 scout24

amazon-k8s-node-drainer's Issues

Module import error - yaml

I'm trying to install this with a new installation of EKS using eksctl, and have not been able to successfully execute the lambda.

I get this error in the logs:
Unable to import module 'handler': No module named 'yaml'

I tried using Python 3.7 and 3.6 with no success.
Any suggestions?

Completed Job pods cause an error

If the node being drained has any pods that are not ready, such as a pod created by a Job that has completed, then there will be an error.

The completed job will not be removed from the evictable jobs list, and therefore the code will loop forever (or until the lambda times out) waiting for the job to be evicted.

The pod_is_evictable method should ignore any pods that are not in a ready state, as well as DaemonSet pods. An alternative would be to ignore the pod if its owner_reference is a Job.

Possible to first wait for pods to gracefully terminate then terminate the node

Hello.,

We raised a case with AWS support team, and below is the problem statement:

We have AWS EKS production environment, and contains ~500 EKS worker nodes (1.15)., we observe that most of the nodes were more than 80 days old, with this 80+ days old, noticed degrading performance on pod deployments. So, we wanted to do instance refresh on EKS nodes where it should first cordon the node, wait for the pods to terminate gracefully then the node is terminated.

With above point, the AWS Support team gave reference to this github - "amazon-k9s-node-drainer". So, we are doing this POC on this "amazon-k8s-node-drainer", on "DEV" EKS environment, we observe that it works same manner as "instance refresh" without doing standard process like :

cordon the node
wait for all deployed pods to terminate
destroy the node
according to ASG, new node is added

So, wanted to check if there is a way to do this manner ? especially follow "standard process" before terminating the EKS nodes.

Thanks
HK

node-drainer isn't aware of LoadBalancer inflight requests causing 502s

As pointed out by the comment below from the kubernetes/autoscaler repo, this and other lifecycle hooks are very naive in assuming that the instance can be terminated after checking the node's non-DaemonSet pods are gone. kube-proxy may still be proxying connections to other nodes.

There should be a k8s action when the cloud controller deregisters the node from all of the ELBs that we can cause or wait for, before completing the lifecycle hook.
The less sophisticated alternative would be accepting a maximum draining wait value that the operator would have to keep updated beyond the ELB default of 300 seconds.

Most drain techniques won't remove daemonset pods, so kube-proxy should continue to run until the instance itself is gone, which in my testing continues to allow it to side-route traffic to pods. There are myriad node draining rigs out there, and it appears "EKS Node Groups" come stock with yet another one. We can use this one, github.com/aws-samples/amazon-k8s-node-drainer to illustrate what we found to be the problem.

https://github.com/aws-samples/amazon-k8s-node-drainer/blob/master/drainer/handler.py#L149
        cordon_node(v1, node_name)

        remove_all_pods(v1, node_name)

        asg.complete_lifecycle_action(LifecycleHookName=lifecycle_hook_name,
                                      AutoScalingGroupName=auto_scaling_group_name,
                                      LifecycleActionResult='CONTINUE',
                                      InstanceId=instance_id)
That is the basic runbook in the handler for the drain. A procedure like this would fire once CA has decided the node should be removed, so CA has already evicted most of the pods itself. asg.complete_lifecycle_action will actually destroy the instance. This is a very common and sane thing to do, and is similar to what we had in place; but it will still drop connections during scale-down with servicetype loadbalancer. Because, CA evicts most/all pods (but not kube-proxy) and eventually will terminate the node by calling the correct hook which will allow draining rigs to kick into gear. Taking the above code, the draining would look like:

cordon_node is called, which marks the node unschedulable (@Jeffwan 's PR will do this earlier, as part of the CA process, which may almost completely solve this issue if that merges). When the node is marked unschedulable k8s will START to remove it from all ELBs it was added to, honoring any drain settings. Remember this node may be actively holding connections EVEN THOUGH it may have nothing but kube-proxy running, because of how servicetype loadbalancer works. Cordon is non-blocking and will return virtually immediately.

remove_all_pods is called, which should evict all pods that aren't daemonsets. Again, this should leave kube-proxy running and still allow the node to side-route traffic to pods. This will likely run very quickly, or immediately, because CA has likely already evicted the pods before this chain of events starts.

asg.complete_lifecycle_action is called telling AWS it can actually destroy the node itself, which will stop kube-proxy (obviously) breaking any connections still routing through kube-proxy.

The issue is it's probably not safe to actually stop the node just because all pods have been evicted. cordon_node is non-blocking, and only signals that k8s should start the process of removing the nodes from the elbs, but doesn't wait (and shouldn't) until the nodes are actually removed from the ELBs. In our case, we have a 300s elb drain configured, so we should wait at least 300 seconds after cordon_node before terminating the node with asg.complete_lifecycle_action. Our solution was to add logic between remove_all_pods and asg.complete_lifecycle_action. Our logic right now is to make sure we've slept at least as long as our longest ELB drain after calling cordon and before calling asg.complete_lifecycle_action. We plan to add an actual check to make sure k8s has removed the instance from all ELBs on its own before subsequently calling the lifecycle hook, rather than relying on an arbitrary sleep. A nearly arbitrary sleep is, however, the kubernetes way. Most of these drain procedures aren't dealing with the fact that the node is possibly still handling production traffic when all pods, save for kube-proxy and daemonsets, are gone.

Originally posted by @bshelton229 in kubernetes/autoscaler#1907 (comment)

'str' object has no attribute 'path'

Hi,

Scenario 1:
When I try to use the provided instructions as it is I get the following error:

[ERROR] AttributeError: 'str' object has no attribute 'path'
Traceback (most recent call last):
File "/var/task/handler.py", line 129, in lambda_handler
return _lambda_handler(k8s.config, k8s.client, event)
File "/var/task/handler.py", line 114, in _lambda_handler
k8s_config.load_kube_config(KUBE_FILEPATH)
File "/var/task/kubernetes/config/kube_config.py", line 649, in load_kube_config
loader.load_and_set(config)
File "/var/task/kubernetes/config/kube_config.py", line 461, in load_and_set
self._load_authentication()
File "/var/task/kubernetes/config/kube_config.py", line 205, in _load_authentication
if self._load_user_token():
File "/var/task/kubernetes/config/kube_config.py", line 395, in _load_user_token
base_path = self._get_base_path(self._user.path)

Scenario 2:
When I update the create_kube_config
from:
'user': 'lambda'

to:
'user': [
'lambda'
]

I get the following error:

[ERROR] ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '71f53e5f-1fe3-420a-9db9-5d59bbca405d', 'Content-Type': 'application/json', 'Date': 'Tue, 16 Jul 2019 06:08:44 GMT', 'Content-Length': '129'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

My configmap aws-auth is as follows:

apiVersion: v1
data:
mapRoles: |
- role: arn:aws:iam::454587422204:role/k8s-lambda-DrainerRole-1AX8LKESGUQO5
username: lambda

PrivateDNSName not returned from ec2.describe_instances

Most of the time when this runs it cant find the k8s node because the k8s node name is "".

I added some logging and the PrivateDnsName isn't returned from ec2 describe instance, or any private IP information isn't returned.

Is it possible these detach before the lifecycle hook completes?

Lambda fails due to 401 Unauthorized Response

Context:

Region: us-east-1
k8s version: 1.10

Expected behavior:
Node gets autoscaled down, lifecycle hook triggers Lambda function, pods drain from node

Actual behavior:
Node gets autoscaled down, lifecycle hook triggers Lambda function, function fails with the following trace:

[INFO]	2019-04-12T02:43:48.640Z	cb9a6c5c-b738-41c8-99d7-5e094ed4bf15	No kubeconfig file found. Generating...

[INFO]	2019-04-12T02:43:49.252Z	cb9a6c5c-b738-41c8-99d7-5e094ed4bf15	Node name: ip-10-4-5-65.ec2.internal
[ERROR]	2019-04-12T02:43:49.548Z	cb9a6c5c-b738-41c8-99d7-5e094ed4bf15	There was an error removing the pods from the node ip-10-4-5-65.ec2.internal
Traceback (most recent call last):
File "/var/task/handler.py", line 137, in _lambda_handler
if not node_exists(v1, node_name):
File "/var/task/k8s_utils.py", line 47, in node_exists
nodes = api.list_node(include_uninitialized=True, pretty=True).items
File "/var/task/kubernetes/client/apis/core_v1_api.py", line 13392, in list_node
(data) = self.list_node_with_http_info(**kwargs)
File "/var/task/kubernetes/client/apis/core_v1_api.py", line 13489, in list_node_with_http_info
collection_formats=collection_formats)
File "/var/task/kubernetes/client/api_client.py", line 334, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/var/task/kubernetes/client/api_client.py", line 168, in __call_api
_request_timeout=_request_timeout)
File "/var/task/kubernetes/client/api_client.py", line 355, in request
headers=headers)
File "/var/task/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/var/task/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '674797e4-c6a2-4757-b2b5-f613ee0b9768', 'Content-Type': 'application/json', 'Date': 'Fri, 12 Apr 2019 02:43:49 GMT', 'Content-Length': '165'})
HTTP response body: {
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401
}
END RequestId: cb9a6c5c-b738-41c8-99d7-5e094ed4bf15
REPORT RequestId: cb9a6c5c-b738-41c8-99d7-5e094ed4bf15	Duration: 1306.67 ms	Billed Duration: 1400 ms Memory Size: 256 MB	Max Memory Used: 122 MB

What I've done:

Confirmed RBAC setup correctly per README
Confirmed cluster name and region
Confirmed aws-auth configmap included the correct lambda arn and username
Confirmed IAM permissions and that STS is being called for the bearer token generation
Confirmed that k8s api is not vpc-only (I would assume no response otherwise)
Done my best to validate that the returned bearer token is correctly formatted

As best I can guess this is a bearer token issue, but I can't figure out what might be wrong in that regard.

How does this draining flow differs from EKS build-in mechanisms?

Hello, I was recently facing nodegroup update rollout issues,
and when looking to have more control, or have the flow of instance replacement more verbose,
I stumbled upon this repo, but the README.md fails to specify in what way specifically the lambda function hook differs from the EKS API build-in nodegroup managements and draining machinery - which is documented here and here

where nodes are naturally terminated at the end of an update like AMI change..

from what I've seen in a live cluster, the 'default' flow also condons and waits for pods to drain..

How to connect eks endpoint from lambda in vpc

Hi there,

Why it showed the warning messages as below when I config the "VpcConfig" property of "AWS::Serverless::Function" in the template.yaml.

    ---
    2019-09-25 08:21:10,539 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f92e39c43d0>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/nodes?includeUninitialized=True&pretty=True
    
    08:21:10
    [WARNING]   2019-09-25T08:21:10.539Z    e0b2d964-f400-47cd-b29f-253e7fe463e5    Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f92e39c43d0>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/nodes?includeUninitialized=True&pretty=True
    [WARNING]   2019-09-25T08:21:10.539Z    e0b2d964-f400-47cd-b29f-253e7fe463e5    Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f92e39c43d0>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/nodes?includeUninitialized=True&pretty=True
    ---
 
However, I can make sure this private subnet and the security group allow the traffic to the EKS API server endpoint since I launched an EC2 instance in the same subnet and the same security group, and I can dig/curl the EKS API server endpoint.

I found there's a description in the README.md  on github:
 
    NB: The lambda function created assumes that the Amazon EKS cluster's Kubernetes API server endpoint has public access enabled, if your endpoint only has private access enabled then you must modify the template.yml file to ensure the lambda function is running in the correct VPC and subnet.
   
Therefore, I assumed it runs the function within the VPC private subnet should work. Could you please point out how to correctly conifgure the VPC for the function? Or, should I do any thing else other than just configure "VpcConfig" property to make it work?
 
Thanks for your help in advance.

Signal Returned to ASG before pods terminate gracefully.

Using k8s 1.11 the evict API endpoint appears to ignore pods that handle SIGTERM gracefully with a generous termination grace period set. Havn't tested other k8s versions yet.

Have had better luck with the delete pod API endpoint and polling pods till they are all terminated. Although this does not respect any pod disruption budgets.

Failure in step "sam build --use-container --skip-pull-image"

When follow this guide for the sam build, below issue occured:

demo-user:~/environment/amazon-k8s-node-drainer (master) $ sam build --use-container --skip-pull-image
Starting Build inside a container
Building resource 'DrainerFunction'

Fetching lambci/lambda:build-python3.7 Docker container image.................................................................................................................................................
Mounting /home/ec2-user/environment/amazon-k8s-node-drainer/drainer as /tmp/samcli/source:ro,delegated inside runtime container
Traceback (most recent call last):
File "/usr/local/bin/sam", line 11, in
sys.exit(cli())
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/click/decorators.py", line 64, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/samcli/lib/telemetry/metrics.py", line 93, in wrapped
raise exception # pylint: disable=raising-bad-type
File "/usr/local/lib/python3.6/site-packages/samcli/lib/telemetry/metrics.py", line 62, in wrapped
return_value = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/samcli/commands/build/command.py", line 127, in cli
mode,
File "/usr/local/lib/python3.6/site-packages/samcli/commands/build/command.py", line 192, in do_cli
artifacts = builder.build()
File "/usr/local/lib/python3.6/site-packages/samcli/lib/build/app_builder.py", line 104, in build
lambda_function.runtime)
File "/usr/local/lib/python3.6/site-packages/samcli/lib/build/app_builder.py", line 195, in _build_function
runtime)
File "/usr/local/lib/python3.6/site-packages/samcli/lib/build/app_builder.py", line 267, in _build_function_on_container
container.wait_for_logs(stdout=stdout_stream, stderr=stderr_stream)
File "/usr/local/lib/python3.6/site-packages/samcli/local/docker/container.py", line 197, in wait_for_logs
raise RuntimeError("Container does not exist. Cannot get logs for this container")
RuntimeError: Container does not exist. Cannot get logs for this container
demo-user:~/environment/amazon-k8s-node-drainer (master) $

Issue with lambda- errors while evicting

I am getting following error. Any ideas how to fix it? Currently, lambda can't evict any pods.

[ERROR] 2020-05-07T11:29:40.860Z 3caa9c4f-db5d-4e2c-95eb-e15d59d51795 Unexpected error adding eviction for pod flux/memcached-65cbddbbdb-b8bxg

  | 2020-05-07T12:29:40.860+01:00 | Traceback (most recent call last):
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/k8s_utils.py", line 87, in evict_pods
  | 2020-05-07T12:29:40.860+01:00 | pod.metadata.name + '-eviction', pod.metadata.namespace, body)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/apis/core_v1_api.py", line 6353, in create_namespaced_pod_eviction
  | 2020-05-07T12:29:40.860+01:00 | (data) = self.create_namespaced_pod_eviction_with_http_info(name, namespace, body, **kwargs)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/apis/core_v1_api.py", line 6450, in create_namespaced_pod_eviction_with_http_info
  | 2020-05-07T12:29:40.860+01:00 | collection_formats=collection_formats)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/api_client.py", line 334, in call_api
  | 2020-05-07T12:29:40.860+01:00 | _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/api_client.py", line 168, in __call_api
  | 2020-05-07T12:29:40.860+01:00 | _request_timeout=_request_timeout)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/api_client.py", line 377, in request
  | 2020-05-07T12:29:40.860+01:00 | body=body)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/rest.py", line 266, in POST
  | 2020-05-07T12:29:40.860+01:00 | body=body)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/rest.py", line 222, in request
  | 2020-05-07T12:29:40.860+01:00 | raise ApiException(http_resp=r)
  | 2020-05-07T12:29:40.860+01:00 | kubernetes.client.rest.ApiException: (400)
  | 2020-05-07T12:29:40.860+01:00 | Reason: Bad Request