amazon-eks-user-guide's Issues

Why still use "Slave"?

Could use "Worker" instead of the potentially unsettling word "Slave" in this document.

Missing step in updating CoreDNS image version to v1.6.6

proxy plugin support was removed since CoreDNS v1.5.0. Therefore, for an existing EKS cluster to be able to use CoreDNS v1.6.6, coredns ConfigMap needs to be modified so that Corefile will use forward plugin to replace proxy plugin.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
  labels: coredns
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
  Corefile: |
    .:53 {
        kubernetes cluster.local {
          pods insecure
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30

Create a service account for the ALB ingress controller fails

The docs for

Step 4: Create a service account for the ALB ingress controller and attach the policy to the service account.

If you do not create your cluster with eksctl and you run the command in step 4:

eksctl create iamserviceaccount
--region us-east-1
--name alb-ingress-controller
--namespace kube-system
--cluster prod
--attach-policy-arn arn:aws:iam::111122223333:policy/ALBIngressControllerIAMPolicy

It will throw the error:

no eksctl-managed CloudFormation stacks found for "cluster_name".

To reproduce the error:

  1. create a cluster via the console and attach unmanaged linux worker nodes
  2. follow the steps in:

Issue 2:

When selecting the link "Edit this page on GitHub" the docs appear to be out of sync.

cluster-autoscaler-autodiscover.yaml is using an old version of cluster-autoscaler

On current documentation at we use following command:

kubectl apply -f

This yaml file used an old version of cluster-autoscaler (currently the one supports kubernetes v1.12) show the latest docker images

So I think documentation needs to be updated to reflect to use the same major version of cluster-autoscaler that matches the major version of the kubernetes cluster we create.

For example we have 1.14.7 of it now

pod logging from fluentd to cloudwatch


I have eks on fargate k8 cluster and i am using fluentd as side car container to send logs to cloudwatch. My fluentd container is able to access shared volume from application container but its unable to send logs to cloudwatch. Is there some fluent-d side-car implementation which i can use as readymade template? or if there's some other better alternative for pod logging?


File Does Not Exist

With the lastest updates the instructions to pull the following file fails:

Steps to reproduce:

Run curl -o dns.yaml

Expected results:
Expected a valid file to return

Actual results:
<?xml version="1.0" encoding="UTF-8"?> <Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>cloudformation/2020-04-21/dns.yaml</Key><RequestId>74634F3E16F9FED8</RequestId><HostId>+azDf7F14FlGPI0VCZ8HHuc8kl/XsvEYwqqy+wikEThIK7lbbPlMMRnqDa7dkEGbUJqCijwogM4=</HostId></Error>%

Make IRSA refresh token mechanics explicit

Somewhere in the technical overview on IRSA we should mention that the token is automatically refreshed by the kubelet, referring to the upstream docs:

The kubelet will request and store the token on behalf of the pod, make the token available to the pod at a configurable file path, and refresh the token as it approaches expiration. Kubelet proactively rotates the token if it is older than 80% of its total TTL, or if the token is older than 24 hours.

How to Upgrade EKS Worker Nodes and EKS Cluster?

I was unable to find any documentation on how to upgrade worker nodes for a new Kubernetes version or because of security issues. How will this work with EKS?

The other information I could not find is about how to upgrade the EKS cluster to a new version of Kubernetes. For example, the current version provided by AWS is Kubernetes 1.10, but Kubernetes 1.11 is already available. How does the upgrade strategy for minor and major versions will look like? I know you mentioned that in some of the talks, but there is nothing documented as of now.

Please mention/point to the aws-eks-ami release in the AMI list

It's extremely difficult right now to figure out what has changed in the AMIs: only has region specific AMI IDs, the commit logs tend to be in the shape of "update AMI ids", the market place doesn't have a changelog -- but it does point to version numbers.

What I right now do is to look at the history of the eks-optimized-ami.html, try to find back the ID I used last, and then walk through the history, assuming that each of the changes correlates to an entry in the amazon-eks-ami changelog.

I then take the current value of the AMI ID for my region, and update my internal documentation to point to that AMI ID. Every few weeks I check again with the same process, and update my docs again (and potentially even replace nodes).

Update apiVersion for Kubernetes 1.16

The vpc-resource-controller.yaml linked in this Windows Support doc uses a deployment with:

apiVersion: apps/v1beta1 

This was deprecated in kubernetes 1.16:

When trying to apply to the cluster:

error: unable to recognize "": no matches for kind "Deployment" in version "apps/v1beta1"

Should be an easy change in just updating the apiVersion in the file

GPU AMI doesn't work with CUDA 10 image

The currently documented method to test GPU workers are configured properly points to the latest CUDA image, which is version 10. It seems the current GPU AMI doesn't support this particular version. When I run the example pod manifest, I get the following error:

container_linux.go:262: starting container process caused "process_linux.go:345: container init 
caused \"process_linux.go:328: running prestart hook 0 caused \\\"error running hook: exit status 1,
 stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --
ldconfig=@/sbin/ldconfig --device=GPU-ffa9ca4e-a464-f033-f4e9-4712c549302b --compute -
utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=7659
bded3be6bd6c/merged]\\\\nnvidia-container-cli: requirement error: invalid expression\\\\n\\\"\""

Changing the image version in the manifest to nvidia/cuda:9.2-devel works as expected. This means that the GPU AMI only supports CUDA image versions <10.0.

NodeSecurityGroupFromControlPlaneIngress in amazon-eks-nodegroup.yaml template is very limiting

In the amazon-eks-nodegroup.yaml template that is linked from this guide, there is this security group:

    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
      Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
      GroupId: !Ref NodeSecurityGroup
      SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
      IpProtocol: tcp
      FromPort: 1025
      ToPort: 65535

Basically, this setting makes it impossible for Kubernetes services to access pods that have have containerPort set to anything below 1025, which is a huge issue since so many of them use the 80 port (e.g. nginx).

So, the question is: why is this set to 1025, not 0?

Failed to regenerate ASG cache: cannot autodiscover ASGs

Hi, I follow this documentation in order to set up my K8S cluster and install cluster autoscaler.
I used this command to create my cluster

eksctl create cluster \
--name gara-int-cluster \
--version 1.15  \
--region eu-west-3 \
--nodegroup-name standard-workers \
--node-type t2.micro \
--nodes 3 \
--nodes-min 1 \
--nodes-max 4 \
--ssh-access \
--ssh-public-key \

Then I followed the instructions to add cluster autoscaler.
When I check the autoscaler logs at the end, I have this error:

E0322 18:48:28.135501 1 aws_manager.go:259] Failed to regenerate ASG cache: cannot autodiscover ASGs: AccessDenied: User: arn:aws:sts::849616654863:assumed-role/eksctl-gara-int-cluster-nodegroup-NodeInstanceRole-19QKQO25WZ3M9/i-0ee6bc34b33c65e12 is not authorized to perform: autoscaling:DescribeTags
status code: 403, request id: 7a3526a2-0cc6-4377-88a7-b558fb094e77
F0322 18:48:28.218626 1 aws_cloud_provider.go:351] Failed to create AWS Manager: cannot autodiscover ASGs: AccessDenied: User: arn:aws:sts::849616654863:assumed-role/eksctl-gara-int-cluster-nodegroup-NodeInstanceRole-19QKQO25WZ3M9/i-0ee6bc34b33c65e12 is not authorized to perform: autoscaling:DescribeTags
status code: 403, request id: 7a3526a2-0cc6-4377-88a7-b558fb094e77
goroutine 44 [running]:, 0x3, 0xc0004c01e0, 0x17c)
/gopath/src/ +0xb1

I'm new with AWS. What did I do wrong ? Thanks in advance

EKS worker nodes ami local certificate expired

When I make a curl request from any of the EKS worker nodes or when I'm trying to establish the secure connection over HTTPS from any of the pods running on that worker nodes, I'm receiving the following error:

curl: (60) SSL certificate problem: certificate has expired More details here:

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

The EKS master version: 1.14
AMI ID: amazon-eks-node-1.14-v20200507 | ami-0486134a23d903f10
AWS region: us-west-2(Oregon)

I have also checked my domain certs, their state is active, the HTTPS requests are getting served from outside EKS nodes.

This is blocking me across all regions, would appreciate if it gets resolved ASAP.

Thanks in advance
-Rohith Vallabhaneni.

Deploy the Amazon EBS CSI Driver command's GitHub link expired.

Deploy the Amazon EBS CSI Driver commands GitHub link expired. It is now updated with kustomization.yaml.

kubectl apply -k ""

The document needs to be updated with the new steps

external-dns yaml in CloudFormation is missing lameduck config

The YAML for CoreDNS that is referenced here is missing a lameduck option that was added by the coredns folks back in October (coredns/deployment#206). The missing option can cause DNS failures during coredns updates, as well as during cluster autoscaling (where we see them happen) when a node coredns is running on is affected.

There are no docs to update, but the cloudformation yaml should be updated to include this option to prevent issues like this in the future. The lameduck option effectively acts as a preStop hook for coredns so that it stops accepting requests before it's actually gone.

Explicit list of tag required


While trying to build a EKS cluster without using CloudFormation, I had to do some trial and error to figure out the tags required by a fully working EKS cluster.

Should the doc include a specific page stating all the required tags for people not using CF? Am I missing some btw?

Something like this (I can do a PR if needed):

Tags required by EKS

VPC Tagging Requirement

  •<cluster-name> set to shared.


Subnet Tagging Requirement

  •<cluster-name> set to shared.
  • For private subnets: set to 1.
  • For public subnets: set to 1.


Security Group Tagging Requirement

The security group used by the worker nodes should be tagged with:

  •<cluster-name> set to owned


Autoscaling Group Tagging Requirement

The Autoscaling group used to launch the worker nodes should be tagged with:

  •<cluster-name> set to owned

Also set PropagateAtLaunch = true to copy the tag to EC2 instances that are
launched as part of the Auto Scaling group


metric server installation script fails (curl , jq)

it says

DOWNLOAD_URL=$(curl -Ls "" | jq -r .tarball_url)
DOWNLOAD_VERSION=$(grep -o '[^/v]*$' <<< $DOWNLOAD_URL)


curl -Ls ""
"message": "Not Found",
"documentation_url": ""

does not return tarball_url so script fails

Which kube-proxy version for 1.13?

What is the currently recommended kube-proxy version for EKS 1.13? Is it 1.13.7, 1.13.8 or 1.13.10?

This commit updated the recommended version to 1.13.8 (165a747) but it was reset to 1.13.7 by (4f19128). Not sure if that was intentional?

Also the current K8s version for EKS 1.13 is 1.13.10 so maybe that should also be used for kube-proxy?

An error occurred (UnsupportedAvailabilityZoneException) when calling the CreateCluster operation: Cannot create cluster 'rotha-dev-devel' because us-east-1c, the targeted availability zone, does not currently have sufficient capacity to support the cluster.

I've been following the AWS EKS Getting Started Guide to the letter, and when trying to run the aws eks create-cluster step I am receiving the following error:

An error occurred (UnsupportedAvailabilityZoneException) when calling the CreateCluster operation: Cannot create cluster 'rotha-dev-devel' because us-east-1c, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1d

How do I specify which availability zone the cluster will be deployed to? If I'm understanding correctly, the chosen AZ is specified when creating the VPC.

The amazon-eks-vpc-sample.yaml file used has the following section in it:

        - '0'
        - Fn::GetAZs:
            Ref: AWS::Region

This is repeated 3 times, one for each Subnet. The only change is the '0' gets changed to '1' and '2'. I'm guessing the change needs to go here, but I have no idea what needs to be put there instead.

Restricting Access to Amazon EC2 Instance Profile Credentials -does not seem to work

After running the below script in the Amazon AMI Linux worker node as explained in ,
sudo yum install -y iptables-services

sudo yum install -y initscripts

sudo iptables --insert FORWARD 1 --in-interface eni+ --destination --jump DROP

sudo iptables-save | tee /etc/sysconfig/iptables

sudo systemctl enable iptables.service

I am getting errors for the cert-manager-cainjector and cluster-autoscaler pods.The cert-manager-cainjector and cluster-autoscaler pods went into crashloopbackerror status with the below issue.

kubectl logs aws-cluster-autoscaler-5cd9d77588-tm4nz -n kube-system
Error from server: Get dial tcp connect: no route to host

I even tried to insert "-A FORWARD -d -i eni+ -j DROP" directly into /etc/sysconfig/iptables file , and restarted the iptables.service.But still no luck.

Both cert-manager-cainjector and cluster-autoscaler add-ons are configured with IRSA.
SSM add-on is configured with hostNetwork: true option and "AmazonSSMManagedInstanceCore" IAM role is added to the instance profile ,so that SSM daemonset can work.Instance profile has been added with IAM role for SSM because IRSA does not seem to work in SSM agent.

Is there anything that can be done to fix this connect:no route to host issue?


Is this an out-take from the "BRADY BUNCH"?

Restricting Access to Amazon EC2 Instance Profile Credentials on Windows node

Hi all,

Is there any way (script) to restrict access to Amazon EC2 instance profile credentials on Windows node? I have configured the Service Account and I want my pods to use the created IAM role. But .net app tries to get the credentials from EC2 Metadata with error:

[Amazon.Util.EC2InstanceMetadata:0] - Unable to contact EC2 Metadata service to obtain a metadata token. Attempting to access IMDS without a token.

Broken link to Kubernetes Dashboard deployment yaml file

Re: this page

This link now returns a 404: kubectl apply -f

The kubernetes/dashboard repo now recommends using this command:

kubectl apply -f

Proxy Protocol annotation in K8s in AWS fail to enable in AWS ELBv2

Following to this guide to enable Proxy Protocol on AWS ELBv2, but AWS ELBv2 fail to enable the attribute.'*'

A sample value.yaml is below:

kind: Service
  annotations: '*' "nlb"
  name: lb
    name: lb
  - port: {{ .Values.simpleflask.service.ports.port }}
    targetPort: {{ .Values.simpleflask.service.ports.targetPort }}
    run: simpleflask
  type: LoadBalancer```

Suggest to add the notice if using amazon-k8s-node-drainer before upgrading Kubernetes 1.16

There has an issue of amazon-k8s-node-drainer if using Kubernetes 1.16, the Pod eviction object name should be match the Pod name.

For users already integrated the solution(e.g. using Cluster AutoScaler to do scale in), to prevent any potential issue when CA terminating the Worker Node and unable correctly send Pod eviction API, it is recommended to add notice to update/patch the amazon-k8s-node-drainer before upgrading the cluster version.


Passing AWS_PROFILE on update-kubeconfig cli

Add profile details if calling update passing update-kubeconfig with profile details

#aws eks update-kubeconfig --name k8s-test --kubeconfig k8s-test.config --profile myprofile

Currently this generates as

      - token
      - -i
      - k8s-test
      command: aws-iam-authenticator

Its good to add the profile details on the cli when not using default profile.

- name: aws
      command: aws-iam-authenticator
        - "token"
        - "-i"
        - "k8s-test"

        - name: AWS_PROFILE
          value: "myprofile"

Cross account roles for IRSA

Add a section to the EKS documentation that explains how to create a cross-account IRSA role, e.g. pod in cluster A in account A assumes a role in account B.

EKS Worker nodes & public Ip address

I got feedback by an AWS partner, that the worker nodes, created by the cloud formation template are having a public IP address by default.
Shouldn't that be a parameter/setting?

See also

To launch your worker nodes with the AWS Management Console

Cloud Formation Script
"AssociatePublicIpAddress: true"

Nodegroup creation fails on private subnet with VPC endpoints.

Steps to reproduce:

  • Create a private subnet
  • Create a gateway endpoint for Amazon S3
  • Create a private endpoint for com.amazonaws.region.ecr.dkr
  • Try to create a managed nodegroup

It will fail with the message:
"NodeCreationFailure - Instances failed to join the kubernetes cluster"

The command:
$ kubectl describe pod aws-node-xxxxx -n kube-system

Will return:
FailedCreatePodSandBox 2m23s (x2610 over 4d) kubelet, Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "" : Error response from daemon: Get no basic auth credentials

How to fix:

Recommended VPC setup for EKS

In the tutorial we recommend that folks use private subnets for worker nodes and public subnets for provisioning public facing load balancers. The sample script we provide to provision a VPC creates only public subnets. This seems inconsistent. Can we provide a sample CFn template that creates public & private subnets instead? Happy to provide a PR for that template, but I'm not sure how.

ICE Error explanation

Can this be explained further by providing an example or by explaining it in detail -

"Retry creating your cluster with subnets in your cluster VPC that are hosted in the Availability Zones returned by this error message."

Also, is this related to unavailability of EC2 instances in the AZ or is this because of unavailability of EKS Cluster altogether in the AZ?

And what about UnsupportedAvailabilityZoneException? Can this be explained a little bit more?

If the zone is not supported, will it ever be supported or is it in the process of being supported?

Fix windows support doc with GMSA support

As per EKS Kubernetes version 1.16 [1] , GMSA is supported [2] feature (graduated to beta).

Windows GMSA support has graduated from alpha to beta, and is now supported by Amazon EKS. For more information, see Configure GMSA for Windows Pods and containers in the Kubernetes documentation.

However, windows-support.html [1] doc still references that GMSA is not yet supported by EKS.

Group Managed Service Accounts (GMSA) for Windows pods and containers is a Kubernetes 1.14 alpha feature that is not supported by Amazon EKS. You can follow the instructions in the Kubernetes documentation to enable and test this alpha feature on your clusters.



Metrics-server installation via Helm

Maybe add a Helm chart for metrics-server installation instead or as an alternative to curl & jq? Similarly to Prometheus installation instructions.

P.S. I could submit a PR if needed.

AWS EKS coredns link is broken.

This link was recently updated and is broken.

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>cloudformation/2020-04-21/dns.yaml</Key><RequestId>54A29959EFDCDEE5</RequestId><HostId>YtwFSyDH8vDjsggUU1QrM/gvp6GLLgavT5sZTDdoKusbzheBZ5BT8e0RwSt5sqwmo15rf2hO/+U=</HostId></Error>%

[networking] EKS Calico and NetworkPolicy Updates


I've ran into a litany of questions around EKS and NetworkPolicy implementation, so I'm opening this issue to get some answers in case I missed something. Afterwards, I will open a PR to update the documentation so that no one else has to wonder about these things.


Calico and AWS VPC CNI

I've found this blog post Exploring the Networking Foundation for EKS: amazon-vpc-cni-k8s + Calico, which states the following:

Really, all you need to know is: use amazon-vpc-cni-k8s as the CNI plugin, apply a simple manifest to deploy Calico as a daemonset, and Bob’s your uncle.


Our recommendation is: if these are not hard blockers for your deployment, you should use the amazon-vpc-cni-k8s plugin as the simplest and best-performing native networking solution for AWS — and, of course, it is what will come by default with EKS. Whichever networking approach you choose, you can rest assured that you have industry-standard container network security courtesy of Calico.

Do the AWS VPC CNI and Calico work together, or are they exclusive?

From what I understand, the VPC CNI handles networking, but Calico is needed for implementing NetworkPolicy objects. Is this correct?

Can only Calico be used?

The second paragraph implies that the VPC CNI can be ditched altogether, and that Calico can be used exclusively. Is this still the case? If so, are there existing EKS-specific instructions on how to only use Calico?

Calico and Fargate

From the Installing Calico on Amazon EKS guide, it says:

Calico is not supported when using Fargate with Amazon EKS.

What does this mean exactly? Does this mean that if you have a Fargate profile installed for your cluster at all that the entire Calico networking stack doesn't work? Or does this mean that NetworkPolicy and GlobalNetworkPolicy objects will not work when used in / applied to the Fargate namespace?

Calico and GlobalNetworkPolicy objects

There's very little documentation on these GlobalNetworkPolicy objects, but the official EKS chart for Calico does install that CRD.

What limitations are there with EKS using ? The spec document for GlobalNetworkPolicy on Calico reference

Will be supported in the future? If not, what functionality will not work with

