awsdocs / amazon-eks-user-guide Goto Github PK

The open source version of the Amazon EKS user guide. You can submit feedback & requests for changes by submitting issues in this repo or by making proposed changes & submitting a pull request.

License: Other

amazon-eks-user-guide's Introduction

Amazon EKS User Guide

The open source version of the Amazon EKS User Guide. There are Edit this page on GitHub links at the bottom of each applicable documentation page in that guide. Once you've navigated to an applicable page, you can contribute suggested edits directly by choosing the Edit this file pencil icon, making the suggested edits, then choosing the Create pull request button.

License Summary

The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.

The sample code within this documentation is made available under a modified MIT license. See the LICENSE-SAMPLECODE file.

Default branch

The default branch for this repo has changed to main.

If you have cloned the previous default branch, please update your local repo to use the main branch.

amazon-eks-user-guide's People

Contributors

Stargazers

Watchers

Forkers

vchan2002 pathikvsharma sruon geremycohen bcreane ozdanborne wangbus acbharat satoshihirose norbinsh otoolema andrewd-uriux graham-m gibbsie mtulio rkilgore-meta ankitgupta21 veryfatboy changlees sangeeyeah devopseze ik-kubernetes glenngillen benluteijn jayh5 cjpcloud sankarcloud rajuantony mrichman jonjozwiak woernfl badaldavda8 stevehouel vishu77 anshrma taylorb-syd sanstorage4u thomcost hamedhsn runlevel-six theonlydoo toricls maiconrocha venkatavinaykumar hugespoon bcelenza jicowan tiffanyfay skyvaibhav zuzud kellygriffin govindcnd joaovitor jordanpurinton ee08b397 dranes opensourcezombie goverthanan tkornai shibataka000 424d57 starchx jasonswindle georgeerickson nandeeshb09 lewayne-aws parijatmishra mihir-acharya tabern leecalcote sudheerchamarthi bhsmith60 0xlen mariomerco royalbhargav100 cmdallas mmsokal sohaib112 groksrc rimaulana ashwanijha04 jonathan-foxx fortejas andersjw mailjunze justinplute anelavelly mmerkes ivallhon keikumata srutig smrutiranjantripathy martina-if jeffwan kalpesh-khule realvz tbondarchuk travcunn dharanibalasani krzemienski

amazon-eks-user-guide's Issues

Restricting Access to Amazon EC2 Instance Profile Credentials on Windows node

Hi all,

Is there any way (script) to restrict access to Amazon EC2 instance profile credentials on Windows node? I have configured the Service Account and I want my pods to use the created IAM role. But .net app tries to get the credentials from EC2 Metadata with error:

[Amazon.Util.EC2InstanceMetadata:0] - Unable to contact EC2 Metadata service to obtain a metadata token. Attempting to access IMDS without a token.

external-dns yaml in CloudFormation is missing lameduck config

The YAML for CoreDNS that is referenced here is missing a lameduck option that was added by the coredns folks back in October (coredns/deployment#206). The missing option can cause DNS failures during coredns updates, as well as during cluster autoscaling (where we see them happen) when a node coredns is running on is affected.

There are no docs to update, but the cloudformation yaml should be updated to include this option to prevent issues like this in the future. The lameduck option effectively acts as a preStop hook for coredns so that it stops accepting requests before it's actually gone.

Cross account roles for IRSA

Add a section to the EKS documentation that explains how to create a cross-account IRSA role, e.g. pod in cluster A in account A assumes a role in account B.

Failed to regenerate ASG cache: cannot autodiscover ASGs

Hi, I follow this documentation in order to set up my K8S cluster and install cluster autoscaler.
I used this command to create my cluster

eksctl create cluster \
--name gara-int-cluster \
--version 1.15  \
--region eu-west-3 \
--nodegroup-name standard-workers \
--node-type t2.micro \
--nodes 3 \
--nodes-min 1 \
--nodes-max 4 \
--ssh-access \
--ssh-public-key gara-cluster-public-eks.pub \
--managed

Then I followed the instructions to add cluster autoscaler.
When I check the autoscaler logs at the end, I have this error:

E0322 18:48:28.135501 1 aws_manager.go:259] Failed to regenerate ASG cache: cannot autodiscover ASGs: AccessDenied: User: arn:aws:sts::849616654863:assumed-role/eksctl-gara-int-cluster-nodegroup-NodeInstanceRole-19QKQO25WZ3M9/i-0ee6bc34b33c65e12 is not authorized to perform: autoscaling:DescribeTags
status code: 403, request id: 7a3526a2-0cc6-4377-88a7-b558fb094e77
F0322 18:48:28.218626 1 aws_cloud_provider.go:351] Failed to create AWS Manager: cannot autodiscover ASGs: AccessDenied: User: arn:aws:sts::849616654863:assumed-role/eksctl-gara-int-cluster-nodegroup-NodeInstanceRole-19QKQO25WZ3M9/i-0ee6bc34b33c65e12 is not authorized to perform: autoscaling:DescribeTags
status code: 403, request id: 7a3526a2-0cc6-4377-88a7-b558fb094e77
goroutine 44 [running]:
k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog.stacks(0x4cbb001, 0x3, 0xc0004c01e0, 0x17c)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/klog.go:900 +0xb1

I'm new with AWS. What did I do wrong ? Thanks in advance

Metrics-server installation via Helm

Maybe add a Helm chart for metrics-server installation instead or as an alternative to curl & jq? Similarly to Prometheus installation instructions.

P.S. I could submit a PR if needed.

EKS Worker nodes & public Ip address

I got feedback by an AWS partner, that the worker nodes, created by the cloud formation template are having a public IP address by default.
Shouldn't that be a parameter/setting?

AWS IAM authenticator curl command downloads file without .exe extension

The Windows instructions at https://github.com/awsdocs/amazon-eks-user-guide/blob/master/doc_source/install-aws-iam-authenticator.md say to run the command curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.12.7/2019-03-27/bin/windows/amd64/aws-iam-authenticator.exe, but the command downloads a file that I had to manually rename to .exe before it worked.

ICE Error explanation

Can this be explained further by providing an example or by explaining it in detail -

"Retry creating your cluster with subnets in your cluster VPC that are hosted in the Availability Zones returned by this error message."

Also, is this related to unavailability of EC2 instances in the AZ or is this because of unavailability of EKS Cluster altogether in the AZ?

And what about UnsupportedAvailabilityZoneException? Can this be explained a little bit more?

If the zone is not supported, will it ever be supported or is it in the process of being supported?

Broken link to Kubernetes Dashboard deployment yaml file

Re: this page

This link now returns a 404: kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

The kubernetes/dashboard repo now recommends using this command:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml

Update apiVersion for Kubernetes 1.16

The vpc-resource-controller.yaml linked in this Windows Support doc uses a deployment with:

apiVersion: apps/v1beta1

This was deprecated in kubernetes 1.16:
https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/

When trying to apply to the cluster:

error: unable to recognize "https://amazon-eks.s3.us-west-2.amazonaws.com/manifests/us-east-1/vpc-resource-controller/latest/vpc-resource-controller.yaml": no matches for kind "Deployment" in version "apps/v1beta1"

Should be an easy change in just updating the apiVersion in the file

An error occurred (UnsupportedAvailabilityZoneException) when calling the CreateCluster operation: Cannot create cluster 'rotha-dev-devel' because us-east-1c, the targeted availability zone, does not currently have sufficient capacity to support the cluster.

I've been following the AWS EKS Getting Started Guide to the letter, and when trying to run the aws eks create-cluster step I am receiving the following error:

An error occurred (UnsupportedAvailabilityZoneException) when calling the CreateCluster operation: Cannot create cluster 'rotha-dev-devel' because us-east-1c, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1d

How do I specify which availability zone the cluster will be deployed to? If I'm understanding correctly, the chosen AZ is specified when creating the VPC.

The amazon-eks-vpc-sample.yaml file used has the following section in it:

AvailabilityZone:
        Fn::Select:
        - '0'
        - Fn::GetAZs:
            Ref: AWS::Region

This is repeated 3 times, one for each Subnet. The only change is the '0' gets changed to '1' and '2'. I'm guessing the change needs to go here, but I have no idea what needs to be put there instead.

Why still use "Slave"?

Could use "Worker" instead of the potentially unsettling word "Slave" in this document.

add eksctl page

cc @christopherhein

Restricting Access to Amazon EC2 Instance Profile Credentials -does not seem to work

After running the below script in the Amazon AMI Linux worker node as explained in https://docs.aws.amazon.com/eks/latest/userguide/restrict-ec2-credential-access.html ,
sudo yum install -y iptables-services

sudo yum install -y initscripts

sudo iptables --insert FORWARD 1 --in-interface eni+ --destination 169.254.169.254/32 --jump DROP

sudo iptables-save | tee /etc/sysconfig/iptables

sudo systemctl enable iptables.service

I am getting errors for the cert-manager-cainjector and cluster-autoscaler pods.The cert-manager-cainjector and cluster-autoscaler pods went into crashloopbackerror status with the below issue.

kubectl logs aws-cluster-autoscaler-5cd9d77588-tm4nz -n kube-system
Error from server: Get https://10.73.45.50:10250/containerLogs/kube-system/aws-cluster-autoscaler-5cd9d77588-tm4nz/aws-cluster-autoscaler: dial tcp 10.73.45.50:10250: connect: no route to host

I even tried to insert "-A FORWARD -d 169.254.169.254/32 -i eni+ -j DROP" directly into /etc/sysconfig/iptables file , and restarted the iptables.service.But still no luck.

Both cert-manager-cainjector and cluster-autoscaler add-ons are configured with IRSA.
SSM add-on is configured with hostNetwork: true option and "AmazonSSMManagedInstanceCore" IAM role is added to the instance profile ,so that SSM daemonset can work.Instance profile has been added with IAM role for SSM because IRSA does not seem to work in SSM agent.

Is there anything that can be done to fix this connect:no route to host issue?

heptio-authenticator-aws was renamed to aws-iam-authenticator

heptio-authenticator-aws using in kubectl was renamed to aws-iam-authenticator.
kubernetes-sigs/aws-iam-authenticator@8875022

The rename causes problems in the following tutorial.
https://docs.aws.amazon.com/eks/latest/userguide/configure-kubectl.html
https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html

remove

Is this an out-take from the "BRADY BUNCH"?

EKS worker nodes ami local certificate expired

When I make a curl request from any of the EKS worker nodes or when I'm trying to establish the secure connection over HTTPS from any of the pods running on that worker nodes, I'm receiving the following error:


curl: (60) SSL certificate problem: certificate has expired
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not establish a secure connection to it. To learn more about this situation and how to fix it, please visit the web page mentioned above.

The EKS master version: 1.14
AMI ID: amazon-eks-node-1.14-v20200507 | ami-0486134a23d903f10
AWS region: us-west-2(Oregon)

I have also checked my domain certs, their state is active, the HTTPS requests are getting served from outside EKS nodes.

This is blocking me across all regions, would appreciate if it gets resolved ASAP.

Thanks in advance
-Rohith Vallabhaneni.

Fix windows support doc with GMSA support

As per EKS Kubernetes version 1.16 [1] , GMSA is supported [2] feature (graduated to beta).

Windows GMSA support has graduated from alpha to beta, and is now supported by Amazon EKS. For more information, see Configure GMSA for Windows Pods and containers in the Kubernetes documentation.

However, windows-support.html [1] doc still references that GMSA is not yet supported by EKS.

Group Managed Service Accounts (GMSA) for Windows pods and containers is a Kubernetes 1.14 alpha feature that is not supported by Amazon EKS. You can follow the instructions in the Kubernetes documentation to enable and test this alpha feature on your clusters.

References:

[1] https://kubernetes.io/docs/tasks/configure-pod-container/configure-gmsa/
[2] https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html
[3] https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html

Failing to communicate with API server: Get https://172.20.0.1:443/api/v1/componentstatuses?timeout=30s: waiting for cluster agent to connect

Recently we have upgraded eks cluster from eks v1.11 to v1.12 version , post upgrade we are getting error communicating with api server intermittently.

Please help here to fix this issue,as we are facing our services are going down intermittently.

amazon cni version -- 1.5.3
core dns -- 1.2.2
kube proxy -- 1.12.6

Template error: Fn::Select cannot select nonexistent value at index 2

I cannot use the template https://amazon-eks.s3-us-west-2.amazonaws.com/cloudformation/2018-08-30/amazon-eks-vpc-sample.yaml in us-east-1. Why it failed at creating the third subnet?

AWS EKS coredns link is broken.

This link was recently updated and is broken. https://github.com/awsdocs/amazon-eks-user-guide/blame/master/doc_source/coredns.md#L42

curl https://amazon-eks.s3.us-west-2.amazonaws.com/cloudformation/2020-04-21/dns.yaml
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>cloudformation/2020-04-21/dns.yaml</Key><RequestId>54A29959EFDCDEE5</RequestId><HostId>YtwFSyDH8vDjsggUU1QrM/gvp6GLLgavT5sZTDdoKusbzheBZ5BT8e0RwSt5sqwmo15rf2hO/+U=</HostId></Error>%

Deploy the Amazon EBS CSI Driver command's GitHub link expired.

Deploy the Amazon EBS CSI Driver commands GitHub link expired. It is now updated with kustomization.yaml.

kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"

The document needs to be updated with the new steps

Which kube-proxy version for 1.13?

What is the currently recommended kube-proxy version for EKS 1.13? Is it 1.13.7, 1.13.8 or 1.13.10?

This commit updated the recommended version to 1.13.8 (165a747) but it was reset to 1.13.7 by (4f19128). Not sure if that was intentional?

Also the current K8s version for EKS 1.13 is 1.13.10 so maybe that should also be used for kube-proxy?

pod logging from fluentd to cloudwatch

I have eks on fargate k8 cluster and i am using fluentd as side car container to send logs to cloudwatch. My fluentd container is able to access shared volume from application container but its unable to send logs to cloudwatch. Is there some fluent-d side-car implementation which i can use as readymade template? or if there's some other better alternative for pod logging?

Make IRSA refresh token mechanics explicit

Somewhere in the technical overview on IRSA we should mention that the token is automatically refreshed by the kubelet, referring to the upstream docs:

The kubelet will request and store the token on behalf of the pod, make the token available to the pod at a configurable file path, and refresh the token as it approaches expiration. Kubelet proactively rotates the token if it is older than 80% of its total TTL, or if the token is older than 24 hours.

cluster-autoscaler-autodiscover.yaml is using an old version of cluster-autoscaler

On current documentation at https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html we use following command:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

This yaml file used an old version of cluster-autoscaler (currently the one supports kubernetes v1.12)

https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/cluster-autoscaler?gcrImageListsize=30 show the latest docker images

So I think documentation needs to be updated to reflect to use the same major version of cluster-autoscaler that matches the major version of the kubernetes cluster we create.

For example we have 1.14.7 of it now https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/cluster-autoscaler@sha256:86ae2c68bdebbfeb5ac73ebe6bef790da6d8e9b4d186852f016f0a46355cd21d/details?tab=info

Please mention/point to the aws-eks-ami release in the AMI list

It's extremely difficult right now to figure out what has changed in the AMIs: https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html only has region specific AMI IDs, the commit logs tend to be in the shape of "update AMI ids", the market place doesn't have a changelog -- but it does point to version numbers.

What I right now do is to look at the history of the eks-optimized-ami.html, try to find back the ID I used last, and then walk through the history, assuming that each of the changes correlates to an entry in the amazon-eks-ami changelog.

I then take the current value of the AMI ID for my region, and update my internal documentation to point to that AMI ID. Every few weeks I check again with the same process, and update my docs again (and potentially even replace nodes).

EKS + CloudWatch logging is not supported right now

There is a page in the documentation the explains that we can get EKS logs to CLoudWatch.

It needs to be removed as it is misleading in terms ok EKS features and functionality right now.

[networking] EKS Calico and NetworkPolicy Updates

Description

I've ran into a litany of questions around EKS and NetworkPolicy implementation, so I'm opening this issue to get some answers in case I missed something. Afterwards, I will open a PR to update the documentation so that no one else has to wonder about these things.

Questions

Calico and AWS VPC CNI

I've found this blog post Exploring the Networking Foundation for EKS: amazon-vpc-cni-k8s + Calico, which states the following:

Really, all you need to know is: use amazon-vpc-cni-k8s as the CNI plugin, apply a simple manifest to deploy Calico as a daemonset, and Bob’s your uncle.

...

Our recommendation is: if these are not hard blockers for your deployment, you should use the amazon-vpc-cni-k8s plugin as the simplest and best-performing native networking solution for AWS — and, of course, it is what will come by default with EKS. Whichever networking approach you choose, you can rest assured that you have industry-standard container network security courtesy of Calico.

Do the AWS VPC CNI and Calico work together, or are they exclusive?

From what I understand, the VPC CNI handles networking, but Calico is needed for implementing NetworkPolicy objects. Is this correct?

Can only Calico be used?

The second paragraph implies that the VPC CNI can be ditched altogether, and that Calico can be used exclusively. Is this still the case? If so, are there existing EKS-specific instructions on how to only use Calico?

Calico and Fargate

From the Installing Calico on Amazon EKS guide, it says:

Calico is not supported when using Fargate with Amazon EKS.

What does this mean exactly? Does this mean that if you have a Fargate profile installed for your cluster at all that the entire Calico networking stack doesn't work? Or does this mean that NetworkPolicy and GlobalNetworkPolicy objects will not work when used in / applied to the Fargate namespace?

Calico and `GlobalNetworkPolicy` objects

There's very little documentation on these GlobalNetworkPolicy objects, but the official EKS chart for Calico does install that CRD.

What limitations are there with EKS using crd.projectcalico.org/v1 ? The spec document for GlobalNetworkPolicy on Calico reference projectcalico.org/v3.

Will projectcalico.org/v3 be supported in the future? If not, what functionality will not work with crd.projectcalico.org/v1?

Missing step in updating CoreDNS image version to v1.6.6

proxy plugin support was removed since CoreDNS v1.5.0. Therefore, for an existing EKS cluster to be able to use CoreDNS v1.6.6, coredns ConfigMap needs to be modified so that Corefile will use forward plugin to replace proxy plugin.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    eks.amazonaws.com/component: coredns
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
EOF

How to mount the pod's data into AWS s3 bucket?

GPU AMI doesn't work with CUDA 10 image

The currently documented method to test GPU workers are configured properly points to the latest CUDA image, which is version 10. It seems the current GPU AMI doesn't support this particular version. When I run the example pod manifest, I get the following error:

container_linux.go:262: starting container process caused "process_linux.go:345: container init 
caused \"process_linux.go:328: running prestart hook 0 caused \\\"error running hook: exit status 1,
 stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --
ldconfig=@/sbin/ldconfig --device=GPU-ffa9ca4e-a464-f033-f4e9-4712c549302b --compute -
utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=7659
 /var/lib/docker/overlay2/0e8ac16367d1fdbaef77a014319383fe142129534864f51ac90c
bded3be6bd6c/merged]\\\\nnvidia-container-cli: requirement error: invalid expression\\\\n\\\"\""

Changing the image version in the manifest to nvidia/cuda:9.2-devel works as expected. This means that the GPU AMI only supports CUDA image versions <10.0.

metric server installation script fails (curl , jq)

On https://docs.aws.amazon.com/eks/latest/userguide/metrics-server.html
it says

DOWNLOAD_URL=$(curl -Ls "https://api.github.com/repos/kubernetes-sigs/metrics-server/releases/latest" | jq -r .tarball_url)
DOWNLOAD_VERSION=$(grep -o '[^/v]*$' <<< $DOWNLOAD_URL)

but

curl -Ls "https://api.github.com/repos/kubernetes-sigs/metricsserver/releases/latest"
{
"message": "Not Found",
"documentation_url": "https://developer.github.com/v3/repos/releases/#get-the-latest-release"
}

does not return tarball_url so script fails

Passing AWS_PROFILE on update-kubeconfig cli

Add profile details if calling update passing update-kubeconfig with profile details

#aws eks update-kubeconfig --name k8s-test --kubeconfig k8s-test.config --profile myprofile

Currently this generates as

  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      args:
      - token
      - -i
      - k8s-test
      command: aws-iam-authenticator

Its good to add the profile details on the cli when not using default profile.

- name: aws
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: aws-iam-authenticator
      args:
        - "token"
        - "-i"
        - "k8s-test"

      env:
        - name: AWS_PROFILE
          value: "myprofile"

Helm installation update

Now, Helm 3.0 is out of use tiller, I think the document should be updated!

Suggest to add the notice if using amazon-k8s-node-drainer before upgrading Kubernetes 1.16

There has an issue of amazon-k8s-node-drainer if using Kubernetes 1.16, the Pod eviction object name should be match the Pod name.

For users already integrated the solution(e.g. using Cluster AutoScaler to do scale in), to prevent any potential issue when CA terminating the Worker Node and unable correctly send Pod eviction API, it is recommended to add notice to update/patch the amazon-k8s-node-drainer before upgrading the cluster version.

Reference

aws-samples/amazon-k8s-node-drainer#30

Latest version in referenced yaml for deployment uses image v1.6.2

cni-upgrades specifies latest recommended version as v1.6.1, but the
referenced yaml uses the image v1.6.2

Minor grammar error in EKS Control Plane logging section

Fix: #39

Nodegroup creation fails on private subnet with VPC endpoints.

Steps to reproduce:

Create a private subnet
Create a gateway endpoint for Amazon S3
Create a private endpoint for com.amazonaws.region.ecr.dkr
Try to create a managed nodegroup

It will fail with the message:
"NodeCreationFailure - Instances failed to join the kubernetes cluster"

The command:
$ kubectl describe pod aws-node-xxxxx -n kube-system

Will return:
FailedCreatePodSandBox 2m23s (x2610 over 4d) kubelet, ip-xx-x-xx-xx.us-west-2.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause-amd64:3.1" : Error response from daemon: Get https://602401143452.dkr.ecr.us-west-2.amazonaws.com/v2/eks/pause-amd64/manifests/3.1: no basic auth credentials

How to fix:

Create a VPC endpoint for com.amazonaws.region.ecr.api, see https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html#ecr-setting-up-vpc-create
Try to create a managed nodegroup again

Explicit list of tag required

Hi,

While trying to build a EKS cluster without using CloudFormation, I had to do some trial and error to figure out the tags required by a fully working EKS cluster.

Should the doc include a specific page stating all the required tags for people not using CF? Am I missing some btw?

Something like this (I can do a PR if needed):

Tags required by EKS

VPC Tagging Requirement

kubernetes.io/cluster/<cluster-name> set to shared.

See https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html.

Subnet Tagging Requirement

kubernetes.io/cluster/<cluster-name> set to shared.
For private subnets: kubernetes.io/role/internal-elb set to 1.
For public subnets: kubernetes.io/role/elb set to 1.

See https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html.

Security Group Tagging Requirement

The security group used by the worker nodes should be tagged with:

kubernetes.io/cluster/<cluster-name> set to owned

See https://amazon-eks.s3-us-west-2.amazonaws.com/cloudformation/2019-02-11/amazon-eks-nodegroup.yaml.

Autoscaling Group Tagging Requirement

The Autoscaling group used to launch the worker nodes should be tagged with:

kubernetes.io/cluster/<cluster-name> set to owned

Also set PropagateAtLaunch = true to copy the tag to EC2 instances that are
launched as part of the Auto Scaling group

See https://amazon-eks.s3-us-west-2.amazonaws.com/cloudformation/2019-02-11/amazon-eks-nodegroup.yaml.

Recommended VPC setup for EKS

In the tutorial we recommend that folks use private subnets for worker nodes and public subnets for provisioning public facing load balancers. The sample script we provide to provision a VPC creates only public subnets. This seems inconsistent. Can we provide a sample CFn template that creates public & private subnets instead? Happy to provide a PR for that template, but I'm not sure how.

NodeSecurityGroupFromControlPlaneIngress in amazon-eks-nodegroup.yaml template is very limiting

In the amazon-eks-nodegroup.yaml template that is linked from this guide, there is this security group:

  NodeSecurityGroupFromControlPlaneIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
      GroupId: !Ref NodeSecurityGroup
      SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
      IpProtocol: tcp
      FromPort: 1025
      ToPort: 65535

Basically, this setting makes it impossible for Kubernetes services to access pods that have have containerPort set to anything below 1025, which is a huge issue since so many of them use the 80 port (e.g. nginx).

So, the question is: why is this set to 1025, not 0?

File Does Not Exist

With the lastest updates the instructions to pull the following file fails:

https://amazon-eks.s3.us-west-2.amazonaws.com/cloudformation/2020-04-21/dns.yaml

Steps to reproduce:

Run curl -o dns.yaml https://amazon-eks.s3.us-west-2.amazonaws.com/cloudformation/2020-04-21/dns.yaml

Expected results:
Expected a valid file to return

Actual results:
<?xml version="1.0" encoding="UTF-8"?> <Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>cloudformation/2020-04-21/dns.yaml</Key><RequestId>74634F3E16F9FED8</RequestId><HostId>+azDf7F14FlGPI0VCZ8HHuc8kl/XsvEYwqqy+wikEThIK7lbbPlMMRnqDa7dkEGbUJqCijwogM4=</HostId></Error>%

explain why you would want to uncomment the roles section of kubeconfig's heptio config

The docs specify:

Note

Optionally, you can uncomment the -r and lines and substitute an IAM role ARN to use with your user.

But don't say why.

Adding note regarding the use of multiple profiles

One thing that could use some clarification is how to make sure your kubeconfig talks to the right account/profile, thus the right IAM user. Since many of us have multiple profiles/accounts in our AWS configurations.

based on this....

kubernetes-sigs/aws-iam-authenticator#71

Create a service account for the ALB ingress controller fails

The docs for https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html

issue:
Step 4: Create a service account for the ALB ingress controller and attach the policy to the service account.

If you do not create your cluster with eksctl and you run the command in step 4:

eksctl create iamserviceaccount
--region us-east-1
--name alb-ingress-controller
--namespace kube-system
--cluster prod
--attach-policy-arn arn:aws:iam::111122223333:policy/ALBIngressControllerIAMPolicy
--approve

It will throw the error:

no eksctl-managed CloudFormation stacks found for "cluster_name".

To reproduce the error:

create a cluster via the console and attach unmanaged linux worker nodes
follow the steps in: https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html

Issue 2:

When selecting the link "Edit this page on GitHub" the docs appear to be out of sync.

How to Upgrade EKS Worker Nodes and EKS Cluster?

I was unable to find any documentation on how to upgrade worker nodes for a new Kubernetes version or because of security issues. How will this work with EKS?

The other information I could not find is about how to upgrade the EKS cluster to a new version of Kubernetes. For example, the current version provided by AWS is Kubernetes 1.10, but Kubernetes 1.11 is already available. How does the upgrade strategy for minor and major versions will look like? I know you mentioned that in some of the talks, but there is nothing documented as of now.

Link does work

http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login

but below will work

http://localhost:8001/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy/#!/overview?namespace=default

heptio-authenticator-aws has been renamed to aws-iam-authenticator

Seems we need update the document to replace heptio-authenticator-aws to aws-iam-authenticator

Below is the commit in upstream

Renamed heptio-authenticator-aws -> aws-iam-authenticator

kubernetes-sigs/aws-iam-authenticator@678cdff

Proxy Protocol annotation in K8s in AWS fail to enable in AWS ELBv2

Following to this guide to enable Proxy Protocol on AWS ELBv2, but AWS ELBv2 fail to enable the attribute.

service.beta.kubernetes.io/aws-load-balancer-proxy-protocol:'*'

A sample value.yaml is below:

kind: Service
metadata:
  annotations: 
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
  name: lb
  labels:
    name: lb
spec:
  ports:
  - port: {{ .Values.simpleflask.service.ports.port }}
    targetPort: {{ .Values.simpleflask.service.ports.targetPort }}
  selector:
    run: simpleflask
  type: LoadBalancer```