aws / amazon-vpc-cni-k8s Goto Github PK
View Code? Open in Web Editor NEWNetworking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
License: Apache License 2.0
Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
License: Apache License 2.0
I'm currently trying to setup a new environment on AWS EKS
but I'm facing an issue with large scale deployments.
Here is the error throwed by k8s
:
Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "foobar_8x7sahio0-98d_default" network: add command: failed to setup network: setup NS network: failed to add host route: file exists
This generates longer deployment and become a serious issue.
us-east-1
AWS EKS
Version 1.10
ubuntu-eks/1.10.3/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20180613.1
(ami-39397a46)m5.4xlarge
(it's supposed to allow using 238 seconday IPs)119
(max on the same host)When there's any issues with cni only the log file in the node where the pod is running contains the reason.
(kubectl describe pods
)
This is not required by AWS CNI but kubelet fails without it.
All AWS apis should handle these three(mentioned in title) error case. Basically implementing exponentisl backoff when we hit these errors.
Currently the plugin reserves the first IPaddr on each ENI as unavailable for pods. I believe it's only necessary to reserve the primary IPaddr for the node. We should modify the IPAM to allow all IPaddrs on secondary ENIs for pods.
TL;DR: k8s advertises a secondary IP as the private IP for a node where I'm using host networking for a pod (and a headless service to direct traffic to it). The pod drops the packets that are destined for a non-configured IP.
NodeA - Pod1 uses host based networking for an application that is configured as the backend for a headless service (service-pod1).
NodeB - Pod2 resolves an A record for "service-pod1" and receives 10.192.131.176.
The private IP that's configured on the OS for NodeA is 10.192.159.32 (listed as one of the primary private IPs in the EC2 view).
When NodeB/Pod2 sends traffic to the IP 10.192.131.176 it gets dropped, likely by the kernel because the IP isn't configured on the OS.
To isolate the issue I tested outside of the pods, from NodeB to NodeA using ping and ncat
When pinging from NodeB to 10.192.131.176 I receive a response from NodeA's only configured IP, 10.192.159.32:
From 10.192.159.32 icmp_seq=1 Destination Host Unreachable
When trying to connect to an nc listener from NodeB (on NodeA: nc -l 7000) I get an error on NodeB: Ncat: No route to host.
I confirmed using tcpdump in both of these tests that the traffic is reaching NodeA, and also that using the configured IP as the destination (10.192.159.32) works as expected.
I terminated the node and the issue is still present, but with different IPs in play.
There is also another node where describe shows one of the non-configured secondary IPs, but that seems to not present an issue when the pods aren't using host networking (traffic is destined to the pod-configured IP).
My main concern is that the kubernetes headless service is returning an IP that's unusable.
A workaround for this might be to use a non-local bind for the application, but that seems like it should be unnecessary.
k8s version 1.9, node versions v1.9.2-eks.1
Please let me know if more info is needed.
Hi, moving kubernetes/kops#4218 here as suggested
I am getting an error when running kubectl logs POD -n NAMESPACE
, specifically:
Error from server: Get https://10.103.20.110:10250/containerLogs/monitoring/grafana-0/grafana: dial tcp 10.103.20.110:10250: getsockopt: connection timed out
This error seems to be related to the fact that the kubelet is registering the wrong IP, it seems that the kubelet reports one of the secondary private IPs (on eth0 as far as I can tell)
In the example error reported:
Error from server: Get https://10.103.20.110:10250/containerLogs/monitoring/grafana-0/grafana: dial tcp 10.103.20.110:10250: getsockopt: connection timed out
10.103.20.110 is a secondary private IP on eth0 and it is the IP shown by kubectl describe node:
Addresses:
InternalIP: 10.103.20.110
InternalDNS: ip-10-103-21-40.megamind.internal
Hostname: ip-10-103-21-40.megamind.internal
Note that only a single InternalIP is reported and it is not the one the kubelet is listening on
Locally curl works on both the primary IPs on eth0 and eth1
The problem has also been occurring on the master nodes and it manifests with new nodes unable to join the cluster because the kubelet is unable to contact the API (the IPs are wrong)
If I pass the --node-ip=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) argument to the kubelet everything works as expected
I could not reproduce this specific issue by simply adding an extra eni and secondary IPs (trying to emulate the plugin behaviour) on a cluster running the flannel overlay network - the kubelet reports all available IP Addresses correctly
Versions:
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Need to add following
tolerations:
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
Pods in a VPC can talk to:
SNATing should not be done in cases 2-4.
What IP space should involve NATing should be configurable instead of defaulting to VPC CIDR
When using nodeport services (with or without loadbalancer), if iptables redirects traffic to a local pod with an IP on a secondary interface, the traffic is dropped by the reverse path filter.
Everything seems to work OK because if the first SYN is dropped the client will retry (however queries load-balanced to the local pod take much longer) and will (probably) be sent to another host (or pod on primary interface).
This can be seen by logging martian packets. When traffic is sent to a local pod on a secondary interface, it will be dropped.
The reason is the following:
To trigger the issue consistently:
I was using a VPC that had two IPv4 CIDRs and a subnet with only one of the CIDRs, this resulted in all traffic being dropped with any policy even when trying to match with a pod label. Using tcpdump I observed on the host with the source pod that all traffic from the pod would be SNAT'ed before being sent to the destination host.
IP addresses being assigned to pods were in the subnet 172.32.0.0/16 while the SNAT iptables rule was setup for 172.31.0.0/16. This resulted in all traffic between hosts being SNAT'ed which caused Calico network policy to block all traffic between hosts.
Here is the iptables SNAT rule:
-A POSTROUTING ! -d 172.31.0.0/16 -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 172.32.0.27
SNAT will be setup correctly when there are more than one CIDR in a VPC.
What?
Request - Will it be possible to remove the restrictions on no of pods for an instance type. Currently there is a restriction
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI
Logs are not flushed to disk (to [host]/var/log/aws-routed-eni/ipamd.log.xxx) when ipamd initialization fails, which makes debugging hard.
This is due to the defer clause in the main function: defer does not apply when using os.Exit() so logs are not flushed on initialization errors.
On a node that is only 3 days old all containers scheduled to be created on this node get stuck in ContainerCreating
. This is on an m4.large
node. The AWS console shows that it has the maximum number of private IP's reserved so there isn't a problem getting resources. There are no pods running on the node other than daemon sets. All new nodes that came up after cordoning this node came up fine as well. This is a big problem because the node is considered Ready
and is accepting pods despite the fact that it can't launch any.
The resolution: once I deleted aws-node
on the host from the kube-system
namespace all the stuck containers came up. The version used is amazon-k8s-cni:0.1.4
In addition to trying to fix it, is there any mechanism for the aws-node
process to have a health check and either get killed and restarted or drain and cordon a node if failures are detected? Even as an option?
I have left the machine running in case any more logs are needed. The logs on the host show:
skipping: failed to "CreatePodSandbox" for <Pod Name>
error. The main reason was: failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"<PodName>-67484c8f8-xj96c_default\" network: add cmd: failed to assign an IP address to container"
Other choice error logs are:
kernel: IPVS: Creating netns size=2104 id=32780
kernel: IPVS: Creating netns size=2104 id=32781
kernel: IPVS: Creating netns size=2104 id=32782
kubelet: E0410 17:59:24.393151 4070 cni.go:259] Error adding network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.393185 4070 cni.go:227] Error while adding to cni network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.427733 4070 cni.go:259] Error adding network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.428095 4070 cni.go:227] Error while adding to cni network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.506935 4070 cni.go:259] Error adding network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.506962 4070 cni.go:227] Error while adding to cni network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.509609 4070 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "<PodName>-69dfc9984b-v8dw9_default" network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.509661 4070 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "<PodName>-69dfc9984b-v8dw9_default(8808ea13-3ce6-11e8-815e-02a9ad89df3c)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "<PodName>-69dfc9984b-v8dw9_default" network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.509699 4070 kuberuntime_manager.go:647] createPodSandbox for pod "<PodName>-69dfc9984b-v8dw9_default(8808ea13-3ce6-11e8-815e-02a9ad89df3c)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "<PodName>-69dfc9984b-v8dw9_default" network: rpc error: code = Unavailable desc = grpc: the connection is unavailable
kubelet: E0410 17:59:24.509771 4070 pod_workers.go:186] Error syncing pod 8808ea13-3ce6-11e8-815e-02a9ad89df3c ("<PodName>-69dfc9984b-v8dw9_default(8808ea13-3ce6-11e8-815e-02a9ad89df3c)"), skipping: failed to "CreatePodSandbox" for "<PodName>-69dfc9984b-v8dw9_default(8808ea13-3ce6-11e8-815e-02a9ad89df3c)" with CreatePodSandboxError: "CreatePodSandbox for pod \"<PodName>-69dfc9984b-v8dw9_default(8808ea13-3ce6-11e8-815e-02a9ad89df3c)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"<PodName>-69dfc9984b-v8dw9_default\" network: rpc error: code = Unavailable desc = grpc: the connection is unavailable"
In the example yaml files in misc/, the name and value for the k8s-app
label is aws-node
which is not very descriptive. I propose changing it to amazon-cni
or amazon-vpc-cni-k8s
.
Recently #99 was merged which changed the interface prefix from cali
back to eni
. This change is important as now Calico can be installed onto existing clusters without modification to the existing aws-node daemonset, in EKS for example.
I'm requesting a point release of this repo so that there is a static link to the new version of that calico manifest.
I raised a PR to address the EKS docs side of the change over here: awsdocs/amazon-eks-user-guide#9
But it was merged without answering any of my questions, so I figured I'd propose a new solution here.
Thanks!
During the scale test of deploying 8000 pods over 150 nodes (c3.xlarge), there are 12 pods stucked in ContainerCreating state.
The node where these pods get assigned are complaining there are NO available IP addresses for these Pods. There are total of 42 IP addresses and all of them have been assigned to Pods.
#e.g. in /var/log/aws-routed-eni/ipamd.log.2018-05-12-00
2018-05-12T00:59:59Z [INFO] DataStore has no available IP addresses
By current cni design, this node should have 4 enis and 56 IP addresses. And from EC2 console, you can see there are 4 enis and 60 ip addresses available for this node.
But ipamd datastore indicates that there are only 3 enis and 42 IP addresses.
Reload aws-node container on this node and then 4 enis and 56 IP addresses get added to datastore. All remaining 12 pods transit into "Running" state
If cni-ipamd daemonset pod is rebooted when it is in the middle of increasing IP Pool Size (create eni/ attach eni/add eni/ip to datastore), the newly attached eni and its ip addresses can be lost and never get added to l-ipamd datastore.
Hello,
We are using this plugin in our own k8s cluster and everything has been working fine until we upgrade to 1.0.0 (redeploy a full cluster, not in-place upgrade).
A few unusual things regarding our network/cluster:
=====Starting installing AWS-CNI =========
=====Starting amazon-k8s-agent ===========
ERROR: logging before flag.Parse: W0615 02:36:27.560592 9 client_config.go:533] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
/var/log/aws-routed-eni
total 920
drwxr--r-x+ 2 root root 67 Jun 15 12:12 .
drwxr-xr-x+ 17 root root 4096 Jun 15 12:12 ..
-rw-r--r--+ 1 root root 1652 Jun 15 12:36 ipamd.log.2018-06-15-02
-rw-r--r--+ 1 root root 814228 Jun 15 12:40 plugin.log.2018-06-15-02
/var/log/aws-routed-eni/ipamd.log.2018-06-15-02
2018-06-15T02:12:30Z [INFO] Starting L-IPAMD 1.0.0 ...
2018-06-15T02:12:30Z [INFO] Testing communication with server
[ ... skipped many duplicates ...]
2018-06-15T02:36:27Z [INFO] Starting L-IPAMD 1.0.0 ...
2018-06-15T02:36:27Z [INFO] Testing communication with server
/var/log/aws-routed-eni/plugin.log.2018-06-15-02
(thousands similar lines are skipped)018-06-15T02:26:38Z [INFO] Received CNI add request: ContainerID(c11bf6d18d938bf9e64a48b889358ef1f9d919e3e7af70d44971d60e57167a8b) Netns(/proc/7106/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=c11bf6d18d938bf9e64a48b889358ef1f9d919e3e7af70d44971d60e57167a8b) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:39Z [INFO] Received CNI add request: ContainerID(cdf5d2dc0d4c449fe81f95e7e1179d0de4b23497e85ebda29abc9069ea98166e) Netns(/proc/7195/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=cdf5d2dc0d4c449fe81f95e7e1179d0de4b23497e85ebda29abc9069ea98166e) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:40Z [INFO] Received CNI add request: ContainerID(4d80aefed852bcda93348b54f33a74fbe32feec9a50129ff3b0439f5048e10a4) Netns(/proc/7282/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=4d80aefed852bcda93348b54f33a74fbe32feec9a50129ff3b0439f5048e10a4) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:41Z [INFO] Received CNI add request: ContainerID(f2d7900a1939205c589abf4e021c49270e8ee0999eda9b364525efc603a3c15b) Netns(/proc/7366/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=f2d7900a1939205c589abf4e021c49270e8ee0999eda9b364525efc603a3c15b) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:42Z [INFO] Received CNI add request: ContainerID(aa143a27a4097346862475f597b2933eef573487051bfac0e10b2d7feff1baca) Netns(/proc/7458/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=aa143a27a4097346862475f597b2933eef573487051bfac0e10b2d7feff1baca) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:42Z [ERROR] Error received from AddNetwork grpc call for pod kube-dns-599dbfffb4-mpg6x namespace kube-system container aa143a27a4097346862475f597b2933eef573487051bfac0e10b2d7feff1baca: rpc error: code = Unavailable desc = grpc: the connection is unavailable
2018-06-15T02:26:43Z [INFO] Received CNI add request: ContainerID(2b1fbb2744ae7d5e72c52230346f9a331650847724ceb68ad72f2de373847a5e) Netns(/proc/7546/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=2b1fbb2744ae7d5e72c52230346f9a331650847724ceb68ad72f2de373847a5e) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:44Z [INFO] Received CNI add request: ContainerID(ebadb4daf43fd330f1182b4b5b9797e12406e68c1156a9e5fe75311cd9f26cec) Netns(/proc/7633/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=ebadb4daf43fd330f1182b4b5b9797e12406e68c1156a9e5fe75311cd9f26cec) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:45Z [INFO] Received CNI add request: ContainerID(f6b511e25f47017d8439c4c90af6e3a222064fd3b2c2c16eb17694a251e9e466) Netns(/proc/7721/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=f6b511e25f47017d8439c4c90af6e3a222064fd3b2c2c16eb17694a251e9e466) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:46Z [INFO] Received CNI add request: ContainerID(2ffaa10f5ba964324a90f995f1a8d03a36a04fcfc75e697ece416c5aae3684a7) Netns(/proc/7806/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=2ffaa10f5ba964324a90f995f1a8d03a36a04fcfc75e697ece416c5aae3684a7) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:46Z [ERROR] Error received from AddNetwork grpc call for pod kube-dns-599dbfffb4-mpg6x namespace kube-system container 2ffaa10f5ba964324a90f995f1a8d03a36a04fcfc75e697ece416c5aae3684a7: rpc error: code = Unavailable desc = grpc: the connection is unavailable
2018-06-15T02:26:48Z [INFO] Received CNI add request: ContainerID(3c3ca85bdaa36f50790d0143225581ef7681fb699c11920b670d89d284ab9630) Netns(/proc/7896/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=3c3ca85bdaa36f50790d0143225581ef7681fb699c11920b670d89d284ab9630) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
kubectl get pod aws-node-9dm8j -o yaml -n kube-system
apiVersion: v1
kind: Pod
metadata:
annotations:
podpreset.admission.kubernetes.io/podpreset-proxy-preset: "230"
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: 2018-06-15T02:22:02Z
generateName: aws-node-
labels:
controller-revision-hash: "993007391"
k8s-app: aws-node
pod-template-generation: "1"
name: aws-node-9dm8j
namespace: kube-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: aws-node
uid: 8af02149-7041-11e8-b62b-02877b22b514
resourceVersion: "1640"
selfLink: /api/v1/namespaces/kube-system/pods/aws-node-9dm8j
uid: e367fa34-7042-11e8-b62b-02877b22b514
spec:
containers:
- env:
- name: AWS_VPC_K8S_CNI_LOGLEVEL
value: DEBUG
- name: MY_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: WARM_ENI_TARGET
value: "1"
envFrom:
- configMapRef:
name: proxy-config
image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.0.0
imagePullPolicy: IfNotPresent
name: aws-node
resources:
requests:
cpu: 10m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /host/var/log
name: log-dir
- mountPath: /var/run/docker.sock
name: dockersock
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: aws-node-token-g4hp6
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeName: ip-10-8-208-183.ap-southeast-2.compute.internal
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: aws-node
serviceAccountName: aws-node
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
volumes:
- hostPath:
path: /opt/cni/bin
type: ""
name: cni-bin-dir
- hostPath:
path: /etc/cni/net.d
type: ""
name: cni-net-dir
- hostPath:
path: /var/log
type: ""
name: log-dir
- hostPath:
path: /var/run/docker.sock
type: ""
name: dockersock
- name: aws-node-token-g4hp6
secret:
defaultMode: 420
secretName: aws-node-token-g4hp6
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-06-15T02:22:02Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-06-15T02:42:38Z
message: 'containers with unready status: [aws-node]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-06-15T02:22:02Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa
image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.0.0
imageID: docker://sha256:7e6390decb990137bdb11335c5d8c3f6b08fed446ec6c5283d3dac7bf3bd70ae
lastState:
terminated:
containerID: docker://2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa
exitCode: 1
finishedAt: 2018-06-15T02:42:38Z
reason: Error
startedAt: 2018-06-15T02:42:08Z
name: aws-node
ready: false
restartCount: 8
state:
waiting:
message: Back-off 5m0s restarting failed container=aws-node pod=aws-node-9dm8j_kube-system(e367fa34-7042-11e8-b62b-02877b22b514)
reason: CrashLoopBackOff
hostIP: 10.8.208.183
phase: Running
podIP: 10.8.208.183
qosClass: Burstable
startTime: 2018-06-15T02:22:02Z
docker ps -a | grep aws-node-9dm8j
2a79a400f494 7e6390decb99 "/bin/sh -c /app/insโฆ" About a minute ago Exited (1) About a minute ago k8s_aws-node_aws-node-9dm8j_kube-system_e367fa34-7042-11e8-b62b-02877b22b514_8
91c0e4bc827d gcrio.artifactory.ai.cba/google_containers/pause:latest "/pause" 22 minutes ago Up 22 minutes k8s_POD_aws-node-9dm8j_kube-system_e367fa34-7042-11e8-b62b-02877b22b514_0
docker inspect 2a79a400f494
[
{
"Id": "2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa",
"Created": "2018-06-15T02:42:08.310748437Z",
"Path": "/bin/sh",
"Args": [
"-c",
"/app/install-aws.sh"
],
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 1,
"Error": "",
"StartedAt": "2018-06-15T02:42:08.478002133Z",
"FinishedAt": "2018-06-15T02:42:38.586149666Z"
},
"Image": "sha256:7e6390decb990137bdb11335c5d8c3f6b08fed446ec6c5283d3dac7bf3bd70ae",
"ResolvConfPath": "/var/lib/docker/containers/91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290/hostname",
"HostsPath": "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/etc-hosts",
"LogPath": "/var/lib/docker/containers/2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa/2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa-json.log",
"Name": "/k8s_aws-node_aws-node-9dm8j_kube-system_e367fa34-7042-11e8-b62b-02877b22b514_8",
"RestartCount": 0,
"Driver": "overlay2",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "",
"ExecIDs": null,
"HostConfig": {
"Binds": [
"/opt/cni/bin:/host/opt/cni/bin",
"/etc/cni/net.d:/host/etc/cni/net.d",
"/var/log:/host/var/log",
"/var/run/docker.sock:/var/run/docker.sock",
"/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/volumes/kubernetes.io~secret/aws-node-token-g4hp6:/var/run/secrets/kubernetes.io/serviceaccount:ro,Z",
"/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/etc-hosts:/etc/hosts:Z",
"/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/containers/aws-node/da09d6ea:/dev/termination-log:Z"
],
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {}
},
"NetworkMode": "container:91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290",
"PortBindings": null,
"RestartPolicy": {
"Name": "",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": null,
"CapDrop": null,
"Dns": null,
"DnsOptions": null,
"DnsSearch": null,
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "container:91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 999,
"PidMode": "",
"Privileged": true,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": [
"seccomp=unconfined",
"label=disable"
],
"UTSMode": "host",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 10,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "/kubepods/burstable/pode367fa34-7042-11e8-b62b-02877b22b514",
"BlkioWeight": 0,
"BlkioWeightDevice": null,
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DeviceCgroupRules": null,
"DiskQuota": 0,
"KernelMemory": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": null,
"OomKillDisable": false,
"PidsLimit": 0,
"Ulimits": null,
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0
},
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/8aee7f4dd35211ea18b313fa9775eb538ccee611b9920bbe3a85312266e93b02-init/diff:/var/lib/docker/overlay2/c30cba254228c12f8cc1c2fd3f52356da0eebffc059e1615f906166ed139154d/diff:/var/lib/docker/overlay2/391f2b24d1f995a56d1b26efaec30eccb4c888d9c49c999f096dd7d8ae1aa37f/diff:/var/lib/docker/overlay2/7dd99ed23bff31b4f7edc34fd94b4af8f4d45ee8e74418552b7d503b6487f741/diff:/var/lib/docker/overlay2/b49c7dca7adebea285beedeeb4c1582995fdb24d410b752e75fa9061f384023c/diff:/var/lib/docker/overlay2/da29fe35e1be44003dc292f07a417fd349fc3696a5284ca3bd9ec83750c173f5/diff:/var/lib/docker/overlay2/af3dfca75d11fd663b8cfb64c3368377a62334f6555fb90c73fde785ca076f8a/diff:/var/lib/docker/overlay2/0ed9dd21485dacd3b0b38831dbf20e676b9ac612662ca09db01804bfb0ef5104/diff:/var/lib/docker/overlay2/1dfd12e815496f9dc05974cc75f3b7806acad57f4c8386839766f27b36e9cc9f/diff",
"MergedDir": "/var/lib/docker/overlay2/8aee7f4dd35211ea18b313fa9775eb538ccee611b9920bbe3a85312266e93b02/merged",
"UpperDir": "/var/lib/docker/overlay2/8aee7f4dd35211ea18b313fa9775eb538ccee611b9920bbe3a85312266e93b02/diff",
"WorkDir": "/var/lib/docker/overlay2/8aee7f4dd35211ea18b313fa9775eb538ccee611b9920bbe3a85312266e93b02/work"
},
"Name": "overlay2"
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/run/docker.sock",
"Destination": "/var/run/docker.sock",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/volumes/kubernetes.io~secret/aws-node-token-g4hp6",
"Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
"Mode": "ro,Z",
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/etc-hosts",
"Destination": "/etc/hosts",
"Mode": "Z",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/containers/aws-node/da09d6ea",
"Destination": "/dev/termination-log",
"Mode": "Z",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/opt/cni/bin",
"Destination": "/host/opt/cni/bin",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/etc/cni/net.d",
"Destination": "/host/etc/cni/net.d",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/log",
"Destination": "/host/var/log",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
],
"Config": {
"Hostname": "ANL05300084",
"Domainname": "",
"User": "0",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"AWS_VPC_K8S_CNI_LOGLEVEL=DEBUG",
"MY_NODE_NAME=ip-10-8-208-183.ap-southeast-2.compute.internal",
"WARM_ENI_TARGET=1",
"HTTPS_PROXY=http://proxy:3128",
"HTTP_PROXY=http://proxy:3128",
"NO_PROXY=169.254.169.254, localhost, 127.0.0.1, s3.ap-southeast-2.amazonaws.com, s3-ap-southeast-2.amazonaws.com, dynamodb.ap-southeast-2.amazonaws.com, 10.8.192.0/25, 10.8.200.0/25, 10.8.248.0/24, 10.8.224.0/23, 10.8.240.0/23, 10.8.208.0/22, 10.12.210.0/24, 10.12.210.1, 10.8.208.183, 10.12.210.2",
"KUBERNETES_PORT_443_TCP=tcp://10.12.210.1:443",
"KUBE_DNS_SERVICE_HOST=10.12.210.2",
"KUBE_DNS_SERVICE_PORT=53",
"KUBE_DNS_SERVICE_PORT_DNS=53",
"KUBE_DNS_PORT=udp://10.12.210.2:53",
"KUBE_DNS_PORT_53_UDP_PORT=53",
"KUBE_DNS_PORT_53_TCP_PROTO=tcp",
"KUBERNETES_SERVICE_PORT=443",
"KUBERNETES_PORT_443_TCP_PROTO=tcp",
"KUBERNETES_PORT_443_TCP_ADDR=10.12.210.1",
"KUBE_DNS_PORT_53_UDP=udp://10.12.210.2:53",
"KUBERNETES_SERVICE_HOST=10.12.210.1",
"KUBE_DNS_PORT_53_TCP_PORT=53",
"KUBE_DNS_PORT_53_TCP_ADDR=10.12.210.2",
"KUBE_DNS_PORT_53_TCP=tcp://10.12.210.2:53",
"KUBERNETES_PORT=tcp://10.12.210.1:443",
"KUBERNETES_PORT_443_TCP_PORT=443",
"KUBE_DNS_SERVICE_PORT_DNS_TCP=53",
"KUBE_DNS_PORT_53_UDP_PROTO=udp",
"KUBE_DNS_PORT_53_UDP_ADDR=10.12.210.2",
"KUBERNETES_SERVICE_PORT_HTTPS=443",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": null,
"Healthcheck": {
"Test": [
"NONE"
]
},
"ArgsEscaped": true,
"Image": "sha256:7e6390decb990137bdb11335c5d8c3f6b08fed446ec6c5283d3dac7bf3bd70ae",
"Volumes": null,
"WorkingDir": "/app",
"Entrypoint": [
"/bin/sh",
"-c",
"/app/install-aws.sh"
],
"OnBuild": null,
"Labels": {
"annotation.io.kubernetes.container.hash": "16e6c0d7",
"annotation.io.kubernetes.container.restartCount": "8",
"annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
"annotation.io.kubernetes.container.terminationMessagePolicy": "File",
"annotation.io.kubernetes.pod.terminationGracePeriod": "30",
"io.kubernetes.container.logpath": "/var/log/pods/e367fa34-7042-11e8-b62b-02877b22b514/aws-node/8.log",
"io.kubernetes.container.name": "aws-node",
"io.kubernetes.docker.type": "container",
"io.kubernetes.pod.name": "aws-node-9dm8j",
"io.kubernetes.pod.namespace": "kube-system",
"io.kubernetes.pod.uid": "e367fa34-7042-11e8-b62b-02877b22b514",
"io.kubernetes.sandbox.id": "91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290"
}
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {},
"SandboxKey": "",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"MacAddress": "",
"Networks": {}
}
}
]
kubectl -n kube-system set image ds/aws-node aws-node=602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:0.1.4
kubectl describe ds aws-node -n kube-system
Name: aws-node
Selector: k8s-app=aws-node
Node-Selector: <none>
Labels: k8s-app=aws-node
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"k8s-app":"aws-node"},"name":"aws-node","namespace":"kube-...
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 1
Number of Nodes Scheduled with Up-to-date Pods: 1
Number of Nodes Scheduled with Available Pods: 1
Number of Nodes Misscheduled: 0
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: k8s-app=aws-node
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Service Account: aws-node
Containers:
aws-node:
Image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:0.1.4
Port: <none>
Host Port: <none>
Requests:
cpu: 10m
Environment:
AWS_VPC_K8S_CNI_LOGLEVEL: DEBUG
MY_NODE_NAME: (v1:spec.nodeName)
WARM_ENI_TARGET: 1
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/host/var/log from log-dir (rw)
/var/run/docker.sock from dockersock (rw)
Volumes:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
log-dir:
Type: HostPath (bare host directory volume)
Path: /var/log
HostPathType:
dockersock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 43m daemonset-controller Created pod: aws-node-9rznc
Normal SuccessfulCreate 33m daemonset-controller Created pod: aws-node-9dm8j
Normal SuccessfulDelete 3m daemonset-controller Deleted pod: aws-node-9dm8j
Normal SuccessfulCreate 2m daemonset-controller Created pod: aws-node-zsqn9
tail /var/log/aws-routed-eni/plugin.log.2018-06-15-02
2018-06-15T02:53:36Z [INFO] Received CNI add request: ContainerID(f1fb5197777177af62fc93851dd451b48884bb5b8c8c247909a55dbc43d4721b) Netns(/proc/13285/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=f1fb5197777177af62fc93851dd451b48884bb5b8c8c247909a55dbc43d4721b) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:53:36Z [INFO] Received add network response for pod kube-dns-599dbfffb4-mpg6x namespace kube-system container f1fb5197777177af62fc93851dd451b48884bb5b8c8c247909a55dbc43d4721b: 10.8.208.169, table 0
2018-06-15T02:53:36Z [INFO] Added toContainer rule for 10.8.208.169/32
Please let me know if you need any additional information.
Best,
Ruslan.
We noticed in some deployments, even though a instance has been terminated, the ENIs allocated by ipamD are NOT released back to EC2. In addition, the secondary IP addresses allocated on these ENI are also NOT released back to EC2.
When there are too many of these leaked ENIs and Secondary IP addresses, subnet available IP pool can be depleted. And node in cluster will failed to allocate secondary IP addresses. When this happens, Pod may not able to get an IP and get stucked in ContainerCreating
You can verify if you are running into this issue in console:
and in description aws-K8S-i-02cf6e80932099598, the instance i-02cf6e80932099598 has already been terminated
Manually delete these ENI after confirming the instance has already been terminated.
Today, we manually add version to main.go. It is error prone and often forgotten when making changes. Ideally, these should be done automatically by build process and versioned with git SHA
I've got a cluster up and running with the CNI plugin working. My nodes are all healthy and everything seems fine. I do a simple kubectl run nginx --image=nginx
, but now my images won't pull.
I can pull the image from the host as well a ping dns names from there as well. Additionally, I've added the --node-ip
to the kubelets. Has anyone run into this before?
This seems pretty much identical to the point-to-point CNI driver. https://github.com/containernetworking/plugins/tree/master/plugins/main/ptp
Can someone explain how it is different?
We should not use TagResources for tagging our ENIs, it's a more powerful API and scoping down permissions is harder.
Code where we use this today -
amazon-vpc-cni-k8s/pkg/awsutils/awsutils.go
Line 627 in 5d6757f
We should be using ec2:CreateTag
There's information available here - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-resources
We're interested in consuming this plugin via kops.
I understand this is pre-release, but could you give us a tag? Just so we don't have to use latest: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:latest
Thanks
A few people have had trouble building plugin (eg #53 (comment)). We should document how to build on Linux, OS/X.
node tech-support bundle need to collect:
- iptables-save
- iptables -nvL
- iptables -nvL -t nat
cni configuration in /etc/cni/net.d
kubelet logs
make top level name unique (e.g. by instance-id)
include tool version
Hi,
I'm running my own cluster on AWS and would love to use this as my networking. When trying out the manifest I realize the image it uses (602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.0.0
) is private. Can you make this public? Or is this intentionally not provided and something I need to buld myself?
Today, this plugin uses 51678 port for introspection service (#8) and this port is also used by ECS agent. This can prevent ECS agent from using this CNI plugin.
Controller is not aware of how many IP addresses are available to be assigned to pods. It tries to assign an IP address for the new pod and then fails reactively. It should have this information proactively.
Created a 2 node t2.medium
cluster as:
kops create cluster \
--name example.cluster.k8s.local \
--zones us-east-1a,us-east-1b,us-east-1c \
--networking amazon-vpc-routed-eni \
--node-size t2.medium
--kubernetes-version 1.8.4 \
--yes
Created a Deployment using the configuration file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.12.1
ports:
- containerPort: 80
- containerPort: 443
Scaled the replicas:
kubectl scale --replicas=30 deployment/nginx-deployment
30 pods are expected in the cluster (2 * 3 * 5) but only 27 pods are available. Three pods are always in ContainerCreating
state.
A similar cluster was created with m4.2xlarge
. 120 pods (2 * 4 * 15) are expected in the cluster, but only 109 pods are available.
More details about one of the pods that is not getting scheduled:
$ kubectl describe pod/nginx-deployment-745df977f7-tndc7
Name: nginx-deployment-745df977f7-tndc7
Namespace: default
Node: ip-172-20-68-116.ec2.internal/172.20.95.9
Start Time: Wed, 20 Dec 2017 14:32:53 -0800
Labels: app=nginx
pod-template-hash=3018953393
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-745df977f7","uid":"96335f7f-e5d5-11e7-bc3a-0a41...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container nginx
Status: Pending
IP:
Created By: ReplicaSet/nginx-deployment-745df977f7
Controlled By: ReplicaSet/nginx-deployment-745df977f7
Containers:
nginx:
Container ID:
Image: nginx:1.12.1
Image ID:
Ports: 80/TCP, 443/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7wp6n (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-7wp6n:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-7wp6n
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 55m default-scheduler Successfully assigned nginx-deployment-745df977f7-tndc7 to ip-172-20-68-116.ec2.internal
Normal SuccessfulMountVolume 55m kubelet, ip-172-20-68-116.ec2.internal MountVolume.SetUp succeeded for volume "default-token-7wp6n"
Warning FailedCreatePodSandBox 55m (x8 over 55m) kubelet, ip-172-20-68-116.ec2.internal Failed create pod sandbox.
Normal SandboxChanged 5m (x919 over 55m) kubelet, ip-172-20-68-116.ec2.internal Pod sandbox changed, it will be killed and re-created.
Warning FailedSync 10s (x1017 over 55m) kubelet, ip-172-20-68-116.ec2.internal Error syncing pod
More details about the exact steps are at https://gist.github.com/arun-gupta/87f2c9ff533008f149db6b53afa73bd0
Hi,
We are running a POC using the EKS preview.
Since we are behind a corporate proxy, we need to enable proxy for the CNI.
However, After the conversation with AWS support team, it seems the CNI container does not support proxy right now.
Is this going to be on the roadmap?
Thanks
Eric Liu
I followed instructions here and when the daemonset pods start up I get the following in the event log:
Normal SuccessfulMountVolume 19s kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com MountVolume.SetUp succeeded for volume "log-dir"
Normal SuccessfulMountVolume 19s kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com MountVolume.SetUp succeeded for volume "cni-net-dir"
Normal SuccessfulMountVolume 19s kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com MountVolume.SetUp succeeded for volume "cni-bin-dir"
Normal SuccessfulMountVolume 19s kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com MountVolume.SetUp succeeded for volume "default-token-mww6r"
Normal BackOff 18s kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com Back-off pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:latest"
Warning Failed 18s kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com Error: ImagePullBackOff
Normal Pulling 5s (x2 over 19s) kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:latest"
Warning Failed 5s (x2 over 19s) kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:latest": rpc error: code = Unknown desc = failed to resolve image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:latest": unexpected status code https://602401143452.dkr.ecr.us-west-2.amazonaws.com/v2/amazon-k8s-cni/manifests/latest: 401 Unauthorized
Warning Failed 5s (x2 over 19s) kubelet, k8sworker-0-006.infra.poc.aun1.i.wish.com Error: ErrImagePull
where the salient information is:
Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:latest": rpc error: code = Unknown desc = failed to resolve image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:latest": unexpected status code https://602401143452.dkr.ecr.us-west-2.amazonaws.com/v2/amazon-k8s-cni/manifests/latest: 401 Unauthorized
I can reproduce with curl
from in my VPC, and from outside of AWS:
$ curl https://602401143452.dkr.ecr.us-west-2.amazonaws.com/v2/amazon-k8s-cni/manifests/latest
Not Authorized
Also when I try to get to it through a browser it asks for HTTP Basic Auth.
Have I missed some configuration somewhere? Have these images moved?
Ipamd allocates ENI based on IP Pool size. But it doesn't consider instance based limits specified in https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html. Therefore even when the available IP Pool size is reaching its maximum value on a specific instance class, Ipamd tries to allocate ENI continuously. This loop runs every 5 seconds once and invokes 3 EC2 APIs (Create, Attach and Delete)
2018-02-27T13:58:33Z [DEBUG] IP pool stats: total=15, used=10, c.currentMaxAddrsPerENI =6, c.maxAddrsPerENI = 6
2018-02-27T13:58:33Z [INFO] Created a new eni: eni-a7b80f83
2018-02-27T13:58:33Z [DEBUG] Trying to tag newly created eni: keys=k8s-eni-key, value=i-074705deafbe739c0
2018-02-27T13:58:34Z [DEBUG] Tag the newly created eni with arn: arn:aws:ec2:us-west-2:908176817140:network-interface/eni-a7b80f83
2018-02-27T13:58:34Z [DEBUG] Discovered device number is used: 1
2018-02-27T13:58:34Z [DEBUG] Discovered device number is used: 0
2018-02-27T13:58:34Z [DEBUG] Discovered device number is used: 2
2018-02-27T13:58:34Z [DEBUG] Found a free device number: 3
2018-02-27T13:58:34Z [INFO] Exceeded instance eni attachment limit: 3
2018-02-27T13:58:34Z [ERROR] Failed to attach eni eni-a7b80f83: AttachmentLimitExceeded: Interface count 4 exceeds the limit for t2.medium
status code: 400, request id: 42da274d-53a5-43bb-90a4-eaa4cc5a94a2
2018-02-27T13:58:34Z [DEBUG] Trying to delete eni: eni-a7b80f83
2018-02-27T13:58:34Z [INFO] Successfully deleted eni: eni-a7b80f83
2018-02-27T13:58:34Z [ERROR] Failed to increase pool size due to not able to allocate ENI allocate eni: error attaching eni: failed to attach eni: AttachmentLimitExceeded: Interface count 4 exceeds the limit for t2.medium
status code: 400, request id: 42da274d-53a5-43bb-90a4-eaa4cc5a94a2
2018-02-27T13:58:39Z [DEBUG] IP pool stats: total=15, used=10, c.currentMaxAddrsPerENI =6, c.maxAddrsPerENI = 6
2018-02-27T13:58:40Z [INFO] Created a new eni: eni-c6b90ee2
2018-02-27T13:58:40Z [DEBUG] Trying to tag newly created eni: keys=k8s-eni-key, value=i-074705deafbe739c0
2018-02-27T13:58:40Z [DEBUG] Tag the newly created eni with arn: arn:aws:ec2:us-west-2:908176817140:network-interface/eni-c6b90ee2
2018-02-27T13:58:40Z [DEBUG] Discovered device number is used: 1
2018-02-27T13:58:40Z [DEBUG] Discovered device number is used: 0
2018-02-27T13:58:40Z [DEBUG] Discovered device number is used: 2
2018-02-27T13:58:40Z [DEBUG] Found a free device number: 3
2018-02-27T13:58:40Z [INFO] Exceeded instance eni attachment limit: 3
2018-02-27T13:58:40Z [ERROR] Failed to attach eni eni-c6b90ee2: AttachmentLimitExceeded: Interface count 4 exceeds the limit for t2.medium
status code: 400, request id: 7978c341-ba82-426a-9eb8-d616b558891f
2018-02-27T13:58:40Z [DEBUG] Trying to delete eni: eni-c6b90ee2
2018-02-27T13:58:40Z [INFO] Successfully deleted eni: eni-c6b90ee2
2018-02-27T13:58:40Z [ERROR] Failed to increase pool size due to not able to allocate ENI allocate eni: error attaching eni: failed to attach eni: AttachmentLimitExceeded: Interface count 4 exceeds the limit for t2.medium
status code: 400, request id: 7978c341-ba82-426a-9eb8-d616b558891f
2018-02-27T13:58:45Z [DEBUG] IP pool stats: total=15, used=10, c.currentMaxAddrsPerENI =6, c.maxAddrsPerENI = 6
2018-02-27T13:58:46Z [INFO] Created a new eni: eni-fbbb0cdf
2018-02-27T13:58:46Z [DEBUG] Trying to tag newly created eni: keys=k8s-eni-key, value=i-074705deafbe739c0
2018-02-27T13:58:46Z [DEBUG] Tag the newly created eni with arn: arn:aws:ec2:us-west-2:908176817140:network-interface/eni-fbbb0cdf
2018-02-27T13:58:46Z [DEBUG] Discovered device number is used: 1
2018-02-27T13:58:46Z [DEBUG] Discovered device number is used: 0
2018-02-27T13:58:46Z [DEBUG] Discovered device number is used: 2
2018-02-27T13:58:46Z [DEBUG] Found a free device number: 3
2018-02-27T13:58:46Z [INFO] Exceeded instance eni attachment limit: 3
Number of ENIs allocated today:
Dear all,
I would like to know whether this plug-in can be used for a kubernetes cluster with windows nodes. I would like to have master node with linux and windows nodes with windows container.
Is it possible?
Thanks.
Best regards
Kejun xu
When the amazon-k8s-cni pod is restarted either through docker restart CONTAINER_ID or when performing rolling update to DS/ when container is terminated, new pod ends up in crash loop.
2018-03-01T06:28:10Z [INFO] Starting L-IPAMD 0.1.2 ...
2018-03-01T06:28:10Z [DEBUG] Discovered region: us-west-2
2018-03-01T06:28:10Z [DEBUG] Found avalability zone: us-west-2a
2018-03-01T06:28:10Z [DEBUG] Discovered the instance primary ip address: 10.0.107.27
2018-03-01T06:28:10Z [DEBUG] Found instance-id: i-022cd86d550d8d584
2018-03-01T06:28:10Z [DEBUG] Found primary interface's mac address: 06:d6:8a:ab:62:c6
2018-03-01T06:28:10Z [DEBUG] Discovered 3 interfaces.
2018-03-01T06:28:10Z [DEBUG] Found device-number: 2
2018-03-01T06:28:10Z [DEBUG] Found account ID: 908176817140
2018-03-01T06:28:10Z [DEBUG] Found eni: eni-242b8713
2018-03-01T06:28:10Z [DEBUG] Found device-number: 1
2018-03-01T06:28:10Z [DEBUG] Found eni: eni-d02488e7
2018-03-01T06:28:10Z [DEBUG] Found device-number: 0
2018-03-01T06:28:10Z [DEBUG] Found eni: eni-462c8071
2018-03-01T06:28:10Z [DEBUG] Found eni eni-462c8071 is a primary eni
2018-03-01T06:28:10Z [DEBUG] Found security-group id: sg-f4d6ad8b
2018-03-01T06:28:10Z [DEBUG] Found subnet-id: subnet-f98368b2
2018-03-01T06:28:10Z [DEBUG] Found vpc-ipv4-cidr-block: 10.0.0.0/16
2018-03-01T06:28:10Z [DEBUG] Total number of interfaces found: 3
2018-03-01T06:28:10Z [DEBUG] Found eni mac address : 06:31:b4:c1:38:80
2018-03-01T06:28:10Z [DEBUG] Found eni: eni-242b8713, mac 06:31:b4:c1:38:80, device 3
2018-03-01T06:28:10Z [DEBUG] Found cidr 10.0.96.0/19 for eni 06:31:b4:c1:38:80
2018-03-01T06:28:10Z [DEBUG] Found ip addresses [10.0.124.25 10.0.116.74 10.0.114.250 10.0.124.17 10.0.102.59 10.0.113.127] on eni 06:31:b4:c1:38:80
2018-03-01T06:28:10Z [DEBUG] Found eni mac address : 06:6d:ab:65:c6:be
2018-03-01T06:28:10Z [DEBUG] Found eni: eni-d02488e7, mac 06:6d:ab:65:c6:be, device 2
2018-03-01T06:28:10Z [DEBUG] Found cidr 10.0.96.0/19 for eni 06:6d:ab:65:c6:be
2018-03-01T06:28:10Z [DEBUG] Found ip addresses [10.0.105.80 10.0.104.104 10.0.111.56 10.0.110.194 10.0.120.149 10.0.123.115] on eni 06:6d:ab:65:c6:be
2018-03-01T06:28:10Z [DEBUG] Found eni mac address : 06:d6:8a:ab:62:c6
2018-03-01T06:28:10Z [DEBUG] Using device number 0 for primary eni: eni-462c8071
2018-03-01T06:28:10Z [DEBUG] Found eni: eni-462c8071, mac 06:d6:8a:ab:62:c6, device 0
2018-03-01T06:28:10Z [DEBUG] Found cidr 10.0.96.0/19 for eni 06:d6:8a:ab:62:c6
2018-03-01T06:28:10Z [DEBUG] Found ip addresses [10.0.107.27 10.0.96.31 10.0.114.187 10.0.97.251 10.0.109.232 10.0.102.149] on eni 06:d6:8a:ab:62:c6
2018-03-01T06:28:10Z [DEBUG] Trying to execute command[/sbin/ip [ip rule add not to 10.0.0.0/16 table main priority 1024]]
2018-03-01T06:28:11Z [INFO] Starting L-IPAMD 0.1.2 ...
2018-03-01T06:28:11Z [DEBUG] Discovered region: us-west-2
2018-03-01T06:28:11Z [DEBUG] Found avalability zone: us-west-2a
2018-03-01T06:28:11Z [DEBUG] Discovered the instance primary ip address: 10.0.107.27
2018-03-01T06:28:11Z [DEBUG] Found instance-id: i-022cd86d550d8d584
2018-03-01T06:28:11Z [DEBUG] Found primary interface's mac address: 06:d6:8a:ab:62:c6
2018-03-01T06:28:11Z [DEBUG] Discovered 3 interfaces.
2018-03-01T06:28:11Z [DEBUG] Found device-number: 2
2018-03-01T06:28:11Z [DEBUG] Found account ID: 908176817140
2018-03-01T06:28:11Z [DEBUG] Found eni: eni-242b8713
2018-03-01T06:28:11Z [DEBUG] Found device-number: 1
2018-03-01T06:28:11Z [DEBUG] Found eni: eni-d02488e7
2018-03-01T06:28:11Z [DEBUG] Found device-number: 0
2018-03-01T06:28:11Z [DEBUG] Found eni: eni-462c8071
2018-03-01T06:28:11Z [DEBUG] Found eni eni-462c8071 is a primary eni
2018-03-01T06:28:11Z [DEBUG] Found security-group id: sg-f4d6ad8b
2018-03-01T06:28:11Z [DEBUG] Found subnet-id: subnet-f98368b2
2018-03-01T06:28:11Z [DEBUG] Found vpc-ipv4-cidr-block: 10.0.0.0/16
2018-03-01T06:28:11Z [DEBUG] Total number of interfaces found: 3
2018-03-01T06:28:11Z [DEBUG] Found eni mac address : 06:31:b4:c1:38:80
2018-03-01T06:28:11Z [DEBUG] Found eni: eni-242b8713, mac 06:31:b4:c1:38:80, device 3
2018-03-01T06:28:11Z [DEBUG] Found cidr 10.0.96.0/19 for eni 06:31:b4:c1:38:80
2018-03-01T06:28:11Z [DEBUG] Found ip addresses [10.0.124.25 10.0.116.74 10.0.114.250 10.0.124.17 10.0.102.59 10.0.113.127] on eni 06:31:b4:c1:38:80
2018-03-01T06:28:11Z [DEBUG] Found eni mac address : 06:6d:ab:65:c6:be
2018-03-01T06:28:11Z [DEBUG] Found eni: eni-d02488e7, mac 06:6d:ab:65:c6:be, device 2
2018-03-01T06:28:11Z [DEBUG] Found cidr 10.0.96.0/19 for eni 06:6d:ab:65:c6:be
2018-03-01T06:28:11Z [DEBUG] Found ip addresses [10.0.105.80 10.0.104.104 10.0.111.56 10.0.110.194 10.0.120.149 10.0.123.115] on eni 06:6d:ab:65:c6:be
2018-03-01T06:28:11Z [DEBUG] Found eni mac address : 06:d6:8a:ab:62:c6
2018-03-01T06:28:11Z [DEBUG] Using device number 0 for primary eni: eni-462c8071
2018-03-01T06:28:11Z [DEBUG] Found eni: eni-462c8071, mac 06:d6:8a:ab:62:c6, device 0
2018-03-01T06:28:11Z [DEBUG] Found cidr 10.0.96.0/19 for eni 06:d6:8a:ab:62:c6
2018-03-01T06:28:11Z [DEBUG] Found ip addresses [10.0.107.27 10.0.96.31 10.0.114.187 10.0.97.251 10.0.109.232 10.0.102.149] on eni 06:d6:8a:ab:62:c6
2018-03-01T06:28:11Z [DEBUG] Trying to execute command[/sbin/ip [ip rule add not to 10.0.0.0/16 table main priority 1024]]
I can see code going till
When I manually execute the command on the instance,
[ec2-user@ip-10-0-0-6 ~]$ sudo /sbin/ip rule add not to 10.0.0.0/16 table main priority 1024
RTNETLINK answers: File exists
Whenever there is a PR, Continuous Integration should get kick off automatically:
When trying to create resources the following error is thrown:
"error: error converting YAML to JSON: yaml: line 11: mapping values are not allowed in this context"
Will tidy up and and send through a PR.
As described in issue #18, kube-scheduler is not aware of the number of available IPv4 addresses on a node. Kube-Scheduler can schedule a Pod to run on a node even after the node has exhausted all of its IPv4 addresses.
Here we propose to use kubernete's extended resources (which is supported in Kubernetes 1.8) to enable kube-scheduler support for managing node's available IPv4 addresses
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: myimage
resource:
requests:
vpc.amazonaws.com/ipv4: 1
limits:
vpc.amazonaws.com/ipv4: 1
eni-ip-controller is a new component which runs in a kubernetes cluster. It watches kubernetes node resources. When a new node joins the cluster, eni-ip-controller updates API server about node's vpc.amazonaws.com/ipv4 resource (the number of available IPv4 addresses on the new node). Here is the workflow:
Here is an HTTP request that advertises 15 "vpc.amazonaws.com/ipv4" resources on node k8s-node-1
curl --header "Content-Type: application/json-patch+json" \
--request PATCH \
--data '[{"op": "add", "path": "/status/capacity/vpc.amazonaws.com~1ipv4", "value": "15"}]' \
http://k8s-master:8080/api/v1/nodes/k8s-node-1/status
Since "vpc.amazonaws.com/ipv4" is NOT a standard resource like "cpu" or "memory", if a Pod does NOT have "vpc.amazonaws.com/ipv4" specified, the Pod can consume an ENI IPv4 resource on a node without accounted in the scheduler.
Here are few options to solve this:
Using taint, such that a pod which does NOT have "vpc.amazonaws.com/ipv4" resource limits specified will be evicted or not scheduled. This is accomplished by
kubectl taint nodes <node-name> vpc-ipv4=true:NoSchedule
kubectl taint nodes <node-name> vpc-ipv4=true:NoExecute
tolerations:
- key: "vpc-ipv4"
operator: "Equal"
value: "true"
effect: "NoSchedule"
- key: "vpc-ipv4"
operator: "Equal"
value: "true"
effect: "NoExecute"
Kubernetes 1.9 introduces ExtendedResourceToleration, where API server can automatically add tolerations for such taints.
Kubernetes Initializer
can be used to always inject "vpc.amazonaws.com/ipv4" resource request into a pod spec. Here is the workflow:
Kubernetes Custom Schedulers allows us to build a new scheduler for all pods to be scheduled on our cluster.
When L-IPAM runs out of IPv4 addresses, L-IPAM can taint node itself as not-schedulable.
kubernetes scheduler extender allows us to implement a "scheduler extender" process that the standard Kubernetes scheduler calls out to as a final pass when making scheduling decisions.
Given that Kops is currently one of the main Kubernetes installation method on AWS, it would be nice to know how to go about installing this plugin (if it's possible). From what I can tell, only a select list of CNI plugins are currently installable out of the box. I'm wondering if it's possible to manually set the kubelet flags specified in the docs. Deploying the daemonset and adding additional permissions to the role ARN should be trivial.
Thanks!
Currently the proposal says that a limitation is
All ENIs on a instance share same subnet and same security groups.
Which is at odds with some of the stated goals. I get that this is a point-in-time view of the CNI but I'm curious what the plan is going forward to support this.
Some specific questions:
awsvpc
networking mode in ECS, which seems to (as far as I've been able to tell) manage ENIs directly and drop their interface directly into the relevant container's network namespace?sg-123456
(because let's say that's the only SG allowed to speak to my RDS database), how would I tell k8s that?Not sure if it makes more sense to answer these here or just to add more to the proposal, but those are the questions that jumped to mind when I was reading it.
To my knowledge GKE and AKS do both split the K8S overlay network in subnets. Each subnet contains just the pods of one node. In VPC however it looks like pod IPs are overlapping. Is there any way to split the network per node?
We need this to identify whether pod-to-pod traffic is going from one node to another. Querying the k8s API is not an option for performance reasons.
Right now, github.com/aws/amazon-vpc-cni-k8s
uses some code from github.com/aws/amazon-ecs-agent
and github.com/aws/amazon-ecs-cni-plugin
. These shared code needs to be put into a new repo. And github.com/aws/amazon-vpc-cni-k8s1
should vendor these shared code from new repo
If I understand it correctly, in order to allow Pod-to-Pod communication over IP addresses associated to secondary ENIs attached to master and worker nodes, you should configure security groups so that:
Otherwise pods are unable to communicate with each other, right?
If so, could we add some note on README?
I found three scenarios (listed below) where create ENI ends up exhausting EC2 Api tps.
2018-03-01T02:51:09Z [ERROR] Failed to increase pool size due to not able to allocate ENI allocate eni: failed to create eni: failed to create network interface: InsufficientFreeAddressesInSubnet: The specified subnet does not have enough free addresses to satisfy the request.
status code: 400, request id: b3e57ddd-383a-4743-9b91-5fb3c0251e86
2018-03-01T02:51:14Z [DEBUG] IP pool stats: total=9, used=7, c.currentMaxAddrsPerENI =5, c.maxAddrsPerENI = 6
2018-03-01T02:51:14Z [ERROR] Failed to CreateNetworkInterface InsufficientFreeAddressesInSubnet: The specified subnet does not have enough free addresses to satisfy the request.
status code: 400, request id: ded36c17-afbe-4aad-89d8-804b9f28e28a
2018-03-01T02:51:14Z [ERROR] Failed to increase pool size due to not able to allocate ENI allocate eni: failed to create eni: failed to create network interface: InsufficientFreeAddressesInSubnet: The specified subnet does not have enough free addresses to satisfy the request.
status code: 400, request id: ded36c17-afbe-4aad-89d8-804b9f28e28a
2018-03-01T02:51:19Z [DEBUG] IP pool stats: total=9, used=7, c.currentMaxAddrsPerENI =5, c.maxAddrsPerENI = 6
2018-03-01T07:03:24Z [ERROR] Failed to increase pool size due to not able to allocate ENI allocate eni: failed to create eni: failed to create network interface: UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: 1190f1d9-158f-4b57-81d5-4d406f56606f
2018-03-01T07:03:29Z [DEBUG] IP pool stats: total=5, used=0, c.currentMaxAddrsPerENI =6, c.maxAddrsPerENI = 6
2018-03-01T07:03:29Z [ERROR] Failed to CreateNetworkInterface UnauthorizedOperation: You are not authorized to perform this operation.
status code: 403, request id: 43351f80-8066-4494-8ff5-b47eb976e569
2018-03-01T07:03:29Z [ERROR] Failed to increase pool size due to not able to allocate ENI allocate eni: failed to create eni: failed to create network interface: UnauthorizedOperation: You are not authorized to perform this operation.
We've been experiencing an issue where connection timeout intermittently to a ClusterIP service. The cluster is set up with kops, 3 masters and 3 nodes across 3 private subnets in a single existing VPC, running Kubernetes 1.9 and v1.0 of the aws cni.
The timeouts stop when I manually disable the source/destination check on secondary ENIs attached to the nodes. The primary ENIs attached to the nodes already have the source/destination check disabled, as its done by a controller deployed by kops (ottoyiu/k8s-ec2-srcdst).
I noticed that the ENI attributes are not set to disable the check here after L-IPAM allocates a new ENI and modifies its attributes. It was my understanding that it shouldn't be required for secondary ENIs in this situation anyway, as there is no NAT involved in the connection, so possibly could be a mis-configuration in the cluster?
I haven't taken any tcdumps of the behaviour or run the support script, but am more than happy to if that would be helpful.
Thanks
Luke
So, played around with running hypervisor-based containers on EKS+i3-metal with the aws-cni. The containers come up fine, but relying on the static-arp / network namespace solution doesn't seem to work when the pod is in a hypervisor.
Perhaps using a vbridge with port-isolating etables rules could work? Then I think arp just works for the gateway without a static entry.
As per this document https://github.com/containernetworking/cni/blob/master/SPEC.md "Plugins should generally complete a DEL action without error even if some resources are missing.". CNI plugin should avoid throwing errors like https://github.com/aws/amazon-vpc-cni-k8s/blob/master/plugins/routed-eni/cni.go#L234 and continue with deletion
I'm experiencing a routing issue from outside of my VPC where my EKS cluster is located. My setup is as follows:
VPC A with 3 private subnets. Fourth subnet is public with NAT gateway
VPC B with VPN access.
Peering connection between the two.
VPC A houses my EKS cluster with 3 worker nodes each in a different subnet. VPC B is our existing infrastructure (different region) with VPN access.
Sometimes (not always), I'll have trouble getting a route into a pod from VPC B. Connection will timeout. Ping doesn't work either. If I ssh into one of the worker nodes in VPC A, I can route just fine into the pod.
Let me know if you need more information as I can reproduce pretty easily. I posted this question in the aws eks slack channel and they directed me to create an issue here.
Thank you!
Hi,
I've got some pods in this state and while I'm investigating right now the root cause, I'd wish this error would be more actionable. What failed that made it impossible to assign an IP address? Was some pool exhausted? Some syscall failed? Some api call failed?
amazon-vpc-cni-k8s/pkg/awsutils/awsutils.go
Line 657 in eb8fc86
If I spin up ASG group with 200 nodes with higher EC2 instance types which supports upto 50 secondary IP allocation to ENI, invoking AssignPrivateIpAddress API continuously on all nodes will result in RequestLimitExceeded Exception.
Using m4.large instances I can only have 27 pods per node
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.