Comments (5)
I have this same issue with releases of the aws-load-balancer-controller
chart >= 1.7
. i pinned the load balancer controller pod image to v2.4.2
in both cases, so this is likely a chart level manifest/configuration issue imo
solution 1: downgrading the chart to 1.6.2
fixed it for me.
solution 2: Upgrading the image.tag
argument that pins the load balancer controller image to >=2.7
while keeping the chart version >=1.7
also works.
solution 3: (untested): setting the readinessProbe
helm chart value to empty might also work (See below why)
a bit more detail on the problem description (for my case) and why the solutions work:
it seems like the pods belonging to the load balancer controller deployment arent coming up for some reason - they are failing their readiness probes in the charts >= 1.7:
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-9b866d9c-4n665 0/1 Running 0 6m36s
aws-load-balancer-controller-9b866d9c-fxzsz 0/1 Running 0 6m36s
Pod logs
{"level":"info","ts":1713346839.8052669,"msg":"version","GitVersion":"v2.4.2","GitCommit":"77370be7f8e13787a3ec0cfa99de1647010f1055","BuildDate":"2022-05-24T22:33:27+0000"}
{"level":"info","ts":1713346839.862494,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1713346839.8755138,"logger":"setup","msg":"adding health check for controller"}
{"level":"info","ts":1713346839.8758347,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":1713346839.876017,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding"}
{"level":"info","ts":1713346839.8762178,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-elbv2-k8s-aws-v1beta1-targetgroupbinding"}
{"level":"info","ts":1713346839.8763244,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-networking-v1-ingress"}
{"level":"info","ts":1713346839.876444,"logger":"setup","msg":"starting podInfo repo"}
I0417 09:40:41.877204 1 leaderelection.go:243] attempting to acquire leader lease kube-system/aws-load-balancer-controller-leader...
{"level":"info","ts":1713346841.877265,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1713346841.8773456,"logger":"controller-runtime.webhook.webhooks","msg":"starting webhook server"}
{"level":"info","ts":1713346841.877691,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1713346841.877793,"logger":"controller-runtime.webhook","msg":"serving webhook server","host":"","port":9443}
{"level":"info","ts":1713346841.8782332,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
load balancer controller pod description:
Name: aws-load-balancer-controller-9b866d9c-4n665
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: aws-load-balancer-controller
Node: ip-10-0-60-11.eu-west-2.compute.internal/10.0.60.11
Start Time: Wed, 17 Apr 2024 09:40:39 +0000
Labels: app.kubernetes.io/instance=aws-load-balancer-controller
app.kubernetes.io/name=aws-load-balancer-controller
pod-template-hash=9b866d9c
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Running
IP: 10.0.43.232
IPs:
IP: 10.0.43.232
Controlled By: ReplicaSet/aws-load-balancer-controller-9b866d9c
Containers:
aws-load-balancer-controller:
Container ID: containerd://e42b03eede8cd18a7912b80e9779c638b5817812eb4b9788800f58636d6d2135
Image: public.ecr.aws/eks/aws-load-balancer-controller:v2.4.2
Image ID: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller@sha256:321e6ff4b55a0bb8afc090cd2f7ef5b0be8cd356e407ce525ee8b24866382808
Ports: 9443/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--cluster-name=mlops-demo
--ingress-class=alb
State: Running
Started: Wed, 17 Apr 2024 09:40:39 +0000
Ready: False
Restart Count: 0
Liveness: http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
Readiness: http-get http://:61779/readyz delay=10s timeout=10s period=10s #success=1 #failure=2
Environment:
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: eu-west-2
AWS_REGION: eu-west-2
AWS_ROLE_ARN: arn:aws:iam::743582000746:role/aws-load-balancer-controller
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2d6k5 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
cert:
Type: Secret (a volume populated by a Secret)
SecretName: aws-load-balancer-tls
Optional: false
kube-api-access-2d6k5:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned kube-system/aws-load-balancer-controller-9b866d9c-4n665 to ip-10-0-60-11.eu-west-2.compute.internal
Normal Pulled 10m kubelet Container image "public.ecr.aws/eks/aws-load-balancer-controller:v2.4.2" already present on machine
Normal Created 10m kubelet Created container aws-load-balancer-controller
Normal Started 10m kubelet Started container aws-load-balancer-controller
Warning Unhealthy 4m54s (x35 over 9m54s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 404
as a result, no services (regardless of load balancer type) can be created (as is expected it seems - outlined in this issue) when trying to apply either ingress or generic service manifests.
kubectl apply -f eks-cluster-w-alb/manifests/alb-example/ # deploy remaining stack
deployment.apps/echoserver unchanged
namespace/alb-example unchanged
Error from server (InternalError): error when creating "eks-cluster-w-alb/manifests/alb-example/ingress.yaml": Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1-ingress?timeout=10s": no endpoints available for service "aws-load-balancer-webhook-service"
Error from server (InternalError): error when creating "eks-cluster-w-alb/manifests/alb-example/service.yaml": Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service?timeout=10s": no endpoints available for service "aws-load-balancer-webhook-service"
Error from server (InternalError): error when creating "eks-cluster-w-alb/manifests/alb-example/test-service.yaml": Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service?timeout=10s": no endpoints available for service "aws-load-balancer-webhook-service"
i've noticed this change introducing a readiness probe for the controller docker image made in what looks like the release of the 1.7.0:
This readiness probe seems to only be supported with controller images >=2.7
. in other words, the following combinations of (chart version, controller image version) seem to be compatible:
- (<1.7, *)
- (>=1.7, >=2.7)
- you might be able to use <2.7 controller image versions if you disable the
readinessProbe
helm value, removing the unsupported probe endpoint for older images
- you might be able to use <2.7 controller image versions if you disable the
from eks-charts.
@dev-travelex
would you help run
kubectl -n kube-system describe deployment/aws-load-balancer-controller
and kubectl -n kube-system describe endpoints/aws-load-balancer-webhook-service
from eks-charts.
I went to 2.6.2
after getting that error. And deployed 2.7.0
again after seeing your comment. This time, I didn't get the webhook error. Nonetheless, here are logs :
kubectl -n kube-system describe deployment/aws-load-balancer-controller
Name: aws-load-balancer-controller
Namespace: kube-system
CreationTimestamp: Sat, 10 Feb 2024 01:20:22 +0000
Labels: app.kubernetes.io/instance=aws-load-balancer-controller
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aws-load-balancer-controller
app.kubernetes.io/version=v2.7.0
helm.sh/chart=aws-load-balancer-controller-1.7.0
Annotations: deployment.kubernetes.io/revision: 2
meta.helm.sh/release-name: aws-load-balancer-controller
meta.helm.sh/release-namespace: kube-system
Selector: app.kubernetes.io/instance=aws-load-balancer-controller,app.kubernetes.io/name=aws-load-balancer-controller
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app.kubernetes.io/instance=aws-load-balancer-controller
app.kubernetes.io/name=aws-load-balancer-controller
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Service Account: aws-load-balancer-controller
Containers:
aws-load-balancer-controller:
Image: public.ecr.aws/eks/aws-load-balancer-controller:v2.7.0
Ports: 9443/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--cluster-name=nonprod-bliss-eks
--ingress-class=alb
--aws-vpc-id=vpc-0c6f43c09d80645f7
Liveness: http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
Readiness: http-get http://:61779/readyz delay=10s timeout=10s period=10s #success=1 #failure=2
Environment: <none>
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
Volumes:
cert:
Type: Secret (a volume populated by a Secret)
SecretName: aws-load-balancer-tls
Optional: false
Priority Class Name: system-cluster-critical
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: aws-load-balancer-controller-6d59ff6457 (0/0 replicas created)
NewReplicaSet: aws-load-balancer-controller-b55586fb8 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 22m deployment-controller Scaled up replica set aws-load-balancer-controller-6d59ff6457 to 2
Normal ScalingReplicaSet 6m51s deployment-controller Scaled up replica set aws-load-balancer-controller-b55586fb8 to 1
Normal ScalingReplicaSet 5m55s deployment-controller Scaled down replica set aws-load-balancer-controller-6d59ff6457 to 1 from 2
Normal ScalingReplicaSet 5m55s deployment-controller Scaled up replica set aws-load-balancer-controller-b55586fb8 to 2 from 1
Normal ScalingReplicaSet 5m3s deployment-controller Scaled down replica set aws-load-balancer-controller-6d59ff6457 to 0 from 1
kubectl -n kube-system describe endpoints/aws-load-balancer-webhook-service
Name: aws-load-balancer-webhook-service
Namespace: kube-system
Labels: app.kubernetes.io/component=webhook
app.kubernetes.io/instance=aws-load-balancer-controller
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aws-load-balancer-controller
app.kubernetes.io/version=v2.7.0
helm.sh/chart=aws-load-balancer-controller-1.7.0
prometheus.io/service-monitor=false
Annotations: <none>
Subsets:
Addresses: 100.68.90.208,100.68.90.24
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
webhook-server 9443 TCP
from eks-charts.
The logs shows it still stuck at this for more than 24 hours now; Would it impact the actual deployment of a load balancer?
I0210 23:42:56.770284 1 leaderelection.go:248] attempting to acquire leader lease kube-system/aws-load-balancer-controller-leader...
from eks-charts.
I encountered the same issue. After reviewing @SebastianScherer88 ’s solution, it seemed to use versions 1.8 and 2.6. I decided to upgrade to version 2.7, and it resolved the problem. Thanks!
from eks-charts.
Related Issues (20)
- Allow configuration of webhook selectors
- [appmesh-prometheus] Too many metrics
- "https://aws.github.io/eks-charts" cannot be reached HOT 1
- Add Neuron device plugin helm chart HOT 2
- Publish helm charts also as an OCI package (e.g. on GHCR/ECR registry)
- Release Notes for aws-efa-k8s-device-plugin
- fluentbit missing logs in aws cloudwatch
- aws-node-termination-handler in current repository is outdated
- aws-for-fluent-bit does not update cloudwatch retention log.
- [aws-load-balancer-controller]: webhookNamespaceSelectors inconsistently applied
- [aws-sigv4-proxy-admission-controller] Admission controller fails to recover if no pods are available to service requests HOT 1
- [aws-cloudwatch-metrics] Add support for mounting prometheus.yaml from ConfigMap
- [aws-load-balancer-controller] repeatedly attaching/detaching network interfaces
- aws-for-fluent-bit auto_create_group option is always disabled for cloudwatchlogs plugin in fluentbit helm chart
- GPU metrics not collected by aws-cloudwatch-metrics
- Failed to scrape Prometheus endpoint HOT 1
- [aws-for-fluent-bit] Unaligned documentation
- Typo In aws-for-fluent-bit template firehose.extraOutputs section
- aws for fluent bit chart doesn't apply tags to log groups
- Add affinity and tolerations to the CNI Metrics Helper deployment template
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eks-charts.