Comments (22)
It may well be that since etcd has no quorum, the control plane will no longer schedule pods anywhere. I suggest creating a cluster with 2 worker nodes, deploy Flux on one of the workers, make that node fail and see if it gets reschedule to the healthy node.
from flux-aio.
If you can reproduce this, please add with kubectl edit a toleration like so, and retest please:
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 30
from flux-aio.
Here is the cue bit for timoni, I use:
"flux": {
module: {
url: "oci://ghcr.io/stefanprodan/modules/flux-aio"
version: "2.1.2"
}
namespace: "flux-system"
values: {
hostNetwork: true
securityProfile: "privileged"
controllers: notification: enabled: false
}
}
from flux-aio.
If this node goes down, the controllers are not deployed to other nodes.
Is the Kubernetes control plane still working? I expect it to reschedule the pod on a different node. Maybe the toleration we set in Flux is too broad, I set it like this
from flux-aio.
I can still schedule pods.
Weave Dashboard, AWX, kubernetes dashboard are all rescheduled to other nodes. Only Flux does not and shows "running".
from flux-aio.
If you describe the Flux pod, is there any hint in the events about some blocker to rescheduling?
from flux-aio.
Events show this
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 39m kubelet Created container kustomize-controller
Normal Pulled 39m kubelet Container image "ghcr.io/fluxcd/helm-controller:v0.36.2" already present on machine
Normal Pulled 39m kubelet Container image "ghcr.io/fluxcd/source-controller:v1.1.2" already present on machine
Normal Created 39m kubelet Created container source-controller
Normal Started 39m kubelet Started container source-controller
Normal Pulled 39m kubelet Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.1" already present on machine
Normal SandboxChanged 39m kubelet Pod sandbox changed, it will be killed and re-created.
Normal Started 39m kubelet Started container kustomize-controller
Normal Started 39m kubelet Started container helm-controller
Normal Created 39m kubelet Created container helm-controller
Warning Unhealthy 38m kubelet Liveness probe failed: Get "http://10.0.2.22:9794/healthz": dial tcp 10.0.2.22:9794: connect: connection refused
Warning Unhealthy 38m (x5 over 39m) kubelet Readiness probe failed: Get "http://10.0.2.22:9794/readyz": dial tcp 10.0.2.22:9794: connect: connection refused
Warning Unhealthy 38m kubelet Liveness probe failed: Get "http://10.0.2.22:9792/healthz": dial tcp 10.0.2.22:9792: connect: connection refused
Warning Unhealthy 38m (x8 over 39m) kubelet Readiness probe failed: Get "http://10.0.2.22:9790/": dial tcp 10.0.2.22:9790: connect: connection refused
Warning NodeNotReady 16m (x3 over 93m) node-controller Node is not ready
The full description:
[disi@vmalmakw1s ~]$ kubectl describe pod flux-57bd866b6d-zbrfc -n flux-system
Name: flux-57bd866b6d-zbrfc
Namespace: flux-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: flux
Node: vmalmakms.home/10.0.2.22
Start Time: Sat, 25 Nov 2023 22:11:15 +0000
Labels: app.kubernetes.io/name=flux
pod-template-hash=57bd866b6d
Annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: true
prometheus.io/scrape: true
Status: Running
IP: 10.0.2.22
IPs:
IP: 10.0.2.22
Controlled By: ReplicaSet/flux-57bd866b6d
Containers:
source-controller:
Container ID: containerd://f95a5ff962bb8b4697cc3b3b933b2c45499f594b65802f113aebb587ff822b61
Image: ghcr.io/fluxcd/source-controller:v1.1.2
Image ID: ghcr.io/fluxcd/source-controller@sha256:b776e085ac079bf22ed23afe2874aebd10efcfaa740ec25748774608bbc79932
Ports: 9790/TCP, 9791/TCP, 9792/TCP
Host Ports: 9790/TCP, 9791/TCP, 9792/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9791
--health-addr=:9792
--storage-addr=:9790
--storage-path=/data
--storage-adv-addr=flux.$(RUNTIME_NAMESPACE).svc.cluster.local.
--concurrent=5
--requeue-dependency=30s
--watch-label-selector=!sharding.fluxcd.io/key
--helm-cache-max-size=10
--helm-cache-ttl=60m
--helm-cache-purge-interval=5m
State: Running
Started: Sun, 26 Nov 2023 08:33:56 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 26 Nov 2023 08:33:25 +0000
Finished: Sun, 26 Nov 2023 08:33:56 +0000
Ready: True
Restart Count: 15
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-sc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http-sc/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: flux-system (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/data from data (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nznjt (ro)
kustomize-controller:
Container ID: containerd://997b139a3e85abd2da13c1d95fbb585bf6cfe29967bcd241d95a885493213971
Image: ghcr.io/fluxcd/kustomize-controller:v1.1.1
Image ID: ghcr.io/fluxcd/kustomize-controller@sha256:e2b3c9e1292564bbfaa513f3cc6fa1df1194fae8ba9483fbe581099d0c585d94
Ports: 9793/TCP, 9794/TCP
Host Ports: 9793/TCP, 9794/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9793
--health-addr=:9794
--watch-label-selector=!sharding.fluxcd.io/key
--concurrent=5
--requeue-dependency=30s
State: Running
Started: Sun, 26 Nov 2023 08:33:56 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 26 Nov 2023 08:33:25 +0000
Finished: Sun, 26 Nov 2023 08:33:56 +0000
Ready: True
Restart Count: 15
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-kc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz-kc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: flux-system (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nznjt (ro)
helm-controller:
Container ID: containerd://c6a9a4ec46740520acc2f70a763259469830add59b7904a4ee0b00e8e97d2dd1
Image: ghcr.io/fluxcd/helm-controller:v0.36.2
Image ID: ghcr.io/fluxcd/helm-controller@sha256:6ee7e590e57350ac91cfdeee4587d0e9e6f52e723c56d4b7878c59279bd36f00
Ports: 9795/TCP, 9796/TCP
Host Ports: 9795/TCP, 9796/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9795
--health-addr=:9796
--watch-label-selector=!sharding.fluxcd.io/key
--concurrent=5
--requeue-dependency=30s
State: Running
Started: Sun, 26 Nov 2023 08:33:25 +0000
Last State: Terminated
Reason: Unknown
Exit Code: 255
Started: Sun, 26 Nov 2023 07:49:37 +0000
Finished: Sun, 26 Nov 2023 08:33:09 +0000
Ready: True
Restart Count: 13
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-hc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz-hc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: flux-system (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nznjt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady True
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-nznjt:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 39m kubelet Created container kustomize-controller
Normal Pulled 39m kubelet Container image "ghcr.io/fluxcd/helm-controller:v0.36.2" already present on machine
Normal Pulled 39m kubelet Container image "ghcr.io/fluxcd/source-controller:v1.1.2" already present on machine
Normal Created 39m kubelet Created container source-controller
Normal Started 39m kubelet Started container source-controller
Normal Pulled 39m kubelet Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.1" already present on machine
Normal SandboxChanged 39m kubelet Pod sandbox changed, it will be killed and re-created.
Normal Started 39m kubelet Started container kustomize-controller
Normal Started 39m kubelet Started container helm-controller
Normal Created 39m kubelet Created container helm-controller
Warning Unhealthy 38m kubelet Liveness probe failed: Get "http://10.0.2.22:9794/healthz": dial tcp 10.0.2.22:9794: connect: connection refused
Warning Unhealthy 38m (x5 over 39m) kubelet Readiness probe failed: Get "http://10.0.2.22:9794/readyz": dial tcp 10.0.2.22:9794: connect: connection refused
Warning Unhealthy 38m kubelet Liveness probe failed: Get "http://10.0.2.22:9792/healthz": dial tcp 10.0.2.22:9792: connect: connection refused
Warning Unhealthy 38m (x8 over 39m) kubelet Readiness probe failed: Get "http://10.0.2.22:9790/": dial tcp 10.0.2.22:9790: connect: connection refused
Warning NodeNotReady 16m (x3 over 93m) node-controller Node is not ready
from flux-aio.
Hmm why is Status: Running
if the Liveness probe fails. Can you also describe the ReplicaSet/flux-57bd866b6d
and the flux deployment please.
from flux-aio.
Replicaset
Name: flux-57bd866b6d
Namespace: flux-system
Selector: app.kubernetes.io/name=flux,pod-template-hash=57bd866b6d
Labels: app.kubernetes.io/name=flux
pod-template-hash=57bd866b6d
Annotations: app.kubernetes.io/role: cluster-admin
deployment.kubernetes.io/desired-replicas: 1
deployment.kubernetes.io/max-replicas: 1
deployment.kubernetes.io/revision: 1
Controlled By: Deployment/flux
Replicas: 1 current / 1 desired
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app.kubernetes.io/name=flux
pod-template-hash=57bd866b6d
Annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: true
prometheus.io/scrape: true
Service Account: flux
Containers:
source-controller:
Image: ghcr.io/fluxcd/source-controller:v1.1.2
Ports: 9790/TCP, 9791/TCP, 9792/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9791
--health-addr=:9792
--storage-addr=:9790
--storage-path=/data
--storage-adv-addr=flux.$(RUNTIME_NAMESPACE).svc.cluster.local.
--concurrent=5
--requeue-dependency=30s
--watch-label-selector=!sharding.fluxcd.io/key
--helm-cache-max-size=10
--helm-cache-ttl=60m
--helm-cache-purge-interval=5m
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-sc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http-sc/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/data from data (rw)
/tmp from tmp (rw)
kustomize-controller:
Image: ghcr.io/fluxcd/kustomize-controller:v1.1.1
Ports: 9793/TCP, 9794/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9793
--health-addr=:9794
--watch-label-selector=!sharding.fluxcd.io/key
--concurrent=5
--requeue-dependency=30s
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-kc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz-kc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/tmp from tmp (rw)
helm-controller:
Image: ghcr.io/fluxcd/helm-controller:v0.36.2
Ports: 9795/TCP, 9796/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9795
--health-addr=:9796
--watch-label-selector=!sharding.fluxcd.io/key
--concurrent=5
--requeue-dependency=30s
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-hc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz-hc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/tmp from tmp (rw)
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Priority Class Name: system-cluster-critical
Events: <none>
Deployment
Name: flux
Namespace: flux-system
CreationTimestamp: Mon, 20 Nov 2023 15:16:57 +0000
Labels: app.kubernetes.io/managed-by=timoni
app.kubernetes.io/name=flux
app.kubernetes.io/part-of=flux
app.kubernetes.io/version=v2.1.2
instance.timoni.sh/name=flux
instance.timoni.sh/namespace=flux-system
Annotations: app.kubernetes.io/role: cluster-admin
deployment.kubernetes.io/revision: 1
Selector: app.kubernetes.io/name=flux
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: Recreate
MinReadySeconds: 0
Pod Template:
Labels: app.kubernetes.io/name=flux
Annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: true
prometheus.io/scrape: true
Service Account: flux
Containers:
source-controller:
Image: ghcr.io/fluxcd/source-controller:v1.1.2
Ports: 9790/TCP, 9791/TCP, 9792/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9791
--health-addr=:9792
--storage-addr=:9790
--storage-path=/data
--storage-adv-addr=flux.$(RUNTIME_NAMESPACE).svc.cluster.local.
--concurrent=5
--requeue-dependency=30s
--watch-label-selector=!sharding.fluxcd.io/key
--helm-cache-max-size=10
--helm-cache-ttl=60m
--helm-cache-purge-interval=5m
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-sc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http-sc/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/data from data (rw)
/tmp from tmp (rw)
kustomize-controller:
Image: ghcr.io/fluxcd/kustomize-controller:v1.1.1
Ports: 9793/TCP, 9794/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9793
--health-addr=:9794
--watch-label-selector=!sharding.fluxcd.io/key
--concurrent=5
--requeue-dependency=30s
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-kc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz-kc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/tmp from tmp (rw)
helm-controller:
Image: ghcr.io/fluxcd/helm-controller:v0.36.2
Ports: 9795/TCP, 9796/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces
--log-level=info
--log-encoding=json
--enable-leader-election=false
--metrics-addr=:9795
--health-addr=:9796
--watch-label-selector=!sharding.fluxcd.io/key
--concurrent=5
--requeue-dependency=30s
Limits:
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz-hc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz-hc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOURCE_CONTROLLER_LOCALHOST: localhost:9790
RUNTIME_NAMESPACE: (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
NO_PROXY: .cluster.local.,.cluster.local,.svc
Mounts:
/tmp from tmp (rw)
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Priority Class Name: system-cluster-critical
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available False MinimumReplicasUnavailable
OldReplicaSets: <none>
NewReplicaSet: flux-57bd866b6d (1/1 replicas created)
Events: <none>
from flux-aio.
Really odd, the replicaset says Pods Status: 1 Running
which is really strange, but the Deployment say Replicas: 1 unavailable
but it doesn't create a new replicaset.
from flux-aio.
I will see later today, if I start over and redeploy the entire cluster. Then test again and see if it has the same behaviour.
from flux-aio.
I guess if you delete pod it will get rescheduled, this looks like some race condition in the Kubernetes scheduler or the toleration makes it trip.
from flux-aio.
Correct :)
I did delete the pod.
flux-system flux-57bd866b6d-j6z7x 3/3 Running 0 56s 10.0.2.24 vmalmakw2s.home <none> <none>
flux-system flux-57bd866b6d-zbrfc 3/3 Terminating 43 (75m ago) 11h 10.0.2.22 vmalmakms.home <none> <none>
A new was created and it's working fine.
But it did not happen automatically.
from flux-aio.
Hmm so it looks like it got stuck in Terminating, but why wasn't this status reflected in the replicaset and why it didn't timeout. I wander if this is some bug in Kubernetes.
from flux-aio.
Terminating status only after ran:
$ kubectl delete pod flux-57bd866b6d-zbrfc -n flux-system
i.e. the kubernetes dashboard and other pods also linger some time in terminating. Default is some 15min, before those get removed by Kubernetes?
edit, removes it immediately :
$ kubectl delete pod --force flux-57bd866b6d-zbrfc -n flux-system
from flux-aio.
If you managed to reproduce this, it would be good to take snapshots of the deployment and replicaset and see what events are issued for those, I guess those expired and that's why there are no events listed now.
from flux-aio.
Here some attempts to log events:
Flux is now running on vmalmakw2s and if I shut that down, there is no event to the Replicaset or Deployment after ~15min. I ran this:
$ watch "kubectl describe replicasets.apps -n flux-system flux-57bd866b6d | grep Events"
And no events on the Deployment either.
I then started the node again and still no events.
Then I shutdown the node with awx-operator deployment and monitored.
The only event on the new ReplicaSet for awx-operator that is deployed after ~6min:
Normal SuccessfulCreate 89s replicaset-controller Created pod: awx-operator-controller-manager-5cd65bb78d-7wn64`
I hope this helps.
Now, I'll edit Flux as you stated above and test again.
from flux-aio.
[disi@vmalmakw1s ~]$ kubectl edit deployments.apps -n flux-system flux
deployment.apps/flux edited
Still running fine. Tested sync with git and events.
Deployment log:
Normal ScalingReplicaSet 3m12s deployment-controller Scaled down replica set flux-57bd866b6d to 0 from 1
Normal ScalingReplicaSet 3m12s deployment-controller Scaled up replica set flux-5c4dd674fc to 1
ReplicaSet log:
Normal SuccessfulCreate 6m11s replicaset-controller Created pod: flux-57bd866b6d-qj947
Normal SuccessfulDelete 3m49s replicaset-controller Deleted pod: flux-57bd866b6d-qj947
Running on node "vmalmakms".
Monitoring:
$ watch "kubectl describe replicasets.apps -n flux-system flux-57bd866b6d | grep -A 6 Events"
$ watch "kubectl describe deployments.apps -n flux-system flux | grep -A 6 Events"
Now shutting down "vmalmakms"...
It's working :)
New pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 62s default-scheduler Successfully assigned flux-system/flux-5c4dd674fc-dfpqk to vmalmakw2s.home
Normal Pulled 62s kubelet Container image "ghcr.io/fluxcd/source-controller:v1.1.2" already present on machine
Normal Created 62s kubelet Created container source-controller
Normal Started 61s kubelet Started container source-controller
Normal Pulled 61s kubelet Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.1" already present on machine
Normal Created 61s kubelet Created container kustomize-controller
Normal Started 61s kubelet Started container kustomize-controller
Normal Pulled 61s kubelet Container image "ghcr.io/fluxcd/helm-controller:v0.36.2" already present on machine
Normal Created 61s kubelet Created container helm-controller
Normal Started 61s kubelet Started container helm-controller
old pod
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m36s default-scheduler Successfully assigned flux-system/flux-5c4dd674fc-bxx65 to vmalmakms.home
Normal Pulled 9m36s kubelet Container image "ghcr.io/fluxcd/source-controller:v1.1.2" already present on machine
Normal Created 9m36s kubelet Created container source-controller
Normal Started 9m36s kubelet Started container source-controller
Normal Pulled 9m36s kubelet Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.1" already present on machine
Normal Created 9m36s kubelet Created container kustomize-controller
Normal Started 9m36s kubelet Started container kustomize-controller
Normal Pulled 9m36s kubelet Container image "ghcr.io/fluxcd/helm-controller:v0.36.2" already present on machine
Normal Created 9m36s kubelet Created container helm-controller
Normal Started 9m35s kubelet Started container helm-controller
Warning NodeNotReady 64s node-controller Node is not ready
from flux-aio.
No events on the Deployment, a new ReplicaSet is created:
flux-system flux-57bd866b6d 0 0 0 5d20h
flux-system flux-5c4dd674fc 1 1 1 15m
from flux-aio.
Ok so the tolerationSeconds: 30
made it reschedule? And without it, it stays dead on the failing node?
from flux-aio.
Ok so the
tolerationSeconds: 30
made it reschedule? And without it, it stays dead on the failing node?
Hi, yes, without this parameter it just stays there forever in running state. I would probably change it to ~5min as the standard setting of Kubernetes? It now reschedules way ahead of other pods.
from flux-aio.
Thanks @disi for all the tests. I have published the fix, rerunning timoni bundle apply
should set the right tolerations now.
from flux-aio.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flux-aio.