kubereboot / kured Goto Github PK
View Code? Open in Web Editor NEWKubernetes Reboot Daemon
Home Page: https://kured.dev
License: Apache License 2.0
Kubernetes Reboot Daemon
Home Page: https://kured.dev
License: Apache License 2.0
Kured seems to have been built with the assumption that all nodes are trustworthy.
This assumption can be broken in a multitenant environment, or in an environment when a single node gets compromised.
With a quick look through of the code, there is some low hanging fruit that could potentially be fixed to tighten up security:
The lock is implemented by tweaking the kured daemonset. But this has enough permissions edit all the daemonsets pods in the entire cluster and replace it with a privileged container of the attackers choice. This can be fixed in two ways I can think of. One is to make the lock item different, such as using a configmap. The other is to make a separate service that is in charge of issuing locks so the nodes themselves can't.
Permissions are granted to the kured service account to drain/cordon/uncordon workload, but there can't be a restriction on which node is allowed to be acted on. So rather then using the account to drain the current load, an attacker could drain all nodes but the current node, attracting workload along with their secrets to the current node, then read them off the host who now has access to them. This too can be solved either of 2 ways. having a central service draining/uncordoning which an be pinned to a more critical node. alternately it may be possible to use the kubelets own credentials to drain itself. The noderestriction admission controller should only allow it to act upon itself with a node credential.
Kured kicked in rightly, and disabled scheduling for my node:
app03.lan.davidkarlsen.com Ready,SchedulingDisabled <none> 62d v1.11.2
I can see a number of pods being killed:
root@app03:/var/log/containers# tail kured-vkwnk_kube-system_kured-83287b3a6ba5d8a4dfd8a22822932a1655b71cc2ca2bfbd5007f5d389992100c.log
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"kube-system-kubernetes-dashboard-proxy-55c7756d46-dsqzq\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.066034203Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"coredns-78fcdf6894-k87gl\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.066110928Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monitoring-grafana-788f47b84-bkggz\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.066134368Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monitoring-prometheus-alertmanager-cbcc46d55-gwkqz\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.179296096Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"logging-cerebro-6794fc6bc6-t26v9\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.179378188Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monocular-monocular-mongodb-5644f785b9-24tmz\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.202395762Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monitoring-prometheus-blackbox-exporter-7775df5698-86s67\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.202444091Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"monitoring-prometheus-server-75bfb9f66-xm9vp\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.249172642Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"kube-ops-view-kube-ops-view-kube-ops-view-6db67848c4-krmx8\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.449382938Z"}
{"log":"time=\"2018-08-15T09:47:13Z\" level=info msg=\"pod \\\"logging-elasticsearch-client-5978d8f465-t9kkm\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-15T09:47:13.649729383Z"}
root@app03:/var/log/containers#
but then nothing more happens. I guess it fails at something - but the logs should tell why.
these pods are left (which are mainly daemons, except for the nginx-ingress:
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
auditbeat auditbeat-auditbeat-q4k65 0 (0%) 0 (0%) 0 (0%) 0 (0%)
datadog datadog-datadog-agent-datadog-pr9w5 200m (2%) 200m (2%) 256Mi (1%) 256Mi (1%)
kube-system calico-node-qlmcc 250m (3%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-5cf42 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-system-nginx-ingress-controller-84f76b76cb-jp8dr 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-system-nginx-ingress-default-backend-6b557bb97c-vlfqc 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kured-vkwnk 0 (0%) 0 (0%) 0 (0%) 0 (0%)
logging fluent-bit-2djgl 100m (1%) 0 (0%) 100Mi (0%) 100Mi (0%)
monitoring monitoring-prometheus-node-exporter-6jl6k 0 (0%) 0 (0%) 0 (0%) 0 (0%)
any hints?
Hi folks,
Apologies if this seems like a daft question. I noticed that kured, as deployed in our clusters, is using quite a large amount of memory upon first glance as shown by kubectl top pods
:
NAME CPU(cores) MEMORY(bytes)
kured-57mc6 0m 489Mi
kured-7rdms 0m 547Mi
kured-bpcz8 0m 431Mi
kured-jzkpr 0m 385Mi
kured-kf2dj 0m 302Mi
kured-sm8r6 0m 472Mi
Is kured supposed to be using this much memory, and if so, what is likely to contribute to this usage? If this memory usage seems high, are there things that could be done to reduce it?
Thanks in advance for your help.
Tainting master nodes seems like a widely used way of forbidding pods to be scheduled on master nodes. We want kured pods to also be created on master nodes though to manage their updates as well.
We consider adding the corresponding toleration so users aren't confused when kured doesn't reboot their master nodes (as seen in slack).
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
It's probably more useful to be told when a node is being taken out of service, than to know that an already out of service node is about to be rebooted.
Are you planning to release new versions or are you recommending everyone uses git master?
Hi! Thanks for kured, very useful. I was wondering, if I have scheduled backups and right during a backup kured detects that a node requires rebooting, will the backups or any other cron jobs complete first or will they be terminated or something? Thanks
Feature request: ability to set the order of the nodes based configured query.
For example Statefullset pod: draining Prometheus with only 1 replica cause Metrics lost.
In order to prevent more than 1 drain process with metric lost, set Prometheus node to be the latest upgrade node and by doing so prevent moving Prometheus more than once.
When the first node reboots, CAS brings up another node to place the pending pods drained and waiting to be scheduled. Once the process of rebooting all nodes is complete, there is this one node that has not been patched with reboots. The process of rebooting nodes and the CAS bringing up a new node that is not patch continues. Do you have any suggestions on how this should work?
running on master-114c349
{"log":"time=\"2018-12-06T21:32:02Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:32:02.953439101Z"}
{"log":"time=\"2018-12-06T21:33:02Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:33:02.955788412Z"}
{"log":"time=\"2018-12-06T21:34:02Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:34:02.960860999Z"}
{"log":"time=\"2018-12-06T21:34:13Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:34:13.861012393Z"}
{"log":"time=\"2018-12-06T21:34:13Z\" level=info msg=\"Reboot not required\"\n","stream":"stderr","time":"2018-12-06T21:34:13.861419579Z"}
{"log":"time=\"2018-12-06T21:35:02Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:35:02.963461403Z"}
{"log":"time=\"2018-12-06T21:36:02Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:36:02.970068445Z"}
{"log":"time=\"2018-12-06T21:37:02Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:37:02.972730235Z"}
{"log":"time=\"2018-12-06T21:38:02Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:38:02.978542906Z"}
{"log":"time=\"2018-12-06T21:39:02Z\" level=warning msg=\"nsenter: setns(): can't reassociate to namespace 'mnt': Operation not permitted\" cmd=/usr/bin/nsenter std=err\n","stream":"stderr","time":"2018-12-06T21:39:02.980484477Z"}
Some updates that can happen need to happen after the node is drained to not effect workload. (some kinds of docker upgrades for example)
So the procedure might be:
Can a hook on the node agent be added to run a script before reboot but after drain?
Thanks very much for this, it's really useful.
There doesn't seem to be any RBAC support yet, and it would be useful. It probably needs to run with it's own service account kube-system:kured, and have a role and rolebindings, maybe packaged up as a helm chart?
I'm happy to develop and submit a patch, if you are ok to review it?
Hi!
For some reason, quay.io doesn't have an image tagged 'latest', even if commit c42fff3 seems to suggest there is a latest tag
Events of a new deploy (using the helm chart), when pointing to 'latest':
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
11s 11s 1 kured-01.155212f4960bd900 DaemonSet Normal SuccessfulCreate daemonset-controller Created pod: kured-01-zb8jc
10s 10s 1 kured-01-zb8jc.155212f4d9572677 Pod spec.containers{kured} Normal Pulling kubelet, aks-nodepool1-31356381-1 pulling image "quay.io/weaveworks/kured:latest"
9s 9s 1 kured-01-zb8jc.155212f4ebbc020b Pod spec.containers{kured} Warning Failed kubelet, aks-nodepool1-31356381-1 Failed to pull image "quay.io/weaveworks/kured:latest": rpc error: code = Unknown desc = Tag latest not found in repository quay.io/weaveworks/kured 9s 9s 1 kured-01-zb8jc.155212f4ebbc4a50 Pod spec.containers{kured} Warning Failed kubelet, aks-nodepool1-31356381-1 Error: ErrImagePull
8s 8s 1 kured-01-zb8jc.155212f5368dff63 Pod spec.containers{kured} Normal BackOff kubelet, aks-nodepool1-31356381-1 Back-off pulling image "quay.io/weaveworks/kured:latest" 8s 8s 1 kured-01-zb8jc.155212f5368e2e43 Pod spec.containers{kured} Warning Failed kubelet, aks-nodepool1-31356381-1 Error: ImagePullBackOff
I noticed that the file https://github.com/weaveworks/kured/releases/download/1.1.0/kured-1.1.0.yaml still references quay not dockerhub, so the installation instructions fail.
Slack incoming web hook lets you override channel name. It would be good to give an option to override, as we can have single incoming web hook integration but send notifications for different environments to different channels.
We are using Kured on AKS and I regularly I see that nodes stay on status Ready,SchedulingDisabled and I have to uncordon them manually.
When I look into the log file of the kured pod it shows:
time="2019-03-06T06:30:27Z" level=info msg="Kubernetes Reboot Daemon: 1.1.0"
time="2019-03-06T06:30:27Z" level=info msg="Node ID: aks-default-13951270-0"
time="2019-03-06T06:30:27Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2019-03-06T06:30:27Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1h0m0s"
time="2019-03-06T06:30:28Z" level=info msg="Holding lock"
time="2019-03-06T06:30:28Z" level=info msg="Uncordoning node aks-default-13951270-0"
time="2019-03-06T06:30:29Z" level=info msg="node/aks-default-13951270-0 uncordoned" cmd=/usr/bin/kubectl std=out
time="2019-03-06T06:30:29Z" level=info msg="Releasing lock"
So it says it uncordoned it, but still I regularly see that nodes are in fact not uncordoned.
Is this something you guys see more often?
Kured v.1.2 fails on my kubeadm created K8s v.1.16.0 cluster. It seems that this issue has been solved by fix #75, which isn't a part of a release yet.
The error message I received before compiling kubed from the latest source was:
time="2019-09-22T19:45:07Z" level=info msg="Blocking Pod Selectors: []"
time="2019-09-22T19:45:07Z" level=fatal msg="Error testing lock: the server could not find the requested resource"
It would be really nice to get a kubed release v.1.3 including an updated stable helm chart. This will hopefully make the stable kured helm chart work on a K8s v.1.16 clusters without modifications.
I have not used Kured on older Kubernetes versions (yet).
A flag to set delays between reboots should be added. Just like --period
for checking for sentinel file, we should have something like --reboot-delay
to specify a time frame between reboots of the nodes which have the sentinel file. A use case I am having is a cluster containing large pods taking 30 min to load. If I let kured do its job, the nodes are restarted too quickly for the pods to recover.
According to https://github.com/weaveworks/kured/blob/d7b9c9fbec26e113d8b90499c3e58bc06098581c/cmd/kured/main.go#L256, kured will only drain the node before rebooting, if the node was not already cordoned/unschedulable.
Just because a node is marked unschedulable does not mean all workloads were already drained. The node might have been cordoned manually but not drained before, kured might have started a drain but might have been interrupted (crash, update, whatever) before finishing the drain. Always running drain before reboot makes this safer, and if there are no more workloads to drain, is effectively a no-op.
this is the output I get from kured:
{"log":"time=\"2018-08-07T13:27:21Z\" level=info msg=\"Kubernetes Reboot Daemon: master-5731b98\"\n","stream":"stderr","time":"2018-08-07T13:27:21.023797488Z"}
{"log":"time=\"2018-08-07T13:27:21Z\" level=info msg=\"Node ID: app02.lan.davidkarlsen.com\"\n","stream":"stderr","time":"2018-08-07T13:27:21.02386206Z"}
{"log":"time=\"2018-08-07T13:27:21Z\" level=info msg=\"Lock Annotation: kube-system/kured:weave.works/kured-node-lock\"\n","stream":"stderr","time":"2018-08-07T13:27:21.023876817Z"}
{"log":"time=\"2018-08-07T13:27:21Z\" level=info msg=\"Reboot Sentinel: /var/run/reboot-required every 1h0m0s\"\n","stream":"stderr","time":"2018-08-07T13:27:21.023889431Z"}
{"log":"time=\"2018-08-07T14:26:17Z\" level=info msg=\"Reboot required\"\n","stream":"stderr","time":"2018-08-07T14:26:17.381148603Z"}
{"log":"time=\"2018-08-07T14:26:17Z\" level=warning msg=\"Reboot blocked: 1 active alerts: [deployment_replicas_mismatch]\"\n","stream":"stderr","time":"2018-08-07T14:26:17.439989295Z"}
{"log":"time=\"2018-08-07T15:26:17Z\" level=info msg=\"Reboot required\"\n","stream":"stderr","time":"2018-08-07T15:26:17.381452887Z"}
{"log":"time=\"2018-08-07T15:26:22Z\" level=warning msg=\"Reboot blocked: 1 active alerts: [deployment_replicas_mismatch]\"\n","stream":"stderr","time":"2018-08-07T15:26:22.38692619Z"}
{"log":"time=\"2018-08-07T16:26:17Z\" level=info msg=\"Reboot required\"\n","stream":"stderr","time":"2018-08-07T16:26:17.381447461Z"}
{"log":"time=\"2018-08-07T16:26:17Z\" level=warning msg=\"Reboot blocked: 1 active alerts: [deployment_replicas_mismatch]\"\n","stream":"stderr","time":"2018-08-07T16:26:17.387051589Z"}
{"log":"time=\"2018-08-07T17:26:17Z\" level=info msg=\"Reboot required\"\n","stream":"stderr","time":"2018-08-07T17:26:17.381804597Z"}
{"log":"time=\"2018-08-07T17:26:17Z\" level=warning msg=\"Reboot blocked: 1 active alerts: [deployment_replicas_mismatch]\"\n","stream":"stderr","time":"2018-08-07T17:26:17.387159836Z"}
{"log":"time=\"2018-08-07T18:26:17Z\" level=info msg=\"Reboot required\"\n","stream":"stderr","time":"2018-08-07T18:26:17.381839419Z"}
{"log":"time=\"2018-08-07T18:26:17Z\" level=warning msg=\"Lock already held: app03.lan.davidkarlsen.com\"\n","stream":"stderr","time":"2018-08-07T18:26:17.953898535Z"}
{"log":"time=\"2018-08-07T19:26:17Z\" level=info msg=\"Reboot required\"\n","stream":"stderr","time":"2018-08-07T19:26:17.532210049Z"}
{"log":"time=\"2018-08-07T19:26:17Z\" level=warning msg=\"Reboot blocked: 1 active alerts: [deployment_replicas_mismatch]\"\n","stream":"stderr","time":"2018-08-07T19:26:17.778564028Z"}
{"log":"time=\"2018-08-07T20:26:17Z\" level=info msg=\"Reboot required\"\n","stream":"stderr","time":"2018-08-07T20:26:17.51315201Z"}
{"log":"time=\"2018-08-07T20:26:17Z\" level=warning msg=\"Reboot blocked: 1 active alerts: [deployment_replicas_mismatch]\"\n","stream":"stderr","time":"2018-08-07T20:26:17.950332758Z"}
{"log":"time=\"2018-08-07T21:26:17Z\" level=info msg=\"Reboot required\"\n","stream":"stderr","time":"2018-08-07T21:26:17.502508634Z"}
{"log":"time=\"2018-08-07T21:26:19Z\" level=info msg=\"Acquired reboot lock\"\n","stream":"stderr","time":"2018-08-07T21:26:19.041613032Z"}
{"log":"time=\"2018-08-07T21:26:19Z\" level=info msg=\"Draining node app02.lan.davidkarlsen.com\"\n","stream":"stderr","time":"2018-08-07T21:26:19.041639516Z"}
{"log":"time=\"2018-08-07T21:26:27Z\" level=info msg=\"node \\\"app02.lan.davidkarlsen.com\\\" cordoned\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:27.608428496Z"}
{"log":"time=\"2018-08-07T21:26:29Z\" level=warning msg=\"WARNING: Deleting pods with local storage: anchore-anchore-engine-anchore-engine-worker-55bc984d7-km7zx, flux-584d78b89f-xhrch, minio-minio-799cd646f-ks7qv, monocular-monocular-monocular-api-6747bb55c-httx5; Ignoring DaemonSet-managed pods: auditbeat-auditbeat-ffd58, datadog-datadog-agent-datadog-clqsm, calico-node-4kjvx, kube-proxy-6hmhz, kured-q7jkr, fluent-bit-thql4, monitoring-prometheus-node-exporter-xf4rj\" cmd=/usr/bin/kubectl std=err\n","stream":"stderr","time":"2018-08-07T21:26:29.263488783Z"}
{"log":"time=\"2018-08-07T21:26:30Z\" level=info msg=\"pod \\\"metrics-server-metrics-server-658d69ddf7-hwh7p\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:30.129399317Z"}
{"log":"time=\"2018-08-07T21:26:30Z\" level=info msg=\"pod \\\"minio-minio-799cd646f-ks7qv\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:30.132816317Z"}
{"log":"time=\"2018-08-07T21:26:30Z\" level=info msg=\"pod \\\"anchore-anchore-engine-anchore-engine-worker-55bc984d7-km7zx\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:30.143296645Z"}
{"log":"time=\"2018-08-07T21:26:30Z\" level=info msg=\"pod \\\"flux-584d78b89f-xhrch\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:30.143492166Z"}
{"log":"time=\"2018-08-07T21:26:30Z\" level=info msg=\"pod \\\"logging-elasticsearch-data-1\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:30.143637461Z"}
{"log":"time=\"2018-08-07T21:26:30Z\" level=info msg=\"pod \\\"anchore-anchore-engine-postgresql-5cd6586d5b-mp6d8\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:30.144007446Z"}
{"log":"time=\"2018-08-07T21:26:30Z\" level=info msg=\"pod \\\"logging-elasticsearch-master-1\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:30.144136576Z"}
{"log":"time=\"2018-08-07T21:26:31Z\" level=info msg=\"pod \\\"anchore-anchore-engine-anchore-engine-core-645cd6b7fd-2w4fs\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:31.950339088Z"}
{"log":"time=\"2018-08-07T21:26:31Z\" level=info msg=\"pod \\\"monitoring-prometheus-alertmanager-cbcc46d55-nkxkh\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:31.950422791Z"}
{"log":"time=\"2018-08-07T21:26:34Z\" level=info msg=\"pod \\\"kube-ops-view-kube-ops-view-kube-ops-view-6db67848c4-dfgqg\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:34.623470381Z"}
{"log":"time=\"2018-08-07T21:26:36Z\" level=info msg=\"pod \\\"monocular-monocular-monocular-ui-58f9f95864-qr7fm\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:36.040687921Z"}
{"log":"time=\"2018-08-07T21:26:36Z\" level=info msg=\"pod \\\"flux-memcached-5f8b4c7dc8-l6b5t\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:36.199982135Z"}
{"log":"time=\"2018-08-07T21:26:36Z\" level=info msg=\"pod \\\"tiller-deploy-5c688d5f9b-v2hbx\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:36.200033975Z"}
{"log":"time=\"2018-08-07T21:26:36Z\" level=info msg=\"pod \\\"kube-system-kubernetes-dashboard-proxy-64f7674f88-c97m4\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:36.200053208Z"}
{"log":"time=\"2018-08-07T21:26:36Z\" level=info msg=\"pod \\\"kube-system-heapster-heapster-56c646d674-87ptg\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:36.688737532Z"}
{"log":"time=\"2018-08-07T21:26:36Z\" level=info msg=\"pod \\\"logging-elasticsearch-client-5978d8f465-dr6ft\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:36.688776421Z"}
{"log":"time=\"2018-08-07T21:26:37Z\" level=info msg=\"pod \\\"monocular-monocular-monocular-prerender-6ffdcd79c4-cfs8n\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:37.402532532Z"}
{"log":"time=\"2018-08-07T21:26:37Z\" level=info msg=\"pod \\\"monitoring-prometheus-server-75bfb9f66-wxmq5\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:37.402572959Z"}
{"log":"time=\"2018-08-07T21:26:37Z\" level=info msg=\"pod \\\"monitoring-prometheus-blackbox-exporter-7775df5698-cw49s\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:37.403119108Z"}
{"log":"time=\"2018-08-07T21:26:37Z\" level=info msg=\"pod \\\"hubot-hubot-d7dc4978c-f48nz\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:37.403141351Z"}
{"log":"time=\"2018-08-07T21:26:37Z\" level=info msg=\"pod \\\"monocular-monocular-monocular-api-6747bb55c-httx5\\\" evicted\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:37.403189515Z"}
{"log":"time=\"2018-08-07T21:26:37Z\" level=info msg=\"node \\\"app02.lan.davidkarlsen.com\\\" drained\" cmd=/usr/bin/kubectl std=out\n","stream":"stderr","time":"2018-08-07T21:26:37.403202192Z"}
{"log":"time=\"2018-08-07T21:26:37Z\" level=info msg=\"Commanding reboot\"\n","stream":"stderr","time":"2018-08-07T21:26:37.406949235Z"}
{"log":"time=\"2018-08-07T21:26:42Z\" level=warning msg=\"Error notifying slack: Post https://hooks.slack.com/services/obfuscated/obfuscated/obfuscated: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\"\n","stream":"stderr","time":"2018-08-07T21:26:42.407467437Z"}
{"log":"time=\"2018-08-07T21:27:16Z\" level=warning msg=\"Failed to set wall message, ignoring: Connection reset by peer\" cmd=/bin/systemctl std=err\n","stream":"stderr","time":"2018-08-07T21:27:16.609804186Z"}
{"log":"time=\"2018-08-07T21:27:16Z\" level=warning msg=\"Failed to reboot system via logind: Transport endpoint is not connected\" cmd=/bin/systemctl std=err\n","stream":"stderr","time":"2018-08-07T21:27:16.609844005Z"}
I have configured a Flag: --blocking-pod-selector=runtime=long
to prevent kured to reboot some pods when running (according to the documentation), now Kured pods are in a CrashLoopBackOff
status and the logs are:
Error: unknown flag: --blocking-pod-selector
Usage:
kured [flags]
Flags:
--alert-filter-regexp regexp.Regexp alert names to ignore when checking for active alerts
--ds-name string name of daemonset on which to place lock (default "kured")
--ds-namespace string namespace containing daemonset on which to place lock (default "kube-system")
-h, --help help for kured
--lock-annotation string annotation in which to record locking node (default "weave.works/kured-node-lock")
--period duration reboot check period (default 1h0m0s)
--prometheus-url string Prometheus instance to probe for active alerts
--reboot-sentinel string path to file whose existence signals need to reboot (default "/var/run/reboot-required")
--slack-hook-url string slack hook URL for reboot notfications
--slack-username string slack username for reboot notfications (default "kured")
time="2019-03-07T16:57:24Z" level=fatal msg="unknown flag: --blocking-pod-selector"
Which shows like the flag isn't supported. I'm using version 1.1.0
of kured.
The cordon and drain process is initiated but the reboot command fails. This then kills the container which causes the pods to permanently be cordoned and drained from every node every hour without the reboot occurring
Infrastructure
Azure AKS
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
Pod
Name: kured-9hwl9
Namespace: core
Priority: 0
PriorityClassName: <none>
Node: 0/10.4.0.6
Start Time: Wed, 07 Nov 2018 16:24:18 +0000
Labels: app=kured
controller-revision-hash=2895661024
pod-template-generation=1
release=
Annotations: <none>
Status: Running
IP: 10.244.2.127
Controlled By: DaemonSet/kured
Containers:
kured:
Container ID: docker://bfe49a8a161d4f6e349ed30eaaa3894fb5332f690e00df2d6ec30f0bf3f0f25e
Image: quay.io/weaveworks/kured:1.1.0
Image ID: docker-pullable://quay.io/weaveworks/kured@sha256:9cb1aa3ffc06bd97c3a449eb69790fbda763c9c88195e293a03a3adfbfe4b512
Port: <none>
Command:
/usr/bin/kured
Args:
--ds-name=kured
--ds-namespace=core
State: Running
Started: Thu, 08 Nov 2018 14:11:10 +0000
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 08 Nov 2018 13:39:10 +0000
Finished: Thu, 08 Nov 2018 14:11:05 +0000
Ready: True
Restart Count: 22
Environment:
KURED_NODE_ID: (v1:spec.nodeName)
Mounts:
/var/run from hostrun (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j9s5f (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
hostrun:
Type: HostPath (bare host directory volume)
Path: /var/run
HostPathType:
default-token-j9s5f:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-j9s5f
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
Normal Pulled 40m (x23 over 22h) kubelet, aks-qube-27091152-0 Container image "quay.io/weaveworks/kured:1.1.0" already present on machine
Normal Created 40m (x23 over 22h) kubelet, aks-qube-27091152-0 Created container
Normal Started 40m (x23 over 22h) kubelet, aks-qube-27091152-0 Started container
Logs
time="2018-11-08T14:11:05Z" level=info msg="Commanding reboot"
time="2018-11-08T14:11:05Z" level=warning msg="nsenter: can't execute '/bin/systemctl': No such file or directory" cmd=/usr/bin/nsenter std=err
time="2018-11-08T14:11:05Z" level=fatal msg="Error invoking reboot command: exit status 127"
Hi,
I've just opened a PR - helm/charts#6470 - to add a helm chart for kured, with a view to making kured easy to helm-install.
I thought it'd be sensible for me to create an issue here for two reasons:
I have noticed a bunch of pull requests fixing multiple issues which are not merged into the project for a while. Is it possible to have a better repo maintenance? Thanks!
Hello there,
as a possible feature, can you add an option so that reboots can be constrained in a specific time window?
This way I would be able to have the reboots, let's say, happening only at night time.
Thank you
kubectl drain
should not in theory kill the kured pod as it's a DaemonSet; we rely on this behaviour, because after the drain operation is complete we need to command the reboot. We have however experienced the kured pod being killed during drain when the embedded version of kubectl
is too different to the server (specifically, with kubectl
1.7.x against server 1.9.x), resulting in a never ending cycle of lock/drain/restart/unlock without the reboot actually occurring.
Possible fixes:
terminationGracePeriodSeconds
that we can complete (problem: how long is long enough?)In case of multiple nodes asking to restart, pods will be moved more often then needed, because they can start on the node that will be rebooted next.
The use of PreferNoSchedule on nodes waiting to be rebooted, will prefer scheduling pods onto already rebooted nodes.
link: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
Feature request: Please provide images for other architectures (more than amd64) as part of your release.
kured needs to run on every node, and for my cluster that means amd64, arm, and arm64. I can build those myself, but it would be better if other similar users could also benefit from some centralised effort to support other archs. Thanks :)
Just a simple information to let you know that I created a daemon to reboot after some uptime.
https://github.com/barpilot/kured-toujours
It's really simple and alpha for now, but it can be useful for some people.
It's mainly used to reboot periodically every node and avoid some deadlock with docker, mounts… with the power of kured.
Current locking approach requires the kured job to have the RBAC privilege to be able to update its own DaemonSet. This means a compromised kured job could redefine itself to (eg) elevate host access further(*).
Better (from an RBAC pov) would be to use a "harmless" resource type to store the lock annotation. A ConfigMap (empty/dedicated for this purpose) would be ideal.
(*) I acknowledge that kured also needs other scary privileges (patch nodes) in order to conduct a drain/uncordon, and these won't be improved by this suggestion. Small steps.
Currently for a Cluster running Istio, Node draining can be blocked by Kubernetes due to PodDisruptionBudget policy configuration. When this occurs, Kured until the lock timeouts.
Kured should ignore this error and reboot the Node anyway.
time="2019-07-24T14:28:47Z" level=warning msg="error when evicting pod \"istio-galley-75466f5dc7-cz7sd\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget." cmd=/usr/bin/kubectl std=err time="2019-07-24T14:28:52Z" level=warning msg="error when evicting pod \"istio-telemetry-66fbcd998b-ts27t\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget." cmd=/usr/bin/kubectl std=err time="2019-07-24T14:28:52Z" level=warning msg="error when evicting pod \"istio-sidecar-injector-6f4c67c6cd-62clr\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget." cmd=/usr/bin/kubectl std=err
It may be operationally interesting to have a finer grained lock than the global lock on the DaemonSet in case a specific node needs to stay up for a reason or another.
Since this change in docker the use of docker login
needs to be changed from
docker login -u $USER -p $PASS
to
echo $PASS | docker login -u $USER --password-stdin
(Since TravisCI attaches a terminal it will block forever. @stefanprodan brought this up.)
I know there is integration with prometheus to deal with when alerts are firing. Is there any way to do the same functionality against Sysdig?
This morning I'm getting this error when trying to pull the kured docker image.
sudo docker pull quay.io/weaveworks/kured:master-5731b98
Error response from daemon: Get https://quay.io/v2/weaveworks/kured/manifests/master-5731b98: unknown: Namespace weaveworks has been disabled. Please contact a system administrator.
See https://quay.io/repository/weaveworks/kured/manifest/sha256:bee4241779a29a7f7f6a8e5de7a8a5d4042317236ab8ee6d60d50832ad9e55ed?tab=vulnerabilities for quay.io's automated assessment of the most recent public build (oddly-tagged, but someone has already posted that issue: #33)
Might some of these vulnerabilities be helped by using newer versions of the dependencies listed in https://github.com/weaveworks/kured/blob/master/Gopkg.toml ?
The use of a file such as /var/run/reboot-required is specific to Ubuntu family distros, but such a flag does not get set for RHEL/Centos distros using Yum/RPM for package management. In those environments, the yum-utils
package is installed, then the needs-restarting -r
command can run to detect the need for a reboot when its exit code is 1.
Suggest adding an optional argument to kured
that provides a command to be executed, If the command exists and the exit code from that command is non-zero, then trigger a reboot the same as if the sentinel file exists. The default value for the option, if provided, would be needs-restarting -r
.
Reference: How can I check from the command line if a reboot is required on rhel or centos
An alternative approach would be include a short script along the lines of the following that runs when installed on a RHEL/Centos system:
[[ -x /bin/needs-restarting ]] && needs-restarting -r >/dev/null || touch /var/run/reboot-required
Hello,
I was wondering if there is a way to specify prometheus ssl certificates a long with prometheus-url. Actually, kured is encountring the following error when trying to contact Promehtues:
kured-2wksk kured time="2018-12-27T09:04:02Z" level=warning msg="Reboot blocked: prometheus query error: Get https://monitoring-prometheus:9090/api/v1/query?query=ALERTS&time=2018-12-27T09%3A04%3A02.438586828Z: x509: certificate signed by unknown authority"
It would be great to be able to define a timeout for pod eviction.
If drain did not succeed within the timeout kured would stop trying to evict pods and release the lock.
This would be handy when pod disruption budgets does not allow eviction.
Regards.
In our project we stop our VMs over night (9 PM to 7 AM) to save money. This configuration applies to the VMs for our AKS clusters as well.
Last friday at 4/5 AM we got a message in our slack channel that kured has rebooted the cluster nodes. That overlaps with the time frame when a node shouldn't be up. I checked it in the Azure Portal and the VMs were stopped successful the day before. They were down when the reboot happened.
Does anyone have any idea what might have happened?
AKS-Version: 1.13.5
Helm Chart Version: 1.3.1
Kured version: 1.2.0
I'm using Kured on Azure with an ACS Engine generated cluster, and I can see that nodes are being drained and refilled but it looks like they are not being rebooted.
For example, a reboot-required was set on 23:43 on April 13th for node k8s-agents-27478824-4:
$ ls -al
...
-rw-r--r-- 1 root root 0 Apr 13 23:43 reboot-required
...
And I see Kured triggering: draining and refilling nodes with pods:
$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
cassandra cassandra-cassandra-0 1/1 Running 1 3d 10.30.0.37 k8s-agents-27478824-3
cassandra cassandra-cassandra-1 0/1 Pending 0 6s
...
Sadly, this seems to happen EVERY hour without fail. Digging into this it looks like this is because the nodes are actually not being rebooted:
$ last reboot
reboot system boot 4.13.0-1011-azur Fri Apr 13 23:17 still running
reboot system boot 4.13.0-1011-azur Sun Apr 8 19:21 still running
(Note that the last reboot time is before the timestamp of the reboot-required)
Is there something I need to do with Kured in order to tell it how to reboot nodes etc.? Or is this a bug?
Running K8s 1.11.2 on Azure Kubernetes Service and getting the errors with RBAC applied ( across all pods in the daemonset); using the yaml manifests under the master branch of kured repo.
time="2018-10-19T13:13:20Z" level=info msg="Kubernetes Reboot Daemon: master-b86c60f"
time="2018-10-19T13:13:20Z" level=info msg="Node ID: aks-node"
time="2018-10-19T13:13:20Z" level=info msg="Lock Annotation: kured/kube-system:weave.works/kured-node-lock"
time="2018-10-19T13:13:20Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 24h0m0s"
time="2018-10-19T13:13:20Z" level=fatal msg="Error testing lock: daemonsets.extensions \"kube-system\" not found"
kubectl get clusterrole kured
NAME AGE
kured 37m
kubectl get clusterrolebinding kured
NAME AGE
kured 38m
kubectl get role kured -n kube-system
NAME AGE
kured 41m
kubectl get rolebinding kured -n kube-system
NAME AGE
kured 42m
kured is now referenced via https://docs.microsoft.com/en-us/azure/aks/concepts-security#node-security so would be good to get it working on aks :)
An interesting feature for me would be to be able to shutdown instead of rebooting the node. Running on AWS, I would like to let the ASG replace my instance once it has been shutdown.
Hi,
I would like to use Microsoft Teams instead of Slack for drain/reboot notification. Implementation would be a lot similar as it's also using a simple webhook with a custom JSON structure.
Would you accept a PR for that?
thanks
Just in case someone tries the route of trying kured on rancheros and gets those errors:
time="2019-03-15T14:42:22Z" level=warning msg="nsenter: can't execute '/usr/bin/test': No such file or directory" cmd=/usr/bin/nsenter std=err
time="2019-03-15T14:42:52Z" level=warning msg="nsenter: can't execute '/usr/bin/test': No such file or directory" cmd=/usr/bin/nsenter std=err
time="2019-03-15T14:42:52Z" level=info msg="Reboot not required"
time="2019-03-15T14:43:22Z" level=warning msg="nsenter: can't execute '/usr/bin/test': No such file or directory" cmd=/usr/bin/nsenter std=err
time="2019-03-15T14:43:52Z" level=warning msg="nsenter: can't execute '/usr/bin/test': No such file or directory" cmd=/usr/bin/nsenter std=err
time="2019-03-15T14:43:52Z" level=info msg="Reboot not required"
time="2019-03-15T14:44:22Z" level=warning msg="nsenter: can't execute '/usr/bin/test': No such file or directory" cmd=/usr/bin/nsenter std=err
It is because RancherOS has its PID:1 as the system-dockerd, which does not have much inside, and does not have the sentinel file (it would be in the os-console system container)
1 root system-dockerd --restart=false --log-opt max-file=2 --log-opt max-size=25m --pidfile /var/run/system-docker.pid --userland-proxy=false --bip 172.18.42.1/16 --config-file /etc/docker/system-docker.json --exec-root /var/run/system-docker --group root --host unix:///var/run/system-docker.sock --graph /var/lib/system-docker --storage-driver overlay2
It would be nice if kured
supported having an annotation on the node instead to have it rebooted (I know there is one to prevent reboot, but none to force, just the sentinel).
There are times in which we want to reboot nodes and a alert is blocking a reboot. instead of having to restart kured to add the alert in regex to allow reboot. If alerts are grabbed via prometheus alertmanager , this alert can be silenced via alertmanager and reboot will occur.
Pull request #42 adds this feature
Let me know what else is needed to accept this PR
Thanks
Proposal: Allow to pass icinga URL and host/service for alerts. Kured will query icinga API and if there is an active alert on the specific host/service - the node will not be rebooted.
Useful for users (like us) who use icinga/nagios for monitoring.
The Docker image tags available at https://quay.io/repository/weaveworks/kured?tab=tags are challenging. Based on the tag name or information given I can hardly identify what version is running. Especially difficult considering upgrading a Kubernetes cluster to a new version. In this case I would like to update kured to a newer version as well; preferably to the respective kubectl version. Docker tags would help to identify the right version of kured to deploy.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.5", GitCommit:"cce11c6a185279d037023e02ac5249e14daa22bf", GitTreeState:"clean", BuildDate:"2017-12-07T16:05:18Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl -n kube-system annotate ds kured weave.works/kured-node-lock='{"nodeID":"manual"}'
The DaemonSet "kured" is invalid: metadata.annotations: Invalid value: "weave.works~1kured-node-lock": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')
This would facilitate a fail-safe RebootRequired alert that would warn us if the reboot daemon is failing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.