Even though we've enabled some kind of HA solution with seri

Manifest for the whoami service: <div class="highlight highlight-source-yaml notra

Here are an example from the <a href="https://docs.ansible.com/ansible/latest/modules/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Downtime during upgrades about kubernetes-the-right-way HOT 16 CLOSED

amimof commented on June 23, 2024

Downtime during upgrades

from kubernetes-the-right-way.

Comments (16)

anton-johansson commented on June 23, 2024 1

Aha. I had a cluster of two nodes. I had flannel, CoreDNS, NGINX Ingress controller and a simple whoami service installed. Two replicas.

I had this command active while running the upgrade:

while true; do curl 'http://whoami.anton-johansson.local'; sleep 0.1; done

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024 1

Manifest for the whoami service:

---
kind: Namespace
apiVersion: v1
metadata:
  name: whoami

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: whoami
  namespace: whoami
  labels:
    app.kubernetes.io/name: whoami
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: whoami
  template:
    metadata:
      labels:
        app.kubernetes.io/name: whoami
    spec:
      securityContext:
        runAsUser: 1000
      containers:
        - name: whoami
          image: containous/whoami:v1.0.1
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          args:
            - '-port'
            - '12345'

---
kind: Service
apiVersion: v1
metadata:
  name: whoami
  namespace: whoami
  labels:
    app.kubernetes.io/name: whoami
spec:
  selector:
    app.kubernetes.io/name: whoami
  ports:
    - port: 8080
      targetPort: 12345
      protocol: TCP

---
kind: Ingress
apiVersion: networking.k8s.io/v1beta1
metadata:
  name: whoami
  namespace: whoami
  labels:
    app.kubernetes.io/name: whoami
  annotations:
    kubernetes.io/ingress.class: external
spec:
  rules:
    - host: whoami.anton-johansson.local
      http:
        paths:
          - path: /
            backend:
              serviceName: whoami
              servicePort: 8080

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024 1

Oh, that explains why I thought I was going crazy then. Nice find!

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024 1

I've decided to perform the upgrade on each worker node individually, where I can perform pre- and post-maintenance actions, like so:

# Upgrade all masters, in an orderly fashion
$ ansible-playbook --inventory my-inventory --extra-vars "serial_all=1" --limit "etcd,masters" ~/projects/kubernetes-the-right-way/install.yml

# Upgrade each worker individually
$ kubectl cordon k8s-node-1
$ ./do-something-to-failover-metallb k8s-node-1
$ ansible-playbook --inventory my-inventory --limit "k8s-node-1" ~/projects/kubernetes-the-right-way/install.yml
$ kubectl uncordon k8s-node-1

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024

Here are an example from the pause Ansible module:

# Pause for 5 minutes to build app cache.
- pause:
    minutes: 5

# Pause until you can verify updates to an application were successful.
- pause:

# A helpful reminder of what to look out for post-update.
- pause:
    prompt: "Make sure org.foo.FooOverload exception is not present"

# Pause to get some sensitive input.
- pause:
    prompt: "Enter a secret"
    echo: no

I'm not sure how well this would deal with serial_all though.

from kubernetes-the-right-way.

amimof commented on June 23, 2024

I haven't been able to replicate the problem. I even tried with stopping containerd completely on two nodes. One running a nginx ingress controller, and the other a simple web app. I could still access the web app. However I did run into problems when restarting kubelet which seems to cause downtime. I think this is due to how networking is provisioned though kubelet's with the services. In either case, containers (or processes), are not restarted when either of kubelet or containerd are restarted. However pod connectivity is affected unfortunately.

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024

Hmm, you're probably right. But it's probably not kubelet. kube-proxy is more likely, as that's the component handling the network routing.

Strange that you cannot reproduce though... How did you test your pods during the upgrade process?

from kubernetes-the-right-way.

amimof commented on June 23, 2024

That makes sense. I didn't do an upgrade, just took down containerd and kubelet. I'll do an upgrade and see what happens. Can you share any particular curl command to run during the upgrade? I would appreciate it.

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024

Wait wait wait wait... I have already had this up for discussion before (#48) and you fixed it by restarting services in handlers and waiting for services to come back up (#52).

This issue is related to something else. I'm using MetalLB (layer2 mode) to assign extenral IP addresses for my services. MetalLB layer2 mode isn't active/active, so only one destination is being routed to at any given time. When I stop kube-proxy on that node, it takes about a second for MetalLB to redirect traffic to another node that has the target pod, causing downtime during that second.

I think it's safe to close this issue, however I'm gonna have to research a way to gracefully handle these issues, by telling MetalLB that a node is going into maintenance.

Do you have any ideas or suggestions on the subject?

EDIT:

I found metallb/metallb#494, which seems to describe the issue and a possible solution. It requires quite a bit more from the upgrade process though...

EDIT 2:

Now I'm being super dizzy. When upgrading a cluster today at work, I realized that MetalLB was the cause of the downtime (as described above). When I ran it at home a couple of days ago, I did not have MetalLB and still experienced issues. That I cannot explain. I'll see if it happens next time, when doing the same for v1.16.7.

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024

@amimof Since we use MetalLB and it gives us a short period of downtime, I need a way of handling this when running the playbook.

Alternative 1:
Simply stop and await user input before starting upgrading each node. This allows me to manually and gracefully force MetalLB to announce IP addresses on another host before proceeding. This is the easy alternative, and could easily be configurable (don't await user input by default).

Alternative 2:
Allow hooking into the playbook somehow. Allow running a certain command before the Ansible roles of the node are executed and also allow running a command after the Ansible roles of the node. This would allow me to automatically force MetalLB to switch host. However, I'm not sure how this would be done. Maybe two new roles, pre and post, which can somehow be configured with the inventory file?

Do you have any thoughts?

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024

I've tried reproducing my problem by shutting down kube-proxy and even kubelet and containerd as well, but my services just seem to work anyway. At first, I thought they would suffer a small downtime when MetalLB switched over announcement to another node, but it just doesn't happen like it did when actually upgrading Kubernetes version.

I think I need to dig a bit more into MetalLB to understand how everything works before proceeding here. I'll also see if happens when upgrading our cluster to 1.16.7.

from kubernetes-the-right-way.

amimof commented on June 23, 2024

So I have been doing some testing and I managed to reproduce the problem, but only if upgrading. Which led me to believe that it has something to do with how Kubernetes handles versions of various components. I upgraded a 1.15.10 cluster to 1.16.7.

Then I discovered that if the kubelet is restarted with a version that is newer (or older) then the one before it, it causes the pods on it to restart. But if kubelet is restarted without replacing the binary nothing happens. At this stage I have no idea why this happens.

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024

@amimof Any thoughts on how I can perform this in a "handled" way? A simple pause (wait for input) would suffice, then I could manually failover my MetalLB IP address.

An automatic hook would be ideal, but I'm not sure how that would be built.

from kubernetes-the-right-way.

amimof commented on June 23, 2024

I can't seem to find any solution to this unfortunately. Would it be possible to do your MetalLB failovers before the upgrade. And only run the upgrade on parts of the cluster. A pause function is basically an anti-pattern automation-wise.

from kubernetes-the-right-way.

anton-johansson commented on June 23, 2024

Yeah, I don't see any good solution for a fully automated process here. I have some ideas for a tool that I can use to fail-over (cordon the node being upgraded, delete the pod on the node (if it's currently the MetalLB "master") and uncordon. MetalLB will then announce from another node and I'm free to run the upgrade.

One idea would be to allow running an executable as a pre-step before each node is upgraded? Then you can do whatever you want, sleep, ask for input, force failover, or whatever you like.

Again, not sure how this works with anything other than serial_all=1, but still.

from kubernetes-the-right-way.

amimof commented on June 23, 2024

I guess you could define two node groups in your inventory and run the upgrade twice. One for each group. And you could do the cordon/uncordon in between each run.

from kubernetes-the-right-way.

Downtime during upgrades about kubernetes-the-right-way HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent