I'm having trouble recovering from this scenario. Have a three

I can't reproduce this. I've added a unit test in <a class="issue-link js-issue-link"

I'm confused by your deion. I assume that with "leader electio

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Lose leader on down cluster about dqlite HOT 5 CLOSED

canonical commented on May 24, 2024

Lose leader on down cluster

from dqlite.

Comments (5)

freeekanayaka commented on May 24, 2024

I can't reproduce this. I've added a unit test in canonical/go-dqlite#67 which performs exactly the steps you listed, and it seems to work fine.

I've also tried with the dqlite-demo and it also works:

free@x1:~$ rm -rf /tmp/dqlite-demo
free@x1:~$ dqlite-demo start 1 &
[1] 54920
free@x1:~$ dqlite-demo start 2 &
[2] 54929
free@x1:~$ dqlite-demo start 3 &
[3] 54938
free@x1:~$ dqlite-demo add 2
free@x1:~$ dqlite-demo add 3
free@x1:~$ dqlite-demo update foo bar
done
free@x1:~$ kill -TERM %1
free@x1:~$ kill -TERM %2
[1]   Done                    dqlite-demo start 1
free@x1:~$ kill -TERM %3
[2]-  Done                    dqlite-demo start 2
free@x1:~$ dqlite-demo start 2 &
[4] 54994
[3]   Done                    dqlite-demo start 3
free@x1:~$ dqlite-demo start 3 &
[5] 55011
free@x1:~$ dqlite-demo update foo egg
done
free@x1:~$ dqlite-demo query foo
bar

This is with all libs built from master, but it shouldn't make a difference.

If you are using the dqlite-demo too as test backend (since I remember you were doing that at some point), are you sure you correctly added the second and third node to the cluster? As in:

dqlite-demo add 2
dqlite-demo add 3

from dqlite.

ibuildthecloud commented on May 24, 2024

I haven't tested with dqlite-demo, this is based on testing with dqlite in k3s. As long as I know this should work I'll keep debugging. I'm sure it's some issue on my side.

The thing that is tricky with running k8s on dqlite is there are a lock of health checks and leader election code in k8s. So when I kill the dqlite leader, the followers seem to always commit suicide because the leader election heartbeat fails. So losing the leader causes all servers to restart. After restart the surviving two followers never seem to elect a master.

If I kill a non leader everything works as expected because there is no significant interruption in the ability to perform a write.

from dqlite.

freeekanayaka commented on May 24, 2024

I'm confused by your description.

I assume that with "leader election in k8s" you mean the logic to elect a leader for components such as the controller-manager, right? That leader election logic should be completely oblivious of what happens in etcd or kine/dqlite (don't know all the details of the latter, but I will treat it as semantically 100% equivalent to etcd): it just uses an API that the storage backend exposes. The storage backend is supposed to be HA, so whatever happens to its leader, it should not affect leader election in k8s: a new leader will be elected in the storage backend and the k8s code using those leader election APIs should retry whatever it was attempting to do (grab a lock/lease or whatever).

In dqlite followers don't commit suicide, they just start a new election if they don't hear back from the leader within a timeout. So I assume that when you write "followers seem to always commit suicide" you mean k8s followers (e.g. a non leader controller-manager). It'd be interesting to understand under which circumstances they decide to shutdown the process: as long as you don't kill a majority of dqlite nodes, whatever dqlite request they were (indirectly) performing can be retried and will eventually succeed.

from dqlite.

ibuildthecloud commented on May 24, 2024

@freeekanayaka Sorry, that last statement was confusing but I think you understood it correctly. There is leader election in k8s that is done by just writing a storage heartbeat. If dqlite loses quorum writes fail and then k8s leader election dies. In general if a leader fails to write it's heartbeat, the process dies.

But I figured out what is going on and it's completely different than I thought. Sorry for all the confusion, I'm just testing a lot of abusive failure and recovery scenarios.

What has happened is that I have nodes A, B, and C. All three are running, but what I didn't realize is that if I connect to each node the members reported is as follows:

A => [A, B, C]
B => [A, B, C]
C => []

So C is not actually in the cluster. So what I was doing was testing killing nodes, A is leader, I assumed C was not. If I killed C things survived. If I killed A, the cluster went down. Obviously killing A was losing quorum.

My question now is why isn't C in the cluster and how can I detect a situation when one member is not replicating the log. At one point C was working fine, but some scenario I obviously broke it. I've deleting, adding, killing nodes in fairly chaotic ways and also breaking quorum quite a bit. If I look at the data dir for C I just have metadata1 and metadata2.

from dqlite.

freeekanayaka commented on May 24, 2024

@freeekanayaka Sorry, that last statement was confusing but I think you understood it correctly. There is leader election in k8s that is done by just writing a storage heartbeat. If dqlite loses quorum writes fail and then k8s leader election dies. In general if a leader fails to write it's heartbeat, the process dies.

Yes, the write fails and you get back an error from whatever db.Exec() or tx.Exec() call you invoked. At that point tho I'd expect either kine or k8s to notice that this is error is driver.ErrBadConn and retry. Assuming that a majority of dqlite nodes is still online, this failed write will only be a short glitch, in a matter of a few milliseconds (exact amount depends mostly on the value of the dqlite.WithNetworkLatency() option passed to dqlite.New()) a new leader will be elected and a retry will succeed. I presume the k8s component at hand doesn't immediately commit suicide, but only after a certain amount of retries?

But I figured out what is going on and it's completely different than I thought. Sorry for all the confusion, I'm just testing a lot of abusive failure and recovery scenarios.

What has happened is that I have nodes A, B, and C. All three are running, but what I didn't realize is that if I connect to each node the members reported is as follows:

A => [A, B, C]
B => [A, B, C]
C => []

So C is not actually in the cluster. So what I was doing was testing killing nodes, A is leader, I assumed C was not. If I killed C things survived. If I killed A, the cluster went down. Obviously killing A was losing quorum.

Right, so this is essentially what I suspected in my first comment to this issue. One of the node was not actually part of the cluster.

My question now is why isn't C in the cluster and how can I detect a situation when one member is not replicating the log.

Well, there's nothing to detect. From the moment your call to client.Add() returns successfully, you can be sure that the node that you've just added will be part of the cluster and replicate data or act as leader. From the moment you call client.Remove() that node won't be part of the cluster. Your application is in control of membership, so in that sense there's nothing to detect.

At one point C was working fine, but some scenario I obviously broke it. I've deleting, adding, killing nodes in fairly chaotic ways and also breaking quorum quite a bit. If I look at the data dir for C I just have metadata1 and metadata2.

If C was ever part of the cluster (i.e. its ID and address were registered with client.Add()) and if there was a time were both C and the current leader were alive at the same time, then C would have received data. If it only has those two files it means it was never part of the cluster, or its data directory was wiped.

from dqlite.

Lose leader on down cluster about dqlite HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent