Comments (11)
Sorry, there isn't a lot of info in this issue. I think I'm asking if there is somewhere I should look to troubleshoot a bit further. If I can't make any progress I'll try a small use case to reproduce. Right now I'm running this in the full context of k3s.
from dqlite.
I figured out that
Line 652 in b7c20af
from dqlite.
Yeah, error propagation between dqlite and raft is currently sub-optimal. I'm working to improve the situation and provide better diagnostic messages for situations like this.
Hope that hacking around the code helps you to spot what's wrong, otherwise let me know.
from dqlite.
On line https://github.com/canonical/raft/blob/25805d9a7c73c4e401e87ff7df67e50efa5db03a/src/raft.c#L136
recovery checks that the node is unavailable. In my situation the node is in state RAFT_FOLLOWER so recover fails.
The full scenario is this.
- Start two nodes.
- Kill node two
- Node one (because of my application) will kill itself because quorum is lost
- All nodes are shut off
- Start node one in "recovery" mode.
- Node.Recover() fails because node is in FOLLOWER state.
All I'm doing effectively in the code is
node, err := dqlite.New(id, advertiseAddress, dbDir,
dqlite.WithBindAddress(bindAddress),
dqlite.WithDialFunc(dial),
dqlite.WithNetworkLatency(20*time.Millisecond))
node.Start()
node.Recover(singleNode)
from dqlite.
So right now removing the "UNAVAILABLE" check in libraft everything works as expected.
from dqlite.
Okay, the intended way to use Node.Recover()
is before Node.Start()
, see the docstring of the low-level dqlite_node_reover()
C API:
* 1. Make sure no dqlite node in the cluster is running.
(which translates to "make sure that Node.Start()
wasn't called yet").
I think there is some work to do both in improving documentation and reporting better errors.
from dqlite.
Note that removing the "UNAVAILABLE" check would make the API unsafe. It might turn out to work in your specific case only by chance.
from dqlite.
@freeekanayaka I tried not starting the node and I got a connection error. I think maybe I was doing something wrong. I'll try again. Thanks.
from dqlite.
@freeekanayaka Thanks, I found the issue. Prior to doing the reset I was contacting the node to find the surviving members ID. Due to my lack of understanding of IDs I didn't want to assume on the client side I had the right ID for the surviving node. I'm guessing I don't have to do that? In my use case I only support resetting the cluster to one member. In that situation I assume I can just hard code the ID to 1? Actually is ID 1 required?
from dqlite.
@freeekanayaka Thanks, I found the issue. Prior to doing the reset I was contacting the node to find the surviving members ID.
I'm not surely to understand what you mean here.
Due to my lack of understanding of IDs I didn't want to assume on the client side I had the right ID for the surviving node. I'm guessing I don't have to do that?
I'm not clear about this as well.
In my use case I only support resetting the cluster to one member. In that situation I assume I can just hard code the ID to 1? Actually is ID 1 required?
The use case of resetting the cluster to one member should be indeed be one of the most frequent and should also be pretty straightforward to handle. As long as you:
- Use
1
for theid
parameter ofdqlite.New()
- Use
1
for theID
attribute of the singledqlite.NodeInfo
object that you pass toNode.Recover()
you should be good, regardless of what ID the node actually had before.
from dqlite.
Thanks, everything is working now.
from dqlite.
Related Issues (20)
- Moving sqlite3_step and other database operations off the main thread HOT 8
- Expose option to disable/enable raft snapshot compression
- Handle INTERRUPT request HOT 3
- stderr of server threads swallowed during integration tests HOT 2
- Proposal: stop trying to handle OOM HOT 1
- Investigate growing memory usage found by microk8s benchmarking HOT 2
- Consider shipping a tiny "manifest" binary to print information about the dqlite installation
- install instructions don't work HOT 4
- Confusing error message when trying to run a query statement with Exec HOT 1
- Recommended way to perform schema migrations HOT 4
- Raft uv_timer leak when creating and destroying node HOT 1
- Cluster-wide configuration of target voter/standby count
- Idea: pass a socket instead of dqlite_node_set_bind_address HOT 4
- Implement DQLITE_VISIBLE_TO_TESTS properly, or get rid of it HOT 1
- Git layout for v1.15.0 HOT 8
- Write operations that immediately follow write operations sometimes cause a disk I/O-error, followed by loss of leadership and high latency HOT 3
- Support the RETURNING clause HOT 2
- Can we use the unix-excl VFS? HOT 9
- Consider periodically using VACUUM to reduce memory footprint HOT 9
- Not Leader failure response HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dqlite.