To implement a custom I/O backend ( struct raft_io ), a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

IOBE: no procedure provided to prepare the first (configuration) log entry about raft HOT 6 CLOSED

canonical commented on August 23, 2024

IOBE: no procedure provided to prepare the first (configuration) log entry

from raft.

Comments (6)

freeekanayaka commented on August 23, 2024

@amiloradovsky you're right.

The reason I didn't yet add encode/decode functions to the public API is that I'm not entirely sure if the encoding should instead be customizable.

However, since the public API of this raft library has not reached yet a 1.0 stable version, I think it's fine expose the function you need in order to get unblocked, with the understanding that interfaces might change between now and the final 1.0 version.

I think I asked you already, but I didn't get a reply: would you mind to elaborate why you need to implement a custom raft_io backend? That's one of the hardest parts of the library and a lot of work to get done right. I'd like to understand what use cases are currently not met by the stock backend based on libuv.

from raft.

amiloradovsky commented on August 23, 2024

@freeekanayaka Ah, yes, sorry for the silence. There are a lot of reasons, small and big:

The libuv-based IOBE seems flakier than the rest of the library: GCC and Valgrind warnings lead to there; when the application crashes, it seems to be due to an unitialized values in there (hard to follow).
Also, I heed to use (C)ZMQ in the other parts of the system, so it would be nice to be able to just drop libuv as a (relatively big) dependency.
The libuv IOBE persistent storage needs to be cleared (rm <dir>/*) before each run, or it won't bootstrap.
The burden of aligning the message size is put on the user (me) and not handled automatically somewhere under the hood.
It doesn't support IPv6, or any other protocols than TCP/IPv4.
I'm not interested in either Windows or the mobile OSes, and would prefer less complexity.
Also, libuv is distributed under a soup of BSD-like licenses, I'd prefer something more standard, like ASL or MPL.
etc. etc.

Presently, I still use the provided libuv IOBE, because my own implementation needs (a lot of?) debugging (for instance, it can't elect a leader, despite all the messages seem to be sent and received, and the ticks being on time) and further polishing (use CZMQ frames instead of manual pointer shifts), and this isn't a priority right now. Nevertheless, I'm planning to finish it later, to get a smaller and more closed system. Or be able to compare my own implementation against another (for testing etc.).

As for the practical application, it's for an in-cluster synchronization for a storage system.

from raft.

freeekanayaka commented on August 23, 2024

@freeekanayaka Ah, yes, sorry for the silence.

No worries! Thanks for following up.

* The libuv-based IOBE seems flakier than the rest of the library: GCC and Valgrind warnings lead to there; when the application crashes, it seems to be due to an unitialized values in there (hard to follow).

Yeah, I don't think there are any memory leaks or actual memory issues in practice, since the libuv IOBE is fairly well unit tested and was used extensively in production LXD systems for quite a bit now.

That being said, I agree that IOBE needs more work to get up the same quality level as the core system. One reason is partly how complicated it turns to be to actually manage asynchronous I/O, especially in C. Still learning myself, expect improvements in the libuv backend in the coming weeks and months.

* Also, I heed to use (C)ZMQ in the other parts of the system, so it would be nice to be able to just drop libuv as a (relatively big) dependency.

That's understandable.

* The libuv IOBE persistent storage needs to be cleared (`rm <dir>/*`) before each run, or it won't bootstrap.

I don't understand this. You surely don't want to bootstrap a single raft instance more than once? So why do you need to rm <dir>/*? I'm confused.

* The burden of aligning the message size is put on the user (me) and not handled automatically somewhere under the hood.

I'm not sure to understand this either, mind elaborating?

* It doesn't support IPv6, or any other protocols than TCP/IPv4.

Yeah, my wish would be at least implement TLS support.

* I'm not interested in either Windows or the mobile OSes, and would prefer less complexity.

Sounds good too. Although libuv on Linux is really a thin and neat wrapper over epoll and doesn't add much complexity imo.

* Also, libuv is distributed under a soup of BSD-like licenses, I'd prefer something more standard, like ASL or MPL.

This really depends on the use you have to make of it. BSD is a fairly liberal license, so I'd be surprised if it creates any issue for your project. However I understand wanting clarity of licenses for your dependencies.

As for the practical application, it's for an in-cluster synchronization for a storage system.

That sounds very cool. I'm happy to support you in whatever you need, as I'd really like to see another consumer of this raft library, beyond dqlite.

One thing I'd would ideally like to do if I had time would be to breakdown the libuv IOBE into separate re-usable components and abstractions. So for example when one wants to implement a new backend, only things like low-level transport mechanisms and disk persistence would need to be implemented. But I think that's quite a bit of work to do right.

Probably my suggestion would be to stick to the libuv IOBE for now, if you can, then when its quality and internal code becomes more mature over the next weeks or months you might take a chance to basically copy-paste it and replace all libuv specific things with whatever you want. That might probably give us a better clue what parts need to be abstracted in order to make it easier to implement new backends.

from raft.

amiloradovsky commented on August 23, 2024

@freeekanayaka As for removing the persistent data, that's simply the solution I've found to make bootstrap happy. Maybe I should just check first, and bootstrap only if the directory is empty.

As for the alignment, I just can't add an entry which size isn't multiple of 8. Sure that's justifiable by the memory allocation & access issues. But that should be handled elsewhere.
Currently, I have to place the actual length of the message in the first few octets, then the message itself (unaligned), then zero the rest, and only then can submit to apply it.
I guess these problems should be resolved implicitly, behind the public API. So one could just call raft_apply with a buffer of any length, not necessarily multiple of something, and it would be only advisory to have the buffer size already aligned (no need for another *alloc and mem(p)cpy).

I have a few minor fixes for the warnings. Just to be sure everything's okay there, and nothing really important is drowning in the flood. Plus, e.g. Nixpkgs have -Werror enabled by default.

And also it may be a good idea to bump the version number a little. So I could update the Guix port (they don't like when packages are versioned by commit ids).

And finally, I need a liveness monitoring mechanism implemented somewhere, and thought it could be fitted relatively easily into libraft itself:
Not only the leader has to send the heartbeats to the followers, but also the followers have to send a heartbeat to the leader, or it will add an entry saying the silent node is out (and that in turn will lead to a call back, provided by the FSM). But it's likely the topic for another "issue" (feature request).

from raft.

freeekanayaka commented on August 23, 2024

@freeekanayaka As for removing the persistent data, that's simply the solution I've found to make bootstrap happy. Maybe I should just check first, and bootstrap only if the directory is empty.

Yeah the idea is that you run raft_bootstrap once to set the initial cluster configuration, and then you don't need to run it anymore, since the configuration is stored in the log. In many cases applications will have their one out-of-band mechanism to detect if raft_bootstrap was already called or not (for instance they initialize other files at the first start, so if those files are present the can guess that raft_bootstrap was already invoked). However, if you don't have such mechanism or don't want to put one in place, when you restart a node you can always safely invoke raft_bootstrap and if it returns RAFT_CANTBOOTSTRAP you just ignore the error and go on. That's what the code in example/server.c does.

As for the alignment, I just can't add an entry which size isn't multiple of 8. Sure that's justifiable by the memory allocation & access issues. But that should be handled elsewhere.
Currently, I have to place the actual length of the message in the first few octets, then the message itself (unaligned), then zero the rest, and only then can submit to apply it.
I guess these problems should be resolved implicitly, behind the public API. So one could just call raft_apply with a buffer of any length, not necessarily multiple of something, and it would be only advisory to have the buffer size already aligned (no need for another *alloc and mem(p)cpy).

Yeah maybe. As you figured, I've opted to require the alignment so that the IOBE implementation can do zero copy I/O. However, if the application is fine having some more overhead that can be relaxed.

When deciding an encoding format for your application, or for your IOBE, please consider that network or disk bandwidth is usually not an issue (because unless you have huge data to transfer you'll be constraint more by latency than bandwidth), so it makes sense to use 8 full bytes as header for your message metadata (e.g. length, plus any other things you might need now or in the future), followed by the actual message payload. So the minimum size of a message would be 16 and everything would be aligned (header and payload). The performance of your system will most probably be the same.

Hope I have understood what you meant.

I have a few minor fixes for the warnings. Just to be sure everything's okay there, and nothing really important is drowning in the flood. Plus, e.g. Nixpkgs have -Werror enabled by default.

And also it may be a good idea to bump the version number a little. So I could update the Guix port (they don't like when packages are versioned by commit ids).

Ok, I have a few changes in the pipeline too. Please make a PR with those fixes when done, so I'll merge everything and then cut a new release.

And finally, I need a liveness monitoring mechanism implemented somewhere, and thought it could be fitted relatively easily into libraft itself:
Not only the leader has to send the heartbeats to the followers, but also the followers have to send a heartbeat to the leader, or it will add an entry saying the silent node is out (and that in turn will lead to a call back, provided by the FSM). But it's likely the topic for another "issue" (feature request).

This is interesting. I think most application will indeed need such mechanism, even it's not required by raft. However, it feels it should be left either as some sort of optional extension or maybe a separate system built on top of libraft.

from raft.

amiloradovsky commented on August 23, 2024

@freeekanayaka Thanks. I'll see what I can do to handle the bootstrap properly, and align the messages' size from the start.

As for the (a)liveness, I guess there need to be not one but two new entry types: one for when a node in the configuration starts sending the heartbeats, becomes actually present, and one for when it ends doing so. I need to think more about how it should be done right.

from raft.

IOBE: no procedure provided to prepare the first (configuration) log entry about raft HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent