Code Monkey home page Code Monkey logo

Comments (9)

freeekanayaka avatar freeekanayaka commented on August 23, 2024 3

I am currently working on making a good and safe abstraction/wrapper of libraft in Rust that I will probably release under the name canonical-raft.

The story of replicated system in Rust is worse than the one in Go where the most popular library to replicate states over the network is the hashicorp/raft library. The power of it is mainly comes from the ease of use of it: you only have to implement one single and simple FSM interface and you are done.

I would love to be able to make that easy in Rust, it would highly empower the language!

Yeah, there's no particularly good and mature rust implementation of Raft, that I know of indeed. I had to pick C because this lib is mainly used for replicating SQLite, which is in C, but I had also considered rust back then.

In a second time I would like to create a pure Rust implementation of a raft_io backend that I will be able to include with the library. It would be easier to integrate in the async ecosystem of Rust than the uv based one and it would be more cross-platform too.
Do you think it could be possible?

This is interesting, one of my long term wishes would actually be to essentially use this C implementation as basis to code a pure rust one, which would be a very close "translation" of this code. The design would be the same: single thread with fully asynchronous I/O. I suspect it might be easier than it sounds.

Starting to implement the raft_io backend sounds like a perfectly sane incremental approach.

There's a big gotcha here tho that I'd like to point out. I've learned a lot of lessons with this initial raft implementation, and while I believe the current API is quite reasonable, I'd like to change it and get to what would be a final stable 1.0 API.

The API I have in mind would probably be even more friendly to cross-language interoperability, because the core would not use callbacks as in the current API, but would rely more on I/O queues that the I/O backend would consumes (a bit like io_uring or etcd/raft, if you will). I'd also like to decouple most of the persistence logic from libuv, so a custom I/O backend could reuse it (although that part would still be outside of core).

I've started working on this here https://github.com/freeekanayaka/raft/tree/decouple-from-uv , there shouldn't be much change in the internal logic, as it's mostly an API-level exercise. I don't think I'll make fast progress on this plan though because I'm working on it on my spare time right now.

Sorry for the long explanation, but the take away is that if you want to implement a stable I/O backend for this library, perhaps you might want to consider waiting (if you are not in a hurry). In addition to that, after I complete the work on the 1.0 API I have in mind, I would very highly recommend anyone implementing a custom I/O backend to use the backend-independent bits of the disk persistence and format that I'd like to put in place.

All that being said, you could of course code you raft_io now, but if you want to keep the same disk format/design as the current libuv backend (which again I recommend, see also below) that will be a lot of work. Also you'll have to be prepared to adapt your code to the new 1.0 API down the road.

The last thing that I have seen is that the hashicorp/raft library comes with an official MDBStore backend, as I am the author of heed: a safe typed LMDB wrapper in Rust, I will probably make an equivalent store on top of LMDB using heed.
Does that make sense to you?

In my opinion the disk persistence story is one point where this raft library shines over hashicorp/raft. I believe the stock storage implementation of hashicorp/raft uses boltdb, and there's the additional LMDB implementation. Both of them are key-value stores, and as such they are actually not such a great fit for implementing a raft storage backend IMO. Essentially because they add functionality that you don't need, and which probably hurts performance (I don't have numbers). Also, botldb and most probably also LMDB are not using asynchronous and non-blocking disk I/O, something which is very important these days to get high performance in a multi-core SSD-backed world.

The storage implementation of the libuv backend in this raft library is based on the original design of the Raft author (see here, which is also the design of the other very popular Go implementation, etcd/raft.

The reason why hashicorp/raft went for boltdb or LMDB is probably just because they were off-the-shelve components that they could quickly re-use.

I would like to thank you about the great work you have done here!
I was searching for a Raft or Paxos simple library to make our search engine MeiliSearch replicated for too long now!

Thank you too! I looked at your project and it looks amazing :)

from raft.

Kerollmops avatar Kerollmops commented on August 23, 2024

This is interesting, one of my long term wishes would actually be to essentially use this C implementation as basis to code a pure rust one, which would be a very close "translation" of this code. The design would be the same: single thread with fully asynchronous I/O. I suspect it might be easier than it sounds.

If you do your Rust port, please don't use a specific async runtime and make your library generic over it. It seems like you already think this way and I'm glad to see that :)

Sorry for the long explanation, but the take away is that if you want to implement a stable I/O backend for this library, perhaps you might want to consider waiting (if you are not in a hurry).

That makes sense, I think that the first release of canonical-raft would be backed by the libuv driven implementation anyway.

In addition to that, after I complete the work on the 1.0 API I have in mind, I would very highly recommend anyone implementing a custom I/O backend to use the backend-independent bits of the disk persistence and format that I'd like to put in place.

How do you think it will work? How will you design it? Will the user be able to provide functions that take, for example, a file descriptor and a raft_buffer for example? Just to be sure the design can fit in the Rust async ecosystem (like mio or anything driven by epoll/kqueue...).

In my opinion the disk persistence story is one point where this raft library shines over hashicorp/raft. I believe the stock storage implementation of hashicorp/raft uses boltdb, and there's the additional LMDB implementation. [...] Also, botldb and most probably also LMDB are not using asynchronous and non-blocking disk I/O, something which is very important these days to get high performance in a multi-core SSD-backed world.

Makes complete sense to me I'll keep your disk persistence work. That is really kind of you to advise me on that!


Just a little explanation about why I am not a big fan of using libuv as the async loop: this is because it means that you have to install a third-party library to compile and run your program. I would prefer to have a batteries included library.

I found out that libuv is a user space library, therefore a vendored version of it could be used, the last missing bit so far is to make the AIO dependency optional therefore making it available on every platform and without any third-party dependencies.

from raft.

freeekanayaka avatar freeekanayaka commented on August 23, 2024

This is interesting, one of my long term wishes would actually be to essentially use this C implementation as basis to code a pure rust one, which would be a very close "translation" of this code. The design would be the same: single thread with fully asynchronous I/O. I suspect it might be easier than it sounds.

If you do your Rust port, please don't use a specific async runtime and make your library generic over it. It seems like you already think this way and I'm glad to see that :)

Yes, I'm with you indeed. Although I understand why async/await might be appealing and even appropriate for some cases, I think it can be problematic in some others. The core library would be plain old function calls, with no callbacks, but rather queues that consumers of the library would need to read explicitly, making "scheduling" fully manual.

Sorry for the long explanation, but the take away is that if you want to implement a stable I/O backend for this library, perhaps you might want to consider waiting (if you are not in a hurry).

That makes sense, I think that the first release of canonical-raft would be backed by the libuv driven implementation anyway.

In addition to that, after I complete the work on the 1.0 API I have in mind, I would very highly recommend anyone implementing a custom I/O backend to use the backend-independent bits of the disk persistence and format that I'd like to put in place.

How do you think it will work? How will you design it? Will the user be able to provide functions that take, for example, a file descriptor and a raft_buffer for example? Just to be sure the design can fit in the Rust async ecosystem (like mio or anything driven by epoll/kqueue...).

I don't have a precise design yet, but it would pretty much like the core: plain function calls, with absolutely no system-related construct (file descriptors or anything like that). There would be data structures holding data to be written or read from disk, with all related encoding and parsing logic that translate core-level abstraction (such as a log entry) into its serialized representation on disk. But all actual system I/O and the program flow/scheduling around that would be left to the specific backend implementation (epoll/kqueue etc).

In my opinion the disk persistence story is one point where this raft library shines over hashicorp/raft. I believe the stock storage implementation of hashicorp/raft uses boltdb, and there's the additional LMDB implementation. [...] Also, botldb and most probably also LMDB are not using asynchronous and non-blocking disk I/O, something which is very important these days to get high performance in a multi-core SSD-backed world.

Makes complete sense to me I'll keep your disk persistence work. That is really kind of you to advise me on that!

You're welcome!

Just a little explanation about why I am not a big fan of using libuv as the async loop: this is because it means that you have to install a third-party library to compile and run your program. I would prefer to have a batteries included library.

Yeah that would be nice, but libuv is really a thin layer for cross-platform async I/O (essentially epoll, kpoll and windows async). If I wasn't currently using libuv I would probably need to come up with kinda re-inventing it, for cross platform support. I understand that currently this raft lib is Linux-specific, but as shown by the Windows port PR, it's not too difficult to make it work on other OSs.

Note that libuv was born to make nodejs cross platform.

I found out that libuv is a user space library, therefore a vendored version of it could be used, the last missing bit so far is to make the AIO dependency optional therefore making it available on every platform and without any third-party dependencies.

Yes, I'll fix syscall.c/h shortly.

from raft.

Kerollmops avatar Kerollmops commented on August 23, 2024

The core library would be plain old function calls, with no callbacks, but rather queues that consumers of the library would need to read explicitly, making "scheduling" fully manual.

So how do the user be able to read from the queue without blocking the thread?

I don't have a precise design yet, but it would pretty much like the core: plain function calls, with absolutely no system-related construct (file descriptors or anything like that). There would be data structures holding data to be written or read from disk, with all related encoding and parsing logic that translate core-level abstraction (such as a log entry) into its serialized representation on disk. But all actual system I/O and the program flow/scheduling around that would be left to the specific backend implementation (epoll/kqueue etc).

This seems nice and easy to use, therefore easy to wrap in other languages.

If I wasn't currently using libuv I would probably need to come up with kinda re-inventing it, for cross platform support.

Yeah, I undertand that this library is mandatory, that's why I will find a way to ship it with the Rust library, this way we avoid having to install it with apt or brew or anything and only depends on cargo.

Note that libuv was born to make nodejs cross platform.

Didn't know about that, I think mio is kind of the same abstracton but Rust specific.

I found out that libuv is a user space library, therefore a vendored version of it could be used, the last missing bit so far is to make the AIO dependency optional therefore making it available on every platform and without any third-party dependencies.

Yes, I'll fix syscall.c/h shortly.

I would be happy you do that, I'm currently rsyncing my work on a Linux machine to cargo check it as my editor doesn't shows compilation errors as it doesn't like the linux/aio import 😃


I have some questions about the current libraft design:

Why do raft_malloc exist in the library? Can I define my own allocator using the raft_heap struct and wrap the Rust global allocator safely?

In Rust we can override the global allocator. If I understand correctly I will need to define a raft_heap struct that internally calls the alloc, dealloc... functions to make the program safe. By doing so I can avoid thinking about the raft_malloc and raft_free functions as the global allocator is doing the same. Am I right?

Raft Rust
malloc alloc (aligned to a word)
free dealloc
calloc alloc_zeroed
realloc realloc (aligned to a word)
aligned_alloc alloc

And the last tricky point is what about memory alignment? Every allocations must be aligned in Rust otherwise it is considered undefined behavior, what if I receive a raft_malloc call without any memory layout information? Do I use an alignement of a whole word (u64 or u32) to be safe? It seems the subject as already been scratched.

The other problem is that Rust needs a Layout to dealloc things... This could probably work as the raft_free calls comes in the alignement would be defined to the max alignement. What if I got a raft_aligned_alloc followed by a raft_free on this allocation? I do not know the original alignement... Do I need to ignore the alignement when asked by raft?

from raft.

freeekanayaka avatar freeekanayaka commented on August 23, 2024

The core library would be plain old function calls, with no callbacks, but rather queues that consumers of the library would need to read explicitly, making "scheduling" fully manual.

So how do the user be able to read from the queue without blocking the thread?

The struct raft object and associated core APIs would essentially form a state machine that you would manually drive from your backend implementation (which, differently from now, would have no required "interface" like struct raft_io, and would completely decoupled from struct raft).

For instance:

/* Your backend code completes receiving a full `struct raft_message` from
   the network (serialization and transport concerns are all left to the backend
   like it happens currently with strcut raft_io). Typically you would have an event
   loop that invokes some callback or handler when it finishes reading from a
   socket a full message. At that point the backend code does something like
   this:
 */
void my_backend_message_handler(struct raft *raft, struct raft_message *message) {
   struct raft_event *event;
    int rv;
    /* Tell the raft state machine that a new message has arrived. The state machine
       will update its internal state accordingly and push any I/O or other asynchronous
       action that should be taken into some queues that the backend can consume. */
    rv = raft_recv(raft, message); 
    if (rv != 0) {
        /* handle errors */
    }
    /* Consume all new I/O events that have been pushed to the queue
       and handle them asynchronously, like submit non-blocking network
       or disk writes. */
    while ((event = raft_pull(raft, RAFT_IO) != NULL) {
        struct raft_io *io = (struct raft_io*)event; /* This is an I/O event. */
        switch (io->type) {
          case RAFT_PERSIST_ENTRIES:
              /* Submit a write request to persist some new log entries to disk. */
             my_backend_write_entries((struct raft_persist_entries*)io);
             break;
          case RAFT_SEND_MESSAGE:
              /* Submit a network request to send a serialized message. */
             my_backend_send_message((struct raft_send_message*)io);
             break;
          /* etc. */
        }
    }

    while ((event = raft_pull(raft, RAFT_FSM) != NULL) {
        struct raft_fsm *fsm = (struct raft_fsm*)event; /* This is an FSM-related event */
        switch (fsm->type) {
          case RAFT_APPLY_COMMAND:
            /* Apply a new command to the user's state machine. */
            my_fsm_apply_command((struct raft_command *)fsm);
          /* etc */
        }
    }
}

This is more verbose than the current raft_io interface, but more flexible and leaves full control to the backend. Essentially the idea is to invert the control flow: instead of having struct raft call
the backend, it's the backend that calls struct raft, which merely uses queues to let the
backend know what needs to be done.

I don't have a precise design yet, but it would pretty much like the core: plain function calls, with absolutely no system-related construct (file descriptors or anything like that). There would be data structures holding data to be written or read from disk, with all related encoding and parsing logic that translate core-level abstraction (such as a log entry) into its serialized representation on disk. But all actual system I/O and the program flow/scheduling around that would be left to the specific backend implementation (epoll/kqueue etc).

This seems nice and easy to use, therefore easy to wrap in other languages.

Yep, that'd be the goal.

If I wasn't currently using libuv I would probably need to come up with kinda re-inventing it, for cross platform support.

Yeah, I undertand that this library is mandatory, that's why I will find a way to ship it with the Rust library, this way we avoid having to install it with apt or brew or anything and only depends on cargo.

Note that libuv was born to make nodejs cross platform.

Didn't know about that, I think mio is kind of the same abstracton but Rust specific.

Pretty much.

I found out that libuv is a user space library, therefore a vendored version of it could be used, the last missing bit so far is to make the AIO dependency optional therefore making it available on every platform and without any third-party dependencies.

Yes, I'll fix syscall.c/h shortly.

I would be happy you do that, I'm currently rsyncing my work on a Linux machine to cargo check it as my editor doesn't shows compilation errors as it doesn't like the linux/aio import smiley

Ah thanks. I actually had pushed #131 before reading your offer. Next time :)

I have some questions about the current libraft design:

Why do raft_malloc exist in the library? Can I define my own allocator using the raft_heap struct and wrap the Rust global allocator safely?

Yes, the raft_heap interface is precisely used for customizing your memory allocator. By default malloc and friends are used, but you can implement anything you want. This interface will likely not change in the final 1.0 API (except perhaps minor details, see below the possible aligned_free()).

In Rust we can override the global allocator. If I understand correctly I will need to define a raft_heap struct that internally calls the alloc, dealloc... functions to make the program safe. By doing so I can avoid thinking about the raft_malloc and raft_free functions as the global allocator is doing the same. Am I right?

The contract of the raft_malloc, raft_free etc. interfaces is exactly the same contract of their POSIX equivalents. So as long as your implementation of them meets those expectations, you're good.

Raft Rust
malloc alloc (aligned to a word)
free dealloc
calloc alloc_zeroed
realloc realloc (aligned to a word)
aligned_alloc alloc

And the last tricky point is what about memory alignment? Every allocations must be aligned in Rust otherwise it is considered undefined behavior, what if I receive a raft_malloc call without any memory layout information? Do I use an alignement of a whole word (u64 or u32) to be safe? It seems the subject as already been scratched.

If you use a full word alignment you'll be good, yes.

The other problem is that Rust needs a Layout to dealloc things... This could probably work as the raft_free calls comes in the alignement would be defined to the max alignement. What if I got a raft_aligned_alloc followed by a raft_free on this allocation? I do not know the original alignement... Do I need to ignore the alignement when asked by raft?

So, the struct raft_heap->aligned_alloc() method is only needed by the libuv backend for performing direct disk I/O. If you are not going to use the libuv backend you don't even need
to implement it.

However, since you are going to use libuv, you do need it. If it would make integration with
Rust easier, perhaps I could add a new struct raft_heap->aligned_free() method that would
be always used internally by raft for releasing memory allocated by aligned_alloc and would
be passed the same alignment value originally passed at allocation time.

It feels it would indeed make things easier also for the Windows port. What do you think?

from raft.

Kerollmops avatar Kerollmops commented on August 23, 2024

The struct raft object and associated core APIs would essentially form a state machine that you would manually drive from your backend implementation (which, differently from now, would have no required "interface" like struct raft_io, and would completely decoupled from struct raft).

Ok, it makes extreme sense here, the loop is inverted, the user have to run its event loop and propagate the events to and from the raft state machine, instead of keeping the raft impl as a black box that calls your functions whenever it wants to!

Don't know if you already thougt about io error propagation to the raft state machine, but not important as this is not yet relevant :)

This is more verbose than the current raft_io interface, but more flexible and leaves full control to the backend. Essentially the idea is to invert the control flow: instead of having struct raft call the backend, it's the backend that calls struct raft, which merely uses queues to let the
backend know what needs to be done.

Yup, that would be nice in a sense but I would like to see it be integrated easily, not like the current PingCAP Raft impl where you have to run your own tick loop, or I would have to make it already wrapped in something like an async timer based system that you can easily plug on an async event loop.

I would be happy you do that, I'm currently rsyncing my work on a Linux machine to cargo check it as my editor doesn't shows compilation errors as it doesn't like the linux/aio import smiley

Ah thanks. I actually had pushed #131 before reading your offer. Next time :)

BTW, I achieve to make the library run on OSX, the only problem I got is that the uv_timer callback is never called... Probably related to the libuv version or another similar issue. Sorry to tell you that but I'm starting to hate libuv ;( Working on this...

Yes, the raft_heap interface is precisely used for customizing your memory allocator. By default malloc and friends are used, but you can implement anything you want. This interface will likely not change in the final 1.0 API (except perhaps minor details, see below the possible aligned_free()).
[...]
The contract of the raft_malloc, raft_free etc. interfaces is exactly the same contract of their POSIX equivalents. So as long as your implementation of them meets those expectations, you're good.

That's nice this way I will be able to make libuv and libraft allocate memory by using the Rust global allocator defined by the user, this will be easily integrable in any Rust program 🎉

I'll just need to carefully read when and where I need to drop allocated memory and when the Raft impl needs to do it by itself, that will highly be one big source of leaks and use after free errors, I'll probably have to rely on valgrind or something.

So, the struct raft_heap->aligned_alloc() method is only needed by the libuv backend for performing direct disk I/O. If you are not going to use the libuv backend you don't even need
to implement it.

However, since you are going to use libuv, you do need it. If it would make integration with
Rust easier, perhaps I could add a new struct raft_heap->aligned_free() method that would
be always used internally by raft for releasing memory allocated by aligned_alloc and would
be passed the same alignment value originally passed at allocation time.

It feels it would indeed make things easier also for the Windows port. What do you think?

Yeah. I think it would be easier and mainly safe to have this aligned_free function. This way I will be able to free aligned memory with the right original alignment and avoid undefined behaviors. What about aligned_realloc and aligned_calloc does libuv needs them at all?

Raft Rust
malloc alloc (aligned to a word)
free dealloc (aligned to a word)
calloc alloc_zeroed
realloc realloc (aligned to a word)
aligned_alloc alloc
aligned_free dealloc

Thank you for your help, that is really kind of you, I really appreciate that 👍

A little note on where I am with the Rust safe wrapper:

  • I achieve to make it compile on OSX by faking the fallocate function using ftruncate.
  • I am fighting with libuv, I installed it with the nix pkg manager but it complains about missing framework headers (Raaaaahhh OSX 🤯). So I am using the sources from the libuv-sys2 Rust crate.
  • The first time I achieved to run it on OSX the servers (ran by the cluster example) ran but the server_apply_cb were never called.
  • Now the cluster can't even elect a leader, that's a little bit frustrating but pobably related to network errors and libuv.

from raft.

stgraber avatar stgraber commented on August 23, 2024

@Kerollmops hey there, just was wondering how that effort is going on and if there's anything that you need from us?
The issue hasn't been updated in over a year so wondering if there's anything actionable for us or if we should close it.

from raft.

Kerollmops avatar Kerollmops commented on August 23, 2024

Hey @stgraber,

We don't have much time for this project at MeiliSearch for now, I think we will go on a full-Rust solution. I would like to thank you all for the work you have done here but I think you can close this issue.

Have a great day!

from raft.

stgraber avatar stgraber commented on August 23, 2024

thanks!

from raft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.