uazu / stakker Goto Github PK

View Code? Open in Web Editor NEW

159.0 8.0 8.0 633 KB

A lightweight low-level single-threaded actor runtime

Home Page: https://uazu.github.io/stakker/

License: Apache License 2.0

Perl 1.96% Shell 3.22% Rust 94.82%

actor rust async actor-model actor-framework actor-system safe low-level event-driven

stakker's People

Contributors

Stargazers

Watchers

Forkers

joseluis krircc isgasho prateekvishnu icodein iq-scm hardliner66

stakker's Issues

Means to make actor calls from another thread

The basic idea is to be able to do something like remote_call!([cx, actor], method(args));. However cx would have to be some kind of inter-thread object that forwards calls to the queue on the Stakker thread. Also actor can't be an Actor reference because that's not Send. So it requires something like a RemoteDeferrer and a RemoteActor. A RemoteDeferrer could be something like an Arc<Mutex<(Waker, Vec)>>. To avoid too much contention on a single mutex, it may be preferable to allow several underlying queues to be created, e.g. to create a different one for each thread it is sent to.

An alternative if only a few distinct calls are to be made would be to have a RemoteFwd, which is created from a Fwd and allows data to be forwarded to the Stakker thread. This would be something like an inter-thread stream with the destination fixed at creation time.

There is a question of how to handle failure. If the destination goes away (e.g. destination actor fails), then would the sender in another thread need to know about that? i.e. would they want to be notified to allow them to cleanup and terminate. The PipedThread implementation handles this case, meaning that if the actor holding the reference to the PipedThread dies, then the thread will also be notified to allow it to clean up and terminate. So probably RemoteActor and RemoteFwd should have a means to query whether the destination has gone away. Perhaps even make it hard for the caller NOT to notice, to force it to handle this case.

There is also the question on how this fits into future support for remote calls from further away than other threads, e.g. remote calls across machines. However it may be that remote calls from other machines will be handled with proxy actors, in which case it will be a completely different API and unrelated to this one.

Please add comments here if you have a requirement for something like this and what kind of solution would be preferable. (See also #23)

Consider small-Ret optimisation to avoid allocation

Ret is boxed, which means it needs to allocate memory (except for ret_nop!). It would be nicer if small Ret instances could be handled without allocation. Then a whole round-trip to another actor (call! and ret!) could be done with no allocations at all.

A simple callback to an actor method with no captured data apart from the actor ref should fit in the same space as the current Ret. This should be possible by taking the FnOnce apart like is done to serialize FnOnce for the FnOnce queue. Upside is saving a malloc and free. Downside is needing unsafe code, and needing more code on creation and on calling through a Ret to switch between two alternatives (switching based on the size of the FnOnce). The existing safe implementation can be kept for the 'no-unsafe' feature.

Currently Ret is two usizes. It could be expanded to 3 or 4 usizes which would allow capturing more data without allocation, but at the cost of making all Ret instances bigger. 3 might be optimal if the application has a lot of callbacks that capture the actor ref and some index to represent the context the callback is related to. It might be worth it for some applications, so maybe it could be a cargo feature.

Call and refer to actors that implement a trait

Hi Jim,

I've been trying out several actor model implementations, include Stakker, and wasn't able to figure out how to refer to an actor that implements a trait.

https://github.com/anacrolix/eratosthenes/blob/8bdb8c9ed5bd9e924bd483e2f480dd87f9fc7359/rust/stakker/src/main.rs#L104

On that line, I want to initialize a Link, with a Printer, instead of another Link, but I get this error:

$ cargo check
    Checking eratosthenes-stakker v0.1.0 (/Users/anacrolix/src/eratosthenes/rust/stakker)
error[E0308]: mismatched types
   --> src/main.rs:104:16
    |
104 |     let tail = actor!(system, Link::init_tail(printer), ret_nop!());
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Link`, found struct `Printer`
    |
    = note: expected struct `stakker::actor::ActorOwn<Link>`
               found struct `stakker::actor::ActorOwn<Printer>`
    = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

I expect it has to do with Link::next, which is of type ActorOwn<Link>. Actually I'm happy with a reference to an actor that merely implements Next, but I'm not sure how I'd go about expressing that. (I don't think ActorOwn<Next>) will work.

To experiment with it, checkout that repo, and run cargo check in rust/stakker.

Feature request: elapsed and/or simulation start time

It would be nice to have shorthand methods for getting elapsed time since the simulation start and/or getting the simulation start time directly (should be more flexible).

`stakker_macros` linked from website doesn't exist

Hi,

I just noticed, that the website links to stakker_macros, which doesn't exist.

Can not initialise actor using fully qualified init method

With the following code

let thing = actor!(
    stakker,
    some_crate::Struct::init(),
    ret_nop!()
);

Compiler fails with error:

error: no rules expected the token `::`
  --> tests/test.rs:37:35
   |
37 |             some_crate::Struct::init(),
   |                               ^^ no rules expected this token in macro call

I assume that the actor! macro does not expect anything apart from the struct's name and method, which is quite limiting.

An Actor implementing multiple traits?

In my emulator, I am making each chip actor run independent of each other, not reference each other directly.
I do this by having them only reference a parent trait actor any time they want to call! other actors.

The parent actor implementing this trait, will call! the other children actors.
Ideally, I would like this to be just one central motherboard actor, implementing all traits of all chips, with all chips stored in one struct.
But it seems like actor_of_trait is written in a way that forces me to choose just one trait.
Is it possible for me to make an actor which implements multiple traits, and pass this one central motherboard to all chip actors?
Otherwise, it looks like I will need a separate board actor implementing the trait for each chip.

Actor coroutines

An actor coroutine would run detached from the actor, but receives the actor &mut Self and cx: &mut Cx<'_, Self> references whenever it is resumed, i.e. the code runs within a normal actor context. So the idea is that actor code within a coroutine could do anything that an actor method could do, but in a sequential style. This also allows calls to other actors to happen apparently synchronously in the code (although it is still all asynchronous underneath). So a call to another actor would return the same value that a Ret handler receives right now, but directly in the code. (? could perhaps be used to handle failures).

Actor coroutines would be behind an Rc (or similar), and would be held in memory by the internal Ret handler -- or whatever suspended the coroutine and will resume it later. So they would get dropped if nothing is ever going to resume them again (i.e. actor coroutines would need to be ready to be dropped without completing). There could be two types of coroutines, one which terminates the actor when it completes, and another which just runs to completion without affecting the actor lifetime. It would be possible to run several actor coroutines at the same time, each held in memory by whatever is going to resume it. Cleanup is straightforward -- if whatever would resume the coroutine is dropped, the coroutine is also dropped. If the actor is terminated, then resuming the coroutine would just do nothing (similarly to how outstanding calls to an actor are dropped if it has terminated), and any Ret instances would send back None.

Actor coroutines would also enable convenient offloading of processing to background threads, because that could be represented as a wrapped block of code which effectively yields the coroutine until the background processing is complete. The apparently-synchronous nature of the coroutine code would make this clearer and more convenient in the source.

This feature is blocked on a Rust generator feature that would allow borrows to be passed into the coroutine.

Look into a no_std variant of stakker

I just attempted to use stakker in a wasm project.

I got back an error that wasm doesn't support std::time. Looking online I see there are several variants of workarounds, like using the instant crate, but I cannot use this because the stalker new function requires the std::time::Instant as input..

Clarify timer contract

The general contract of a timer is that it will be called not-before its expiry time, and hopefully soon after. (In general, OS scheduling might mean that timer execution may be delayed according to the current load.)

However there is another question, about the order of execution of timers. The current implementation guarantees that timers are called in order of expiry time, but only to the implementation resolution. It makes no guarantees about the order of timer execution when several timers expire at the same Instant (within the lowest unit of time resolution used by the timer queue implementation).

The question is whether it would be worthwhile guaranteeing strict execution order by time, and then by submission order for timers with the same expiry Instant. The trouble is that time can stall for a while on the same value if Instant::now() skips backwards. Any code assuming that two timers started one after the other will execute in that order would be randomly broken by such a stall. If guaranteeing order doesn't add much overhead, then it would make things more deterministic, and make weird corner cases test the same each time. However, if the overhead would be too great, then the current behaviour needs to be documented.

Glommio: Consider writing interface code to run on top of it

Glommio has a similar single-threaded approach to Stakker for handling load, i.e. shard it or load-balance it at a high level, instead of load-balancing across threads at a low level. However Glommio takes things a good deal further, and its maintainers have put in the necessary work to interface to Hyper and so on. However it only works on recent Linux kernels (5.8+) and supports no other platforms, so it is rather a niche runtime. Adding Stakker to it would be a niche on a niche, i.e. to support Glommio users who also prefer the actor model instead of async/await. However the combination could be really high performance. Also interfacing to Glommio would be a good learning experience.

Realistically, I think this is not going to happen unless someone has a specific interest in this and is willing to fund it, since supporting all platforms would be preferable. However maybe it's possible to copy ideas and workarounds from Glommio where they have solved problems of getting crates from the general async/await ecosystem to work single-threaded.

Link the docs website from somewhere

https://uazu.github.io/stakker/ is pretty hard to find at the moment (I needed to google the reddit thread). I think this could be linked from readme, docs, and the "website" meta on GitHub?

Macro to support `Actor<Box<dyn Trait>>` based actor creation

See issue #1

ret_some_do! updating variables issue

Consider the following code:

use stakker::*;
use std::time::Instant;

struct Board;

impl Board {
  pub fn init(_: CX![]) -> Option<Self> {
    Some(Self)
  }
  
  pub fn get_byte(&self, _: CX![], ret: Ret<u8>) {
    ret!([ret], 123)
  }
}

struct ExampleStruct {
  value: u8,
}

fn main() {
  let mut stakker0 = Stakker::new(Instant::now());
  let stakker = &mut stakker0;

  let board = actor!(stakker, Board::init(), ret_nop!());
  
  let mut example_struct = ExampleStruct { value: 9 };
  
  let ret = ret_some_do!(move |result| {
    example_struct.value = result
  });
  call!([board], get_byte(ret));
  stakker.run(Instant::now(), false);

  println!("{}", example_struct.value);
}

It has the following compiler error:

error[E0596]: cannot borrow `cb` as mutable, as it is not declared as mutable
  --> src\main.rs:28:13
   |
28 |     let ret = ret_some_do!(move |result| {
   |  _____________^
29 | |     example_struct.value = result
30 | |   });
   | |    ^
   | |    |
   | |____cannot borrow as mutable
   |      help: consider changing this to be mutable: `mut cb`
   |

Is there a better way for me to do this?
I like the potential use of ret_some_do! due to me avoiding the use of cx in my deeper modules which don't pass cx down...

Pass long path name to macro

Working off the tutorial:

actor!(stakker, Light::init(), ret_nop!());

This works because Light is defined locally.

If I have Light defined in a separate module, I get an error with this code:

actor!(stakker, submodule::Light::init(), ret_nop!());

The current workaround:

use submodule::Light;
actor!(stakker, Light::init(), ret_nop!());

Real-world benchmarks

It would be good to do some real-world benchmarks to demonstrate Stakker's single-threaded approach versus other solutions, to show pros and cons.

Really any kind of real-world load would be good for a benchmark, but a suggestion from @d4h0 was a Websocket client or server:

It appears that tungstenite crate should work on top of Stakker
Probably the test can be load-balanced externally across several Stakker threads
More ideas on this discussion

ActorOwnAnon type

This is to encapsulate an ActorOwn, but not expose the type. This lets someone keep a list of mixed actors that need freeing without type problems.

Allow returning `Self` instead of `Option<Self>` for Prep methods that always go to Ready

Option provides a From implementation for T such that Option::from(val) on both T and Option<T> will result in Option<T>. So all Prep calls could be wrapped in Option::from(...) to handle both. This needs testing to check that it causes no issues. It also requires some consideration to be sure that it won't confuse coders who might forget that there's a way to delay moving to the Ready state.

Timer queue and time/duration overhaul

Timers currently use a BtreeMap which is theoretically efficient at scale, especially when there are thousands of timers, or when the thread is heavily overloaded. However a BtreeMap generates a lot of code, and is probably overkill. It would be better to have something tuned to this application. So these changes are proposed:

New type MonoTime (for monotonic time) which is time since base Instant as a u64 in ns, allowing Stakker to run for 500+ years at a time. Support most things that Instant does. Conversion to/from Instant is supported via a Core reference to get the base time. Support creation directly from a time in ns to support virtual time without reference to an Instant
New type MonoDur for monotonic duration as a u64 in ns. Support easy conversion from f64 in seconds. Support most things that Duration does, and conversion to/from Duration
Switch all times and durations in API to use these types (or Into<...> these types), i.e. so Stakker can work purely with these types internally
Create timer benchmarks (based on estimated realistic distribution of timers at various scales) and run against the current BtreeMap implementation
Rewrite timer queue as a N-ary heap with an associated slab-style array to support deletion, and benchmark to check that this is an improvement

This will be a breaking API change, so it will mean going to 0.3. However API changes will be kept to a minimum and where code uses type inference it might not even notice. Justifications for design decisions:

Instant is problematic because the representation is different on different platforms, and calculations may be relatively heavy (e.g. macOS). Also, there's no way to construct an Instant zero. You always have to work relative to 'now' even if you're working in virtual time. Also secs/ns time in Duration is inconvenient to calculate with.
Using ns time is easy to calculate with (add/sub). Current Stakker timer code internally uses a compromise time representation with discontinuities.
Converting secs/ns to ns is just a mul and add (fast).
Converting back from ns to secs/ns (as used by Instant and Duration) requires a divide and modulus which is slow, but there shouldn't be much need to convert back within the Stakker system. Just maybe on the edges
Encouraging const conversion of f64 to MonoDur means easy representation of durations in the code, which are converted to ns at compile-time
N-ary heap can be optimised to cache line size. It will produce a shallow tree. It should perform well for single timer fetches. It doesn't support partition to grab a whole chunk of timers like BtreeMap does. However the code will be many times smaller.
Weak point of heap is O(N) deletion, which is worked around by having a slab-like vec associated with it where the callbacks are stored, where deletion can occur

Are there any benchmarks?

Hi,

Stakker looks fascinating!

I'm wondering if you have planned to publish benchmarks that compare Stakker with other options.

I think, that could help with the marketing of the crate (if the results are good... 😄).

Also: Is my assumption correct, that basically everything has to be re-written for Stakker (HTTP servers/clients, etc.)? Or would it be easy to reuse crates that are written on top of "pure" Mio, for example?