rsimmonsjr / axiom Goto Github PK

View Code? Open in Web Editor NEW

183.0 183.0 21.0 593 KB

Implementation of a highly-scalable and ergonomic actor model for Rust

License: Other

Rust 100.00%

axiom's People

Contributors

Stargazers

Watchers

axiom's Issues

the ActorSystem::send_to_system_actors should have a "try" implementation.

This should allow the user to send to all system actors in a try fashion and then be told if there were any errors sending to those actors.

Investigate possible race condition when scheduling actor in ActorId::send.

Currently the code checks to see if receivable = 1 to determine if it needs to schedule the actor. If the actor has messages already then the dispatcher thread will have put it in the channel. I am wondering if try_send will race the dispatcher threads and an actor with a message not get scheduled. if its in the channel twice that is fine, though not best performant, but that should be preferable to not being in the channel when it has messages. The bad part is if it is out of synch it wont get back in synch because the actor will have more than one receivable message and wont be in the channel. Perhaps an AtomicBool should be used instead to track if the actor is scheduled.

Return error when deserializing an ActorId if a remote is not connected instead of panic.

If a user tries to deserialize a Remote ActorId and the remote is not connected the code currently panics. It should, instead, merely return a deserialization error.

Add the ability to register a local name for an actor.

The local name should not be a uuid but something that is person readable that a developer can set when they create the actor and then look up afterwards.

Create a processor type that doesn't use state.

Some actors just react to messages and don't hold state. They shouldn't have to declare a processor with a placeholder state.

Create a Means to allow the ActorSystem to finish all messages before shutting down.

Currently Shutdown is a bit brutal, messages are stranded in the channels. This ticket is to add an additional shutdown process that makes sure all messages are processed and sends stop to all actors so that they can wrap up work before the threads exit.

Rename SeccErrors to just Error and let the user scope to secc::Error if needed.

This is just a usability thing since the types can easily ditch the prefix. It is also worth considering deleting the Secc prefix on all of the other sub types.

Put in a means to allow an actor to monitor another actor.

A monitor is a special actor that receives messages from the system and allows one actor to know about the life status of another actor. One possible implementation is to expand SystemMsg to include the message ActorStopped(Arc<ActorId>) to enable the receiver to know the actor that was shut down. When an actor is monitoring another actor the system will track all monitoring actor ids in the actor and then send the message to all those actors. Of course if the system is hard killed one cannot be sure the monitor message will be received. In later implementations across a network this should take advantage of location transparency.

Implement the Dining Philosophers Problem in the Examples

https://en.wikipedia.org/wiki/Dining_philosophers_problem

Split SECC off into its own crate.

Axiom uses SECC but not the other way around. It should be in its own crate. Once SECC is super stable after being used by Axiom it should be made its own repository.

Create ActorId::send_arc and ActorId::try_sendArc() to make passing Arcs efficient.

send_new and try_send_new will wrap the message inefficiently in a double Arc. This needs to have new API to make it efficient.

When deserializing ActorId, return a deserialization error if the system uuid matches but the actor uuid is not found.

This could happen if an actor was already stopped but the user still had an ActorId in serialized form and then tries to deserialize it. Right now that would probably panic and take down the system.

Add graceful shutdown for threads handling remotes.

Currently the threads are just flat terminated. It would be better if they are gracefully shutdown and they inform the remotes that they are shut down. This could be wrapped up in a protocol for system to system communication.

Optimize dispatcher handling for actors that get lots of small messages.

Rather than doing only one message at a time, it would be more efficient if a dispatcher thread would perform work up to a certain configurable time limit. The default might be 1 millisecond If a message takes less than that, the dispatcher thread should handle the next message for the actor, if any, and the next and so on until it gets to the timeout.

While adding this issue the developer will need to add a configurable time_slice to set the time in nanoseconds and then a configurable time_slice_max which will serve as an upper bound for the time a message executes, beyond this level the system should log a warning.

Implement a means to lookup an Actor by its UUID.

The actor system should maintain a map of UUIDs to ActorIds where the key is the id field inside the ActorId. This will enable a user to lookup the actor even when the actor is remote by its ID because UUID v4 values have an incredibly small chance of colliding.

Make error handling in ActorSystem::process_wire_message more robust.

Currently errors will mostly just panic. This should be changed to be more robust and tolerant.

Design question: pattern match on message type?

I'm used to Erlang's pattern of feeding an actor message into a pattern match, and the chains of if let Some(x) = msg.content_as::<Foo>() feels clunky in comparison. I know there's only so much one can do with TypeId but I was wondering if you had any thoughts on whether it would be possible to have a nicer pattern?

Another concern I have that I don't know the answer to is whether TypeId is stable between compiler version, or even multiple builds of the same program. If not, having multiple separate programs in the same cluster could lead to their TypeId's being incompatible...

Rename Statuses.

To make Status more consistent the variants should have constant cognitive theme.
Processed should be renamed Consume
Skipped should be renamed: Skip

Note that it might entail renaming status as well because its really a verb. Maybe Action

Update Dev Ops structure

Allow travis-ci to deploy on tag.
Block users from committing directly to master or pushing to master.

Make error handling in ActorSystem::connect more robust.

Right now errors in connect will mostly just panic. This should make them more robust and tolerant.

Semantics and limits of TypeId

From #53:

TypeId should not be depended on when compiled on different machines and that is not recommended. When running an Axiom Cluster (after I get that finished) the recommendation will be to deploy the same compiled binary on all nodes.

I want to throw down some thoughts I have about TypeId and see what you think of them, so that I can get them straight in my own head, and hopefully someone will find this useful someday down the line.

First, I'm coming from ROS, which is designed to allow multiple programs to operate together, even written in different languages, and basically implements a message-based RPC system complete with message definition files similar to gRPC or Cap'n Proto. I sure as heck don't want to build something as complicated as that, but there are certain advantages that come out of it that I do kinda do want:

You can have external programs that record/replay/produce messages for simulation and testing
You can start up a new node that connects to a running system and acts as a debugger to inspect what's going on in some detail

Your recommendation for using the same binary for every node in a cluster makes sense for something scaling horizontally like a web service, but is less convenient for something with lots of asymmetric parts like a robot system. If rebuilding your debugger program requires rebuilding and re-deploying your whole system, and that system is fundamentally stateful, that gets annoying and slow.

So I guess my question is, what can we reliably do with TypeId, and what are the exact constraints? How far can we rely on TypeId::of::<u32>() == whatever being accurate? Obviously if everything is built into one statically-linked binary, all TypeId's will line up with each other. And the TypeId docs say "...it is worth noting that the hashes and ordering will vary between Rust releases. Beware of relying on them inside of your code!", so that's the other extreme. But "same compiler" is pretty easy to guarantee, and the docs don't say a whole lot beyond that. So, will TypeId comparisons be valid if:

...we build multiple binaries as part of the same crate?
...we build multiple dylib's as part of the same crate?
...we build multiple totally independent binaries using the same types (say, a common crate full of type definitions)?
...we build multiple totally independent binaries using the same types, but for different targets? x86_64 vs aarch64, for example.

Create a macro that improves downcast ergonomics.

Currently in the test cases there is a usage of the downcast function that is very manual and with poor ergonomics.

        fn handle(&mut self, aid: Arc<ActorId>, msg: &Arc<Message>) -> Status {
            dispatch(self, aid.clone(), msg.clone(), &StructActor::handle_op)
                .or_else(|| dispatch(self, aid.clone(), msg.clone(), &StructActor::handle_i32))
                .or_else(|| {
                    dispatch(
                        self,
                        aid.clone(),
                        msg.clone(),
                        move |state: &mut StructActor, aid: Arc<ActorId>, msg: &u8| -> Status {
                            assert_eq!(3, aid.received());
                            assert_eq!(7 as u8, *msg);
                            state.count += *msg as usize;
                            assert_eq!(29 as usize, state.count);
                            Status::Processed
                        },
                    )
                })
                .unwrap()
        }

This is unfortunately necessary due to rust mechanics with Any but could be improved with a macro that would take the state, the aid, the msg and then a list of handler functions (which could be closures and then generate the code above.

Implement sending to remote actors.

Currently sending to remote actors is not enabled. Once the system has the ability to connect in the cluster, this ability should be added. The ActorSender enum should be enhanced to allow the message to be serialized to the recipient and decoded at the other side and sent to a Local sender where the actor lives.

Create configuration struct for the actor system.

Rather than have a mechanism by which the Actor System reads config from a file, I would like to create a struct with the configuration options and allow the user to instantiate this struct however they want when passing to the actor system. There should also be a set of defaults that if the user passes no config structure or only a partially filled structure, the system will configure with defaults.

The preferred way of implementation would be to use the builder pattern as in:

let config = ActorSystem::config().poll_ms(20);

When sending to an actor, if that actor dies before the send can occur it shouldnt crash.

Currently if you try to send to an actor and your request to send is beaten by the actor stopping an going out of the hashmap then the system would crash. Instead it should gracefully handle this and return an error message. This can happen when an actor is killed or stops gracefully.

Refactor ActorId Semantics to Allow SystemActor

Because of the needs of SystemActor and other issues, there is a need to refactor the ActorId to get rid of the threadlocal.

Add message type name to message when std::any::type_name is stabilized.

Once std::any::type_name is stabilized, add the name to the message to be used for debugging.

Integrate Prometheus APM

In order to enable this application to be monitored it is worth considering integration of Prometheus APM into the library. This may take the form of a secondary crate such as axiom_prometheus to make it optional but some of the metrics should be structured for APM monitoring.

https://prometheus.io/docs/instrumenting/clientlibs/

Implement ability to configure the system for default mailbox size and allow user to override per actor.

There should be a new spawn function that allows a user to spawn an actor with overrides for the mailbox size and other parameters we care to add in the future. The ActorConfig object should also be part of the ActorSystemConfig that is passed at system startup.

Implement a means to be able to Serialize and Deserialize messages with any content.

When sending messages there is a need to serialize and deserialize when the message crosses network boundaries. This ticket is geared towards finding a means to do this. For this to work, Message might very well be converted to an envelope struct instead of type. This is part of enabling remote actors.

Investigate possiblility of removing inner Arc in MessageContent.

This is an issue for efficiency and simplification. Currently the MessageContent type looks like the following:

pub enum MessageContent {
    /// The message is a local message.
    Local(Arc<dyn ActorMessage + 'static>),
    /// The message is from remote and has the given hash of a [`std::any::TypeId`] and the
    /// serialized content.
    Remote(Vec<u8>),
}

Note that the local message is holding the content inside an inner Arc. It would be much nicer if we could get rid of the inner Arc if possible because it would reduce complexity and indirection.

Create tests for skipping and reset in secc.

SECC currently has no tests around skipping functions and they need to be added.

Alter receiving of messages so an actor Panic does not take down the whole system.

Currently if an actor panics while processing a message the whole system will go down. In reality only that actor should go down. This should involve using some means of panic unwinding in the receiver thread in order to shut down the offending actor but keep going.

ResetSkip doesn't process remove message.

When you return ResetSkip it should first dequeue the current message before resetting the pointers.

Change ActorId send ergonomics.

With the changes from the serialization branch it should be possible now to implement

aid.send(Message::new(x));

Given aid is a an ActorId

Add ability for Actors to return a reason that they stopped via a trait.

When the actor is stopping it should be able to return a reason for that stop via a trait. This reason should be debug and display enabled and structured so that monitoring actors can decide what to do about the monitored actor that was stopped.

Impl Ord for ActorId?

I'm making a pubsub-ish thing and it would be nice if I could store ActorId's in a deterministic order. PR will come if you want it.

Implement a Monte-Carlo simulation in Examples

http://web.math.ku.dk/~rolf/teaching/thesis/DixonColes.pdf

Assign a name to each of the Dispatcher threads.

Using a thread builder, give a human readable name to each of the dispatcher threads. This will improve debugging.

Refactor Actor to live in local sender.

Currently the Actor has to be looked up in a hashtable when it is scheduled and that is inefficient and unnecessary. The ActorSender::Local should be refactored to store the actor and avoid the lookup.

Debug occasional philosphers.rs lockups.

the philosophers.rs example occasionally locks up but it was merged because there were so many pertinent changes in the core code. This continues the work on that example.

Actors currently have no easy way to reply to senders.

Currently the actor doesn't know who sent the message so replying becomes a problem.

Timeout is not obeyed when SECC `send_await_timeout` no space is available in the channel.

The secc::receive_await_timeout obeys the timeout but the send does not. This should be added and both functions should be tested to make sure they work as expected.

Implement tracking and warning for actor message processing.

Actors should track how long their messages take to process and how much time they spend in the channel and use that to warn the user when they are sending messages that take too long to process. The threshold for warning should be added to the configuration object for the actor system

Integrate logging into system.

Logging using https://github.com/rust-lang-nursery/log should be integrated into the system and provide all of the output currently in println!()

Add ability for SECC to track how long the message has been in the channel and other timing stats.

It would be potentially useful to know how long a message has been in the channel. SECC should implement that by tracking the difference in microseconds between enqueue and dequeue time in the SeccNode and then rolling those numbers up into an average when a message is received in order to be able to report timing metrics. At the same time any other timing metrics should be explored and implements such as "time waiting for capacity" and "time waiting for messages".

Actors shouldn't panic if they cant schedule the actor.

If the actor cant be scheduled temporarily or before the timeout, it currently panics. It should handle this gracefully.

Add capabilities for actors to stop gracefully and brutally.

Currently actors go on forever until the system shuts down and that is obviously not optimal. There need to be the following capabilities integrated.

Messaged stop with SystemMsg::Stop that an actor can process to shutdown gracefully.
Returning status of Stop from handling a message indicating the system should stop the actor.
Calling a ActorSystem::stop() to force the behavior above immediately.

Rename ActorError to just Error and let the user scope to actors::Error if needed.

This is an ergonomics issue as its unnecessary when there is the module prefix if needed.

Update README with the new post-serialization Ergonomics

The API has changed so the readme must as well.

rsimmonsjr / axiom Goto Github PK

axiom's People

Contributors

Stargazers

Watchers

Forkers

axiom's Issues

Recommend Projects

Recommend Topics

Recommend Org