rsocket / rsocket Goto Github PK
View Code? Open in Web Editor NEWRSocket Protocol Definition
Home Page: http://rsocket.io
License: Apache License 2.0
RSocket Protocol Definition
Home Page: http://rsocket.io
License: Apache License 2.0
Currently flags appear in the 8 bits between VERSION and FRAME TYPE in the frame header. I thought at first this means flags should be identified by the version. This is further supported by the idea there is an I flag to ignore the frame (type) if not understood, but the M flag is per-frame type, and the remaining 6 bits are flags available for a frame type to define.
Why are you placing flags before the frame type? Are you trying to align things on 16-bit boundaries? If so, why? I don't think you need to unless you are trying to match this to some underlying protocol or link layer.
(The I flag appears to be pointless—if the receiver can't understand a frame then of course it has to ignore it.)
The current text for the Setup Error Data inside the SETUP ERROR frame is:
Setup Error Data: includes payload describing connection capabilities of the endpoint sending the Setup header. Error Data MUST be a UTF-8 encoded string.
This doesn't make sense to me since I assume the SETUP ERROR frame is sent back in response to a SETUP frame (not SETUP header). Why would the responder know the connection capabilities of the requestor (sender of the SETUP frame)?
Poll for removing Initial Request N from Request Subscription and Request Stream.
It might be easier and cleaner to remove the initial request N field from the request and simply send a Request N right after the request. False Start behavior will cover the error case.
Thoughts?
There's specific situation where a client may have all its leases "empty".
(this is kind of a pathological use case, but it could happen during/after an big outage)
The question is do we want to let the client still send traffic to the servers or not?
I tend to think that it would be better if the client just reject the request (i.e. reject the associated reuest from its upstream)
e.g. in a situation like this:
-> C
A -> B -> C
-> C
if B has empty all its lease from the C, then I would propose that B reject the request from A that trigger the message to C (That rejection being retryable by A)
The motivation provides some useful context. One question that comes to mind here is around layering. It appears that the protocol is currently described as a single layer that includes flow control, envelope/metadata, and several alternative interaction models. This is in some ways similar to where AMQP started. We ultimately ended up dividing the protocol into multiple layers because certain elements (e.g. flow control, envelopes, addressing/routing) ended up being fundamental to all interaction patterns, whereas a given endpoint rarely needs to use/understand the full variety of interaction patterns that can benefit from those more fundamental layers. By dividing the protocol into these layers, we found it became both easier and more efficient to implement useful general purpose intermediaries (i.e. proxies and routers) as they only need to concern themselves with those lower layers.
This may not matter much if you intend to use this as purely a client/server protocol, however if you intend this to form the basis of larger application-level networks, this question may be more relevant. For example if an incoming request ultimately triggers a number of dependent interactions of different types, I imagine you probably want the ability to correlate the dependent interactions back to the trigger. This implies the ability to share and propagate context across different styles of interaction.
Similarly, flow control generally needs to propagate from end to end, e.g. if service A uses service B which uses service C, then B may well be in the position of proxying service C's capacity to service A.
The text describing connection establishment has some contradictory statements in it, which I don't think is intended.
For example, the following two sentences contradict:
The server-side Requester may send requests immediately upon receiving a SETUP frame that it accepts.
If a client-side Requester sets the L flag in the SETUP frame, the server side Requester MUST wait for a LEASE frame from the client-side Responder before it can send a request.
And the client- and server-side behaviors are inconsistent even though they are both requesters:
The client-side Requester that has set the L flag in the SETUP frame MUST wait for the server-side Responder to send a LEASE frame before it can send Requests.
The server-side Requester may send requests immediately upon receiving a SETUP frame that it accepts.
The text-based sequences are clearer, and do not conflict knowing they are different scenarios.
Are the first 8 bits your version or the R (which right now has no defined value) followed by a frame length? A frame length that fits within the lower 24 bits also means that the first 8 bits of a frame-lengthed frame will look the exact same as the first 8 bits of a frame-not-lengthed frame, because your initial version is zero.
If you ever want to increment the version in order to support a different frame header field length you won't be able to. But that might be important.
Suggest making the version the very first thing. Magic strings are also sometimes used to differentiate between random bytes and the start of a message, which might be even more important considering you could start reading almost anywhere in the binary stream coming across the wire.
The protocol definition states an ERROR frame is used to communicate a stream ERROR from a responder to a requestor.
However, unlike the SETUP ERROR frame, there are no error codes defined. Instead it is a free-form UTF-8 encoded string. Is this really for application errors sent back instead of a RESPONSE frame? (That's what @benjchristensen says to me.) If so, application level errors probably shouldn't exist within a Reactive Socket level structure. And they could continue to use MIME type encoded data to provide an application level response that is interpreted by the application as an error.
If these are actually stream errors, like the frame send from the requester is malformed or unrecognized, then this should provide strict error semantics at the Reactive Socket level like is done for the SETUP ERROR frame.
We need the ability to convert an ERROR to a message readable by humans ... such as an Exception
in Java. Thus, the byte[] being sent in an ERROR frame needs to be String bytes so it can be turned back into a String.
This means we must also define the encoding type: UTF-8
Why not support the MimeType encoding for ERROR?
a) because the error could have been caused by the encoder trying to encode into the MimeType
b) because the thing processing the error may have no knowledge of the MimeType or processor
c) because this would require a consistent MimeType to always deliver the error in the same way, which goes against the principle of allowing the MimeType to be defined by the application tier.
I noticed in the docs here https://github.com/ReactiveSocket/reactivesocket/blob/master/Protocol.md#error-codes that error codes have RESERVED
as 0x0000
AND 0xFFFF
. Is this intentional?
Need to write down the answer to "why" we are doing this. The early start of the DesignPrinciples.md file tell mostly "what", we need to add "why". Particularly answer why other existing solutions were not chosen.
Some use cases have come up that may warrant another interaction model. The current 4 models all support a single REQUEST frame only. Bi-directional communication is happening under the covers, and can be achieved by an application using the 4 different REQUEST types in both directions. However, this does mean certain use cases would require the application to associate unidirectional streams, which somewhat defeats the point of having interaction models on top of a multiplexed bi-directional transport.
The two use cases I've come across both have similar semantics as follows:
The goal is to avoid re-fetching the initial burst of all data again each time the client changes the subscription.
Using just request/stream would mean the client cancels and re-requests each time, and gets a new burst of data on each change.
Another option is to maintain multiple separate request/stream connections that are more granular. This can indeed work in some cases, but if there is overlap amongst the criteria, it will still require canceling and re-requesting in some cases that are not purely additive.
The most efficient would be a custom application protocol on top that uses REQUEST_RESPONSE to send criteria, then REQUEST_SUBSCRIPTION to receive the events. This however is building a non-trivial behavior again on top of ReactiveSocket. It's no different than using HTTP GET to mutate server-side state and HTTP SSE to receive the changing output.
Thus, the proposal is to add a 5th interaction type, REQUEST_CHANNEL that allows multiple request frames and multiple response frames (potentially infinite or finite).
From an application perspective, it would look like a channel with the ability to write and read ReactiveSocket Frames.
// server
handleRequestChannel(Publisher<Frame> requests) {
// subscribe to stream of incoming requests ... write Frames to output
}
This is almost easy to add ... the one trick I can think of is thus far we have only had to have REQUEST_N semantics flowing from RESPONDER to REQUESTER, and Lease behavior going from RESPONDER to REQUESTER.
In this case, it would make more sense to have REQUEST_N semantics go both directions, since it is a ReactiveStream Publisher that would represent the input. The Lease would control if REQUEST_CHANNEL can occur, but once the channel is established, then REQUEST_N would be used in both directions.
The protocol as specified seems to be agnostic to the underlying processing model for the request/response in that you don't specify an API for producing/consuming the streams. Do you plan to specifying a higher level an Rx based DSL or API?
A note to make sure we get this working.
How should the Reactive Socket stack handle receipt of corrupted frames? Either that results in parsing failures or messed up data lengths that cause incorrect parsing and the next byte no longer being the start of the next frame?
When this happens, what is told to the application? How should the next frame be properly identified to recover the communication channel?
The protocol definition only specifies the underlying protocol is reliable, which only refers to delivery and non-delivery of data. It does not specify that it also provides data integrity.
Discuss and document how this protocol will interact in stateless microservice architectures while having persistent (stateful) connections.
How will push notification subscriptions and streaming responses behave with autoscaling, failure, load shedding, load balancing, etc?
Should we retain the ReactiveSocket name or find another?
Want to give a chance for broader debate and better ideas before we have production releases and are stuck with it.
Reasoning behind ReactiveSocket is:
This library is directly related to several projects where "Reactive" is an important part of their name and architectural pattern. Thus the choice was made to retain this relationship in the name. Specifically:
ReactiveSocket implements, uses, or follows the principles in these projects and libraries, thus the name.
Potential problems with the name are:
Benefits of the name:
Please weigh in with your thoughts and alternatives if you have them.
If we do change the name, I'd want:
We should reconsider having metadata on the REQUEST_N frame. What will an application do with that? How would we expose it to an app?
Looking at the header format, https://github.com/ReactiveSocket/reactivesocket/blob/master/Protocol.md#frame-header-format It looks like the first bit R
is reserved. Is there a motivation for this? Should this be left as 0 by the encoder?
The spec specifies that the stream IDs must be unique for a connection and any of the peer (client and server from the point of view of connection establishment) can initiate a stream and hence provide the unique ID.
How would an implementation make sure that the ID is unique? Should we prescribe any setup exchange like ID offset for the peer?
I'm working on the JS implementation of this protocol. In the docs, I do not see explicit values for Flag
. Does this information exist somewhere?
There are only 6 bits of unused flags that can accommodate changes introduced in future versions of this protocol. It will be good to have more "reserved" bits in the frame header (32 extra reserved). Thoughts?
Today the protocol supports keep alive frames but does not provide any guidelines on whether it should be enabled by default or not. So, it is left up to the implementations or the eventual users to decide whether it should be enabled or not for a connection.
There are a few cases, as explained below, when in absence of keep-alive, a request will be in a state of silent failure. All scenarios below are when the connection is in half open/closed state and hence data is exchanged in one direction but not the other.
REQUEST_STREAM
or REQUEST_SUBSCRIPTION
requests : If in the middle of these requests, the connection is half closed, then RQ will not be aware of this state change and potentially wait for the response forever.These edge cases (and possibly more) are subtle and are all solved by having keep-alive turned on the connection, however, we do not have keep-alive on by default. So, should we mandate that keep-alive is turned on by default and there is a setup that disables it, as opposed to explicit enabling by default?
What does it mean to not include the frame length in the header? Are those 31 bits all zero or is the header 31 bits shorter?
We had this discussion with @NiteshKant.
The question is should the lease contains the duration/deadline or not, knowing that a server can overwrite his previous Lease with a new one.
It's a tricky question, the problem with the deadline is that the propagation delay between the server and the client makes the deadline very imprecise. This leads to recommendation about having the window >> network latency.
On the other end, an overloaded server should need to do anything to implicitely reduce the flow of incoming requests. Then just leting the lease expire, is the perfect solution.
I tend to prefer the solution with the duration, but I can be convinced.
Does the protocol support requests Request Streams which span multiple frames?
For example does the protocol support the following example use case:
RQ needs to request for bookmarks from a bookmark service (RS) for a given customerId and a list of 1000 titleids.
RQ sends an initial frame containing the customerId and the first 100 titleIds.
RS immediate starts streaming back the bookmarks for a subset of the initial request
RQ sends a frame containing 800 titleIds while the initial responses are sent from RS
RS continues to stream back bookmarks
RQ sends a final frame containing the final 100 titleIds
RS streams the remaining bookmarks and a completion notificaiton.
After discussion with @tmontgomery @stevegury and @robertroeser we decided that it is too confusing to the application layer to allow METADATA frames to be pushed on streams >1. It still makes sense for stream 0 to push METADATA back and forth asynchronously, but at the application layer we will just have NEXT frames with data/metdata.
The protocol documentation suggests that the protocol is unreliable. Please state it explicitly. If it is a reliable protocol, state how.
The specification doesn't mention anything about cancelling on stream 0.
I think that we should either call out that's invalid, or specify the behavior.
I think that it should equivalent to closing the connection then making it invalid.
How long should server wait between KEEPALIVEs initiated by the client before it assumes that the client is dead?
How long should the client wait for the server to echo KEEPALIVE back before it assumes the server is dead?
I think it is somewhat implicit that on a stream either the requester or the responder can send DATA, REQUEST_N, CANCEL and other frames for a true full duplex interaction per stream but is not obvious, specially for someone coming from a ReactiveStreams/ RxJava background as the streams there are unidirectional i.e. only the producer sends onNext
, onComplete
, onError
and the subscriber sends requestN
.
I think it will help to clarify typical interactions in various request types. Also, this can be a good place to define what is illegal in any interaction.
The more I think about our current model of defining types of interaction (Fire n forget, Request-Response, Request-Stream, Request-Subscription), it appears to me that we are being too prescriptive. We may finally end up addressing all cases by incrementally adding things like REQUEST_CHANNEL (#30) but the question is, does it belong to the protocol?
One way to answer this question would be as to what value add does it have. I believe the semantics of what kind of interaction it is, are defined while creating an endpoint and does not change while actually invoking that endpoint. e.g.: An endpoint defined as request and response does not convert to request stream upon invocation. So, is passing this type over the wire meaningful?
Irrespective of whether we define these interaction types, I think there are two important constructs this protocol is establishing today:
If we instead define the protocol as the above two (eliminating the specific interaction types), the protocol will be more flexible and will not force people to fit their requirements into specific interaction types. The interaction types will be defined by the usage like:
reactiveSocket.request() /*Initiate new stream*/
.write(request data)
.continueWith(response)
.take(1)
reactiveSocket.request() /*Initiate new stream*/
.write(request data)
reactiveSocket.request() /*Initiate new stream*/
.write(request data)
.continueWith(response)
.take(n)
reactiveSocket.request() /*Initiate new stream*/
.write(request data)
.continueWith(response)
reactiveSocket.request() /*Initiate new stream*/
.write(request data subject)
.continueWith(response)
I do believe though that such interaction models are useful while defining client and server libraries, such as developed under reactivesocket-java.
I'm curious why you've chosen to support these two scenarios with two distinct flow control mechanisms rather than one. At first blush they both appear to operate by granting/revoking credit, and while they have slightly different encodings for credit (absolute vs cumulative) and slightly different means for revoking credit (ttl vs cancel), I suspect there is probably a single mechanism that could accommodate both scenarios. This could be important depending on desired topologies, e.g. if you want backpressure to propogate smoothly through a given endpoint then it is important to consider what is involved in order to bridge between different flow control mechanisms.
The frame header states the frame length field is included depending on the underlying protocol. I don't think this is a good idea. It makes building and parsing the frame more difficult with lower-level dependencies that may or may not be something the protocol stack can know. It also means the receiver has to figure out if the frame length is there or not. And it means your protocol cannot be used across multiple underlying protocols (which is how the Internet actually works).
Want to explore whether we should be more prescriptive in what metadata is exchanged. An example of this is GRPC: https://github.com/grpc/grpc-common/blob/master/PROTOCOL-HTTP2.md#requests It is very explicit in what metadata is exchange to force client/server to agree upon metadata allowing certain behavior, such as timeouts. It also defines more explicit error codes.
Is there a reason for us to embrace any of these ideas?
Right now we have defined the default schema as having the URI in "Data": https://github.com/ReactiveSocket/reactivesocket/blob/master/Schema.md#data
The problem with this is that a routing layer (such as Zuul) must decode the Data payload, just to get the URI, even if it ignores the rest.
Should we instead put the URI in the metadata?
We would change the default schema to:
Fields:
Key: body [OPTIONAL]
Value: Object
Fields:
Key: uri [REQUIRED]
Value: String
However, this creates another problem. It couples the metadata and data together, which defeats the purpose of having different mime types for them?
Maybe we make "metadata" as the required fields, and leave "data" completely open? Like this:
Fields:
Key: uri [REQUIRED]
Value: String
Key: body [OPTIONAL]
Value: Object
@benjchristensen tells me that the implementation intends to hide processing of multiple frames so that the application only receives the final result of all the metadata and payload data put back together.
For payload data, which is arbitrary bytes, I assume all the data is concatenated and delivered as one big binary blob. But what about metadata? The concept of metadata in Reactive Socket itself implies some sort of meaning or encoding. (If there isn't any, then metadata should be defined as just a separate binary block for separate delivery to the app, as kind of a convenience method by which the app can get two binary blobs instead of one that it has to split into two.)
The behavior of delivery to the application needs to be defined.
Any side of the communication channel may send a LEASE frame at any time to change the number of requests considered acceptable for receipt. The time duration starts from the time a endpoint receives the new LEASE frame. The actual time at which a received LEASE frame is processed may be a long time after the LEASE frame was sent.
At the same time, the defined behavior when receiving a request that exceeds the LEASE granted by the receiver is to ignore and drop that request.
Assume the current lease granted is for 1 million messages over the next 24 hours. And this LEASE frame was created at time 0 and received and processed at time 1. The sender then makes 500 requests. Assume the receiver then generates a LEASE frame granting 1 message over the next 5 seconds at time 2, and this updated LEASE frame is received and processed at time 20. Between time 1 - 19, the sender sent 500 requests. But the receiver will only accept requests it receives between time 1 - 2. All the rest will be dropped.
How does the sender know if the messages got through or not? As far as it knows, all 500 of them were sent within the 1 million lease window. But the receiver disagrees. Should the sender be waiting for a response? Should the sender timeout and resend the same request? What does that mean if the request is not indempotent?
The "Handling the Unexpected" section of the protocol definition states the application is required to handle all this. But the protocol itself prevents the application from having all the knowledge it needs in order to make decisions.
The current protocol spec suggests that an absence of a response from the server suggests accept of the setup. I think we can break it down as per use case:
Broadly, making an acknowledgment from the server optional from the client. The current spec is limiting in this regards that it can not support handshake as opposed to making it optional.
The lease mechanism in the protocol allows clients that are connected to multiple servers to make a smart choice about which server to send a request (i.e. client side load balancing). If you wish, you can provide an analogous capability for servers. When multiple clients connect to a single server (or the fan-in/merge case), a server is faced with a choice of how to allocate its available capacity between all of the connected clients. This can be a tough problem. Distributing capacity evenly may not always work well because 90% of your load may come from 10% of your clients. In other words not all of the clients may have the same level of demand for that capacity. If the protocol doesn't model/communicate that demand, then the server has less information about how best to distribute its available capacity.
By modeling demand in the system (demand roughly corresponds to request backlog or potential for request backlog) you can create a strongly symmetric situation where servers have the same ability that clients do to make smart choices. In this sort of system, requests flow "downstream" through your network towards areas of greater capacity, and capacity flows "upstream" through your network towards areas of greater demand.
It turns out this sort of thing is quite useful in a messaging protocols since at the layer where everything is a one way message flow "demand" can be easily mapped to the message backlog at a given sender, and this is information the receiver doesn't have unless the sender chooses to share.
Fragments are referred to in a few places. From the frame definitions I am led to believe that a fragment is another frame (e.g. two response frames, the first with F-flag set, the second with C-flag set). But the term fragment is not defined. Is a fragment just another frame? Then it should just say frame.
Each message (up and down) needs to support metadata. This is similar to service tokens, headers or cookies in other protocols such as HTTP.
Example use cases are:
At this point we have two concepts defined in the Protocol:
I may be missing usecases but I think these two are redundant.
As it is mentioned in the spec, If the streams are ordered, a preceding metadata frame can define metadata for the next data frame. OTOH, the metadata in a frame can be used to just define metadata without any data (the metadata push frame semantics).
It appears to me that we can do away with one of these concepts.
Perhaps we should be more clear on what happens if a requestor doesn't respect the lease?
Applicable sections:
It says the responder MUST ignore the request. Are we consciously NOT returning an error message? If so, perhaps we should spell that out (and why)?
The "unexpected" section also isn't clear to me on these two:
Lack of LEASE frames that stops new Requests is an application concern and SHALL NOT be handled by the protocol.
Does that confirm that the responder will just ignore the requests?
Request frames indicate the payload data should contain a service ID and request parameters. The encoding of these two things are undefined. I suspect in fact these two things are part of the application protocol running on top of Reactive Socket and they should not even be mentioned in the Reactive Socket protocol.
Capture and document the design and architectural principles in https://github.com/ReactiveSocket/reactivesocket/blob/master/DesignPrinciples.md
I would like to propose adding a new concept: applicative ping (that can also be used as keepalive).
This can be implemented as a specific REQUEST_RESPONSE / RESPONSE or as a new message type (I don't have any strong opinion yet).
This can allow us to do quick and efficient failure detection (e.g. phi accrual detector)
I have a few clarifications w.r.t the stream identifiers defined in the protocol.
Stream IDs are generated by a client.
Doesn't this impede server initiated requests usecase or am I missing something here?
Stream ID generation SHOULD start at 1 and increase contiguously for each request type used by the client.
Is this suggesting that I can not skip stream IDs i.e. I can not have a stream ID of 1 then 5 then 10, etc.? I think in general it is an ok restriction however, it would mean that there is a globally (for the connection) synchronized counter for stream IDs in the client code. If from the protocol point of view we are not inferring status of one stream by virtue of a start of another stream (like in HTTP/2) we can actually relax this requirement to a stream ID must be locally unique for a connection. If this is done, a client can actually shard ID spaces and eliminate the globally synchronized counter.
Now that we have requestChannel
we also need to support REQUEST_N going both directions.
This will require us differentiating which direction the frame is flowing, otherwise we won't know which stream it applies to since requests are bi-directional and both sides start at streamId 1.
Talked with @tmontgomery briefly about this and it seemed like a bit would need to exist on the REQUEST_N frame to state whether it originated from the requestor or responder.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.