(Setup: your BasicXX example classes using multicast addresses [JDK 1.8_11 Ubuntu 14.0

Multicast: first subscriber gets full message replay, following subscribers not ?,about real-logic/aeron

Comments (38)

mjpt777 commented on May 18, 2024

The first subscriber gets the initial term window. Once the window is full then back pressure is applied.

How exactly this behaviour should work I have been mulling for sometime.

from aeron.

tmontgomery commented on May 18, 2024

The default flow control strategy for multicast uses an aggressive scheme where the sender only cares about the most forward receiver. Also, it doesn't track number of receivers or consider slow receivers.

Thus the behavior is that the first subscriber "unblocks" the sender. Subsequent receivers may miss messages as they wait for a SETUP frame to come down. Under load, this means loss.

If additional semantics of coordinating initial setup are needed, then a custom flow control strategy is the desired way to make this work.

from aeron.

RuedigerMoeller commented on May 18, 2024

Edit (after having some wiki):
I do not completely understand how to set retransmission buffer size and in which ways this interacts with various windows. So questions:

How is the size of the buffer avaiable for retransmission determined (sender backlog) and does this interact with term window size ?
In a scenario where some publishers stream high volume all day long, its quite common for subscribers to just get the first-message that is not part of a fragmented message, as a replay might kill first subscribers in case of large "term windows" [<= unsure how big these might get].

from aeron.

RuedigerMoeller commented on May 18, 2024

If I read the source correctly, there is a problem when trying to "unlock" a sender using custom flow control:
no receivers => no status messages => flow control is not invoked => no chance for custom flow control to manipulate positionLimit along the lines of

public long onStatusMessage( .. ) {
     return Long.MaxValue;
}

(would fail anyway as diff to current position is calculated using int).
What I like to achieve is to never block the sender regardless if there are subscribers or not (so even more agressive than current flowcontrol). If a subscriber can't keep up, it will have to drop. Any ideas ?

from aeron.

tmontgomery commented on May 18, 2024

There is no retransmission buffer per se. It depends on the term buffer rotation how much is available for retransmit at any one time. In practice, this can vary between 1 term size and 2 term sizes depending on location in the stream and cleaning. Notice that flow control interacts here and slows things down if too fast for a receiver to consume. This tries to keep the term in range for NAKing and retransmission.

The term window is used as a flow control construct. It is essentially how much space is available to be immediately sent between either publisher to driver, driver to driver, or driver to subscribers. It's not really for retransmission, it's for flow control.

In terms of "unlock"ing a receiver without an SM, currently no. But if you wanted to do it, it would be possible to do it a number of ways. Again, I wouldn't do this since flow control is actually a good thing.

Here is a possible way to do it... have DriverConductor periodically call into the strategy as a "tick". i.e. call a method on a periodic basis that could be used as a token bucket rate limiter. This could then move the position ahead in a way to keep things moving at any specific rate desired.

Check how the conductor handles liveness checks for a way to do the handling of the periodic call.

It might not be bad to have a periodic update for the strategy anyway...

from aeron.

RuedigerMoeller commented on May 18, 2024

Thanks for clarification, very much appreciated.

BUT:
I think there are (some if not many) usecases where this behaviour is a no-go:

Think of a system of high volume realtime data streams. You have subscriber processes which perform real-time computation on those streams. You don't want to have message loss on each hickup (so reliable udp is favourable), but in case receivers can't keep up, they have to accept the loss and resync their computation (so basically just have some gaps in their computed output stream data series then).
The Main Sender of input data never should get blocked under no circuumstances as erm its realtime data :-).
there are subscribers which can keep up with data rate but may suffer Full GC's on accident. By ensuring large "retransmission buffers", one can assure they don't get unrecoverable message loss even when Full GC'ing (though this happens twice a year).

Sidenote 1: When choosing a multicast address, Aeron still defaults to UnicastSenderFlowControl
Sidenote 2: I came up with the following idea to achieve zero-sender blocking:

Set initialLimit to MAX_LONG, ignore onStatusMessage's

public class NonBlockingFlowControl implements SenderFlowControl {

    @Override
    public long onStatusMessage(int termId, int completedTermOffset, int receiverWindowSize, InetSocketAddress address) {
        return -1;
    }

    @Override
    public long initialPositionLimit(int initialTermId, int termBufferCapacity) {
        return Long.MAX_VALUE;
    }
}

in addition computation in DriverPublication

private int sendData()
{
    final int bytesSent;
    final long lastSentPosition = senderPosition.position();
    final long availableWindow = (senderLimit.get() - lastSentPosition); // don't int cast here
    final int scanLimit = (int)Math.min(availableWindow, mtuLength); // but there

and

public void updatePositionLimitFromStatusMessage(final long limit)
{
    if ( limit < 0 ) // no limit set
        return;

Would this screw up Aeron ?

from aeron.

RuedigerMoeller commented on May 18, 2024

There's probably a typo in Configuration.java defaulting MulticastFlowControl to UnicastFlowControl (shouldn't this be MaxMulticast*FlowControl ?) ...

from aeron.

tmontgomery commented on May 18, 2024

Thanks for the catch. That was a holdover from a refactor, I think. I'll make a note to clean it up.

Setting the initial position limit should be OK. It effectively disables flow control between drivers. But, the sender may overrun the receiver. So, you might see errors from that. Overruns are dropped. So, may need to up the receiver window size to compensate.

Couple thoughts on the points you raise.

Your first point is, essentially, most market data systems using snapshot and delta. Works fine. Can build that with Aeron just fine... some additional points on how flow control handles spikes in a second, though.

Your second point is also the same. Depends on how flow control handles spikes.

Aeron's flow control is deliberately designed to support very large windows for various reasons without back pressuring until it all is full. As SMs come back very frequently, the only real time a publisher is flow controlled is when it has exceeded the fastest receiver by over a full term size. As the term size can be set to pretty high values (as long as they fit in memory), this means that spikes of well over several gigabytes should be supportable. This is very different than TCP where the window is small. In Aeron, the windows are large and can only back pressure when full.

In short, if you actually don't want to backpressure the publisher, then use a big term size as you would for handling loss recovery, but think more in terms of how far ahead you could possibly want the publisher to be before it should be held back (as a last resort).

@mjpt777 has a great blog post about flow control and back pressure. I'm sure you've seen it. Having written these systems with and without flow control in the past, I prefer flow control and making being able to use big windows for it as a fallback.

from aeron.

RuedigerMoeller commented on May 18, 2024

Hm .. sorry to nag you, I see your points, however consider:

if I have huge term windows (e.g. 1-2GB), the first subscriber will get a big load of old data as it gets a replay of the full term window (see original complaint above). Its common to have publisher start and then subscribers join an ~hour later.
Stalling the sender does not really make sense (for my use case) as receivers are forced to consume a full window of outdated events, one would rather have them 'drop' and resync to current data or have the sender backlog work like a ringbuffer and overwrite oldest parts of buffered events (that's how wllm, fastcast work).

Ofc I have read most of the stuff of mjp, on the other hand I also have spent some years dealing with backpressuring/controlling high load event systems/clusters ... the current behaviour of blocking the sender if no receivers are present just won't work for us (especially with large term windows).

Anyway I really would like to embrace Aeron as it seems it has excellent latency behaviour outclassing other libs. Code base looks clean+well organized. Like ;-)

BTW my tweak above does not work, sender still stalls if no receivers are present. To be continued ..

from aeron.

RichardWarburton commented on May 18, 2024

Hi Ruediger,

If the goal is to throw away data until you have subscribers isn't that a concern that can be handled in an application protocol? Your publisher would write an INIT message at the start and then throwaway any data until it receives a JOIN message for any machine which is subscribing to the feed. Now it knows you've got subscribers you can start publishing messages as normal. This satisfies the usecase without changing Aeron's flow control and the only added complexity is that the subscribers whose joins you care about also need to be publishers.

from aeron.

RuedigerMoeller commented on May 18, 2024

Hi Richard,

Hm .. I am into Aeron just since some hours .. if there is the possibility for a publisher to detect subscribers, your proposal would solve the issue.

Can you give me a quick hint how to detect this ?
What happens if subscribers vanish and come back later, can I reliably track this from sender side ?

from aeron.

RuedigerMoeller commented on May 18, 2024

Still: consider the subscribers are not under my control (e.g. controlled by third parties). In the morning there are only some 2 slow subscribers, so data stream is pressured back and starts lagging significantly (ofc it will catch up time to time as sender will skip messages if blocked, but with large windows there will be plenty of outdated stuff being sent).
Now another party joins and initially gets a whole bunch of outdated data (e.g. 1GB) until it catches up. Assumed it is the fastest receiver then, this will lead to a drop of the initial 2 slow subscribers all over a sudden. So subscribers are capable of side effecting each other.

from aeron.

tmontgomery commented on May 18, 2024

The protocol spec should allow a sender that sends without having an SM. The implementation right now, may not, however. But, I think a modification to allow this to occur with a custom flow control strategy has some merit.

@mjpt777 @RichardWarburton what are your thoughts on allowing the ability for a flow control strategy that can send in the absence of receivers? Perhaps one that is just rate limited?

from aeron.

tmontgomery commented on May 18, 2024

Opening back up for comments and visibility.

from aeron.

RichardWarburton commented on May 18, 2024

I think its a reasonable addition. Implementing it will certainly ensure that flow control is decoupled from other parts of the system ;)

I think Ruediger has also suggested another feature here as well - the ability to receive notifications on subscribers joining a stream. I can see that also being a useful feature for people wanting to monitor/report on what is going on with Aeron. Do we want to consider adding that as well?

from aeron.

tmontgomery commented on May 18, 2024

I've got battle scars for receiver notifications. They can't really be made reliable notifications in a multicast case unless receivers are tracked. Which might be a good flow control mechanism. However, as the protocol does do some of the job of helping make them more best-effort, maybe they become more usable. Interested in @mjpt777 views as well.

from aeron.

RuedigerMoeller commented on May 18, 2024

From my experience its very hard and cumbersome to make Publishers aware of Subscribers. Agree with @tmontgomery . It will increase complexity of rel. multicast and might be too unreliable to have a real use (except monitoring maybe).

I tried with with a NonBlockingMulticastFlowControl (see my branch), what's missing is to ensure the highwatermark is automatically increased (currently sender gets blocked once term is full).
Unfortunately we had a not-so-good release on monday so I am busy 23x7 fixing things :/ this week ..

From a broader view: backpressure ofc is useful, however in many systems there are ~ 'Master Publishers' which have kind of higher priority so don't should be backpressured by less important processes.

Additionally I think one should not get too obsessed with advanced flow control. Simple things like (adaptive/heuristic) rate limiting work pretty well and (most important:) are predictable once you have 50+ nodes in a cluster.

from aeron.

tmontgomery commented on May 18, 2024

For larger receiver sets, over 25 or so, I would not use flow control. And I would (and have) used FEC (both proactive and NAK-based) for receiver sets of that size and much, much larger. It's a different set of requirements and environment. Aeron isn't well suited for that at this point. It might be later, though.

I'll take a look at your branch and have a think on what else we might need to change or make less strict in the protocol. I'm busy with the holidays & prep for YOW! next week. So, won't be immediate.

from aeron.

RuedigerMoeller commented on May 18, 2024

Being 'busy with holidays' sounds like a good thing to have :-)
I meant 50+ receiving processes running on 10..15 boxes, so probably still doable with plain NAK (at least with hardware setup I tested)

from aeron.

mjpt777 commented on May 18, 2024

I've thought it is odd that a publisher can offer up until the publication term limit and have this delivered to the first subscriber but not subsequent subscribers. I'm coming round to thinking that a publication should be clamped by flow control until at least one subscriber connects.

I think we need to consider flow control in a wider context when we introduce local IPC connected subscribers. They can be much faster consumers than remote consumers.

I think the default should be to gate on all subscribers using the slowest for pace setting. This will be the least surprising for most users. We could then add strategies that allow slow consumers to be discarded but need to consider how those applications get notified that they have fallen behind and allow hooks for optional replay from a persistence service or restart at current position.

from aeron.

RuedigerMoeller commented on May 18, 2024

Hi Martin,

Disclaimer: I can only talk for our use cases ..
"
I've thought it is odd that a publisher can offer up until the publication term limit and have this delivered to the first subscriber but not subsequent subscribers. I'm coming round to thinking that a publication should be clamped by flow control until at least one subscriber connects.
"
This would also work fine, as all subscribers would be treated equal.

"
I think the default should be to gate on all subscribers using the slowest for pace setting. This will be the least surprising for most users.
"

I disagree, default should be the most agressive flow control policy. With the "wait for slowest", any misbehaved subscriber would slow down the whole cluster (e.g. java subscriber with mem leak full-GC'ing frequently, but not dying). Same applies to network issues/failures.
Any kind of failover will likely create hickups in throughput then under high load. By sizing the term window appropriate one defines the range of accepted "peak slowdown/fall-behind" for subscribers. If they fall behind more than that, forcing subscriber failure / unrecoverable loss (with notification) is favourable [cluster resilence].
It really depends on use case, though. E.g. if an application is multicasting large file data, things look different.

from aeron.

mjpt777 commented on May 18, 2024

So every time you have a subscriber fail just outside the NAK window for fastest subscriber then you initiate recovery, or restart, for the subscriber that stumbled?

from aeron.

RuedigerMoeller commented on May 18, 2024

Yes, that's common. Additionally, the publisher has a rate limit (can be dynamically adjusted by the messaging layer if there are too many drops, consensus-alike), so there is kind of a QOS contract between Publishers and Subscribers.
Applying unicast patterns to multicast communication does not work well imo, as backpressure put onto the sender affects all subscribers which is unwanted. E.g. a new node starting up (unjitted) will slow down whole cluster.
Frequently a subscriber can recover from message loss by "resyncing" (e.g. when doing calculations on a time series window, it just resubscribes). Sender backlog of 1-2GB are not uncommon. As Aeron holds one buffer per machine (as far i understood), it could have great savings compared to messaging products which hold buffers per process.
This applies to realtime systems only, as you cannot backpressure "the Market" (well if somebody manages to do this then its called a 'crash' ;-) )

from aeron.

RichardWarburton commented on May 18, 2024

Different users of Aeron have different requirements that can be accommodated with a common core. Are we all in agreement that flow control should be decoupled from the rest of the implementation and pluggable?

There are different strategies that can be catalogued:

Just apply a fixed rate limit.
Slow everyone down to ensure every subscriber keeps up. I think this is Todd's existing optimal multicast strategy. I think there are plenty of situations where people will want this kind of thing.
Slow everyone down to ensure n subscribers keep up, allowing m other subscribers to fall behind. Here the n are critical systems which should influence backpressure and m are things like monitoring or debugging services which shouldn't slow the other subscribers down.

Am I missing any strategies here?

from aeron.

RuedigerMoeller commented on May 18, 2024

Agree so far, however as far I understood Todd, current default strat for mutlicast is to keep up send rate with fastest subscriber, so any slower will be dropped.

"
Slow everyone down to ensure n subscribers keep up, allowing m other subscribers to fall behind. Here the n are critical systems which should influence backpressure and m are things like monitoring or debugging services which shouldn't slow the other subscribers down.
"

currently n = 1, I wanted n = 0, martin proposed n = M. (M = number of subscribers)

from aeron.

RuedigerMoeller commented on May 18, 2024

just another question:
Is having multiple senders on a stream/topic supported ?

from aeron.

mjpt777 commented on May 18, 2024

Yes you can have multiple publishers to the same channel and stream. However they will be a different session. The subscribers can register for notifications of when new connections occur when a new session connects for a given channel and stream.

from aeron.

tmontgomery commented on May 18, 2024

@mjpt777 and I are at YOW! and we've talked a little bit about this. More discussion to follow. However, I think we both think that a fixed rate limiter as a strategy makes total sense.

from aeron.

RuedigerMoeller commented on May 18, 2024

That's good news :-)

from aeron.

strangelydim commented on May 18, 2024

I ran into the behavior reported in the original issue report today... and was similarly perplexed. Seems odd that the behavior a new receiver gets is different if it was the first receiver vs. a subsequent receiver. Not what I would have expected to see.

from aeron.

tmontgomery commented on May 18, 2024

After some discussion, the plan is to disallow Publication.offer to succeed unless at least one subscriber is known.

from aeron.

RuedigerMoeller commented on May 18, 2024

Actually this seems like an elegant and low effort solution. Sounds like a
good idea to me :)

2015-03-31 19:52 GMT+02:00 Todd L. Montgomery [email protected]:

After some discussion, the plan is to disallow Publication.offer to
succeed unless at least one subscriber is known.

—
Reply to this email directly or view it on GitHub
#34 (comment).

from aeron.

RuedigerMoeller commented on May 18, 2024

On the other hand, can a sender distinguish this from backpressure being
applied by slow receivers ?

2015-03-31 21:45 GMT+02:00 Rüdiger Möller [email protected]:

Actually this seems like an elegant and low effort solution. Sounds like a
good idea to me :)

2015-03-31 19:52 GMT+02:00 Todd L. Montgomery [email protected]:

After some discussion, the plan is to disallow Publication.offer to
succeed unless at least one subscriber is known.

—
Reply to this email directly or view it on GitHub
#34 (comment).

from aeron.

tmontgomery commented on May 18, 2024

We might be able to. Initial thought was to mask it so that it is just a window of 0. Are you thinking the app should be informed this is different than a normal flow control?

from aeron.

mjpt777 commented on May 18, 2024

When we change the offer to return the position, 0 can mean not yet connected, -1 is back pressure, and any positive value is the new position. Would that work?

from aeron.

RuedigerMoeller commented on May 18, 2024

@mjpt777 & @tmontgomery
I think this could work. From a sender perspective, if there are no subscribers, the rejected message can be discarded. In case of block caused by backpressure, a sender might want to either "alarm" or apply kind of adaptive netting to reduce send rate.

from aeron.

mikeb01 commented on May 18, 2024

@mjpt777 As a variant on your suggestion, perhaps negative values can be a series of errors codes, e.g:

-1 Not connected
-2 Back pressure
-3 some other error...

Feels more consistent and extensible.

from aeron.

tmontgomery commented on May 18, 2024

0 for most POSIX stuff means success. So, I would reserve it for returning position 0 (as in the first message to a stream). Then as @mikeb01 suggests, negative values for different errors.

Then the normal pattern of error checks is like it is with most syscalls for POSIX.

if (Publication.offer(...) < 0)
{
     // handle error
}

from aeron.

Multicast: first subscriber gets full message replay, following subscribers not ? about aeron HOT 38 CLOSED

Comments (38)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent