Comments (22)
Subscribe on topic URI prefix.
Currently, clients can only subscribe to full URIs.
There needs to be a distinction between subscribe to full or prefix.
Distinction could be a magic ending character in URIs or an explicit flag in SUBSCRIBE message.
from wamp-proto.
Not sure why you're not too keen on this, Tobias. Performance impact? It's a feature that is necessary if
- you have a dynamic list of topics whose names cannot be guessed/enumerated, but should be programmatically subscribable
(unless you address that using discoverability
). But more important than (non-)discoverability, if you have tens or hundreds of thousands of topics (which is typical e.g. for price quotes from multiple financial exchanges, depending on how you structure the topics) it might become more efficient to do pattern matching than matching through subscription lists where every client subscribes to hundreds or thousands of topics (to see 20 properties of say, each of 50 stocks: 50 x 20 = 1000)
Sure, if you structure the topics differently, say bundling the properties clients are most often interested in into 3 or 4 topics, you can reduce the number of subscriptions for every client from 1000 to 50, to stay in the example above.
But say I want to get only the last price of every trade made on every stock (say there are 10000 different stocks), with a pattern it looks e.g. like `/stocks/*/last-trade-price', without wildcards, you have a rather long list of subscriptions...
I do understand that prefix-matching is very easy to implement efficiently, and arbitrary matching needs some more sophistication, but not implementing a more general form of pattern matching makes the aforementioned scenario more inefficient.
Because a topic hierarchy as in URIs is one-dimensional, one has to decide which component to put last, and in case of prefix-matching only, you might be forced to make a tradeoff which can have a great negative impact on performance - or you disallow subscribing to more than _x_ topics.
If flexible pattern matching gets added, it does not necessarily mean that normal, literal subscriptions are negatively affected by this performance-wise, even if the flexible pattern matcher is (in an initial implementation) slow. And even that slow pattern matcher might be more efficient than having a couple of clients subscribing to 1000 topics each.
So it would be a win-win situation: Literal subscribers aren't affected (if you don't count the 2 or so clock cycles spent on the if-branch), and wildcarders can rock.
from wamp-proto.
P.S. Implementing flexible pattern matching will require a tradeoff between
- quick subscription
- quick message filtering/routing
But that looks like an easy one to me. Subscribing, even if it takes many milliseconds, doesn't need to hold up existing message flow, but if you put as much effort as possible into the subscription step to create efficient data structures, message filtering/routing will get faster. And we are not even talking about content-based matching here. It's just pattern matching on a one-dimensional, simple string.
from wamp-proto.
N.B. I find the approach of MQTT - the protocol that you refer to - defining multi-level and single-level wildcards, useful: http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html#appendix-a
The latter is how MQTT is doing it. However, they are not implementing real "set semantics" on top of wildcarded topics:
Which they however consider a bug in a particular implementation, according to that same thread:
As Nick already mentioned in another email, overlapping subscriptions
should not result in a message being received more than one. This is
however an outstanding bug in mosquitto
From a subscriber's point of view, I agree that it is probably better if duplicates are not sent in case of double subscriptions due to wildcards, and the way I'm (still somewhat vaguely) imagining an implementation of the matching logic inside the message broker, it would even be slightly harder (i.e. slower) to figure out if it should be sent twice in case of said event to finance/stock/ibm/closingprice
, when client subscribed to both
finance/stock/ibm/#
andfinance/stock/ibm/closingprice
from wamp-proto.
I don't agree. I don't think that arbitrary wildcards can be implemented efficiently, which is the reason the MQTT implementation does not implement it (properly). If you disagree I'd be interested in example code for the following:
class Subscriptions:
def addSubscriber(self, sessionId, topicPattern):
return
def lookupSubscribers(self, topic):
return []
def removeSubscriber(self, sessionId, topicPattern):
return
The run-times of these functions need to be at most proportional to the length of topic
/topicPattern
and logarithmic in the number of subscriptions. lookupSubscribers
must return a session ID at most once. removeSubscriber
must have proper set semantics.
I think it would be more practical to come up and discuss implementations of above. Since I at least am not interested in an academic protocol definition which cannot be implemented efficiently.
from wamp-proto.
There'll have to be a tradeoff, as mentioned in my #10 (comment) . You have to decide whether you want to make addSubscriber
and removeSubscriber
fast, or lookupSubscribers
. You can't get both. And the faster you want to make lookupSubscribers, the slower the other two guys get.
And as lookupSubscribers needs to be called on every message, I'd vote to make that one fast. Luckily, addSubscriber and removeSubcriber are not only called much more rarely than lookupSubscribers, but they could also be implemented offline, i.e. they don't need to run on the msg processing thread or block it, and even though they might take half a second to finish, the only ones affected are clients while doing a subscription, because they have to wait (of course not necessarily blocking, but they'd wait for the first message to arrive). As soon as the addition is finished, it'll be swapped in in an atomic operation. That a(n asynchronously finishing) removeSubscriber call takes time would probably not be an issue for clients. For reasons of efficiency, the messaging systems that I have worked sometimes send a couple of messages after unsubscribe
has already returned, which has never been an issue, as long as the client was prepared for it.
That's my theory. lookupSubscribers would work on some kind of tree structure and there are a bunch of trees around that specialize not only in prefix searches - something that naturally works great with a lot of tree structures - but that specialize on approximate searches etc. I'd have to do some research though to back this up. I hope I'll find the time - figuring out how to solve this problem efficiently has been on my list for some time now...
from wamp-proto.
FYI, this is how the AMQP implementation in Red Hat's Enterprise MRG does it:
Binding keys for a Topic exchanges can include wildcard characters: a “#” matches one or more words, a “*” matches a single word. Typical bindings use binding keys like #.news (all news items), usa.# (all items in the USA), or usa.weather (all USA weather items).
Some example topic subscriptions from their docs:
usa.#
#.news
#.weather
europe.#
from wamp-proto.
2 quick comments:
-
You can have your topics structured like
/stocks/last-trade-price/*
and then use prefix pattern matching to subscribe for all stocks at once. -
As said, seeing is believing: if you can come up with an implementation like described above (efficient, correct set semantics, sane memory consumption) that would be a start. I am not opposed to any change, it's just that the current approach has proven to work, the prefix-based will work, but I am not convinced that a general regular pattern matching would work.
from wamp-proto.
You can have your topics structured like /stocks/last-trade-price/* and then use prefix pattern matching to subscribe for all stocks at once.
I want to let clients subscribe for the last-prices of all products and at the same time for all properties from one product, depending on what that. (Maybe one and the same client will rarely do both at the same time, but client A wants this, client B wants that)
the current approach has proven to work, the prefix-based will work, but I am not convinced that a general regular pattern matching would work.
I agree, performance is crucial in messaging, and I'll try to offer some algorithm, time permits. Having said that, let me repeat: Offering a general pattern matching variant does not have to make it slower for those users who don't use that feature, but it does offer the feature to those who want it.
So you would basically be saying: Here's your feature, it's slower than a subscription by fixed string or prefix, but if you insist... use it.
AMQP and other messaging solutions allow matching by message content or meta-data (key/value pairs). And if you look at what the CEP / stream event processing guys are doing, they filter, slice and dice and aggregate etc. using sql-like where clauses. Of course that has to be slower than a O(1) hash lookup, but it's a feature that is very useful for some. Granted, the CEP guys often do not care too much about reliability, it's often a bit probabalistic, but not always. Nor is matching by meta-data in messaging solutions.
from wamp-proto.
Here is a naive implementation:
Wildcards are allowed for hierarchical parts of an URI, i.e. /stocks/*/last-trade-price
is ok, but not /stocks/In*Foo*Bar/last-trade-price
. Lets assume *
is a special char not allowed in URIs normally.
Let addSubscriber
and removeSubscriber
just maintain a simple hashmap from topic/topicpattern => list of session IDs. Nothing magic here.
Now lookupSubscribers
called for topic /stock/IBM/last-trade-price
needs to do lookup in that hashmap for:
/stock/IBM/last-trade-price
/*/IBM/last-trade-price
/stock/*/last-trade-price
/stock/IBM/*
/*/*/last-trade-price
/stock/*/*
/*/IBM/*
/*/*/*
to create a union of session IDs that will receive the event. The union'ed list needs to be processed for duplicates, i.e. when a client has subscribed for both of
/stock/IBM/last-trade-price
/stock/*/last-trade-price
The union'ed list needs to be processed for correct set semantics in case a client has subscribed to
/stock/*/last-trade-price
and unsubscribed from
/stock/IBM/last-trade-price
Doing so requires 2 hashmaps (subscriptions and unsubscriptions). But that would only be sufficient when unsubscriptions always override subscriptions (otherwise the order of subscribe/unsubscribe becomes relevant).
Performance problems:
- the lookup for all wildcards has run-time no longer linear in the length of topic, but exponential is the number of part of the topic
- the lookup (because of the dups removal and set semantics) has run-time > linear in the number of determined receivers, since the set operations might involve sorting
from wamp-proto.
CEP is very interesting area of course. But that would be a new dimension of scope for WAMP: being a protocol to access a full-blown CEP platform instead of: the minimal viable Web-based message brokering protocol. Do you have practical experience with CEP platforms?
from wamp-proto.
I'm by no means suggesting the addition of CEP features to WAMP. That would make it indeed a whole different beast.
What I wanted to say though, is that there are matching engines there matching using crazy rules, more complex seeming than my, ahem, modest request ;), and meant to be used with gazillions of messages per second.
from wamp-proto.
I have started an experimental implementation of this on
https://github.com/tavendo/AutobahnPython/tree/prefixsubscribe
https://github.com/tavendo/AutobahnPython/tree/prefixsubscribe/examples/wamp/pubsub/prefixsub
from wamp-proto.
Hi,
I haven't tested/reviewed the experimental implentation, but I was also considering the subscription with wildcards in arbitrary places (but not publishing with wildcards).
I think that an implemenation that "caches" the list of the topics that match the subscribed patterns could be quite efficient. Instead of just maintaining a simple hashmap from topic/topicpattern => list of session IDs, it could maintain 2 hasmaps: topic => sessionIDs, and "used patterns" => list of topics that match.
(using the pattern as the key, and forgetting the hierarchical parts)
Pros:
- when using arbitrary wildcards, the lookup for the topics is linear with the number of the existing topics (but it is only required once for each different pattern used by any interested client).
- the lookup for the sessionId of the subscribed clients is O(1), because I'm not considering publishing to many topics with wildcards.
Cons:
- when creating/deleting a topic, a search of in the "used patterns" keys should be done to add/remove the topic in the lists that match with the pattern.
For example:
(sorry, I don't know python)
class Subscriptions:
patterns = {} # (pattern => listOfMatchingTopicsURLs)
topicSubscriptions = {} # (exactTopicURI => listOfSubscribedSessionIds)
def addSubscriber(self, sessionId, topicPattern):
if(topicPattern contains "the * wildcard")
topics = patterns[topicPattern]; # topics represents an temporal list
if(topics == null) # doesn't exist the cached list of topics that match the pattern, yet
topics = lookupAllTopicsThatMatch(topicPattern); # may be an empty list
patterns[topicPattern] = topics
foreach(topic in topics) # iterate for all the exact topic URLs that match the pattern
subscribeClientToExactTopic(sessionId, topic) # only in case it is not already subscribed
else # no wildcards
topic = topicPattern; # topic is an exact topic URL
subscribeClientToExactTopic(sessionId, topic) # only in case it is not already subscribed
return
def lookupSubscribers(self, topic):
return topicSubscriptions[topic]
def removeSubscriber(self, sessionId, topicPattern):
# similar to addSubscriber, but caling unsubscribeClientFromExactTopic
return
from wamp-proto.
Wildcard publications really is out of scope (for now), but your suggestion for efficient wildcard subscribe via caching is interesting. I have implemented arbitrary regular expression subscription (see the "prefixsubscribe" branch).
Now, one problem arising (already with prefix subscribe) is, that the client now also needs to do pattern matching for incoming events. This is because events (currently) only contain the concrete topic of the event (as specified when publishing), not the pattern that lead to a dispatch to a respective subscriber.
(Re-)implementing all the pattern matching logic contained in the broker also in the clients is I think something we should try to avoid.
We might need to extend the EVENT message, which currently is dead simple:
[8, "http://example.com/foobar", "Hello 2"]
For patterned subscribes, we might need to send something like
[8, "http://example.com/foobar", "Hello 2", "^http://.*/foobar$"]
This will lead to further code restructurings, since currently the broker serialized a dispatched message just once for all subscribers to receive. With above, it needs to serialize per matched pattern. Sure, no problem.
Anyway, I find it interesting how a harmless looking feature like "pattern based subscribe" leads to many consequences. Here is another one: what if I subscribe to "^http://.*/foobar$" and unsubscribe from "http://example.com/foobar" without subscribing before. Do I receive "http://example.com/foobar" events or not? In other word: correct set semantics crossing topic patterns?
from wamp-proto.
Regular expressions would be great, but they can also be confusing because expected results may vary when using different programming languages/regex engines. So, I would propose simple wildcard rules (i.e: * to represent many characters), that should be implemented the same way at both sides (client and server).
In the client side, it could use a different map:
- "patternHandlers" (the keys are topicPatterns, but the values are the listOfFunctionHandlers):
At subscription/unsubscription time, if the topicPattern contains some wildcards, it should add/remove the handler in the "patternHandlers".
When receiving the EVENT message (as defined in WAMP v1: [8, "http://example.com/foobar", "Hello 2"]), call the exact topicURL handlers, and also, if there are any "used pattern" subscriptions (paternHandlers.keys), call the pattern handlers that match the received topicURL (only affects efficiency in the clients that use this feature).
For example:
class Client:
patternHandlers = {}
def whenEventMessageIsReceived(event):
topicURL = event[1];
handlers = getNormalTopicHandlers(topicURL)
if(patternHandlers is not empty):
handlers = handlers.copy();
foreach(topicPattern in patternHandlers.keys):
if(topicURL matches topicPattern):
addPatternHandlersToSet(patternHandlers[topicPattern], handlers) # only the handlers that aren't repeated
callEventHandlers(event, handlers)
I haven't understood the last question about "semantics crossing topic patterns"...
Doesn't the server verify unsubscriptions, and avoids EVENT messages to clients that had unsubscribed from particular topics?
from wamp-proto.
What I meant with "set semantics" is the following.
Suppose the following sequence of client Subs/Unsubs:
SUBSCRIBE "http://example.com/foobar"
UNSUBSCRIBE "http://example.com/foobar"
After that, the client won't receive events for topic "http://example.com/foobar" .. of course.
SUBSCRIBE "http://*/foobar"
SUBSCRIBE "http://example.com/foobar"
After that, the client will receive events for topic "http://example.com/foobar" .. each event only once.
SUBSCRIBE "http://*/foobar"
SUBSCRIBE "http://example.com/foobar"
UNSUBSCRIBE "http://example.com/foobar"
After that, the client will still receive events for topic "http://example.com/foobar".
SUBSCRIBE "http://*/foobar"
UNSUBSCRIBE "http://example.com/foobar"
After that, the client will still receive events for topic "http://example.com/foobar".
SUBSCRIBE "http://*/foobar"
SUBSCRIBE "http://example.com/foobar"
UNSUBSCRIBE "http://example.com/foobar"
UNSUBSCRIBE "http://*/foobar"
After that, the client will not receive events for topic "http://example.com/foobar" any more.
Hence, there is no set semantics between SUBSCRIBE and UNSUBSCRIBE accross different topics or topics patterns. Order is insignificant .. and each SUBSCRIBE must be followed by an UNSUBSCRIBE for the exact same topic or topic pattern to cancel.
from wamp-proto.
- Regarding regular expressions: I agree, since there is no established and widely implemented standard, we should probably keep it simple .. only "" as path components (that is between 2 "/") will be allowed. And "" must be matched by a non-empty string without "/".
- If we do not send the matching pattern that lead to the dispatching of an event within the EVENT message, the client needs to do the matching again. Whats your opinion on that?
I was thinking of extending EVENT as follows:
[8, <topic URI>, <event payload>] OR
[8, <topic URI>, <event payload>, <event extra>]
where <event extra>
is a dict that could be
{"publisher": <WAMP session ID>} OR
{"matched": "wildcard:http://*/foobar"} OR
{"publisher": <WAMP session ID>, "matched": "prefix:http://example.com/event#"} OR
...
The sending of the publisher ID and/or the macthing pattern could be requested by the client as options in SUBSCRIBE.
from wamp-proto.
The fact is that I'm only thinking the subscription with wildcards as a simple utility to reduce the number of subscription messages, and to allow generic subscription to topics names that the client doesn't know.
So, in relation to the previously commented sequences of client Subs/Unsubs:
SUBSCRIBE "http://*/foobar"
UNSUBSCRIBE "http://example.com/foobar"
What I expect is a different behaviour: that the server subscribes the client to each of the topics that match, and the unsubscription makes the client to stop receiving the events from the topic "http://example.com/foobar".
In the case of 2 subs/1 unsub:
SUBSCRIBE "http://*/foobar"
SUBSCRIBE "http://example.com/foobar"
UNSUBSCRIBE "http://example.com/foobar"
I recognize It could be useful that the client still receives events from the topic "http://example.com/foobar"
(although I didn't implemented this way, a reference count of the subscriptions between the client-to-topic could be used to differentiate from the previous Sub/Unsub sequence).
Finally, I'm also agree with the other sequences and conclussions.
On the other side, I think the EVENT message shouldn't have to include the macthing pattern (to much repeated data in the payload).
But maybe a "metaevent" message sent once by the server with the topics that match the subscribed pattern could help to optimize the matching in the client:
class Client:
patternHandlers = {}
matchingTopicPatterns = {}
def whenMetaEventMessageIsReceived(metaevent):
sid = metaevent[3].sessionId;
pattern = metaevent[1]; # instead of an exact topic name
if(sid == theClientSessionId && pattern.contains(wildcards)):
sub = metaevent[2].endsWith("/sub#joined"); # boolean
unsub = metaevent[2].endsWith("/sub#left"); # boolean
topics = metaevent[3].topics;
foreach(topic in topics):
if(sub) matchingTopicPatterns[topic].add(pattern);
else if(unsub) matchingTopicPatterns[topic].remove(pattern);
return;
def whenEventMessageIsReceived(event):
topicURL = event[1];
handlers = getNormalTopicHandlers(topicURL)
if(matchingTopicPatterns[topicURL] is not empty):
handlers = handlers.copy();
foreach(topicPattern in matchingTopicPatterns[topicURL]):
addPatternHandlersToSet(patternHandlers[topicPattern], handlers) # only the handlers that aren't repeated
callEventHandlers(event, handlers)
from wamp-proto.
In hindsight, the main "problem" above is how to communicate the pattern that matched an event to the subscriber when it receives an event.
WAMPv2 solves this as follows
-
Subscriber subscribes to some topic or pattern via
options
[SUBSCRIBE, Request|id, Topic|uri, Options|dict]
-
The broker now acknowledges the subscription
[SUBSCRIBED, Request|id, Subscription|id]
-
Later, when an event is delivered, the broker sends:
[EVENT, Subscription|id, Topic|uri, Event|any]
The broker not only communicates the (concrete) topic
under which is the event was published, but also the subscription
under which the subscriber receives that event.
from wamp-proto.
Does it mean that the EVENT & METAEVENT messages need to be generated/serialized for each client?
(because each client has a different "subscription|id", doesn't?)
from wamp-proto.
Good question, but no (I would not have proposed it if that would be needed): The subscription ID is decided by the broker, and the broker will hand out the same subscription ID for the same subscribed topic or topic pattern. So the event needs to be serialized only once (per matching topic pattern, not per receiving subscriber).
from wamp-proto.
Related Issues (20)
- Add acknowledge option to CALL message? HOT 17
- How to serialize datetime type? HOT 17
- Web site: Add stuff to "Users and Resources" section HOT 10
- Best practice for error answer? HOT 1
- Unspecified Behavior for Callee Leaving During CALL HOT 19
- Unclear behavior for caller leaving during CALL request HOT 7
- Propagation of frozen options in progressive call invocations HOT 6
- Error URI for expired progressive call race condition HOT 3
- Is `wamp.close.goodbye_and_out` deprecated in favor of `wamp.close.normal`? HOT 5
- `wamp.error.no_such_session` is not documented HOT 1
- `wamp.registration.match` matching algorithm is undefined HOT 3
- Should subscriptions to meta events fire the `wamp.subscription.on_subscribe` event? HOT 1
- Tolerating old CALL request IDs when progressive invocations are supported HOT 4
- Add Payload to Abort HOT 28
- Add `REGISTER.Options.forward_timeout` option HOT 2
- Broken Applet on wamp-proto.org
- Equivalent of 500 Internal Server Error in WAMP HOT 5
- Incorrect maximum raw socket limit HOT 1
- Error URI for Rate Limiting Purposes HOT 25
- Add AsyncAPI to comparison on web site
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wamp-proto.