Code Monkey home page Code Monkey logo

Comments (22)

oberstet avatar oberstet commented on June 1, 2024

Subscribe on topic URI prefix.

Currently, clients can only subscribe to full URIs.

There needs to be a distinction between subscribe to full or prefix.

Distinction could be a magic ending character in URIs or an explicit flag in SUBSCRIBE message.

from wamp-proto.

EugenDueck avatar EugenDueck commented on June 1, 2024

Not sure why you're not too keen on this, Tobias. Performance impact? It's a feature that is necessary if

  • you have a dynamic list of topics whose names cannot be guessed/enumerated, but should be programmatically subscribable

(unless you address that using discoverability). But more important than (non-)discoverability, if you have tens or hundreds of thousands of topics (which is typical e.g. for price quotes from multiple financial exchanges, depending on how you structure the topics) it might become more efficient to do pattern matching than matching through subscription lists where every client subscribes to hundreds or thousands of topics (to see 20 properties of say, each of 50 stocks: 50 x 20 = 1000)

Sure, if you structure the topics differently, say bundling the properties clients are most often interested in into 3 or 4 topics, you can reduce the number of subscriptions for every client from 1000 to 50, to stay in the example above.

But say I want to get only the last price of every trade made on every stock (say there are 10000 different stocks), with a pattern it looks e.g. like `/stocks/*/last-trade-price', without wildcards, you have a rather long list of subscriptions...

I do understand that prefix-matching is very easy to implement efficiently, and arbitrary matching needs some more sophistication, but not implementing a more general form of pattern matching makes the aforementioned scenario more inefficient.

Because a topic hierarchy as in URIs is one-dimensional, one has to decide which component to put last, and in case of prefix-matching only, you might be forced to make a tradeoff which can have a great negative impact on performance - or you disallow subscribing to more than _x_ topics.

If flexible pattern matching gets added, it does not necessarily mean that normal, literal subscriptions are negatively affected by this performance-wise, even if the flexible pattern matcher is (in an initial implementation) slow. And even that slow pattern matcher might be more efficient than having a couple of clients subscribing to 1000 topics each.

So it would be a win-win situation: Literal subscribers aren't affected (if you don't count the 2 or so clock cycles spent on the if-branch), and wildcarders can rock.

from wamp-proto.

EugenDueck avatar EugenDueck commented on June 1, 2024

P.S. Implementing flexible pattern matching will require a tradeoff between

  • quick subscription
  • quick message filtering/routing

But that looks like an easy one to me. Subscribing, even if it takes many milliseconds, doesn't need to hold up existing message flow, but if you put as much effort as possible into the subscription step to create efficient data structures, message filtering/routing will get faster. And we are not even talking about content-based matching here. It's just pattern matching on a one-dimensional, simple string.

from wamp-proto.

EugenDueck avatar EugenDueck commented on June 1, 2024

N.B. I find the approach of MQTT - the protocol that you refer to - defining multi-level and single-level wildcards, useful: http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html#appendix-a

The latter is how MQTT is doing it. However, they are not implementing real "set semantics" on top of wildcarded topics:

Which they however consider a bug in a particular implementation, according to that same thread:

As Nick already mentioned in another email, overlapping subscriptions
should not result in a message being received more than one. This is
however an outstanding bug in mosquitto

From a subscriber's point of view, I agree that it is probably better if duplicates are not sent in case of double subscriptions due to wildcards, and the way I'm (still somewhat vaguely) imagining an implementation of the matching logic inside the message broker, it would even be slightly harder (i.e. slower) to figure out if it should be sent twice in case of said event to finance/stock/ibm/closingprice, when client subscribed to both

  • finance/stock/ibm/# and
  • finance/stock/ibm/closingprice

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

I don't agree. I don't think that arbitrary wildcards can be implemented efficiently, which is the reason the MQTT implementation does not implement it (properly). If you disagree I'd be interested in example code for the following:

class Subscriptions:

   def addSubscriber(self, sessionId, topicPattern):
      return

   def lookupSubscribers(self, topic):
      return []

   def removeSubscriber(self, sessionId, topicPattern):
      return

The run-times of these functions need to be at most proportional to the length of topic/topicPattern and logarithmic in the number of subscriptions. lookupSubscribers must return a session ID at most once. removeSubscriber must have proper set semantics.

I think it would be more practical to come up and discuss implementations of above. Since I at least am not interested in an academic protocol definition which cannot be implemented efficiently.

from wamp-proto.

EugenDueck avatar EugenDueck commented on June 1, 2024

There'll have to be a tradeoff, as mentioned in my #10 (comment) . You have to decide whether you want to make addSubscriber and removeSubscriber fast, or lookupSubscribers. You can't get both. And the faster you want to make lookupSubscribers, the slower the other two guys get.

And as lookupSubscribers needs to be called on every message, I'd vote to make that one fast. Luckily, addSubscriber and removeSubcriber are not only called much more rarely than lookupSubscribers, but they could also be implemented offline, i.e. they don't need to run on the msg processing thread or block it, and even though they might take half a second to finish, the only ones affected are clients while doing a subscription, because they have to wait (of course not necessarily blocking, but they'd wait for the first message to arrive). As soon as the addition is finished, it'll be swapped in in an atomic operation. That a(n asynchronously finishing) removeSubscriber call takes time would probably not be an issue for clients. For reasons of efficiency, the messaging systems that I have worked sometimes send a couple of messages after unsubscribe has already returned, which has never been an issue, as long as the client was prepared for it.

That's my theory. lookupSubscribers would work on some kind of tree structure and there are a bunch of trees around that specialize not only in prefix searches - something that naturally works great with a lot of tree structures - but that specialize on approximate searches etc. I'd have to do some research though to back this up. I hope I'll find the time - figuring out how to solve this problem efficiently has been on my list for some time now...

from wamp-proto.

EugenDueck avatar EugenDueck commented on June 1, 2024

FYI, this is how the AMQP implementation in Red Hat's Enterprise MRG does it:

Binding keys for a Topic exchanges can include wildcard characters: a “#” matches one or more words, a “*” matches a single word. Typical bindings use binding keys like #.news (all news items), usa.# (all items in the USA), or usa.weather (all USA weather items).

Some example topic subscriptions from their docs:

usa.#
#.news
#.weather
europe.#

Source: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2/html/Messaging_User_Guide/chap-Messaging_User_Guide-Exchanges.html#sect-Messaging_User_Guide-Exchange_Types-Topic_Exchange

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

2 quick comments:

  1. You can have your topics structured like /stocks/last-trade-price/* and then use prefix pattern matching to subscribe for all stocks at once.

  2. As said, seeing is believing: if you can come up with an implementation like described above (efficient, correct set semantics, sane memory consumption) that would be a start. I am not opposed to any change, it's just that the current approach has proven to work, the prefix-based will work, but I am not convinced that a general regular pattern matching would work.

from wamp-proto.

EugenDueck avatar EugenDueck commented on June 1, 2024

You can have your topics structured like /stocks/last-trade-price/* and then use prefix pattern matching to subscribe for all stocks at once.

I want to let clients subscribe for the last-prices of all products and at the same time for all properties from one product, depending on what that. (Maybe one and the same client will rarely do both at the same time, but client A wants this, client B wants that)

the current approach has proven to work, the prefix-based will work, but I am not convinced that a general regular pattern matching would work.

I agree, performance is crucial in messaging, and I'll try to offer some algorithm, time permits. Having said that, let me repeat: Offering a general pattern matching variant does not have to make it slower for those users who don't use that feature, but it does offer the feature to those who want it.

So you would basically be saying: Here's your feature, it's slower than a subscription by fixed string or prefix, but if you insist... use it.

AMQP and other messaging solutions allow matching by message content or meta-data (key/value pairs). And if you look at what the CEP / stream event processing guys are doing, they filter, slice and dice and aggregate etc. using sql-like where clauses. Of course that has to be slower than a O(1) hash lookup, but it's a feature that is very useful for some. Granted, the CEP guys often do not care too much about reliability, it's often a bit probabalistic, but not always. Nor is matching by meta-data in messaging solutions.

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

Here is a naive implementation:

Wildcards are allowed for hierarchical parts of an URI, i.e. /stocks/*/last-trade-price is ok, but not /stocks/In*Foo*Bar/last-trade-price. Lets assume * is a special char not allowed in URIs normally.

Let addSubscriber and removeSubscriber just maintain a simple hashmap from topic/topicpattern => list of session IDs. Nothing magic here.

Now lookupSubscribers called for topic /stock/IBM/last-trade-price needs to do lookup in that hashmap for:

 /stock/IBM/last-trade-price
 /*/IBM/last-trade-price   
 /stock/*/last-trade-price   
 /stock/IBM/*
 /*/*/last-trade-price   
 /stock/*/*   
 /*/IBM/*   
 /*/*/*   

to create a union of session IDs that will receive the event. The union'ed list needs to be processed for duplicates, i.e. when a client has subscribed for both of

 /stock/IBM/last-trade-price
 /stock/*/last-trade-price   

The union'ed list needs to be processed for correct set semantics in case a client has subscribed to

 /stock/*/last-trade-price   

and unsubscribed from

 /stock/IBM/last-trade-price

Doing so requires 2 hashmaps (subscriptions and unsubscriptions). But that would only be sufficient when unsubscriptions always override subscriptions (otherwise the order of subscribe/unsubscribe becomes relevant).

Performance problems:

  • the lookup for all wildcards has run-time no longer linear in the length of topic, but exponential is the number of part of the topic
  • the lookup (because of the dups removal and set semantics) has run-time > linear in the number of determined receivers, since the set operations might involve sorting

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

CEP is very interesting area of course. But that would be a new dimension of scope for WAMP: being a protocol to access a full-blown CEP platform instead of: the minimal viable Web-based message brokering protocol. Do you have practical experience with CEP platforms?

from wamp-proto.

EugenDueck avatar EugenDueck commented on June 1, 2024

I'm by no means suggesting the addition of CEP features to WAMP. That would make it indeed a whole different beast.

What I wanted to say though, is that there are matching engines there matching using crazy rules, more complex seeming than my, ahem, modest request ;), and meant to be used with gazillions of messages per second.

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

I have started an experimental implementation of this on

https://github.com/tavendo/AutobahnPython/tree/prefixsubscribe
https://github.com/tavendo/AutobahnPython/tree/prefixsubscribe/examples/wamp/pubsub/prefixsub

from wamp-proto.

jmarine avatar jmarine commented on June 1, 2024

Hi,
I haven't tested/reviewed the experimental implentation, but I was also considering the subscription with wildcards in arbitrary places (but not publishing with wildcards).

I think that an implemenation that "caches" the list of the topics that match the subscribed patterns could be quite efficient. Instead of just maintaining a simple hashmap from topic/topicpattern => list of session IDs, it could maintain 2 hasmaps: topic => sessionIDs, and "used patterns" => list of topics that match.
(using the pattern as the key, and forgetting the hierarchical parts)

Pros:

  • when using arbitrary wildcards, the lookup for the topics is linear with the number of the existing topics (but it is only required once for each different pattern used by any interested client).
  • the lookup for the sessionId of the subscribed clients is O(1), because I'm not considering publishing to many topics with wildcards.

Cons:

  • when creating/deleting a topic, a search of in the "used patterns" keys should be done to add/remove the topic in the lists that match with the pattern.

For example:
(sorry, I don't know python)

class Subscriptions:

   patterns = {}   # (pattern => listOfMatchingTopicsURLs)
   topicSubscriptions = {}   # (exactTopicURI => listOfSubscribedSessionIds)

   def addSubscriber(self, sessionId, topicPattern):
      if(topicPattern contains "the * wildcard")
        topics = patterns[topicPattern];   # topics represents an temporal list
        if(topics == null)  # doesn't exist the cached list of topics that match the pattern, yet
          topics = lookupAllTopicsThatMatch(topicPattern);  # may be an empty list
          patterns[topicPattern] = topics
        foreach(topic in topics)  # iterate for all the exact topic URLs that match the pattern
          subscribeClientToExactTopic(sessionId, topic)  # only in case it is not already subscribed
      else  # no wildcards
          topic = topicPattern;  # topic is an exact topic URL
          subscribeClientToExactTopic(sessionId, topic)  # only in case it is not already subscribed
      return

   def lookupSubscribers(self, topic):
      return topicSubscriptions[topic]

   def removeSubscriber(self, sessionId, topicPattern):
      # similar to addSubscriber, but caling unsubscribeClientFromExactTopic
      return

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

Wildcard publications really is out of scope (for now), but your suggestion for efficient wildcard subscribe via caching is interesting. I have implemented arbitrary regular expression subscription (see the "prefixsubscribe" branch).

Now, one problem arising (already with prefix subscribe) is, that the client now also needs to do pattern matching for incoming events. This is because events (currently) only contain the concrete topic of the event (as specified when publishing), not the pattern that lead to a dispatch to a respective subscriber.

(Re-)implementing all the pattern matching logic contained in the broker also in the clients is I think something we should try to avoid.

We might need to extend the EVENT message, which currently is dead simple:

 [8, "http://example.com/foobar", "Hello 2"]

For patterned subscribes, we might need to send something like

[8, "http://example.com/foobar", "Hello 2", "^http://.*/foobar$"]

This will lead to further code restructurings, since currently the broker serialized a dispatched message just once for all subscribers to receive. With above, it needs to serialize per matched pattern. Sure, no problem.

Anyway, I find it interesting how a harmless looking feature like "pattern based subscribe" leads to many consequences. Here is another one: what if I subscribe to "^http://.*/foobar$" and unsubscribe from "http://example.com/foobar" without subscribing before. Do I receive "http://example.com/foobar" events or not? In other word: correct set semantics crossing topic patterns?

from wamp-proto.

jmarine avatar jmarine commented on June 1, 2024

Regular expressions would be great, but they can also be confusing because expected results may vary when using different programming languages/regex engines. So, I would propose simple wildcard rules (i.e: * to represent many characters), that should be implemented the same way at both sides (client and server).

In the client side, it could use a different map:

  • "patternHandlers" (the keys are topicPatterns, but the values are the listOfFunctionHandlers):

At subscription/unsubscription time, if the topicPattern contains some wildcards, it should add/remove the handler in the "patternHandlers".

When receiving the EVENT message (as defined in WAMP v1: [8, "http://example.com/foobar", "Hello 2"]), call the exact topicURL handlers, and also, if there are any "used pattern" subscriptions (paternHandlers.keys), call the pattern handlers that match the received topicURL (only affects efficiency in the clients that use this feature).
For example:

class Client:
  patternHandlers = {}

  def whenEventMessageIsReceived(event):
    topicURL = event[1];
    handlers = getNormalTopicHandlers(topicURL)
    if(patternHandlers is not empty):
      handlers = handlers.copy();
      foreach(topicPattern in patternHandlers.keys):
        if(topicURL matches topicPattern):
          addPatternHandlersToSet(patternHandlers[topicPattern], handlers)  # only the handlers that aren't repeated
    callEventHandlers(event, handlers)

I haven't understood the last question about "semantics crossing topic patterns"...
Doesn't the server verify unsubscriptions, and avoids EVENT messages to clients that had unsubscribed from particular topics?

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

What I meant with "set semantics" is the following.

Suppose the following sequence of client Subs/Unsubs:

  SUBSCRIBE "http://example.com/foobar"
  UNSUBSCRIBE "http://example.com/foobar"

After that, the client won't receive events for topic "http://example.com/foobar" .. of course.

  SUBSCRIBE "http://*/foobar"
  SUBSCRIBE "http://example.com/foobar"

After that, the client will receive events for topic "http://example.com/foobar" .. each event only once.

  SUBSCRIBE "http://*/foobar"
  SUBSCRIBE "http://example.com/foobar"
  UNSUBSCRIBE "http://example.com/foobar"

After that, the client will still receive events for topic "http://example.com/foobar".

  SUBSCRIBE "http://*/foobar"
  UNSUBSCRIBE "http://example.com/foobar"

After that, the client will still receive events for topic "http://example.com/foobar".

  SUBSCRIBE "http://*/foobar"
  SUBSCRIBE "http://example.com/foobar"
  UNSUBSCRIBE "http://example.com/foobar"
  UNSUBSCRIBE "http://*/foobar"

After that, the client will not receive events for topic "http://example.com/foobar" any more.

Hence, there is no set semantics between SUBSCRIBE and UNSUBSCRIBE accross different topics or topics patterns. Order is insignificant .. and each SUBSCRIBE must be followed by an UNSUBSCRIBE for the exact same topic or topic pattern to cancel.

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024
  1. Regarding regular expressions: I agree, since there is no established and widely implemented standard, we should probably keep it simple .. only "" as path components (that is between 2 "/") will be allowed. And "" must be matched by a non-empty string without "/".
  2. If we do not send the matching pattern that lead to the dispatching of an event within the EVENT message, the client needs to do the matching again. Whats your opinion on that?

I was thinking of extending EVENT as follows:

 [8, <topic URI>, <event payload>] OR
 [8, <topic URI>, <event payload>, <event extra>]

where <event extra> is a dict that could be

{"publisher": <WAMP session ID>} OR
{"matched": "wildcard:http://*/foobar"} OR
{"publisher": <WAMP session ID>, "matched": "prefix:http://example.com/event#"} OR
...

The sending of the publisher ID and/or the macthing pattern could be requested by the client as options in SUBSCRIBE.

from wamp-proto.

jmarine avatar jmarine commented on June 1, 2024

The fact is that I'm only thinking the subscription with wildcards as a simple utility to reduce the number of subscription messages, and to allow generic subscription to topics names that the client doesn't know.

So, in relation to the previously commented sequences of client Subs/Unsubs:

  SUBSCRIBE "http://*/foobar"
  UNSUBSCRIBE "http://example.com/foobar"

What I expect is a different behaviour: that the server subscribes the client to each of the topics that match, and the unsubscription makes the client to stop receiving the events from the topic "http://example.com/foobar".

In the case of 2 subs/1 unsub:

  SUBSCRIBE "http://*/foobar"
  SUBSCRIBE "http://example.com/foobar"
  UNSUBSCRIBE "http://example.com/foobar"

I recognize It could be useful that the client still receives events from the topic "http://example.com/foobar"
(although I didn't implemented this way, a reference count of the subscriptions between the client-to-topic could be used to differentiate from the previous Sub/Unsub sequence).

Finally, I'm also agree with the other sequences and conclussions.

On the other side, I think the EVENT message shouldn't have to include the macthing pattern (to much repeated data in the payload).
But maybe a "metaevent" message sent once by the server with the topics that match the subscribed pattern could help to optimize the matching in the client:

class Client:
  patternHandlers = {}
  matchingTopicPatterns = {}

  def whenMetaEventMessageIsReceived(metaevent):
    sid = metaevent[3].sessionId;
    pattern = metaevent[1];  # instead of an exact topic name
    if(sid == theClientSessionId && pattern.contains(wildcards)):
      sub = metaevent[2].endsWith("/sub#joined"); # boolean
      unsub = metaevent[2].endsWith("/sub#left"); # boolean
      topics = metaevent[3].topics; 
      foreach(topic in topics):
        if(sub) matchingTopicPatterns[topic].add(pattern);
        else if(unsub) matchingTopicPatterns[topic].remove(pattern);
    return;

  def whenEventMessageIsReceived(event):
    topicURL = event[1];
    handlers = getNormalTopicHandlers(topicURL)
    if(matchingTopicPatterns[topicURL] is not empty):
      handlers = handlers.copy();
      foreach(topicPattern in matchingTopicPatterns[topicURL]):
          addPatternHandlersToSet(patternHandlers[topicPattern], handlers)  # only the handlers that aren't repeated
    callEventHandlers(event, handlers)

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

In hindsight, the main "problem" above is how to communicate the pattern that matched an event to the subscriber when it receives an event.

WAMPv2 solves this as follows

  1. Subscriber subscribes to some topic or pattern via options

     [SUBSCRIBE,            Request|id, Topic|uri, Options|dict]
    
  2. The broker now acknowledges the subscription

      [SUBSCRIBED,          Request|id, Subscription|id]
    
  3. Later, when an event is delivered, the broker sends:

      [EVENT,               Subscription|id, Topic|uri, Event|any]
    

The broker not only communicates the (concrete) topic under which is the event was published, but also the subscription under which the subscriber receives that event.

from wamp-proto.

jmarine avatar jmarine commented on June 1, 2024

Does it mean that the EVENT & METAEVENT messages need to be generated/serialized for each client?
(because each client has a different "subscription|id", doesn't?)

from wamp-proto.

oberstet avatar oberstet commented on June 1, 2024

Good question, but no (I would not have proposed it if that would be needed): The subscription ID is decided by the broker, and the broker will hand out the same subscription ID for the same subscribed topic or topic pattern. So the event needs to be serialized only once (per matching topic pattern, not per receiving subscriber).

from wamp-proto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.