Code Monkey home page Code Monkey logo

Comments (16)

hugo-ricateau avatar hugo-ricateau commented on May 26, 2024 1

what are the practical differences between a view and a list?

The view only references its source dict and hence, does not create any duplicated storage (unlike the list, which allocates its own storage; (partially) duplicating the one of the original dict). You can think of the view as a reusable iterator; therefore, what comes to my mind in terms of "performance" improvements is:

  • to avoid allocating unnecessary memory (especially for large collections);
  • to avoid considering the whole collection when only iterating until a given condition;
  • to avoid retaining strong references that may prevent garbage collection.

I think it should be quite easy to implement a view object instead of a list

Sure, this can indeed be easily achieved by inheriting from KeysView, ValuesView & ItemsView (located in collections.abc). Let me know if I can help in implementing that feature.

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024 1

Hello,

Thanks for these explanations. I think I get it now ;-)

I indeed prefer the first option, which seems cleaner and more appropriate. However, since it requires to maintain the intervals sorted and, as such, requires a lot of changes, I think we can opt for the second option as a "temporary workaround". At some point, I hope to find time and courage to write an IntervalDict based on a kind of self-balancing red-black-tree like structure, so to achieve better performances at the same time. That won't be easy (esp. to keep the current API, since a tree-like structure requires to store atomic intervals only) and I don't have short plan for this now ;-)

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024 1

I was looking at the implementation of xView in Python (see https://github.com/python/cpython/blob/master/Lib/_collections_abc.py#L729), and I've the impression that we cannot simply reuse the code you suggested in #34 (comment).

For example, the __contains__ method of ValuesView (see https://github.com/python/cpython/blob/master/Lib/_collections_abc.py#L776) is, I think, unlikely to work out of the box, since keys in IntervalDict can be Interval or values, and in the case of an Interval, the IntervalDict.get method returns a new IntervalDict. I need to check, but I guess that if we have an IntervalDict and we instantiate a ValuesView from it, checking for containment in this view will fail since the keys that will be tested (see line 777) will be Interval instances, and as such, any value returned on line 778 will be an IntervalDict (where parameter value should, intuitively, be a value), making the comparison on line 779 fails.

Am I right, or did I miss something? :-)

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024 1

Hello,

I found some time this afternoon to have a look at this issue. I've created a PR, see #35.
@hugo-ricateau Could I ask you to have a look at it? I think it properly addresses this issue.

Thank you!

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024 1

Closed by #35

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Hello,

Thanks for reporting this!

With the exception of the "automatic updated content" of a view, what are the practical differences between a view and a list? I think it should be quite easy to implement a view object instead of a list for .items, .values and .keys, but I do not see how this can lead to performance improvements ;-)

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Thanks for these explanations!

Feel free to submit a PR with these new "features" if you want, but keep me informed if you plan do to this, otherwise, I'll try to find some time to implement it by myself (I cannot say when it will be implemented, but be sure "it's not for soon" since I've a lot of other tasks to do before ;-))

from portion.

hugo-ricateau avatar hugo-ricateau commented on May 26, 2024

I don't have any bandwidth for now, but I'll try to submit something in the next weeks; I'll let you know when I'll start working on it.

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Do you have pointers to existing subclasses of DictView so I can have a look at them?

from portion.

hugo-ricateau avatar hugo-ricateau commented on May 26, 2024

I had a quick look at portion/dict.py, and the fastest solution to implement is probably something like

from collections.abc import KeysView, ValuesView, ItemsView

class IntervalDict(MutableMapping):
    def keys(self):
        return KeysView(OrderedDict(sorted(self._items, key=_sort)))

    def values(self):
        return ValuesView(OrderedDict(sorted(self._items, key=_sort)))

    def items(self):
        return ItemsView(OrderedDict(sorted(self._items, key=_sort)))

However, it would be considerably better to directly use an OrderedDict instead of a list as the internal storage structure (i.e. for . _items).

Another option, even though it's somehow hacky (the @property for _mapping), could be something like

from collections.abc import KeysView, ValuesView, ItemsView

class IntervalDictBaseView:
    def __init__(self, mapping):
        self.__mapping = mapping

    @property
    def _mapping(self):
        return sorted(self.__mapping._items, key=_sort)

class IntervalDictKeysView(IntervalDictBaseView, KeysView):
    pass

class IntervalDictValuesView(IntervalDictBaseView, ValuesView):
    pass

class IntervalDictItemsView(IntervalDictBaseView, ItemsView):
    pass

class IntervalDict(MutableMapping):
    def keys(self):
        return IntervalDictKeysView(self)

    def values(self):
        return IntervalDictValuesView(self)

    def items(self):
        return IntervalDictItemsView(self)

but I think that the first option with an OrderedDict storage would definitely be preferable.

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Thank you!

I prefer the first solution, but in that case, the ordering won't be kept in the views when the IntervalDict is modified.

One possibility is indeed to change the internal structure to a sorted list (I don't see why an OrderedDict will be required here). Moreover, having an ordered list internally will be helpful to achieve better performance in some cases. That's something I planned a long time ago - to switch to an interval-tree-like structure internally - but I couldn't find time to work on it.

from portion.

hugo-ricateau avatar hugo-ricateau commented on May 26, 2024

I prefer the first solution, but in that case, the ordering won't be kept in the views when the IntervalDict is modified.

My previous post was erroneous; the first option is not working as it is. It has the same drawback as the current implementation (with a list): the view is not kept up-to-date with respect to its IntervalDict (and it's not only about ordering).

One possibility is indeed to change the internal structure to a sorted list (I don't see why an OrderedDict will be required here).

What do you refer to by "sorted list"? As list always respect ordering; it feels like I misunderstood your answer.

You should consider using an OrderedDict for the following reason: view objects are referencing a mapping; therefore, either the internal storage is a mapping and you can instantiate the view with it; or the internal storage is not a mapping and you thus have to insert a mapping in between the storage and the view, breaking the "always up-to-date" feature...
So, first option will only work if the internal storage is a mapping (maintained sorted at all time); if you want to use a non mapping storage you should consider the second option, that allows to insert the mapping object in between the storage and the view on the fly.

If you think that the class can be implemented with an OrderedDict storage, first option is probably preferable; otherwise, the second option will be the only working one.

Let me know if you start working on it so we can synchronise ourself and avoid work duplication.

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Perhaps I misunderstood something ;-) As far as I know, OrderedDict is a mapping in which the order is based on "when" an item was inserted. In the IntervalDictclass, it is expected that .items(), .values() and .keys() return a list of things sorted by "interval" (basically, by their lower bound).

At the time I wrote IntervalDict, I wanted to have a deterministic order for these three methods (notably to allow to test the outputted content and for convenience). I somewhat arbitrarily decided to sort them by "keys in ascending order" mainly because the insertion order for disjunction of intervals is sometimes a bit difficult to maintain/compute (e.g. assume dict for [1, 3] -> 'a'; add [2] -> 'b' in it; result is [1,2[ | ]2, 3] -> 'a' and [2] -> 'b' where [1,2[ |]2, 3] is "newly created" but should be listed first since it's the result of something applied on the very first interval inserted in this dict). This is even worse in some cases, since an IntervalDict stores intervals and not atomic intervals. Consider for instance:

Create with [1, 3] -> 'a';
Insert [5] -> 'b';
Insert [7, 8] -> 'a';

Internally, this will be stored as [1, 3] | [7, 8] -> 'a' + [5] -> 'b'. Insertion order will be difficult to provide in this case.

That's mainly why the order in which items are returned in keys(), values() and items() is based on the lower bound of each key, so it is deterministic (and, to some extent, convenient).

from portion.

hugo-ricateau avatar hugo-ricateau commented on May 26, 2024

I totally agree that .keys(), .values() & .items() should be sorted by interval (for both determinism and convenience). I was certainly not clear enough in my previous posts; so, let me try to explain what I have in mind differently.

I think that preliminary was clear to you, but let me recall it anyway (just to be sure :) ).
View objects are instantiated with a reference to a mapping (that is subsequently used to retrieve the data while iterating over the view); this is how is implemented the "always up-to-date" aspect of the views.

Now, there are two options under review:

  • The first one implements the views as

    from collections.abc import KeysView, ValuesView, ItemsView
    
    class IntervalDict(MutableMapping):
        def keys(self):
            return KeysView(self.<storage>)
    
        def values(self):
            return ValuesView(self.<storage>)
    
        def items(self):
            return ItemsView(self.<storage>)

    where self.<storage> must thus be a mapping.
    Note that if the storage is not a mapping, you'll have to create a mapping object to link the view to the storage. However, such a mapping being surely static, the view won't remain up-to-date when the IntervalDict will change (that was my mistake in the previous comment); this is equivalent to the current implementation.
    Thus, as the storage must have a manageable order and must be a mapping, the natural candidate is an OrderedDict (even though it is not the "when the item is inserted" that is of interest for us, but the ability to manage its ordering).
    Note that, with this option, the storage must remain sorted at any moment.

  • The second option sets aside the necessity for the storage to be a mapping: the (custom) view objects keep a reference on the IntervalDict itself, instead of its storage. Thanks to the _mapping @property, every time a custom view object is iterated, a fresh mapping object is created to establish the link between the view (its base class) and the storage.
    Note that the storage container can be sorted in the course of this operation and thus it is not required that it remains sorted at any moment.

Looks like you have a preference for the first option (and I do agree ;) ). My point is that this (probably) implies to rework the whole class; this is why it may be worth considering the second option anyway. But you are most likely to estimate the amount of work necessary ;)

from portion.

hugo-ricateau avatar hugo-ricateau commented on May 26, 2024

Indeed, you're right; so none of options 1 or 2 will work out-of-the-box. It might be necessary to override __contains__ (and possibly __iter__ as well) in custom views. So the solution is probably in between the two suggested options; where changing the internal storage structure is not a requirement (and should certainly be skipped as a first step, event though this might improve performances and should definitely be reworked in the futur).

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

I started thinking about this. I think we need, at least, to:

  • Change __iter__ in IntervalDict so that it does not rely on keys() (we can reuse the current definition of keys() to do so) (see https://github.com/AlexandreDecan/portion/blob/master/portion/dict.py#L293). This will allow views to reuse __iter__ with no recursive calls.
  • KeysView (see https://github.com/python/cpython/blob/master/Lib/_collections_abc.py#L729) can be reused as is.
  • ValuesView (see https://github.com/python/cpython/blob/master/Lib/_collections_abc.py#L772) will require __contains__ to be adapted. We can rely on IntervalDict.find which does the job. For __iter__, we can reuse the current implementation of IntervalDict.values().
  • ItemsView (see https://github.com/python/cpython/blob/master/Lib/_collections_abc.py#L747) needs also both methods to be redefined. Again, the __iter__ can probably be implemented by taking the code from IntervalDict.items(). The __contains__ however is a bit trickier since it depends whether given key is a value or an interval. I have the impression that it would be sufficient, for both cases, to compare self._mapping[key] == IntervalDict([(key, value)]) but since checking for equality between IntervalDict involves IntervalDict.items(), it is likely that it will end in recursive calls since __eq__ for ItemsView is based on Set.__eq__ which in turn is based on Set.__le__ which in turn checks for elem not in other (which means comparing keys, and not items in our case). If I'm right, that means we need something like sorted(self._mapping[key]._items) == sorted(IntervalDict([(key, value)])._items instead (or, if we do not want to rely on the internal API, to distinguish the cases where key is a value and key is an interval, and for the second case, to check that self._mapping[key] exists, is an IntervalDict containing a single key, that this key is equal to the one provided, and that the values correspond. Another possibility is simply to rewrite IntervalDict.__eq__ so it does not rely on items() anymore, making the implementation of ItemsView easier :-)

Does it seem right? I think we'll need a lot of unit tests to be sure that we do not miss something ;-)

from portion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.