Comments (15)
Hello,
Thanks for this feedback! While I understand your concern, I don't really get why the consequences of choosing "(+inf, -inf)" for the empty interval are an issue. Could you please clarify?
The choice to have list(P.empty()) == [P.empty()]
was made for consistency. As a P.Interval
instance is a disjunction of atomic intervals, it makes sense (at least to me ;-)) that iterating on a P.Interval
returns a non-empty list of atomic intervals. In the specific case of the empty interval, it (non-specifically) returns a list containing a single atomic interval: the empty one.
from portion.
The basic concern with (+inf,-inf) is that this violates the ordering of bounds. In all other cases, one can rely on that, thus it would be great if that could be a guarantee by the library.
With respect to the iteration, I do not see what the benefit is of guaranteeing that iteration provides non-empty list of atomic intervals. Is there any example of an operation/query on intervals which can be coded more conveniently/elegantly with the current convention? On the other hand, I have already provided examples of cases, where the current convention makes it less convenient/elegant, e.g. the computation of the total volume. Or say you would want to plot that set using any plotting library; I could draw a rectangle for each atomic interval and simply iterate; with the current convention, I need an extra test for the empty interval. In general, the current convention simply violates a number of invariants which would both feel natural and are otherwise fulfilled for all intervals.
from portion.
So there are two different things here:
-
Guaranteeing that iteration provides non-empty list of atomic intervals. I see no reason why
list(P.empty()) == []
since the contract is that iterating on an interval provides a list of atomic intervals. It makes sense to havelist(P.empty()) == [P.empty()]
the same way thatlist(i) == [i]
holds for any atomic intervali
. -
I agree that having
i.lower <= i.upper
for all intervals, including the empty one, would be nice. However, and unless you have some ideas in mind, I don't see how to address this. Currently, the fact that an empty interval equals(+inf, -inf)
is very convenient for most operations (including comparing intervals, intersecting or joining intervals, etc.). I'm not against changing the convention, but I don't see which one should be adopted instead.
I'm open to concrete suggestions :-)
from portion.
-
list(P.empty()) == []
does agree with the contract that it provides a list of atomic intervals, same as it holds forlist(P.empty()) == [P.empty()]
. Only the former however holds for a contract that it provides the minimal number of atomic intervals. If you do not include minimal in the contract, thenlist(P.openclosed(0,2)) == [P.openclosed(0,1),P.openclosed(1,2)]
would be valid too. Having minimal thus specifies a unique representation, which IMHO is makes the contract better without reducing its applicability. Exactly because empty intervals are somewhat special beasts often requiring special attention, I believe one would prefer the guarantee that there are none in the iteration, given that there is no need for them. On the other hand, what is the usefulness of guaranteeing non-emptyness of the returned list? -
Indeed,
(+inf,-inf)
is convenient for some operations, not so much for others (particularly the interval size). From a pure mathematical perspective, I don't care about the representation, as long as any and all representations of the empty intervals mathematically behave as the empty interval does. Now, you haven't specified thatlower
/upper
should correspond to the mathematical notion of the set-theoretic infimum/supremum, even that it currently does so for all but the empty interval. Thus, it is really first and foremost a matter of specification here. Implementation-wise, I might have chosen a separate class for the empty set/interval, one class for the non-empty mathematical interval/ atomic interval, and one for the mathematical set/composite interval. This would give you the freedom to specify mathematical behaviour for any property you like, which tend to be somewhat special for the empty interval. I understand however, that this might be a breaking change. The particular case for the broken interval size could be solved by using e.g. (+inf,+inf) instead, though not sure if that doesn't trigger other edge cases.
To conclude: I propose to change the iteration behaviour. At this point, I do not propose change the value of the lower
/upper
attributes of the empty interval. I do believe that the library could profit from a mathematical/set-theoretic specification of each attribute, but that will lead to bike-shedding, and for that I'm not involved enough in this library.
from portion.
Ok, so let's keep point 2 for later ;)
Considering point 1, I'm still not convinced by the "benefits" of such a breaking change. I agree with you that the empty interval is a special beast and could even be implemented as a special object. However, following Python's philosophy, I tend to consider that special cases aren't special enough to break the rules. Here, the rule is that, whatever i
is, list(i)
returns the underlying list of atomic intervals composing i
. If i
is empty, then list(i)
returns a list containing the empty interval, following all other cases.
I also wonder what would happen to slices and (more importantly) to indexing intervals if we go for list(P.empty()) == []
. Indeed, raising IndexError
for P.empty()[0]
implies that one has to check for i.empty
before accessing i[0]
. That adds an additional test, which is what you wanted to avoid by having list(P.empty()) == []
notably for volume). It also breaks the rule (and probably many others) that i.lower, i.upper == i[0].lower, i[-1].upper
, something that is expected for all intervals, regardless whether they are empty, finite or infinite...
from portion.
Here, the rule is that, whatever
i
is,list(i)
returns the underlying list of atomic intervals composingi
.
I believe with composing you mean the representation of the current implementation? Because there is no sensible mathematical definition which would consider the empty interval an element of the decomposition of any set of the real line.
In the end, we want the library to help us answer questions about the intervals/sets we're dealing with. Questions which would be fixed by the proposed changes:
- The number of intervals: Going from -inf to +inf, how often do I enter and leave the set? Answer:
len(list(i))
. - Composition of intervals: If
i
andj
are disjunct and non-touching, thenlist(i | j) = list(i) + list(j)
.
As for i.lower, i.upper == i[0].lower, i[-1].upper
: Where does this come up in the wild other than in the implementation of the library, which is supposed to deal with this even when its complicated so the user does not have to? For the empty set, .lower
already doesn't make much sense, and neither do I see the mathematical specification of i[0]
for the empty set. If you have one, I would still prefer to have a i.first
and i.last
attribute which would then be specified as P.empty() if i.empty else i[0]
. Note that even for arbitrary python lists/sets, the empty list does not respect the above identity: min(s),max(s) == list(sorted(s))[0],list(sorted(s))[-1]
unless for the empty list/set, where, again, that identity has no meaning.
from portion.
I guess maybe my misunderstanding is the fundamental object handled by the library. For me, it is the mathematical concept of a set of anything isomorphic to the real line. As such, a natural decomposition is the minimal decomposition. Needing special consideration for identities like i.lower, i.upper == i[0].lower, i[-1].upper
is natural there, because "the first/last interval" has simply no meaning for the empty set - none of the questions about those intervals have a sensible answer other than "is undefined".
On the other hand, you seem to prefer to think of P.interval
as a non-empty list of intervals, with less focus on set-theoretic properties. In that sense, the library answers questions about that list rather than questions about some sub-set of the reals.
If that is the intent you have for this library, then I have my answer: I have misunderstood the purpose of this library, within that purpose, iteration is correct as is. If that is your intent, then I will stop wasting your time, and we can close this.
from portion.
I get your point, and I agree that the way we currently deal with empty intervals is perhaps (certainly?) misleading from a theoretical point of view. As you said, portion
focuses more on "practicability" (for whatever it means ;-)) than on theoretic exactitude. Historically, there were two separate classes for atomic and non-atomic intervals, so the question did not really arise, even though it already poses some "questions" as can be seen here: https://github.com/AlexandreDecan/portion/blob/master/CHANGELOG.md#180-2018-12-15 , i.e., the change that introduced len(P.empty()) == 1
and list(P.empty()) == [P.empty()]
, even before I merged the separate classes for atomic and non-atomic intervals).
However, I do believe now than the proposed change won't affect much practicability while solving the issue you raised ;) At the same time, the proposed change has several implications on the code, and will definitely be a breaking change (hence implying a new major version to comply with semantic versioning, and implying I should carefully consider all other potential breaking changes that could improve the library to do all of them at once in a single new major version).
I think (but I should think more about it ^^) we have two options to implement the change:
- Keep
self._intervals
as is whenself
is empty, and simply change__iter__
to return the empty list,__getitem__
to raiseIndexError
, and__len__
to be 0 when the interval is empty. In that case, we keep the invariant thatself._intervals
is non-empty (which is convenient for most operations) and internally, the empty interval keeps its representation(+inf, -inf)
while externally, it complies with more set-theoretic properties. - Change
self._intervals
to the empty list, so that its content is consistent with__iter__
,__getitem__
and__len__
and implement some checks for emptiness in most operations to deal with the special case of empty intervals (that's a bit of work, but I'm fine with this idea). We still keep the convention that the empty interval resolves to(+inf, -inf)
(by defining this convention in.left
,.lower
,.upper
and.right
, unless you have more appropriate values to suggest?). That's a bit inconsistent to some extent, but I don't see what else can be done here...
I'm not sure which one is better. (1) is easier to implement but makes "iterating on the empty interval" the special case. (2) seems more appropriate, but makes the four accessors the special cases for empty intervals... Honestly, I don't really like these two options, so any suggestion to create a (3) is welcome ;-) Anyway, before I decide to go for the change or not, I need to balance the benefits (set-theoretic compatibility) and the drawbacks (breaking change & potential inconsistency between internals and the public API).
from portion.
I actually haven't thought about the best implementation at all yet. After all, it is what the C++ community likes to call "an implementation detail". As long as you have a clear specification what comprises the API and how that API behaves, you can change the implementation with every release if you like. So as you said, until you have decided if a change of the API is in place, the implementation can wait.
That said, usually the most efficient implementation in terms of maintainability is the one that most naturally represents the intended concept. For the change to the iteration behaviour as a decomposition into non-empty intervals, that would point towards (2)
.
Indeed, from a quick search for _interval
through the source, the only implementation changes that would be needed are to .left
, .lower
, .right
, .upper
, .from_atomic
and potentially .atomic
; exactly those for which the most natural set-theoretic meaning is special for the empty interval. Here you will have to decide to change the API or stick to the current API (and potentially introduce new API points for the natural set-theoretic properties). Potential set-theoretic concepts to model are:
- infimum: Undefined for the empty interval; either return None or raise an Exception
- The largest among all minima (a value not bigger than any element of the set): The current spec of
lower
. - left-openness/-closedness: The empty interval as well as the left-infinite interval are usually defined as both open and closed. Options are introducing separate
.left_open
and.left_closed
, and/or changing.left
to returnNone
, a specialP.BOTH
or raising an Exception. atomic
could be redefined as implying non-empty, which would give some nice consistency; but one could also invent a new name for that concept.
from portion.
Thanks for the feedback. I started looking at the required changes in the code. I naturally went for option (2). I pushed an empty branch with the changes. The diff can be seen here: https://github.com/AlexandreDecan/portion/compare/empty?expand=1
And indeed, I had to change .left
, .lower
, .upper
, .right
as well as move the check for emptiness (based on bounds) from .empty
to .from_atomic
and implement .empty
using self._intervals
(more elegant). I also had to "adapt" .overlaps
and .__and__
since I relied on the non-emptiness of self._intervals
when iterating on atomic intervals in these methods. As you said, .atomic
had to be changed as well. I went for the convention that the empty interval is atomic (I implicitly consider atomic as the opposite of being a disjunction of intervals so it makes sense to consider the empty interval as being atomic). The remaining other changes were all to fix tests that explicitly considered the empty interval as being a list of a single, special interval. But that's something we may decide to change...
I will have deeper a look at the link you provided. I have no plan (currently) to add methods for more theoretic concepts (i.e., supremum, infimum, etc. since they can be easily implemented using left, lower, upper, right; and are a bit out of the scope of the library). Actually, most of the concepts there can lead to a new library built on top of portion
(and that should be maintained by someone that has a deeper mathematical background on this topic ;-)).
from portion.
I like it! I think iteration becomes more natural this way. The set-theoretic special cases for empty intervals are less important to me; if a user's code relies on i.left
/ i.lower
, chances are that they will have to special-case for the empty interval anyway. Making those return None
would potentially follow the spirit of the python Zen of "Errors should never go unnoticed" in that failure to special case will more likely cause an Exception, but from a usability perspective it doesn't make much difference if the test reads if i.lower is None
or if it reads if i.empty
.
from portion.
I actually agree it is likely that someone will check for emptiness if needed (the same way they will check for infinities if that's something that can happen in their code). Given this, I think if i.empty:
should be preferred to if i.lower is None:
(hence, keeping infinities for the empty interval would "force/lead" the former rather than the latter).
I don't really like the idea of returning None
since one of the assumption of this library is that all bounds are comparable, and None
isn't (I know it's a "special case", but if we go for a special handling of a special case, I would prefer raising an exception than using None
). Raising an Exception
is likely to make the calling code uglier and less readable than using None
or keeping the current values... I still need to think about this (interestingly, returning None
or raising an exception means I won't have to provide a pseudo-meaningful behaviour for some operations such as comparisons, containment, etc. I'll have to consider this as well...).
Considering the breaking nature of the change in the empty branch, and since I won't have much time to work on portion
to bundle other potential breaking changes in a new release (such as typed/specialized intervals, explicit support for discrete intervals, customizable domain/infinities), I think I'll distribute the change as part of patch (actually, a minor update since I've already other changes ready for that update) and "sell" the change as a fix (that's the case), while providing an easy way (still need to work on this) to keep the old behaviour in case the new one breaks something in one's code. I don't think many users will be affected, but if that's the case, I prefer to provide them an easy way to configure portion
to keep the previous behaviour since the breaking change is introduced by us, rather than asking them to change their own code at many different places after a patch or minor update that is supposedly assumed to be backward compatible...
Oh, and I've another question for you: you said "atomic could be redefined as implying non-empty, which would give some nice consistency; but one could also invent a new name for that concept". Why do you suggest that atomic
should/could be False
for empty intervals?
from portion.
Regarding empty -> not atomic; in the end, it's a matter of definition. A possible "classification" of the possible sets with respect to their behaviour, I see three main classes:
- The empty set/interval: Many assumptions break here.
- non-empty (mathematical) intervals: These are fully described by the four properties
.left
,.lower
,.right
,.upper
. Mathematicians might want to further distinguish degenerate and non-degenerate intervals, but for most casesi.lower == i.upper
does seem to do the right thing already. - composite sets; if I care for internal structure, then I will look for the disjunction into intervals (class 2).
With .atomic
and .empty
we have all we need to classify, no matter which choice we take for the definition of atomic
. However, if we make .atomic
exactly the indicator for class two, you could use it directly to decide that all you need is .lower
and .upper
. Furthermore, it matches nicely with the iteration: Iteration gives you the minimal decomposition into atomic intervals. The contract is then very explicit in that you can rely on lower
/upper
from what you get out of the iterator. In general, "narrow contracts" are usually considered good (there is more which you can rely on). But then, it's just a matter of i.atomic and not i.empty
, so if you want to minimise API changes, that would be fine too.
from portion.
Ok, thanks! I'll keep .atomic
as a synonym of not a disjunction so the empty interval will be considered as atomic.
I also changed how comparisons are made in presence of empty intervals (and I spotted a bug when comparing values with intervals at the same time).
I don't know when I'll release these changes (currently 2.3.0-dev on GitHub, not published on PyPI).
Thanks again for having raised the "issue" with iterating on an interval, and the interesting subsequent discussions we had ;-)
from portion.
Thank you for maintaining this library!
from portion.
Related Issues (20)
- Is this library performant enough to work with (non-atomic) intervals which span integers between 1 and 1 billion? HOT 1
- Using an external comparator? HOT 3
- iterate is broken with subclasses of Interval HOT 12
- Add __format__ method to Interval (improvement) HOT 16
- Thoughts about text-annotation use case and Pandas Ext. API HOT 5
- interval diameter (length, width, measure, range, or size) HOT 8
- Add join / merge method HOT 2
- "compatible version" specifier in setup.py confuses poetry HOT 2
- AttributeError: module 'portion.interval' has no attribute 'empty' HOT 3
- IntervalMultiDict HOT 3
- Add example for pandas in README HOT 5
- importlib error with create_api HOT 2
- Error to import interval, inf, imath from interval HOT 1
- importlib.machinery error with create_api HOT 6
- Enclosure Calculation Bug HOT 1
- Empty Calculation Bug HOT 3
- The performance issues of interval calculations in large quantities. HOT 1
- mass/Lebesgue measure? HOT 7
- Value out of range HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from portion.