Comments (13)
Hello,
Thank you for this feedback! I understand that floating-point artifacts could be an issue in these cases. However, dealing with these issues is slightly out-of-scope of portion
since we do not make any assumption about the objects being compared. For some of them, a tolerance threshold is not required, while for some other, the tolerance is not even a single value. Moreover, supporting such a threshold means that objects require to support subtraction (or more generally, a notion of distance between two of these objects) which is a strong assumption unfortunately :-/
What I can suggest you however is to implement a tiny wrapper for your objects, in which you override the various comparison methods (e.g. __eq__
obviously, but also, __lt__
, __gt__
, etc. if needed, depending on which ones are already implemented in your objects) such that their computation can take into account the "tolerance threshold" before delegating the comparison to the wrapped object.
I hope this helps you! Meanwhile, I'll think about the possibility to support such wrap/unwrap operations on the fly in portion
, since it is also something that could be useful for #29.
from portion.
Hallo,
Thanks for the prompt reply. I understand that the beauty of portion is generality, and I can see how subtraction can be a limitation. Unfortunately, I do not have control of the Time class from astropy to implement a custom __eq__
(with tolerances) and I wouldn't want to use a subclass just for that. (Is this what you meant?)
So, I will have to guard against them in my implementation of TimeInterval, which uses portion under the hood. Thanks anyway! :)
from portion.
Hello,
Not necessarily a subclass, but a thin wrapper like this:
class Wrapper:
def __init__(self, wrapped):
self.wrapped = wrapped
def __eq__(self, other):
# fancy stuff here to support comparison
It should also be possible to define __getattr__
to delegate all attributes/methods access to the wrapped object to make it more convenient to use them. Obviously, all of this requires to wrap/unwrap objects before using them with portion
:-/ (that could be inconvenient though).
from portion.
Hello,
Did the workaround work? Can I close this issue for now? :-) It's on my todo-list to find a way to deal with this kind of situations, but I've no short plan to work on it :(
from portion.
from portion.
It aint pretty (I had a specific use case in mind and made the minimal changes i needed to make it work) but I forked your library and made some changes to support using math.isclose for floating point operations. I had to drop all support for intervals based on types that can not be coerced into a floating point and needed to make some tweaks to _Pinf to make it work seamlessly with float('inf')
Essentially, this
# Item is a value
for i in self._intervals:
left = (item >= i.lower) if i.left == Bound.CLOSED else (item > i.lower)
right = (item <= i.upper) if i.right == Bound.CLOSED else (item < i.upper)
if left and right:
return True
return False
became this.
# Item is a value
for i in self._intervals:
left = op.ge(item, i.lower) if i.left is Bound.CLOSED else op.gt(item, i.lower)
right = op.le(item, i.upper) if i.right is Bound.CLOSED else op.lt(item, i.upper)
if left and right:
return True
return False
where op.[ge,le,lt,gt,eq] can be replaced with functions like this, or replaced with the functions from import operator as op
for default behavior.
def lt(a, b, rel_tol=__rel_tol__, abs_tol=__abs_tol__):
a, b = float(a), float(b) # Catches any type that cant be coerced, not strictly needed.
return (a < b) and (not eq(a, b, rel_tol=rel_tol, abs_tol=abs_tol))
It should still pass your unit tests, albeit with some tests removed or modified to remove items that are based on strings.
Just food for thought, I like your library and thought I would kick back an idea for ya.
https://github.com/eehusky/portion
Cheers!
from portion.
Thanks for this! It's very interesting and I would really like to support this directly in portion without loss of generality. That's definitely something I'll try to address in a next major release.
BTW, was it easy to deal with portion codebase? Are there things that were not easy to deal with? I think at least I should try now to make any extension as easy as possible ;-)
from portion.
Yeah, once I found the right corner I needed to peel back it took me a couple hours of messing around. I was using sympy sets but wanted something a little lighter and this fit the bill perfectly once I found it...'set' is a very overloaded term when googling python stuff.
I haven't backported removing all the non applicable unit tests with strings, but I think the last commit I have should allow you to add custom operators pretty easily. By default two operator classes come built in (classic and floating point). Adding one for dates shouldnt be too difficult if you have a notion of how you want them to work.
https://github.com/eehusky/portion/blob/master/portion/fuzzy_operator.py
This was added to the top level of the interval.py module. After importing the library you can change the module level comparisons with the two functions. I set it up to default to classic <,<=,>=,==,!= so it should still work as you intended.
# From interval.py
from .fuzzy_operator import FuzzyOperator, ClassicOperator
op = ClassicOperator()
def set_operator(operator):
global op
op = operator
def set_tolerance(rel_tol, abs_tol):
global op
if not isinstance(op,FuzzyOperator):
op = FuzzyOperator()
op.rel_tol = rel_tol
op.abs_tol = abs_tol
This has the obvious caveat of only module level granularity and not per instance, but crossing that bridge was going to be way more work than I wanted to put in :) I think you could do something like importing the module multiple times and having each import with a separate operator class. Maybe something like this?
import portion as P
P.set_tolerance(1e-11, 1e-13)
P.closed(1, 2) | P.closed(3, 4) | P.closed(2, 3)
import portion as Q
Q.closed(1, 2) | Q.closed(3, 4) | Q.closed(2, 3)
I wasn't thinking when i started this and now I wish I hadn't run autopep8 on the source code and messed with the formatting in interval.py
Some of the changes like switching to is
for checking bound types was done with the intent of preventing me from goofing up.
<,<=,>=,>, ==,!=
Only used to compare intervals to intervals. Which end up getting delegated to the operator classes.is
CheckingBound.Open
orBound.Closed
op.[eq,ne,le,lt,gt,ge]
is only used for comparing items in intervals.
from portion.
Thanks for your answer! I'll have a look at the code of your fork and see what I can do to ease the implementation of custom comparisons in portion
directly. It is very likely that I will ask you to review/test the changes before releasing this, if you agree :-)
from portion.
The fork is working correctly as long as the values do not define a new order (i.e. if lt(a, b) , then a < b has to hold). This prevents, for example, defining an order that is exactly the opposite of the existing one (e.g. lt(a,b) = a > b; le(a,b) = a >= b, gt(a,b) = a < b, and so on). The reason it won't work is because we rely on sort
, min
and max
at some places:
https://github.com/eehusky/portion/blob/master/portion/interval.py#L134
For min and max, it is easy to rewrite them using lt/gt. For sort, it requires to define a new key function that should be built from a cmp function (not something very hard to do, though).
I have to be honest and admit that integrating all this represents a lot of changes in the code, and above all a lot of risk of error in the event of future changes.
I still believe that using "manually" a wrap/unwrap mechanism on values provided to intervals still seems the best approach to me (the "safest" from the point of view of portion
, anyway ^^) to manage these cases.
from portion.
To be fair, that same statement could be said now, I cant redefine the ordering of the values of the intervals. I guess I'm not following the logic behind why one would replace operators with ones that are defined backwards but okay.
But in the interest of being helpful, you could replace the calls to min/max/and sort with replacements in the new operators class (change sort to sorted). However....
The thing about min,max, and sort is that they aren't really impacted by the inexactness of floating point.
In the case of min,max, there can only be a single value returned from a collection of values that are not identical, so different tolerances have no impact. If two values are exactly equal only a single value will be returned.
Same goes for sort, the only thing that will happen if you sort floating points with a relative tolerance is that values that are ~= have the chance to be ordered differently. Most like in order of access. However, in this particular case it wont matter because those same values that are ~= are going to be close enough together to be merged into a single interval.
The point of these changes isnt to support reversing intervals, its to allow you to declare that two relatively close values can be considered the same (typically relatively close means a few ULPs). Or, in the case of the original issue, maybe that two dates are equal if they are within a few seconds?
- Disclaimer, I'm coming at this from a mathematics point of view so defining < as > is madness.
from portion.
Hello,
I completely agree with you :-) I know that the proposed solution aims to solve the problem of "close comparisons", and it does it well ;)
With portion, I try to be as generic as possible, that's why I'm aiming at a solution a little more "universal" that would also allow to redefine the semantics of <, >, ==, <= and >= as well. I agree that it makes little sense, a priori, to "reverse the order" of values. It was a somewhat artificial example, but having the possibility to redefine the order allows, for example, to support objects that are not natively comparable, or to be able to sort values on the basis of another criterion (for example, book titles that would normally be sorted by lexicographical order, but that one wishes to sort by date of publication).
Another problem in the proposed solution is the use of a "module" variable, which shares its state globally. This means that it is not possible to define this variable twice (and thus to manipulate two different "interval types" at the same time). Even if you import the module twice, the variable will be shared (unless you do an absolute import and a relative import, or play with the Python import machine, which would be a bit risky). Having said that, it's a "lesser evil", and I think I'll go for a solution based on this idea, because it's simpler to implement without touching the code too much (the alternative would be to have Intervals' that you could parametrize at creation time, and create subtypes on the fly with
type()' to specialize the intervals on certain types of values. I have a local branch with such an approach, but it poses other problems).
I'll do some testing as soon as I have some time, to support your use-case (as well as the "toy examples" above) directly in portion, probably by defining a module variable (and probably a function like cmp' that the user can redefine. The advantage of
cmp' is that that's all it takes for ``sort'' to still work properly, and it also works in min' and
max', even though the latter two were not fundamentally problematic to bypass ;-)
from portion.
Hello again,
I was working on this when I found a problematic example, and I need your lights to see how to better address it.
Let's assume we consider for the example that two floats are "close enough" if their absolute difference is, let's say, 0.1.
Let's create the following interval P.closed(0, 0.1)
. Should we assume that this interval is a singleton, or not?
If we assume it is not a singleton, then I believe its repr
should be [0,0.1]
which may seem strange given that the two bounds are considered equal (this is even more "obvious" if we consider [0, 0.0000000000001]
for which I clearly expect [0]
instead).
If we assume it is a singleton, then its repr
should be either [0]
or [1]
, but which one to choose? In the first case (the other is symmetrical) it means that [0,0.1] | [0.15,1]
will resolve to [0] | [0.15,1]
where one would expect it to resolve to [0,1]
(since 0.1 and 0.15 are "close enough").
from portion.
Related Issues (20)
- Get number of times multiple intervals overlap HOT 3
- Iteration of empty intervals is inconsistent. in general "empty := (+inf,-inf)" is problematic HOT 15
- Is this library performant enough to work with (non-atomic) intervals which span integers between 1 and 1 billion? HOT 1
- Using an external comparator? HOT 3
- iterate is broken with subclasses of Interval HOT 12
- Add __format__ method to Interval (improvement) HOT 16
- Thoughts about text-annotation use case and Pandas Ext. API HOT 5
- interval diameter (length, width, measure, range, or size) HOT 8
- Add join / merge method HOT 2
- "compatible version" specifier in setup.py confuses poetry HOT 2
- AttributeError: module 'portion.interval' has no attribute 'empty' HOT 3
- IntervalMultiDict HOT 3
- Add example for pandas in README HOT 5
- importlib error with create_api HOT 2
- Error to import interval, inf, imath from interval HOT 1
- importlib.machinery error with create_api HOT 6
- Enclosure Calculation Bug HOT 1
- Empty Calculation Bug HOT 3
- The performance issues of interval calculations in large quantities. HOT 1
- mass/Lebesgue measure? HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from portion.