Code Monkey home page Code Monkey logo

Comments (13)

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Hello,

Thank you for this feedback! I understand that floating-point artifacts could be an issue in these cases. However, dealing with these issues is slightly out-of-scope of portion since we do not make any assumption about the objects being compared. For some of them, a tolerance threshold is not required, while for some other, the tolerance is not even a single value. Moreover, supporting such a threshold means that objects require to support subtraction (or more generally, a notion of distance between two of these objects) which is a strong assumption unfortunately :-/

What I can suggest you however is to implement a tiny wrapper for your objects, in which you override the various comparison methods (e.g. __eq__ obviously, but also, __lt__, __gt__, etc. if needed, depending on which ones are already implemented in your objects) such that their computation can take into account the "tolerance threshold" before delegating the comparison to the wrapped object.

I hope this helps you! Meanwhile, I'll think about the possibility to support such wrap/unwrap operations on the fly in portion, since it is also something that could be useful for #29.

from portion.

egemenimre avatar egemenimre commented on May 26, 2024

Hallo,

Thanks for the prompt reply. I understand that the beauty of portion is generality, and I can see how subtraction can be a limitation. Unfortunately, I do not have control of the Time class from astropy to implement a custom __eq__ (with tolerances) and I wouldn't want to use a subclass just for that. (Is this what you meant?)

So, I will have to guard against them in my implementation of TimeInterval, which uses portion under the hood. Thanks anyway! :)

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Hello,

Not necessarily a subclass, but a thin wrapper like this:

class Wrapper:
  def __init__(self, wrapped):
    self.wrapped = wrapped
  def __eq__(self, other): 
    # fancy stuff here to support comparison

It should also be possible to define __getattr__ to delegate all attributes/methods access to the wrapped object to make it more convenient to use them. Obviously, all of this requires to wrap/unwrap objects before using them with portion :-/ (that could be inconvenient though).

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Hello,

Did the workaround work? Can I close this issue for now? :-) It's on my todo-list to find a way to deal with this kind of situations, but I've no short plan to work on it :(

from portion.

egemenimre avatar egemenimre commented on May 26, 2024

from portion.

eehusky avatar eehusky commented on May 26, 2024

It aint pretty (I had a specific use case in mind and made the minimal changes i needed to make it work) but I forked your library and made some changes to support using math.isclose for floating point operations. I had to drop all support for intervals based on types that can not be coerced into a floating point and needed to make some tweaks to _Pinf to make it work seamlessly with float('inf')

Essentially, this

# Item is a value
for i in self._intervals:
    left = (item >= i.lower) if i.left == Bound.CLOSED else (item > i.lower)
    right = (item <= i.upper) if i.right == Bound.CLOSED else (item < i.upper)
    if left and right:
        return True
return False

became this.

# Item is a value
for i in self._intervals:
    left = op.ge(item, i.lower) if i.left is Bound.CLOSED else op.gt(item, i.lower)
    right = op.le(item, i.upper) if i.right is Bound.CLOSED else op.lt(item, i.upper)
    if left and right:
        return True
return False

where op.[ge,le,lt,gt,eq] can be replaced with functions like this, or replaced with the functions from import operator as op for default behavior.

def lt(a, b, rel_tol=__rel_tol__, abs_tol=__abs_tol__):
    a, b = float(a), float(b)  # Catches any type that cant be coerced, not strictly needed.
    return (a < b) and (not eq(a, b, rel_tol=rel_tol, abs_tol=abs_tol))

It should still pass your unit tests, albeit with some tests removed or modified to remove items that are based on strings.

Just food for thought, I like your library and thought I would kick back an idea for ya.

https://github.com/eehusky/portion

Cheers!

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Thanks for this! It's very interesting and I would really like to support this directly in portion without loss of generality. That's definitely something I'll try to address in a next major release.

BTW, was it easy to deal with portion codebase? Are there things that were not easy to deal with? I think at least I should try now to make any extension as easy as possible ;-)

from portion.

eehusky avatar eehusky commented on May 26, 2024

Yeah, once I found the right corner I needed to peel back it took me a couple hours of messing around. I was using sympy sets but wanted something a little lighter and this fit the bill perfectly once I found it...'set' is a very overloaded term when googling python stuff.

I haven't backported removing all the non applicable unit tests with strings, but I think the last commit I have should allow you to add custom operators pretty easily. By default two operator classes come built in (classic and floating point). Adding one for dates shouldnt be too difficult if you have a notion of how you want them to work.

https://github.com/eehusky/portion/blob/master/portion/fuzzy_operator.py

This was added to the top level of the interval.py module. After importing the library you can change the module level comparisons with the two functions. I set it up to default to classic <,<=,>=,==,!= so it should still work as you intended.

# From interval.py
from .fuzzy_operator import FuzzyOperator, ClassicOperator

op = ClassicOperator()

def set_operator(operator):
    global op
    op = operator

def set_tolerance(rel_tol, abs_tol):
    global op
    if not isinstance(op,FuzzyOperator):
        op = FuzzyOperator()

    op.rel_tol = rel_tol
    op.abs_tol = abs_tol

This has the obvious caveat of only module level granularity and not per instance, but crossing that bridge was going to be way more work than I wanted to put in :) I think you could do something like importing the module multiple times and having each import with a separate operator class. Maybe something like this?

import portion as P
P.set_tolerance(1e-11, 1e-13)
P.closed(1, 2) | P.closed(3, 4) | P.closed(2, 3)

import portion as Q
Q.closed(1, 2) | Q.closed(3, 4) | Q.closed(2, 3)

I wasn't thinking when i started this and now I wish I hadn't run autopep8 on the source code and messed with the formatting in interval.py

Some of the changes like switching to is for checking bound types was done with the intent of preventing me from goofing up.

  • <,<=,>=,>, ==,!= Only used to compare intervals to intervals. Which end up getting delegated to the operator classes.
  • is Checking Bound.Open or Bound.Closed
  • op.[eq,ne,le,lt,gt,ge] is only used for comparing items in intervals.

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Thanks for your answer! I'll have a look at the code of your fork and see what I can do to ease the implementation of custom comparisons in portion directly. It is very likely that I will ask you to review/test the changes before releasing this, if you agree :-)

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

The fork is working correctly as long as the values do not define a new order (i.e. if lt(a, b) , then a < b has to hold). This prevents, for example, defining an order that is exactly the opposite of the existing one (e.g. lt(a,b) = a > b; le(a,b) = a >= b, gt(a,b) = a < b, and so on). The reason it won't work is because we rely on sort, min and maxat some places:
https://github.com/eehusky/portion/blob/master/portion/interval.py#L134

For min and max, it is easy to rewrite them using lt/gt. For sort, it requires to define a new key function that should be built from a cmp function (not something very hard to do, though).

I have to be honest and admit that integrating all this represents a lot of changes in the code, and above all a lot of risk of error in the event of future changes.

I still believe that using "manually" a wrap/unwrap mechanism on values provided to intervals still seems the best approach to me (the "safest" from the point of view of portion, anyway ^^) to manage these cases.

from portion.

eehusky avatar eehusky commented on May 26, 2024

To be fair, that same statement could be said now, I cant redefine the ordering of the values of the intervals. I guess I'm not following the logic behind why one would replace operators with ones that are defined backwards but okay.

But in the interest of being helpful, you could replace the calls to min/max/and sort with replacements in the new operators class (change sort to sorted). However....


The thing about min,max, and sort is that they aren't really impacted by the inexactness of floating point.
In the case of min,max, there can only be a single value returned from a collection of values that are not identical, so different tolerances have no impact. If two values are exactly equal only a single value will be returned.

Same goes for sort, the only thing that will happen if you sort floating points with a relative tolerance is that values that are ~= have the chance to be ordered differently. Most like in order of access. However, in this particular case it wont matter because those same values that are ~= are going to be close enough together to be merged into a single interval.

The point of these changes isnt to support reversing intervals, its to allow you to declare that two relatively close values can be considered the same (typically relatively close means a few ULPs). Or, in the case of the original issue, maybe that two dates are equal if they are within a few seconds?

  • Disclaimer, I'm coming at this from a mathematics point of view so defining < as > is madness.

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Hello,

I completely agree with you :-) I know that the proposed solution aims to solve the problem of "close comparisons", and it does it well ;)

With portion, I try to be as generic as possible, that's why I'm aiming at a solution a little more "universal" that would also allow to redefine the semantics of <, >, ==, <= and >= as well. I agree that it makes little sense, a priori, to "reverse the order" of values. It was a somewhat artificial example, but having the possibility to redefine the order allows, for example, to support objects that are not natively comparable, or to be able to sort values on the basis of another criterion (for example, book titles that would normally be sorted by lexicographical order, but that one wishes to sort by date of publication).

Another problem in the proposed solution is the use of a "module" variable, which shares its state globally. This means that it is not possible to define this variable twice (and thus to manipulate two different "interval types" at the same time). Even if you import the module twice, the variable will be shared (unless you do an absolute import and a relative import, or play with the Python import machine, which would be a bit risky). Having said that, it's a "lesser evil", and I think I'll go for a solution based on this idea, because it's simpler to implement without touching the code too much (the alternative would be to have Intervals' that you could parametrize at creation time, and create subtypes on the fly with type()' to specialize the intervals on certain types of values. I have a local branch with such an approach, but it poses other problems).

I'll do some testing as soon as I have some time, to support your use-case (as well as the "toy examples" above) directly in portion, probably by defining a module variable (and probably a function like cmp' that the user can redefine. The advantage of cmp' is that that's all it takes for ``sort'' to still work properly, and it also works in min' and max', even though the latter two were not fundamentally problematic to bypass ;-)

from portion.

AlexandreDecan avatar AlexandreDecan commented on May 26, 2024

Hello again,

I was working on this when I found a problematic example, and I need your lights to see how to better address it.
Let's assume we consider for the example that two floats are "close enough" if their absolute difference is, let's say, 0.1.

Let's create the following interval P.closed(0, 0.1). Should we assume that this interval is a singleton, or not?
If we assume it is not a singleton, then I believe its repr should be [0,0.1] which may seem strange given that the two bounds are considered equal (this is even more "obvious" if we consider [0, 0.0000000000001] for which I clearly expect [0] instead).

If we assume it is a singleton, then its repr should be either [0] or [1], but which one to choose? In the first case (the other is symmetrical) it means that [0,0.1] | [0.15,1] will resolve to [0] | [0.15,1] where one would expect it to resolve to [0,1] (since 0.1 and 0.15 are "close enough").

from portion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.