Comments (16)
That's something I already considered but I don't remember exactly why I decided not to implement it. :-)
Oh I remember : that's because format can also be used to define some padding/alignment and there's no easy way to apply this to infinities, leading to unexpected string rendering and "inconsistencies".
Yes, but I'm pretty sure we can figure out something.
For the record, my first motivation to subclass Interval
was to add such a method, because when working with datetime intervals formatting quickly yields to unreadable output.
Antoine
from portion.
If we can parse format specifier then we could try to apply it to infinities as well. But I don't see how currently :-) help is welcome on this.
from portion.
Not sure there is a unique / perfect solution. Probably some design choice has to be made.
Not sure either that it is mandatory to try to implement all formating directives (e.g. alignment). I see them as indications subject to best effort, when it make sense, that can even be dropped. For instance if a string is to long for a length indication, it is not truncated. Another example is datetime formating that simply doesn't support padding or alignment directives.
So, how the spec
argument of __format__
is used basically depends on the object it is invoked on and the use case motivation.
My motivation with datetime is that having (especially non atomic) intervals with the default datetime representation is just unreadable.
def __format__(self, spec):
return P.to_string(self, lambda v: ('{:'+spec+'}').format(v))
The previous proposal is simple and fullfills this purpose.
If one wants to deal with aligments in a consistent way, one must find a way to split the spec
in subpieces that would be applied over interval elements. What about using a separator? For instance:
f"{i:|4d|4s|^15s|}"
where |
is the separator that breaks the spec in 3 parts, first parts applies to values that are not infinities, second part to infinities, and last part to overall result.
For instance:
def __format__(self, spec):
# FIXME: handle null spec
# FIXME: handle exception because __format__ might not be defined for v
parts = spec[1:].split(spec[0])
pinf = '+inf'.__format__(parts[1]) if len(parts) > 1 else '+pinf'
ninf = '-inf'.__format__(parts[1]) if len(parts) > 1 else '-pinf'
result = P.to_string(self,
lambda v: v.__format__(parts[0]),
pinf=pinf,
ninf=ninf)
if len(parts) > 2:
result = result.__format__(parts[2])
return result
Here, the separator is chosen by the user (first character after the semicolon). Another choice would be to use a predefined separator in which case it should not appear after the first semicolon. This would be slightly more readable, The drawback is that it can not be part of the spec for sub parts.
Best regards,
Antoine
from portion.
I don't really like the idea of using a separator. Formatting a string is already something sometimes complicated (because one has to remember the specifiers) and if we're going to support __format__
for intervals, then we should try to stick as much as possible to its "usual" behaviour for other, built-in, objects.
I like the simplicity of your first proposal:
def __format__(self, spec):
return P.to_string(self, lambda v: ('{:'+spec+'}').format(v))
What would be nice is to be allow to extract all parts of spec
that could be applied on infinities (that is, mostly padding/alignment) and to apply them on infinities. I don't know if there's some parser available in the Python standard library for doing so. If there's one, then it might be easy to implement something along this line.
from portion.
This could be a starting point:
https://peps.python.org/pep-3101/#user-defined-formatting (I still need to read it)
from portion.
What would be nice is to be allow to extract all parts of
spec
that could be applied on infinities (that is, mostly padding/alignment) and to apply them on infinities. I don't know if there's some parser available in the Python standard library for doing so. If there's one, then it might be easy to implement something along this line.
Yes, but it seems to be impossible to have a spec that could apply at the same time to strings (infinities) and objects (of unknown type) in the interval. See datetime
again: there is no way to specify a padding nor alignment (extra char in the spec are simply printed as is). What we are looking for is a "union" spec that would apply to arbitrary types... Seems complicated and beyond the scope of your library I think. Maybe the PEP you're pointing at...
I can' say I like the separator idea... quick and dirty solution. I might use the separator in my subclasses as a convenience (I didn't so far).
Maybe the reasonable choice is to do nothing (as you did) or simply provide the first proposal I made (with a warning about the limits en the doc).
Best regards,
Antoine
from portion.
I'll have a look at how format
and its specifier is expected to be provided, and see if I can come up with a solution that works on all bounds (including infinities). If not, then I think it's indeed safer to do nothing (I prefer to avoid a "partial" solution, since in any case, to_string
can be used anyway to cover all cases).
from portion.
I'm affraid there is no way to have a universal specifier, as specifiers (and how they are interpreted) depends on objet they are used with.
Of course to_string
can be used, but the format protocol and the syntactic sugar it provides is much more compact/easy.
Anyway the idea of separator that came up because of this thread will solve (even if not perfect) some cases I have in a much more compact way. I will implement it in some of my subclasses of intervals!
Best regards.
from portion.
I'm affraid there is no way to have a universal specifier, as specifiers (and how they are interpreted) depends on objet they are used with.
Ideed, but the PEP defines a kind of "minimal set of specifiers" ([[fill]align][sign][#][0][minimumwidth][.precision][type]
). Not everything makes sense for infinities, but it would be nice to be able to extract what makes sense for them.
Notice that I discussed this morning with another Pythonista, and he told me something interesting: none of the object containers in Python (e.g. list, dict, etc.) actually supports defining the format of nested objects through __format__
. If we do so for interval, we somehow bring something unexpected since, for example, one would use __format__
to align the whole interval to the right of a 20-characters long string, while another one would use it to align each bound to the right of a 20-characters long string. The latter is what we initially define as the behaviour of __format__
while the former is what is usually expected from it. That may be confusing, and he even suggested to apply the former behaviour since we have a to_string
function that can be used to apply the latter.
from portion.
This could be a starting point:
https://peps.python.org/pep-3101/#user-defined-formatting (I still need to read it)
Isn't the PEP you're referring to the one that introduced Formater class and already is in the language (python 2.6 and 3.0)?
Since meaning of format specifier depends on the object it is applied to, I really can't see how a unique specifier could be processed in order to apply some (relevant) parts of it to infinities (str), and other parts to be applied to unknown objects.
Consider some align requirements and intervals with datetimes: how is it possible to guess that you must delete the align spec when applying to a value, and keep for infinities. You can only do it if you have the knowledge of what format specifier is supported by non infinities.
Here is a refined idea with separator, taking advantage of #
char as "alternate" specifier:
- a simple specifier applies to upper/lower bound values (but not infinities). The underlying assumption is that the format will be understood by all values: this will be the case - probably the most common one - where all values have the same type. This is my initial proposal.
- the alternate form start with
#
, followed by a separator (user chosen), followed by multiple separated format string that applies (in this order) to upper/lower bound values, then infinities, then the whole interval representation. For instance:f"{i:#/6.2f/^6/^24}"
. Values are expected to be floats, infinities are centered among 6 chars, and the whole interval centered again.
Simple, powerfull enough for most use cases... Much more compact and handy than using to_string
for most cases also. Adopted for my subclasses!
Some examples:
>>> i=closed(1,2)
>>> j=closed(1,inf)
>>> k=closed(datetime(2000,1,1), datetime.now())
>>> i
[1,2]
>>> j
[1,+inf)
>>> k
[datetime.datetime(2000, 1, 1, 0, 0),datetime.datetime(2023, 1, 30, 17, 55, 55, 608774)]
>>> f"{i:08d}"
'[00000001,00000002]'
>>> f"{j:08d}"
'[00000001,+inf)'
>>> f"{i:#/08d//^30}"
' [00000001,00000002] '
>>> f"{j:#/08d/^8/^30}"
' [00000001, +inf ) '
>>> f"{k:#/ %Y-%m-%d //^30}"
' [ 2000-01-01 , 2023-01-30 ] '
>>> f"{i:#/#010x}"
'[0x00000001,0x00000002]'
Here is the code that would do it (not heavily tested). Anybody shall feel free to reuse it:
class I(P.Interval):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Can owerwrite on a per class (or even instance) basis
DISJ_STR = " | "
SEP_STR = ","
LEFT_OPEN_STR = "("
LEFT_CLOSED_STR = "["
RIGHT_OPEN_STR = ")"
RIGHT_CLOSED_STR = "]"
PINF_STR = "+inf" # "+oo"
NINF_STR = "-inf" # "-oo"
EMPTY_STR = "()"
DEFAULT_CONV = str
def to_str(self,
conv: Callable = repr,
*,
disj: Optional[str] = None,
sep: Optional[str] = None,
left_open: Optional[str] = None,
left_closed: Optional[str] = None,
right_open: Optional[str] = None,
right_closed: Optional[str] = None,
pinf: Optional[str] = None,
ninf: Optional[str] = None,
empty: Optional[str] = None) -> str:
"""
"""
left_open = self.LEFT_OPEN_STR if left_open is None else left_open
right_open = self.RIGHT_OPEN_STR if right_open is None else right_open
if self.empty:
return (empty if empty is not None
else self.EMPTY_STR if self.EMPTY_STR
else left_open + right_open)
left_closed = self.LEFT_CLOSED_STR if left_closed is None else left_closed
right_closed = self.RIGHT_CLOSED_STR if right_closed is None else right_closed
sep = self.SEP_STR if sep is None else sep
pinf = self.PINF_STR if pinf is None else pinf
ninf = self.NINF_STR if ninf is None else ninf
result = []
for i in self._intervals:
if i.lower == i.upper:
result.append(left_closed + conv(i.lower) + right_closed)
else:
result.append(
(left_open if i.left == P.OPEN else left_closed)
+ (ninf if i.lower == -P.inf else conv(i.lower))
+ sep
+ (pinf if i.upper == P.inf else conv(i.upper))
+ (right_open if i.right == P.OPEN else right_closed)
)
return (disj if disj is not None else self.DISJ_STR).join(result)
def __format__(self, spec: str) -> str:
"""
Default object formatter.
The format specifier is applied to lower and upper bound values,
provided they are not infinities. Since their is not known controled
by the Interval class, it is up to the user to provide a compatible
format specifier.
An alternate specifier form (starting with '#') can be used in order
to control how infinities and the overall result are formated.
The '#' opening char must be followed by a (user chosen) separator.
Every string between separators is then a format specifier that apply
to the lower and upper bound values (first specifier), infinities
(second specifier applicable to str) and the overall result (third
specifier, applicable to str). Second and third specifiers are
optional.
Examples:
>>> i=closed(1,2)
>>> i
[1,2]
>>> j=closed(1,inf)
>>> j
[1,+inf)
>>> k=closed(datetime(2000,1,1), datetime.now())
>>> k
[datetime.datetime(2000, 1, 1, 0, 0),datetime.datetime(2023, 1, 30,
17, 55, 55, 608774)]
>>> f"{i:08d}"
'[00000001,00000002]'
>>> f"{j:08d}"
'[00000001,+inf)'
>>> f"{i:#/08d/^8/^30}"
' [00000001,00000002] '
>>> f"{j:#/08d/^8/^30}"
' [00000001, +inf ) '
>>> f"{k:#/ %Y-%m-%d //^30}"
' [ 2000-01-01 , 2023-01-30 ] '
>>> f"{i:#/#010x}"
'[0x00000001,0x00000002]'
"""
pinf = self.PINF_STR
ninf = self.NINF_STR
# Format specifier applies to values, but not to infinities
if not spec.startswith('#'):
spec = '{:'+spec+'}'
return self.to_str(conv = lambda v: spec.format(v))
if len(spec) == 1:
raise ValueError("Invalid format specifier (missing separator)")
# Alternate form: multiple format specifier that apply to values,
# infinities and the whole result. First char after '#' is a separator
# used to split spec into sub specs. For instance '#/8.2f/^8'
subspec = spec[2:].split(spec[1])
if subspec[0]:
if subspec[0][0] == '!':
value_fmt = '{'+subspec[0]+'}'
else:
value_fmt = '{:'+subspec[0]+'}'
conv = lambda v: value_fmt.format(v)
else:
conv = repr
if len(subspec) > 1 and subspec[1]:
pinf = ('{:'+subspec[1]+'}').format(pinf)
ninf = ('{:'+subspec[1]+'}').format(ninf)
result = self.to_str(conv=conv, ninf=ninf, pinf=pinf)
if len(subspec) > 2 and subspec[2]:
return ('{:'+subspec[2]+'}').format(result)
return result
def __str__(self):
return self.to_str(conv=self.DEFAULT_CONV)
def __repr__(self):
# EQUIV TO return self.to_str(conv=repr) but without flexibility
# and all constants hardcoded. Faster.
if self.empty:
return "()"
return " | ".join(
"["+repr(i.lower)+"]" if i.lower == i.upper
else (("(" if i.left == P.OPEN else "[")
+ repr(i.lower)
+ ","
+ repr(i.upper)
+ (")" if i.right == P.OPEN else "]"))
for i in self._intervals
)
from portion.
And a few speed tests...
# Create Interval with a given number of atomic intervals
def make_interval(cls=P.Interval, n=10):
return cls(*(P.closed(k, k+1) for k in range(0, 2*n, 2)))
# Original class
i0=make_interval(n=0)
i10=make_interval(n=10)
i100=make_interval(n=100)
i1000=make_interval(n=1000)
i10000=make_interval(n=10000)
# Derived class
j0=make_interval(cls=I,n=0)
j10=make_interval(cls=I,n=10)
j100=make_interval(cls=I,n=100)
j1000=make_interval(cls=I,n=1000)
j10000=make_interval(cls=I,n=10000)
Comparing repr
implementation
>>> %timeit repr(i0)
... 192 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit repr(j0)
... 189 ns ± 3.33 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit repr(i10)
... 8.31 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> %timeit repr(j10)
... 6.07 µs ± 273 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> %timeit repr(i100)
... 77.7 µs ± 519 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit repr(j100)
... 53 µs ± 311 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit repr(i10000)
... 8.07 ms ± 496 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit repr(j10000)
... 5.4 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
repr
is a little bit faster: 30 % speedup- probably because it doesn't call
format
with'{}'
.
Comparing to_string
implementation
>>> %timeit P.to_string(i0)
... 560 ns ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit P.to_string(i10)
... 55.8 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit P.to_string(i100)
... 535 µs ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> %timeit P.to_string(i1000)
... 5.4 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit P.to_string(i10000)
... 57.8 ms ± 4.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit j0.to_str()
... 418 ns ± 3.87 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit j10.to_str()
... 10.8 µs ± 471 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> %timeit j100.to_str()
... 98.9 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit j1000.to_str()
... 1.01 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> %timeit j10000.to_str()
... 9.82 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
- orignal
to_string
implementation is way slower thenrepr
: about 7 time slower to_str
method is much faster thanto_string
function: 5 to 6 time faster- it is worth to duplicate
repr
code, even ifto_str
method could be called instead, as it faster with hardcoded strings
Best regards,
Antoine
from portion.
Thanks for sharing the snippets and for these benchmarks. That's interesting, even if I'm not surprised that calling a method is faster than calling the to_string
function that cannot benefit from Interval
internals to speed up its task.
You said:
Since meaning of format specifier depends on the object it is applied to, I really can't see how a unique specifier could be processed in order to apply some (relevant) parts of it to infinities (str), and other parts to be applied to unknown objects. Consider some align requirements and intervals with datetimes: how is it possible to guess that you must delete the align spec when applying to a value, and keep for infinities. You can only do it if you have the knowledge of what format specifier is supported by non infinities.
I agree :) My point was rather to try to extract from a specifier what can be applied to infinities, and to use it as-is for the object. This is: use the specifier as-is for the bounds that are not infinities, and extract what can be applied to infinities (e.g. padding, alignment, etc.) and apply it on infinities. Anyway, given the potential confusion that could arise because of __format__
as I was told, I'm a bit reluctant to implement it that way :-)
I've added the "information" label to this issue since I believe it will be helpful for people looking something similar. I'll definitely link to this issue in the future "Recipes" section of the documentation :-)
from portion.
I've changed our __repr__
and to_string
so to avoid the use of format
and to concatenate strings instead (even if the speed up is small, it's still worth to have it ;-)). I credit you for this in the changelog since without your benchmarks, I couldn't have spotted that one ;-)
from portion.
I've added the "information" label to this issue since I believe it will be helpful for people looking something similar. I'll definitely link to this issue in the future "Recipes" section of the documentation :-)
Hi Alexandre,
Glad you found it usefull.
Something puzzles me however. You find it usefull to get rid of unnecessary format
calls: they yield a 1.3 speed improvement, and you changed your code.
But you discard as almost irrelevant a 7 times (!) speedud just saying it was expected. Of course it was expected, but 7 times speedup...! where there is absolutely no benefit for the user to stick to the "functionnal style" (simply have def to_string(i): return i->to_str()
if you want it to be called without having a method call). Interesting case of confirmation bias.
Best regards,
Antoine
from portion.
Hi,
The speed-up for __repr__
comes for free while the changes you suggest for to_string
/to_str
requires additional code (and contracts) in Interval
, so that to_string
can implicitly use the internals of Interval
. Given the difference is at most a few ms even for very large intervals (and a few ns for the most usual cases), I haven't made this change (and I know that the difference is even lower for __repr__
, but as I said, the difference comes for free).
You said "interesting case of confirmation bias". I don't know if that's the case or not, but I always try to weigh the pros and cons of every decision I make about portion
and its API. These decisions are not necessarily the best ones, but I am the one who will continue to maintain portion
, to add new features, to fix bugs, etc. that is: the one that will have to live with the decisions and their consequences. For the current case, status-quo is the decision I have made and that I am ready to assume :)
from portion.
These decisions are not necessarily the best ones, but I am the one who will continue to maintain
portion
, to add new features, to fix bugs, etc. that is: the one that will have to live with the decisions and their consequences.
Sure, Alexandre: your code, your design, your choices, your decisions... And I would even say, and I truly mean it (trust me!), the best decisions overall are the one you make :-)
But yet... you have 25 lines of code in io.py to implement to_string
and 16 lines of code in interval.py to implement __repr__
. What they do is exactly the same thing, because __repr__
is basically a call to to_string
with default arguments. Just keep one and you'll get less code to test, less code to maintain, DRY. I'm not even speaking of coding style nor speed. This is a library maintainer point of view.
Now as a library user, the implicit contract is that a call to to_string
(with default arguments) should be undistinguishable from a call to __repr__
(well in fact, to_string
should probably be mapped to __str__
also, but it's another matter). The point is they are not (see issue #77 that has been deleted) : I wouldn't even have noticed there was two implementations if I didn't step into it and started to look into the code to understand why I got an exception. Still not speaking of coding style nor speed.
Now, it is obvious that the difference between the 2 is that one of them involves many unnecessary object instanciations (hence being far more slower) while the other do not. And that any two of them could be a call to the other one (or, better, they both should call a unique method). So, there is not need to change the API. And the speedup also comes for free.
Having said that, I don't see any benefit to keep 2 implementations, and to keep the slowest one.
There is probably one and I am missing something. But what? This is just curiosity.
And still, your code, your decision!
And thanks for it!
Best regards,
Antoine
from portion.
Related Issues (20)
- Get number of times multiple intervals overlap HOT 3
- Iteration of empty intervals is inconsistent. in general "empty := (+inf,-inf)" is problematic HOT 15
- Is this library performant enough to work with (non-atomic) intervals which span integers between 1 and 1 billion? HOT 1
- Using an external comparator? HOT 3
- iterate is broken with subclasses of Interval HOT 12
- Thoughts about text-annotation use case and Pandas Ext. API HOT 5
- interval diameter (length, width, measure, range, or size) HOT 8
- Add join / merge method HOT 2
- "compatible version" specifier in setup.py confuses poetry HOT 2
- AttributeError: module 'portion.interval' has no attribute 'empty' HOT 3
- IntervalMultiDict HOT 3
- Add example for pandas in README HOT 5
- importlib error with create_api HOT 2
- Error to import interval, inf, imath from interval HOT 1
- importlib.machinery error with create_api HOT 6
- Enclosure Calculation Bug HOT 1
- Empty Calculation Bug HOT 3
- The performance issues of interval calculations in large quantities. HOT 1
- mass/Lebesgue measure? HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from portion.