Comments (9)
Hello,
Thank your for raising this issue!
The library has not been specifically designed to handle large amounts of intervals, but there is nothing that should prevent it from doing so ;-) So I'll try to investigate where is/are the bottleneck(s) :)
Could you please indicate what's the version of python-intervals
you're using? Also, would it be possible to share the code used to create the "performance report" you posted, so I can run it locally and use it to detect the bottlenecks and to see what are the changes that have a positive impact on the execution time?
Thank you!
from portion.
Or, if you cannot share this code (regardless of why ;-)), give some examples of intervals you try to invert. It seems, according to your message, that each interval is composed of hundreds of atomic ones. Is it right?
In that case, it is very likely that the forecoming 2.0.0 release will already exhibit some speed-up when computing the complement of such intervals. In 1.x.x, complement of an union (i.e. an Interval
composed of many AtomicInterval
instances) is computed by returning the intersection of the complements of each atomic interval which is very inefficient as the number of underlying atomic intervals increase. In the forecoming release, I compute the complement by iterating on pairs of consecutive atomic intervals, computing the "gap" between these pairs, and unioning them. This highly reduce the number of intermediate atomic intervals to compute (n+1
for n
atomics, where it was 2n
before), the number of method calls (no need to call an operation on each atomic interval), and the costly intersection at the end of the process (which was probably the most costly part of the operation).
from portion.
I've just tested the following snippet with 1.10.0
and the forecoming 2.0.0
:
import intervals as I
i = I.Interval(*(I.closed(i, i+1) for i in range(0, 10000, 2)))
assert ~~i == i
The last line was timeit
-ed both in 1.10.0 and 2.0.0. On my laptop, I got 93.2 ms ± 984 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) for 2.0.0. After 10 minutes, this line was still running with 1.10.0 :-)
So it seems the change I made for __invert__
in fc32c52 is an efficient one ;-)
from portion.
The version that we have is 1.10.0
, but it seems that you already solved the problem with the latest commit, so we will use that version instead. As a side note, I'm impressed by the fast response and investigation. That's true public service spirit! Thank you for your work on that lib.
from portion.
You're welcome ;-)
Note that 2.0.0 is not yet released on PyPI. I plan to release it in the next few days, but I still have some things to do before (e.g. I would like to add stub files for type annotations, I would like to write a migration guide from 1.x.x to 2.0.0 and now I would like to check for performance/bottleneck issues ^^).
TL;DR: do not consider the master branch of 2.0.0 to be stable at the moment ;-)
Also, I'm currently considering changing the name of python-intervals
(1) to avoid having "python" in it (not useful), and (2) mainly to avoid any confusion with the intervals
library that is also distributed on PyPI.
from portion.
I confirm you shouldn't rely on 2.0.0 for now, I've just found a bug in __contains__
. I'll try to write a test tomorrow and to fix this (and also to speed up intersection and containment if I've time ^^). For instance, I think (I.closed(0,1) | I.closed(2, 3)).contains(I.closed(0,0.5) | I.closed(2.5, 3))
returns False
where it should return True
. That's a regression w.r.t. 1.10.0.
from portion.
Ok thanks for the info. We'll work around the interval inversion for now until v2
from portion.
I've just released 2.0.0. Notice that python-intervals
is now distributed as portion
.
The list of changes is available on https://github.com/AlexandreDecan/portion/blob/master/CHANGELOG.md, and there are some backwards incompatible ones (the ones that are likely to break are i.is_empty()
and i.is_atomic()
). I would be happy to help if you have issues with this new release.
Performance-wise, I created a small informal report here : #21
from portion.
Ok, we will do the update and adapt the code. It's still fresh so that should be no problem.
from portion.
Related Issues (20)
- Get number of times multiple intervals overlap HOT 3
- Iteration of empty intervals is inconsistent. in general "empty := (+inf,-inf)" is problematic HOT 15
- Is this library performant enough to work with (non-atomic) intervals which span integers between 1 and 1 billion? HOT 1
- Using an external comparator? HOT 3
- iterate is broken with subclasses of Interval HOT 12
- Add __format__ method to Interval (improvement) HOT 16
- Thoughts about text-annotation use case and Pandas Ext. API HOT 5
- interval diameter (length, width, measure, range, or size) HOT 8
- Add join / merge method HOT 2
- "compatible version" specifier in setup.py confuses poetry HOT 2
- AttributeError: module 'portion.interval' has no attribute 'empty' HOT 3
- IntervalMultiDict HOT 3
- Add example for pandas in README HOT 5
- importlib error with create_api HOT 2
- Error to import interval, inf, imath from interval HOT 1
- importlib.machinery error with create_api HOT 6
- Enclosure Calculation Bug HOT 1
- Empty Calculation Bug HOT 3
- The performance issues of interval calculations in large quantities. HOT 1
- mass/Lebesgue measure? HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from portion.