Code Monkey home page Code Monkey logo

interval-diff's Introduction

interval-diff

Implementation of a vectorised interval set difference operation in numpy.

Consider two sets of intervals A and B such that

A     : (----)  (----)  (----)  (----)         (----) (------)
B     :    (---------------)      (------)  (----)      (----)

We'd like to calculate A \ B where A \ B is

A \ B : (--)               (-)  (-)              (--) ()

Installation

Clone the repo to your system and install

$ git clone https://github.com/BlakeJC94/interval-diff
$ cd interval-diff
$ pip install .

Quickstart

To use this algorithm, simply import and use it:

>>> import numpy as np
>>> from interval_diff.vectorised import interval_difference
>>> intervals_a = np.array(
...     [
...         (100, 200),
...         (300, 400),
...         (500, 600),
...         (700, 800),
...         (1000, 1100),
...         (1250, 1400),
...     ]
... )
>>> intervals_b = np.array(
...     [
...         (150, 580),
...         (720, 890),
...         (930, 1070),
...         (1300, 1400),
...     ]
... )
>>> result = interval_difference(intervals_a, intervals_b)
>>> print(result)
# [[ 100.  150.]
#  [ 580.  600.]
#  [ 700.  720.]
#  [1070. 1100.]
#  [1250. 1300.]]

This algorithm is also made to work for pandas Dataframes where the interval starts/ends are located in the columns 'start'/'end'. Any additional metadata in other columns is preserved.

>>> import pandas as pd
>>> intervals_a = pd.DataFrame(intervals_a, columns=['start', 'end'])
>>> intervals_a['tag'] = ["Q", "W", "E", "R", "Q", "R"]
>>> intervals_b = pd.DataFrame(intervals_b, columns=['start', 'end'])
>>> result = interval_difference(intervals_a, intervals_b)
>>> print(result)
#     start     end tag
# 0   100.0   150.0   Q
# 1   580.0   600.0   E
# 2   700.0   720.0   R
# 3  1070.0  1100.0   Q
# 4  1250.0  1300.0   R

To visualise the intervals, the included plotly function can be used

>>> from interval_diff.vis import plot_intervals
>>> fig = plot_intervals(
...     [intervals_a, intervals_b, result],
...     names=["A", "B", "A \ B"],
...     colors=["blue", "red", "blue"],
... )
>>> fig.show()

Intervals figure

To run a quick benchmark, use the CLI interface

$ interval-diff
100%|█████████████████████████████████████████████████████| 21/21 [00:23]
-------------------------------------------------------------------------
 [np] Intervals (3 samples)  | Non-vec mean (s)    | Vec mean (s)
-------------------------------------------------------------------------
 20                          | 0.000102            | 0.000159
 100                         | 0.000916            | 0.000177
 500                         | 0.017802            | 0.000369
 1000                        | 0.063827            | 0.000594
 2000                        | 0.239962            | 0.001068
 5000                        | 1.479928            | 0.002769
 10000                       | 5.907383            | 0.005583

Results will be written to results_np.csv

Performance on pandas dataframes can be benchmarks as well with the --dataframe (or -d) flag:

$ interval-diff -d
100%|█████████████████████████████████████████████████████| 21/21 [00:23]
-------------------------------------------------------------------------
 [pd] Intervals (3 samples)  | Non-vec mean (s)    | Vec mean (s)
-------------------------------------------------------------------------
 20                          | 0.001687            | 0.001906
 100                         | 0.002548            | 0.001858
 500                         | 0.018257            | 0.002021
 1000                        | 0.064690            | 0.002327
 2000                        | 0.254618            | 0.002722
 5000                        | 1.514108            | 0.004365
 10000                       | 5.988335            | 0.007742

Results will be written to results_pd.csv

Parameters can be configured with the --n-intervals (-n) and --n-samples (-k) flags like so:

$ interval-diff --n-intervals 10000 15000 20000 --n-samples 5
100%|█████████████████████████████████████████████████████| 15/15 [03:43]
-------------------------------------------------------------------------
 [np] intervals (5 samples)  | Non-vec mean (s)    | Vec mean (s)
-------------------------------------------------------------------------
 10000                       | 6.215123            | 0.005356
 15000                       | 13.867577           | 0.007547
 20000                       | 24.554746           | 0.009645

Contributing

Pull requests are most welcome!

  • Code is styled using black
    • Included in dev requirements
  • Code is linted with pylint (pip install pylint)
  • Requirements are managed using pip-tools (pip install pip-tools)
    • Add dependencies by adding packages to setup.py and running ./scripts/compile-requirements.sh
    • Add dev dependencies to setup.py under extras_require and run ./scripts/compile-requirements-dev.sh
  • Semantic versioning is used in this repo
    • Major version: rare, substantial changes that break backward compatibility
    • Minor version: most changes - new features, models or improvements
    • Patch version: small bug fixes and documentation-only changes

Virtual environment handling by pyenv is preferred. Run ./scripts/create-pyenv.sh for a quickstart

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.