Code Monkey home page Code Monkey logo

fastrunningmedian.jl's People

Contributors

artemsolod avatar firionus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

artemsolod

fastrunningmedian.jl's Issues

Add support for output pre-allocation

It would be great to add a running_median!function to support pre-allocation of the outputs, because the gc time could account for 50% for a large amount of series.
Or, add support for multiple series and enable multi-threads?

Reduce size of final library

  • Do we ship the benchmark Jupyter Notebook? That could be in its own branch.
  • What about the test fixtures? Can we not ship them? Or reduce their size somehow?

Add fast implementation for small window sizes

See the R/C implementation of runmed for reference

The Stuetzle implementation for small windows can be found at https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/src/Srunmed.c

ToDo

  • Initial implementation with Stuetzle-like searching for updated median or lower/upper (see branch smallwin_v2)
  • Change partialsort-calls to pivot counting
  • Refactor Pivot Counting more clearly
  • Create higher level API that iterates over array
  • Fuzz tests like for double heap (including things like rand(1:2) or rand(1:3) with different window sizes like 1, 2, 3, 4, 5, ... to test for certain edge cases)
  • Benchmark against SortFilters, our own ArrayFilter implementation (?), SkipList implementation (?), DoubleHeap implementation, R implementation (see #20)
  • Formulize pivot counting approach mathematically and explain algorithm better
    • Example: To be a valid median, certain counting conditions (formulate nicely) need to hold. If they don't, the next smaller/bigger element is the new median.
  • Bind into existing high level API (without breaking it)
  • Whole lot of QA stuff, Refactoring

nice work, may I use it

I would like to replace my default rolling median (which is rolling(VectorizedStatistics.vmedian, ..)) with a call to yours that may be post-adjusted to comport with my api.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

New Benchmark Comparison with Logarithmic Violin Plots

While the logarithmic scale is great, I think we could remove the big O lines, they don't really add much. Also, the libraries should be offset so one can judge the error bars (violin distributions). This probably also means limiting the amount of window sizes somewhat more.

Add Test Cases for OffsetArrays

Since #10 allows OffsetArrays, these should be tested to work properly.

Also, we should test for AbstractSparseVector that don't start at 1, aren't regularly spaced, etc.

Accept More Element Data Types

Split from #21

The algorithm might benefit from accepting more data types, like

  • Union{Float64, Missing} (handle like NaN)
  • DateTime
  • Unitful (does it work already?)
  • Other ideas?

Support for NaN input

There are many missing/corrupted data in real world cases, adding support for NaN in inputs would be great.

Don't return updated median from stateful API

Currently grow!, shrink! and roll! return the new median. This may not be needed in all cases and results in performance overhead of acessing the field or calculating the mean of the two top values on the heaps.

Allow roll! with non-full window

  • Remove error and replace with workaround of shrink! followed by grow!
  • Test
  • Benchmark
  • Do efficient implementation of it. Will require switching from CircularBuffer to something else that can be circular at less than full capacity without performance hit. Maybe CircularDeque would work?

Adjust Custom Array Indices to Match Data Position

E.g. with OffsetArray input, we could shift the index depending on tapering and whether the window length is even or odd, keeping output properly aligned with input. Especially cool for plotting when using no tapering.

While this would be a cool feature, I don’t know how such a feature would be generic for different custom indexing types. And therefore, I‘d consider this more of a user responsibility, and more of an optional feature.

Please comment if you‘d still like it or want to contribute towards it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.