The fastrunningmedian.jl from firionus

Add support for output pre-allocation

It would be great to add a running_median！function to support pre-allocation of the outputs, because the gc time could account for 50% for a large amount of series.
Or, add support for multiple series and enable multi-threads?

Move Tapering Examples to Open Format and Make Public

Reduce size of final library

Do we ship the benchmark Jupyter Notebook? That could be in its own branch.
What about the test fixtures? Can we not ship them? Or reduce their size somehow?

Add tapering at beginning only for compatibility with RollingWindows.jl

tapering=:beginning_only or tapering=:start

Implement
Test
Docstrings
Update Spreadsheet

Add fast implementation for small window sizes

See the R/C implementation of runmed for reference

The Stuetzle implementation for small windows can be found at https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/src/Srunmed.c

ToDo

Initial implementation with Stuetzle-like searching for updated median or lower/upper (see branch smallwin_v2)
Change partialsort-calls to pivot counting
Refactor Pivot Counting more clearly
Create higher level API that iterates over array
Fuzz tests like for double heap (including things like rand(1:2) or rand(1:3) with different window sizes like 1, 2, 3, 4, 5, ... to test for certain edge cases)
Benchmark against SortFilters, our own ArrayFilter implementation (?), SkipList implementation (?), DoubleHeap implementation, R implementation (see #20)
Formulize pivot counting approach mathematically and explain algorithm better
- Example: To be a valid median, certain counting conditions (formulate nicely) need to hold. If they don't, the next smaller/bigger element is the new median.
Bind into existing high level API (without breaking it)
- Use something like method = PivotCounting() and PivotCounting <: RunningMedianAlgorithm. This is how DifferentialEquations does it (https://diffeq.sciml.ai/stable/basics/common_solver_opts/#solver_options)
Whole lot of QA stuff, Refactoring

nice work, may I use it

I would like to replace my default rolling median (which is rolling(VectorizedStatistics.vmedian, ..)) with a call to yours that may be post-adjusted to comport with my api.

Re-run Benchmarks

As suggested in #10 (comment)

Check for performance regressions compared to v0.1

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

Move how_to_huild_docs to Markdown and Improve Formatting

Remove Manifest.toml

Libraries should not have a Manifest.toml (https://discourse.julialang.org/t/does-manifest-toml-belong-in-the-repository/12029)

New Benchmark Comparison with Logarithmic Violin Plots

While the logarithmic scale is great, I think we could remove the big O lines, they don't really add much. Also, the libraries should be offset so one can judge the error bars (violin distributions). This probably also means limiting the amount of window sizes somewhat more.

Add Aqua.jl

https://github.com/JuliaTesting/Aqua.jl

Add Test Cases for OffsetArrays

Since #10 allows OffsetArrays, these should be tested to work properly.

Also, we should test for AbstractSparseVector that don't start at 1, aren't regularly spaced, etc.

Update Dependencies in Main and Test

Separate reset! and grow!

Update Aqua.jl

Update Dependencies

for library, test, docs and benchmark

Accept More Element Data Types

Split from #21

The algorithm might benefit from accepting more data types, like

Union{Float64, Missing} (handle like NaN)
DateTime
Unitful (does it work already?)
Other ideas?

Support for NaN input

There are many missing/corrupted data in real world cases, adding support for NaN in inputs would be great.

Explore Skip List Algorithm

See https://stackoverflow.com/a/10696252

Is there literature on that? Could it be faster? Does it allow for arbitrary percentiles like SortFilters?

Use JET.jl to Look for Type Warnings

I already found some spurious allocations due to an accidental Any return type

Don't return updated median from stateful API

Currently grow!, shrink! and roll! return the new median. This may not be needed in all cases and results in performance overhead of acessing the field or calculating the mean of the two top values on the heaps.

Make Tests Deterministic

Move random elements to fixture files
Maybe check for some performance improvements

Allow roll! with non-full window

Remove error and replace with workaround of shrink! followed by grow!
Test
Benchmark
Do efficient implementation of it. Will require switching from CircularBuffer to something else that can be circular at less than full capacity without performance hit. Maybe CircularDeque would work?

Please comment if you‘d still like it or want to contribute towards it.

firionus / fastrunningmedian.jl Goto Github PK

fastrunningmedian.jl's People

Contributors

Stargazers

Watchers

Forkers

fastrunningmedian.jl's Issues

ToDo

Recommend Projects

Recommend Topics

Recommend Org