firionus / fastrunningmedian.jl Goto Github PK
View Code? Open in Web Editor NEWEfficient running median for Julia
License: MIT License
Efficient running median for Julia
License: MIT License
It would be great to add a running_median!function to support pre-allocation of the outputs, because the gc time could account for 50% for a large amount of series.
Or, add support for multiple series and enable multi-threads?
tapering=:beginning_only
or tapering=:start
See the R/C implementation of runmed for reference
The Stuetzle implementation for small windows can be found at https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/src/Srunmed.c
smallwin_v2
)method = PivotCounting()
and PivotCounting <: RunningMedianAlgorithm
. This is how DifferentialEquations does it (https://diffeq.sciml.ai/stable/basics/common_solver_opts/#solver_options)I would like to replace my default rolling median (which is rolling(VectorizedStatistics.vmedian, ..)
) with a call to yours that may be post-adjusted to comport with my api.
As suggested in #10 (comment)
Check for performance regressions compared to v0.1
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
Libraries should not have a Manifest.toml (https://discourse.julialang.org/t/does-manifest-toml-belong-in-the-repository/12029)
While the logarithmic scale is great, I think we could remove the big O lines, they don't really add much. Also, the libraries should be offset so one can judge the error bars (violin distributions). This probably also means limiting the amount of window sizes somewhat more.
Since #10 allows OffsetArrays, these should be tested to work properly.
Also, we should test for AbstractSparseVector that don't start at 1, aren't regularly spaced, etc.
for library, test, docs and benchmark
Split from #21
The algorithm might benefit from accepting more data types, like
There are many missing/corrupted data in real world cases, adding support for NaN in inputs would be great.
See https://stackoverflow.com/a/10696252
Is there literature on that? Could it be faster? Does it allow for arbitrary percentiles like SortFilters?
I already found some spurious allocations due to an accidental Any return type
Currently grow!, shrink! and roll! return the new median. This may not be needed in all cases and results in performance overhead of acessing the field or calculating the mean of the two top values on the heaps.
shrink!
followed by grow!
Also, the index variable j
for the output array could be replaced by a stateful iterator like for the input values.
https://github.com/JuliaFolds/Transducers.jl
Probably not the biggest priority, but would be nice to integrate with bigger ecosystem
E.g. with OffsetArray input, we could shift the index depending on tapering and whether the window length is even or odd, keeping output properly aligned with input. Especially cool for plotting when using no tapering.
While this would be a cool feature, I don’t know how such a feature would be generic for different custom indexing types. And therefore, I‘d consider this more of a user responsibility, and more of an optional feature.
Please comment if you‘d still like it or want to contribute towards it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.