Code Monkey home page Code Monkey logo

Comments (14)

KronosTheLate avatar KronosTheLate commented on June 23, 2024 1

I am not a fan of my earlier suggestions, and I really see the value of having ultimate controll over what is calculated and filtered by, and in what order. I believe we can achieve everything we both want by the following proposal:

The proposal

  • The starting point is a function findpeaks, a name which is more consistent with the package name Peaks.jl than the current findmaxima.
  • The return of findpeaks is a named tuple (let's say we bind it to pks for later reference), with fields inds and data.
  • There will be a set of functions that calculate and filter by features, such as peakproms!, peakheights!, peakwidths!, peakseps!.
  • Calling e.g. peakproms! on pks the then adds a field proms by Base.merge, and if min and/or max keywords are set, the contents of the named tuple will be filtered accordingly.
  • The min and max keywords will be consistent for the feature calculating/filtering functions.

This all means that the named tuple will in essence just bundle up the contents to facilitate passing the right things into the functions that filer by and calculate features. This should add negligible overhead and require no new types, while simplifying the API significantly and keep ultimate control over order of filtering. The filtering/calculating functions can also still have strict set true or false at each step for ultimate controll, and the fields and functions are consistent: peakproms adds the field proms, peakwidths adds widths (along with leftedge and rightedge), etc. The filtering/calculating functions all follow the same prefix in their name, and almost the same methods: Taking the named tuple pks as the first argument, and filtering by min and max, with only minor differences in potential additional keywords.

To take it one step further, we could make it so that if a named tuple is not passed to the filtering/calculating functions as the first argument, a function is returned that performs the action of adding the feature and filtering. This would allow easy chaining of operations:

pks = findpeaks(signal) |> peakproms!(min=3) |> peakwidths!(max=2)

That is almost as elegant as a single function with keyword arguments, but without giving up any control, performance, and without redundant calculations - all the strengths of the current API, as I see it.

A key benefit of this approach is also that you do not need to consult the docs every time to check the order of the e.g. 4 returned vectors of peakwidths, instead relying on named fields of a named tuple. There is no overhead as far as I know, and named tuples can still be indexed and destructured, almost making it a superset of the old API.

How do you feel about this approach?

Note - A function sortpeaks! would be nice to have. A rough draft is something like

"""
    sortpeaks!(pks, property)
    sortpeaks!(pks, property; inds, rev)

Sort the peaks `pks` by the property `property`. 
To discard some of the elements of the sorted peaks, supply the keyword argument `inds` to 
specify which indices in the sorted peaks you want to keep. For more detail, see `partialsortperm`, which is used internally.

To sort in reverse (descending order), set the keyword argument `rev` to `true`.
"""
function sortpeaks!(pks::NamedTuple, property::Symbol; inds=eachindex(first(pks)), rev::Bool=false)
    # check if pks has field `property`
    mask = partialsortperm(pks.property, 1:n; rev)
    for i in eachindex(pks)
        pks[i] = pks[i][mask][end-n+1 : end]
    end
    return pks
end

This function would allow to only keep the 5 widest peaks, or the 3 highest peaks, which is non-trivial without it due to having multiple vectors in pks that should be synchronized.

from peaks.jl.

KronosTheLate avatar KronosTheLate commented on June 23, 2024 1

I have been unable to squeeze in the time to make a proposal for the API I am imagining. In a few days, I am leaving my computer behind for the summer (until about 2 weeks into september). It is crazy how a few hours of voluntary coding-work translates to weeks and months of calendar time...

I expect to give this a go, then, if you do not beat me to it. But I just wanted to tell you, so that you know that I will not be working on this for a long time.

from peaks.jl.

KronosTheLate avatar KronosTheLate commented on June 23, 2024 1

Yes, the vectors can be mutated. I was thinking more of adding fields to the named tuple when I talked of mutating.

I think you may be right - a certain inout type will have a known output type. The output type is different depending on if the field existed previously or not, but as that is reflected in the input type, it is likely not an issue. NamedTuples it is.

I will still be away from my computer for a month, but I do like having the exact plan ready ^_^

from peaks.jl.

halleysfifthinc avatar halleysfifthinc commented on June 23, 2024 1

No rush! I appreciate your interest and contributions.

from peaks.jl.

halleysfifthinc avatar halleysfifthinc commented on June 23, 2024

I acknowledge there are some minor inconveniences due to the API design, but I don't think the alternatives I'm aware of (e.g. a struct) would be better overall.

  • Single function with filtering by kwargs VS separate filtering functions
    • tl;dr: Separate filtering functions allow controlling the order (wrt. filtering by prominence, width, height) and behavior (strict or not) of operations, which isn't possible if the filtering is done in the main function (ie in a fixed order).
    • In my experience, there are often peaks I wish to filter/remove that have some similar characteristics (prominence, width, or height) to other peaks I wish to keep. With separate filtering functions, I can choose a filtering order/behavior that removes unwanted peaks using dissimilar characteristics before filtering by the similar characteristics (e.g. and can choose a more relaxed threshold for the similar characteristics, now that there are no bad apples that would slip through). In contrast, a combined filtering function would potentially apply the filter of similar characteristics first, and would apply the same strictness setting to all filters.
    • I'm open to a PR that adds a function with all the filtering functionality built-in, but it would need to use an existing filter order (ie following MATLAB and/or scipy, hopefully they agree).
  • Returning peaks as a struct
    • Unnecessarily complex imo, and would require a lot of care to avoid introducing a lot of extra allocations (some would be unavoidable). Working with indices directly is always going to be the simplest and most broadly composable.

redefining pks with the return values from peakproms and peakwidths to be rather weird and un-ergonomic

Providing mutating ! functions and copying, non-mutating functions, as I do with peakproms, peakwidths, etc, is a common Julian style. Having both functions offers maximum flexibility/composability.

It would also be nice to be able to provide the x-values, and get the corresponding x-values for the peaks and widths in terms of x-values, rather than having to work with indices always.

I don't understand what you mean here.

from peaks.jl.

KronosTheLate avatar KronosTheLate commented on June 23, 2024

I don't understand what you mean here.

On this point, I was hoping to be able to provide a vector of x-values, wher peak of index i would have an x-coordinate of xvals[i], and automatically get peaks width and location in terms of x-coordinated. I have however changed my mind, as I take your point that this is less efficient, and complicated the API significantly. Perhaps a design filosophy note could be added, stating that
"This package only operates with the vector containing the signal, and it's indices. Not the corresponding values of a vector containing the independent variable. This simplifies the API, and increases efficiency. If the corresponding independent variables of peaks and their widths is needed, this is best done manually outside the functions provided in this package."

Or perhaps a much shorter of a similar message. I realize that this message is heavily taylored to what I needed to hear, and that it may not be that relevant for other users.

from peaks.jl.

KronosTheLate avatar KronosTheLate commented on June 23, 2024

tl;dr: Separate filtering functions allow controlling the order (wrt. filtering by prominence, width, height) and behavior (strict or not) of operations, which isn't possible if the filtering is done in the main function (ie in a fixed order).

This is an excellent point, and one that I have not at all thought of. My current best idea of an API is a single function findpeaks that would have somewhat complicated internals, but that would allow all that we want/need in terms of functionality and API:
It would take a keyword argument filters, a named tuple that specifies filtering options (minprom, maxprom, window (w), minwidth, maxwidth, minheight, maxheight). The order of the filters would determine the order of the filtering to allow full control. I then imagine relheight and strict to be seperate keyword arguments.

The return value could be a named tuple, containing (peakkinds, widths, proms, heights, widths, leftedges, rightedges). If the user does not want to spent computation resources calculating any of the quantities, specify a related filer as NaN, and the return will be be NaN for e.g. the widths, leftedges and rightedges. This would maintain the return type Vector{<:Number} in any case, maintaining type stability.

This requires only knowledge of one function, and much less code to perform filtering. This is a very rough idea, and I am very curious if you feel like some solution in this direction is at all possible, and if you see the value add I do from not having separate functions, and from not having to pass around the relevant vectors outside the functions.

from peaks.jl.

halleysfifthinc avatar halleysfifthinc commented on June 23, 2024

That sounds very functional, and would also solve my only minor gripe about the current API, which is that the order of the output arguments and input arguments prevents direct splatting peakproms into peakwidths, e.g.:

pks, proms = peakproms(argmaxima(x), y))
peakwidths(pks, y, proms)

making it a superset of the old API

👌 Nice.

I don't have time to get to this in the near future, but if you put together a PR, I will certainly review it.

from peaks.jl.

KronosTheLate avatar KronosTheLate commented on June 23, 2024

Nice. I will see if I find the time to put together a proposal soon. I will initially make the current functionality internal by prefixing _'s and not exporting, and building the new API on top of that as a first approximation.

from peaks.jl.

KronosTheLate avatar KronosTheLate commented on June 23, 2024

I have realized that adding fields to a named tuple is not possible, so mutating versions of the functions could not be made. Instead, the user will have to rebind the original variable name to the output each time a feature is added.

Also, adding fields changes the type. It is possble this makes type stability impossible, but I am not sure.

Dictionaries could solve both problems, at the cost of adding a dependance and giving up the amazing field-access syntax pks.proms, replacing it with the rather tedious `pks["proms"]. I am wondering if using a Dictionary (the benefit over Dict is that order is preserved) while fields are being added, and finally advising users to create a NamedTuple from it.

My current though is that it is better to live with unmutability and potential type instability, for the field access syntax and more simplicity.

from peaks.jl.

halleysfifthinc avatar halleysfifthinc commented on June 23, 2024

Mutating a tuple isn't possible, but same as with structs, mutable fields in a immutable struct/object can be mutated; I think that is what would be needed here? I don't believe that using a vector in a tuple copies the vector, so it should be possible to accept a NamedTuple (containing index, values, etc vectors) as an input argument, mutate those vectors, and then return a new NamedTuple (eg containing old and new data, such as peak proms) without making any copies of the original/mutated vectors?

I'm not sure type stability is a concern, since that only applies at the function return type level, and if switching to a NamedTuple design, the returned value/type would be consistent for a given function/input eltype.

from peaks.jl.

KronosTheLate avatar KronosTheLate commented on June 23, 2024

I have a draft up and running at https://github.com/KronosTheLate/Peaks.jl/tree/api_rework !

It is still missing a lot of detail, which I tried to collect into a "ToDo" section in the README. Most notably, I have just hidden the old functions by prefixing a "_", and called them internally in the new functions. I do however feel like the user-facing functions work as I imagine, and feel ready to discuss the proposed API with you (item 1 in the ToDo). If/when we get the API right, we move on to the following bullet points, and after that I believe the API will be reworked into something better!

Some example code to get started:

julia> using Peaks   # Using the api_rework branch

julia> x = Float64[1, 2, 3, 5, 5, 0, 2, 0, 8, 9, 8, 9, 0];  # If `x` is a vector of eltype int, I get an error. Should be looked into

julia> function test1(x)
           pks = findpeaks(x)
           pks = peakproms!(pks, min=0.5)
           pks = peakwidths!(pks)
           pks = peakheights!(pks)
           return pks
       end
test1 (generic function with 1 method)

julia> function test2(x)
           pks = findpeaks(x) |> peakproms!(min=0.5) |> peakwidths!() |> peakheights!()
           return pks
       end
test2 (generic function with 1 method)

julia> test1(x)
(data = [1.0, 2.0, 3.0, 5.0, 5.0, 0.0, 2.0, 0.0, 8.0, 9.0, 8.0, 9.0, 0.0], inds = [4, 7, 10, 12], proms = [4.0, 2.0, 1.0, 1.0], widths = [2.4000000000000004, 1.0, 1.0, 0.5555555555555554], edges = [(3.0, 5.4), (6.5, 7.5), (9.5, 10.5), (11.5, 12.055555555555555)], heights = [5.0, 2.0, 9.0, 9.0])

julia> values(test1(x)) .== values(test2(x))
(true, true, true, true, true, true)

from peaks.jl.

halleysfifthinc avatar halleysfifthinc commented on June 23, 2024

That example looks promising! Please open a draft PR so that I can review/add comments & suggestions.

Ideally, I'd prefer to have both interfaces (the old and the new) available if possible (for backwards compatibility, etc).

from peaks.jl.

KronosTheLate avatar KronosTheLate commented on June 23, 2024

Please open a draft PR

I am a little busy these days with school, and https://juliapackagecomparisons.github.io/. I will get around to it at some point ^_^

from peaks.jl.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.