Code Monkey home page Code Monkey logo

Comments (11)

tbreloff avatar tbreloff commented on May 18, 2024

👍 Cool... this is certainly the type of thing I was thinking of.

It would also be great to add some "plotting recipes" that work with various stats. For example, it would be really cool to be able to do a pca/svd on a scatter plot and add overlays like the image from wikipedia:

image

from onlinestats.jl.

tbreloff avatar tbreloff commented on May 18, 2024

I was just peaking through your plotmethods code... what do you think about adding a Function keyword arg ontrace::Function to the tracefit! method, and changing it to call the function instead of adding to and returning an array of values:

function tracefit!(o::OnlineStat, b::Integer, data...; batch::Bool = false, ontrace::Function = nop)
    b = @compat Int(b)
    n = nrows(data[1])
    i = 1
    s = state(o)
    #result = [copy(o)]
    while i <= n
        rng = i:min(i + b - 1, n)
        batch_data = map(x -> rows(x, rng), data)
        batch ? updatebatch!(o, batch_data...) : update!(o, batch_data...)
        #push!(result, copy(o))
        ontrace(o)
        i += b
    end
    #result
    return
end

# then implement the current functionality like (untested code):
o = Mean()
result = Mean[]
tracefit!(o, 1, rand(10); ontrace = o->push!(result, copy(o)))

The advantage here is that you can use the same tracefit! method to add to a plot or do something else other than return a bunch of copies of an object.

Although, I think this interface could be cleaned up further, so maybe add a new method with this functionality for now?

from onlinestats.jl.

joshday avatar joshday commented on May 18, 2024

Since my end goal was making plots, I like this idea.

b should be changed to a keyword argument. It just looks so interrupting. In general tracefit! needs a rewrite.

from onlinestats.jl.

joshday avatar joshday commented on May 18, 2024

What about something like this? It updates an OnlineStat (with all of data by default, but you can pass a different batch size), and calls a function after each update.

I think it could replace onlinefit!, tracefit!, and traceplot!. EDIT: Maybe not replace traceplot!, but traceplot! would call this function.

function update_do!(o::OnlineStat, data...;
        b::Integer = size(data[1], 1),
        dothis::Function = x -> nothing,
        batch::Bool = false
    )
    b = @compat Int(b)
    n = size(data[1], 1)
    i = 1
    while i <= n
        rng = i:min(i + b - 1, n)
        batch_data = map(x -> rows(x, rng), data)
        batch ? updatebatch!(o, batch_data...) : update!(o, batch_data...)
        i += b
        dothis(o)
    end
end
julia> o = OnlineStats.Mean();

julia> OnlineStats.update_do!(o, randn(100), b = 50, dothis = o -> println(nobs(o)))
50
100

from onlinestats.jl.

tbreloff avatar tbreloff commented on May 18, 2024

I certainly like it more as part of the core update function, but would argue that having both update! and update_do! doesn't change anything. Also while we're at it, how do you feel about a name change to fit!?

Here's a possible abstraction for callbacks. designed as functors:

julia> abstract OnlineCallback

julia> immutable DoEvery <: OnlineCallback
           b::Int
           f::Function
       end

julia> cb = DoEvery(5, o->println("callback for $o."))
DoEvery(5,(anonymous function))

julia> Base.call(cb::DoEvery, i::Integer, o) = mod1(i,cb.b)==cb.b ? cb.f(o) : nothing
call (generic function with 1501 methods)

julia> for i in 1:20
           cb(i,i^2)
       end
callback for 25.
callback for 100.
callback for 225.
callback for 400.

Then a rough draft for altering the method, assuming we can add a parameter which defines whether a stat should update batch or not:

immutable NoCallback <: OnlineCallback end
Base.call(cb::NoCallback, args...) = nothing

abstract BatchMode
immutable Batch <: BatchMode end
immutable Online <: BatchMode end

function fit!(o::OnlineStat{Online}, x, y;
                                                     onupdate::Function = NoCallback())
    for i in 1:size(x,1)
        fit!(o, row(x,i), row(y,i))
        onupdate(i, o)
    end
end

function fit!(o::OnlineStat{Batch}, x, y; 
                                                     onupdate::Function = NoCallback(),
                                                     b::Integer = default_batch_size(o))
    n = size(x, 1)
    i = 0
    while i < n
        rng = i+1:min(i + b, n)
        batchfit!(o,  rows(x,rng), rows(y,rng))
        i += b
        onupdate(i, o)
    end
end

from onlinestats.jl.

joshday avatar joshday commented on May 18, 2024

It would be nice to extend fit!. I think update! is a better verb, (fit a variance doesn't sound right), but maybe it's too general and prone to name conflicts. I could go either way.

I'm worried that OnlineCallback/BatchMode is too elegant. With a few small changes to update_do! (including a better name), it seems like all that functionality could be included.

Do we just need a more general update! method that incorporates callbacks and batch updates?

from onlinestats.jl.

tbreloff avatar tbreloff commented on May 18, 2024

It's always bugged me that there are different verbs for batch update vs singleton updates. Singleton updates are just a batch size of 1, no?

I mostly agree with you about fit vs update. I'm just trying to reconcile the verbs here with the ones that may come out of JuliaML/LossFunctions.jl#3. I really want to be able to chain these things together into a pipeline, and having different verbs (which mean essentially the same thing) will ruin that. Maybe the primary verb there should be update as well (could even consider leaving off the ! since the verb is clear that it mutates.), and just define:

module LearnBase
  update(o, args...) = @not_implemented
  transform(o, args...) = @not_implemented

  const fit = update
  const solve = update
  const predict = transform
end

Any anyone can choose the one that feels natural to them. Although maybe we should actually reference any verbs from StatsBase instead.

from onlinestats.jl.

Evizero avatar Evizero commented on May 18, 2024

Personally I dislike fit as well. It doesn't sound right for many situations. I think it's just an artefact for historic reasons that originated from JuliaStats. To me the most natural would be train.

But from a community perspective I think that it's probably best if we just adopt the language standard of Julia. Using synonyms seems like an ugly solution and just makes the code less readable for outsiders. Let's not make it more complicated than it needs to be and just use fit if something learns from data.

from onlinestats.jl.

joshday avatar joshday commented on May 18, 2024

Thanks for the link! A unified framework is enough to sell me on fit!.

from onlinestats.jl.

Evizero avatar Evizero commented on May 18, 2024

Great to have you on board. There is a lot to learn from your package and I think we are at a good point in time to attempt such a design coordination. I think combined we cover enough of a scope so that others would likely follow (provided we succeed in this endeavour which I am confident about)

from onlinestats.jl.

joshday avatar joshday commented on May 18, 2024

#42 is relevant. All plotting methods should go in the plots.jl file which gets included with Requires.

from onlinestats.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.