Comments (11)
👍 Cool... this is certainly the type of thing I was thinking of.
It would also be great to add some "plotting recipes" that work with various stats. For example, it would be really cool to be able to do a pca/svd on a scatter plot and add overlays like the image from wikipedia:
from onlinestats.jl.
I was just peaking through your plotmethods code... what do you think about adding a Function keyword arg ontrace::Function
to the tracefit!
method, and changing it to call the function instead of adding to and returning an array of values:
function tracefit!(o::OnlineStat, b::Integer, data...; batch::Bool = false, ontrace::Function = nop)
b = @compat Int(b)
n = nrows(data[1])
i = 1
s = state(o)
#result = [copy(o)]
while i <= n
rng = i:min(i + b - 1, n)
batch_data = map(x -> rows(x, rng), data)
batch ? updatebatch!(o, batch_data...) : update!(o, batch_data...)
#push!(result, copy(o))
ontrace(o)
i += b
end
#result
return
end
# then implement the current functionality like (untested code):
o = Mean()
result = Mean[]
tracefit!(o, 1, rand(10); ontrace = o->push!(result, copy(o)))
The advantage here is that you can use the same tracefit!
method to add to a plot or do something else other than return a bunch of copies of an object.
Although, I think this interface could be cleaned up further, so maybe add a new method with this functionality for now?
from onlinestats.jl.
Since my end goal was making plots, I like this idea.
b
should be changed to a keyword argument. It just looks so interrupting. In general tracefit!
needs a rewrite.
from onlinestats.jl.
What about something like this? It updates an OnlineStat (with all of data
by default, but you can pass a different batch size), and calls a function after each update.
I think it could replace onlinefit!
, tracefit!
, and traceplot!
. EDIT: Maybe not replace traceplot!
, but traceplot!
would call this function.
function update_do!(o::OnlineStat, data...;
b::Integer = size(data[1], 1),
dothis::Function = x -> nothing,
batch::Bool = false
)
b = @compat Int(b)
n = size(data[1], 1)
i = 1
while i <= n
rng = i:min(i + b - 1, n)
batch_data = map(x -> rows(x, rng), data)
batch ? updatebatch!(o, batch_data...) : update!(o, batch_data...)
i += b
dothis(o)
end
end
julia> o = OnlineStats.Mean();
julia> OnlineStats.update_do!(o, randn(100), b = 50, dothis = o -> println(nobs(o)))
50
100
from onlinestats.jl.
I certainly like it more as part of the core update function, but would argue that having both update!
and update_do!
doesn't change anything. Also while we're at it, how do you feel about a name change to fit!
?
Here's a possible abstraction for callbacks. designed as functors:
julia> abstract OnlineCallback
julia> immutable DoEvery <: OnlineCallback
b::Int
f::Function
end
julia> cb = DoEvery(5, o->println("callback for $o."))
DoEvery(5,(anonymous function))
julia> Base.call(cb::DoEvery, i::Integer, o) = mod1(i,cb.b)==cb.b ? cb.f(o) : nothing
call (generic function with 1501 methods)
julia> for i in 1:20
cb(i,i^2)
end
callback for 25.
callback for 100.
callback for 225.
callback for 400.
Then a rough draft for altering the method, assuming we can add a parameter which defines whether a stat should update batch or not:
immutable NoCallback <: OnlineCallback end
Base.call(cb::NoCallback, args...) = nothing
abstract BatchMode
immutable Batch <: BatchMode end
immutable Online <: BatchMode end
function fit!(o::OnlineStat{Online}, x, y;
onupdate::Function = NoCallback())
for i in 1:size(x,1)
fit!(o, row(x,i), row(y,i))
onupdate(i, o)
end
end
function fit!(o::OnlineStat{Batch}, x, y;
onupdate::Function = NoCallback(),
b::Integer = default_batch_size(o))
n = size(x, 1)
i = 0
while i < n
rng = i+1:min(i + b, n)
batchfit!(o, rows(x,rng), rows(y,rng))
i += b
onupdate(i, o)
end
end
from onlinestats.jl.
It would be nice to extend fit!
. I think update!
is a better verb, (fit a variance doesn't sound right), but maybe it's too general and prone to name conflicts. I could go either way.
I'm worried that OnlineCallback/BatchMode is too elegant. With a few small changes to update_do!
(including a better name), it seems like all that functionality could be included.
Do we just need a more general update!
method that incorporates callbacks and batch updates?
from onlinestats.jl.
It's always bugged me that there are different verbs for batch update vs singleton updates. Singleton updates are just a batch size of 1, no?
I mostly agree with you about fit vs update. I'm just trying to reconcile the verbs here with the ones that may come out of JuliaML/LossFunctions.jl#3. I really want to be able to chain these things together into a pipeline, and having different verbs (which mean essentially the same thing) will ruin that. Maybe the primary verb there should be update
as well (could even consider leaving off the !
since the verb is clear that it mutates.), and just define:
module LearnBase
update(o, args...) = @not_implemented
transform(o, args...) = @not_implemented
const fit = update
const solve = update
const predict = transform
end
Any anyone can choose the one that feels natural to them. Although maybe we should actually reference any verbs from StatsBase instead.
from onlinestats.jl.
Personally I dislike fit
as well. It doesn't sound right for many situations. I think it's just an artefact for historic reasons that originated from JuliaStats. To me the most natural would be train
.
But from a community perspective I think that it's probably best if we just adopt the language standard of Julia. Using synonyms seems like an ugly solution and just makes the code less readable for outsiders. Let's not make it more complicated than it needs to be and just use fit
if something learns from data.
from onlinestats.jl.
Thanks for the link! A unified framework is enough to sell me on fit!
.
from onlinestats.jl.
Great to have you on board. There is a lot to learn from your package and I think we are at a good point in time to attempt such a design coordination. I think combined we cover enough of a scope so that others would likely follow (provided we succeed in this endeavour which I am confident about)
from onlinestats.jl.
#42 is relevant. All plotting methods should go in the plots.jl file which gets included with Requires.
from onlinestats.jl.
Related Issues (20)
- MovingWindow raises a BoundsError HOT 2
- Possible type instability in `OnlineStatsBase.jl` HOT 1
- Group with 3 Stats not working for multi-observations? HOT 3
- Julia VS Code extension reports "Possible method call error" for `fit!` HOT 3
- _fit! on AutoCov is not type stable HOT 1
- Extract field of an observation before feeding an OnlineStats - ValueExtractor wrapper HOT 2
- Feature Request: OnlineStat Chaining HOT 1
- Using StatLag without depending on OnlineStats (just OnlineStatsBase) HOT 4
- ExtremeValues doesn't work HOT 2
- Odd interaction of `Group` with broadcast HOT 2
- [speculative] `NullStat` HOT 1
- Plot of GroupBy of HeatMap fails
- when fit!-ing a Group to a NamedTuple, the names are ignored HOT 2
- Documentation Request: List which Monoids support merge
- Feature Request: PCA wrapper around CovMatrix which also supports transform methods
- Pretty printing is unpretty inside DataFrame
- Support `keys` and `values` on `GroupBy` HOT 1
- Bug: Y-Marginals for heatmap are wrong
- Allow counts argument in `fit!` HOT 5
- Suggestions for OnlineStats v2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onlinestats.jl.