Discussions related to the future of Machine Learning in Julia
juliaml / meta Goto Github PK
View Code? Open in Web Editor NEWDiscussions related to the future of Machine Learning in Julia
License: MIT License
Discussions related to the future of Machine Learning in Julia
License: MIT License
It seems like the name change to JuliaLearn didn't change the URL. Which name do people prefer? Is it worth the hassle?
Maybe it's because I'm unsure of what's going into ObjectiveFunctions.jl, but do we want a Penalties.jl? My thinking is that penalty implementations are rather straight-forward and we could get something into Metadata sooner than if they are attached to ObjectiveFunctions, which may take more thought to get right.
Side note: My summer work is ending, so I'll actually start contributing soon. I'd be happy to work on penalties, wherever they live.
Hi ML gurus,
Forgive me if this is too forward, but I'd like to share an observation and make a request.
One thing that I think Julia needs is a robust, easy-to-use machine learning toolkit focused on end users who are doing data analysis but aren't necessarily ML experts. I'm one of these people. For example, I need to run classification and logistic regression on a set of data to see whether a hypothesis I have is valid. I've taken some ML and stats courses but am nowhere near an expert in these fields, and frankly, the existing Julia packages seem to be geared towards ML researchers, not users. I can't really find what I need in the multiple packages within the JuliaML organization.
As a result I'm having to use sklearn to do my data analysis, which makes me sad as a Julia evangelist. Sklearn has everything I need in a toolkit, though: it's easy to understand and to use, offers some customization (but not so much that I'm tearing my hair out trying to understand the options), and gives me reasonably-fast results that my colleagues seem to understand.
My request is this: could there be some effort made to come up with something like sklearn in native Julia? I make this request knowing full well that I can't contribute much other than ideas and feedback, but it sure would make things easier for those of us who just need to do some data analysis without knowing about or using the most cutting-edge ML algorithms, and want to do it in Julia.
@tbreloff Would there be any interest in moving MachineLearningMetrics.jl into JuliaML? I would even consider renaming it MLMetrics, which is more succinct.
I'm sitting here with @bisraelsen at NIPS, and discussing the need for good Bayesian Optimization and Gaussian Processes. Some quick thoughts:
I'm just putting this issue out there so we don't forget about the conversation! Anyone with interest/thoughts, please post here.
cc: @nowozin
Can we start prototyping the MLModels
package? Did we decide that this package would contain both definitions of Transformations and implementations of deriv
, prox
, grad
, etc.?
I put a prototype repo here: https://github.com/ahwillia/MLTransformations.jl
(I would be happy to prototype within JuliaML, but wasn't sure whether you would prefer me start elsewhere an migrate packages in once they are a bit more mature. I imagine I will end up deleting this repo or at the least re-naming it.)
High-level notes:
apply
/apply!
rather than transform
/transform!
but I'm not attached to this. My reasoning was (a) apply
is shorter, and (b) it frees you to say transform = MyTransformation(); apply!(transform, data)
which seems natural to me.fit_apply
and fit_apply!
which is similar to SciKitLearn APIinvert!(transform,data)
This is a flavor:
type IdentityTransform <: InvertibleTransformation end
type LogTransform <: InvertibleTransformation end
type ExpTransform <: InvertibleTransformation end
type LogisticTransform <: InvertibleTransformation end
type LogitTransform <: InvertibleTransformation end
apply!(::IdentityTransform, x) = x
apply!(::LogTransform, x) = map!(log,x)
apply!(::ExpTransform, x) = map!(exp,x)
invert!(::IdentityTransform, x) = x
invert!(::LogTransform, x) = map!(exp,x)
invert!(::ExpTransform, x) = map!(log,x)
get_inverse(::IdentityTransform) = IdentityTransform()
get_inverse(::LogTransform) = ExpTransform()
And for a "Learnable Transformation":
type Standardize{T<:Real} <: LearnableTransformation
shift::ShiftTransform{T}
scale::ShiftTransform{T}
end
function fit!{T}(transform::Standardize{T}, x)
transform.shift = mean(x)
transform.scale = one(T)/std(x)
end
function apply!(transform::Standardize, x)
apply!(transform.shift, x)
apply!(transform.scale, x)
end
function invert!(transform::Standardize, x)
invert!(transform.scale, x)
invert!(transform.shift, x)
end
Thoughts on whether this is too verbose? I'm just brainstorming here.
This has been discussed repeatedly, but it's important to get right if we want widespread adoption. Some references:
JuliaML/LossFunctions.jl#12
JuliaML/LossFunctions.jl#3
JuliaNLSolvers/Optim.jl#87
JuliaStats/Roadmap.jl#15
JuliaStats/Roadmap.jl#4
JuliaStats/Roadmap.jl#20
(there are more linked in those issues, and I'm sure I missed a bunch of good conversations)
I recommend a quick skim over those discussions before commenting, if you can find the time.
It's important to remember all the various things we'd like to support with the core abstractions, so we can evaluate when a concept applies and when it doesn't:
And there are some opposing perspectives within these classes:
All verbs need not be implemented by all transformations, but when there's potential for overlap, we should do our best to generalize.
The generalization here is that the object knows how to produce y
in y = f(x)
. This could be the logit function, or a previously fitted linear regression, or a decision tree. Options:
I continue to be a fan of transform
, with the caveat that we may wish to have the shorthand such that anything that can transform can be called as a functor.
I think using Base.rand here is generally going to be fine, so I don't think we need this as one of our core verbs.
I've started leaning towards learn
, partially for the symmetry with LearnBase
, but also because it is not so actively used in either stats (fit) or ML (train), and so could be argued it's more general.
I think solve
/optimize
should be reserved for higher-level optimization algorithms, and update
could be reserved for lower-level model updating.
I personally feel everything should be a Transformation
, though I can see the argument that aggregations, distributions and others don't belong. A mean
is a function, but really it's a CenterTransformation
that uses a "mean function" to transform data.
Can a transformation take zero inputs? If that's the case, then I could argue a generative model might take zero inputs and generate an output, transforming nothing into something.
If we think of "directed graphs of transformations", then I want to be able to connect a Normal distribution into that graph... we just have the flexibility that the Normal distribution can be a "source" in the same way the input data is a "source".
With this analysis, AbstractTransformation
is the core type, and we should make every attempt to avoid new types until we require them to solve a conflict.
There are many things that we could query regarding attributes of our transformations:
I would like to see these things eventually implemented as traits, but in the meantime we'll need methods to ask these questions.
I think we agree that LearnBase will contain the core abstractions... enough that someone can create new models/transformations/solvers without importing lots of concrete implementations of things they don't need.
We need homes for concrete implementations of:
StatsBase contains a ton of assorted methods, types, and algorithms. StatsBase is too big for it to be a dependency of LearnBase (IMO), and LearnBase is too new to expect that StatsBase would depend on it. So I think we should have a package which depends on both LearnBase and StatsBase, and "links" the abstractions together when it's possible/feasible. In some cases this might be as easy as defining things like:
StatsBase.fit!(t::AbstractTransformation, args...; kw...) = LearnBase.learn!(t, args...; kw...)
What are the other packages that we should consider linking with?
cc: @Evizero @ahwillia @joshday @cstjean @andreasnoack @cmcbride @StefanKarpinski @ninjin @simonbyrne @pluskid
(If I forgot to cc someone that you think should be involved, please cc them yourself)
Hi there,
I'm actively developing and maintaining a package on Bayesian nonparametrics in julia:
BayesianNonparametrics.jl
which was presented at a workshop at NIPS in 2015 and actively develop a package on the deep probabilistic model sum-product networks:
SumProductNetworks.jl
which I constantly extend and work on with several other colleges.
Is there an interest to include these packages / or one of them into the JuliaML base? Both packages implement recent ML techniques and will continued to be maintained.
Cheers,
Martin
I'm starting to work on a framework for reinforcement learning in https://github.com/tbreloff/Reinforce.jl, and I was thinking it would be nice to include the core abstractions in LearnBase so that I could have standalone environments or agents without requiring a dependency on Reinforce.jl.
I propose adding this to LearnBase:
abstract AbstractEnvironment
abstract AbstractAgent
# `r, s, A = observe!(env)` should return `(reward, state, actions)`
# Note: most environments will not implement this directly
function observe!(env::AbstractEnvironment)
reward!(env), state!(env), actions(env)
end
# `r = reward!(env)` returns the current reward, optionally updating it first
function reward end
function reward! end
# `s = state!(env)` returns the current state, optionally updating it first
function state end
function state! end
# `A = actions(env)` returns a list/set/description of valid actions
function actions end
# `a = action(agent, r, s, A)` should take in the last reward `r`, current state `s`,
# and set of valid actions `A`, then return an action `a`
function action end
Any thoughts/objections?
I'd like to maintain a "best of breed" list of the most mature/maintained packages to accomplish various tasks (decision trees, data handling, post-fit analysis, or whatever else). Ideally we could also produce a "workbench" type of package which re-exports lots of good packages, pylab-style. Someone could do:
Pkg.add("MLWorkbench")
using MLWorkbench
and have all the best vetted packages ready to go.
Would there be any interest in moving OpenAIGymAPI.jl into JuliaML? Instead of calling OpenAI Gym directly it provides a wrapper around their API.
I'm thinking about moving MLPlots here. The idea is that MLPlots is a collection of recipes for ML visualization, and recipes are loaded dynamically (with the Requires package) based on what other packages you are using. As an example, you can visualize a neural net state in OnlineAI, or plot the history from the ValueHistories package.
Any reservations/issues with the move? Please keep the package in mind when you're building visualizations that others might want to use.
I think we can register LearnBase early, while we're still working through the design of the rest of JuliaML, as then we can add it to the REQUIRE files. The reason not to register is primarily if we think the design will change drastically from what it is now, which I don't expect. Thoughts?
I was thinking about the structure of JuliaML today. I don't foresee that I'll have enough bandwidth to properly complete the "tom branch on Transformations", and I'd really like to fill in the missing pieces of JuliaML. Losses seems pretty much done to me... nice work guys, and thanks @Evizero for getting LearnBase registered.
I think Transformations could revert to being a relatively small package which defines the forward (value) and backward (deriv) methods/types for some common ML functions: logit, softsign, relu, etc, as well as some more complex transformations: affine, convolutions, pooling, ANN layers/nets, etc.
I'm still unconvinced that this should be separate from Losses... but we can always combine later. This houses "losses on parameters". Does it include anything else?
I think this could reexport Losses, Penalties, and Transformations, and provide some conveniences for dealing with common functions of all of them. An empirical risk minimization convenience would live here, for example.
I also think it would be valuable to add a dependence on one of the automatic differentiation libraries and make it easy to build a differentiable graph of the components from these packages.
I feel like this can be our playground for "what we wish for Optim". Maybe eventually the efforts can be unified properly. I want to have conveniences for iterating, helpers for gradient calcs (Adam, etc), helpers for hyperparameter tuning/adaption, etc. I'd like to use the IterationManagers interface, and expand on some of the discussions there around composable managers/states.
The meta-package. This should install and set up the packages listed above, and probably a few more that are common: Plots/PlotRecipes, ValueHistories, MLMetrics, etc. One should be able to:
Pkg.add("Learn")
using Learn
and the entire JuliaML ecosystem is ready.
I was asked today how one would start contributing to JuliaML and if there is a list somewhere. This is my attempt to start such a list.
Because we are evolving as we go along, it must be quite difficult for a potential new contributor to see what is going on and where to start. Especially since our packages have discussion threads that must take hours to read. Yet, we have functionality that is clear and stable enough to offer tasks that are appropriate as a first contact. It seems like a good idea to me to list them at a central place. I think the following entries should be self explaining to what I have in mind.
Please post issues that you think would fit into this list. cc: @tbreloff @joshday @ahwillia
This package is intended to provide a fast backend for computing loss functions. As such, we would like it to be as complete as possible; even if that means including Losses that are seldom utilized in practice.
Penalties/regularization (i.e. Ridge and LASSO) for machine learning. This may be merged with Losses.jl at some point.
Penalties currently implemented are Ridge (L2Penalty), Lasso (L1Penalty), elastic net, and SCAD.
Tons of help needed
I want to keep a running list of either machine learning packages in Julia that are being actively developed/maintained. My reasoning is that (a) we have a lot to learn from each others efforts, and (b) it would be great to coordinate efforts going forward.
Part (b) is of course a bit tricky - having packages that are "standalone" is desirable in many ways. But our vision is that it is also desirable to have these packages (or at least a subset of them) play nicely with each other. The hope is that Learn.jl will provide a consistent API with backends to different optimization and machine learning packages. We'd like the scope to be as broad as possible, while still being simple enough to be useful and maintainable.
Note to collaborators: please update and edit this list as you see fit. Also consider cc-ing the authors of those packages so that they are aware.
Note to others: we'd love to hear your thoughts and updates on your latest projects. We're happy to add your package to this list, please get in touch with us! https://gitter.im/JuliaML/chat
These should help guide the development of Transformations.jl
Note: There are also a lot of excellent statistical modeling packages (e.g. MultivariateStats.jl, and others). I view these packages as being a slightly different focus from this project -- namely they provide a library of canned methods with a consistent API. I tried to pick packages that aim to create an internal _framework_ for specifying and fitting a large class of models.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.