Comments (15)
Thanks for the feedback everyone, the project seems quite well thought out. Perhaps a blogpost will help garner cooperation from the broader Julia community.
Glad to know many of these things are or have been considered.
@fkiraly , regarding PPL integration, I did mean primarily for prediction and in that vein the reference you linked seems quite intriguing.
@ablaom When you reconsider statsmodels, please do note that a monster PR by @kleinschmidt is pending which will provide far more expressiveness and composability to the modeling interface.
from mlj.jl.
@quinn Thanks indeed for those details. After taking another look, I have switched MLJ from Query to Tables, which is serving our needs well for now.
from mlj.jl.
Thanks for your reply. That all makes sense.
Re ppl, it would be nice to be able to wrap a turing.jl model for use in a pipeline without having to expose learnable parameters. Pymc3.has a scikitlearm wrapper to do that.
Also regarding Julia ml, it was initially meant as a framework for complex ml pipelines with rich abstractions and nested models as transformation . Here is some info to that effect : https://github.com/JuliaML/LearnBase.jl/blob/master/src/LearnBase.jl
And @Evizero can say more
from mlj.jl.
I think I more or less agree with @tlienart , some other comments:
- probabilistic programming is essentially the modern, object oriented formulation of Bayesian MCMC (sometimes variational too). As such, it falls into the category of a method class, within the Bayesian framework. Bayesians are, in contemporary practice, not very clear on the separation of fitting versus model application (e.g., prediction) vs evaluation that is reflected in sklearn's fit/predict design, while the PP world is extremely advanced in modular/abstract model building.
Where the task is prediction, the two can be made to fit together, see e.g., the discussion in section 1.3.4 of
https://arxiv.org/abs/1801.00753
for more details on my opinion on the issue, though it is not immediate as the sklearn an common Bayesian modelling interface designs are differently focused (on model class specific model building). I'd see
https://github.com/alan-turing-institute/skpro
as a design study in how an interface could look like, it is conditional on a probabilistic learning interface which sklearn does not have but MLJ now does :-)
I agree with @tlienart that this is an exciting research area (with direct real world applications, e.g., finance or medicine) - though I disagree with the conclusion, I think that makes it especially worthwhile to work on! Though maybe not as one of the current MLJ priorities for pragmatic reasons.
- We've discussed abstract model specification and model description in #22 - I think the high-tech solution would synergize really well with a formula specification interface such as in statsmodels or R. However, again it might not be a priority given limited resources...
Generally, we're always happy for suggestions/designs/contributions!
The Turing is also always looking for competent people, please consider formally applying here:
https://www.turing.ac.uk/work-turing/research-associates
https://www.turing.ac.uk/work-turing/research-assistants
(part time arrangements are possible)
from mlj.jl.
Anthony or Franz will probably give a more detailed and better answer but here's my take on your points from what I understand of the MLJ plan:
- unless I misunderstand it, JuliaML ecosystem does not provide something that offers composable models which is the core of the MLJ idea (another way to frame it is: sklearn pipelines but much better). However JuliaML provides tools that could be used in MLJ such as metrics which may replace some of the things that are in MLJ.
- I don't think there's a plan to integrate probabilistic programming in MLJ and I also don't think it would be a good idea imo given the philosophy is very different and the use case as well. AFAIK PPL are more in the realm of research whereas MLJ would hope to be a practical ml library that can be used for real applications and ideally work on heterogeneous/distributed architectures.But it could be good to interface with Turing
- Automatic HP tuning is definitely something that will be looked into afaik.
- I think that's already the plan
- An interface to statsmodel is probably a good idea
- cool!
- also a good idea though I think it's probably too early at this point
Ps: you may want to join the slack channel if you haven't already ๐
from mlj.jl.
Here's the right link https://github.com/JuliaML/Transformations.jl and http://www.breloff.com/transformations/
from mlj.jl.
Renamed the topic to be more descriptive of its content.
from mlj.jl.
Many thanks @datnamer for your comments and enthusiasm! Just a few things to add to the other comprehensive responses:
-
MLJ already has a flexible API for building learning networks beyond simple linear pipelines, for exporting them as stand-alone models, and tuning their nested parameters. While "architecture" search should be possible, the immediate priority would to improve usability of the existing interface, for example, by providing standard architectures (linear pipelines, stacks, etc) out-of-the-box, and to add to the existing tuning strategies (to include AD gradient descent for pure julia models).
-
MLJ is indeed attempting to be "data agnostic" and there are two generic tabular data interfaces we have looked at: the Tables.jl interface you refer to, and the Query.jl iterable tablets interface (defined in TableTraits.jl). @ayush1999 and I have played around with these but it is still not absolutely clear to me which is this best. At the moment we are using iterable tables, although we currently have a small intermediate interface in MLJBase that could allow us to change our minds later. An important requirement is integration with CategoricalArrays.jl; some other requirements have been discussed here, another here. What we have now works but could be improved.
-
Will have another look at Statsmodels.jl
There are only a few models for which the MLJ interface has been implemented, and a priority is to implement existing the MLJ framework for new models. Any help in this area is particularly appreciated.
from mlj.jl.
On point 4., note that the Tables.jl interface is a superset of the iterable tables set of sources/sinks. I.e. any iterable table is also a Tables.jl-compatible source. This change was made to help simplify the ecosystem and allow for package developers (and use-cases exactly like this) to only need to rely on Tables.jl and get everything else for free. Happy to help answer any other questions regarding using Tables.jl.
from mlj.jl.
Thanks. To clarify, every object X
for which TableTraits.iterabletable(X)
is true, also implements the Tables.jl
interface? But surely not all of it, as column access is not universally adopted by iterable tables, as I understand it. Do you have a link to a relevant discussion?
from mlj.jl.
@ablaom, let me clarify. Every object X
that implements/satisfies TableTraits.isiterabletable(X)
, also automatically implements/satisfies the Tables.jl interface. This is because Tables.jl checks if an object first implements Tables.jl itself; if not, it checks if the object satisfies TableTraits.isiterabletable
and if so, it knows how to provide the Tables.jl interface for those objects. Tables.jl contains fallbacks that ensures that any object that is a "table" can be accessed by both Tables.rows
and Tables.columns
, allowing users/package developers using Tables.jl to use Tables.rows
or Tables.columns
as is most convenient for their package, without needing to worry about whether the input table will support one or the other. Hopefully that helps? Let me know if you have any other questions.
from mlj.jl.
A little late to the party, but here is a related effort w.r.t point 3: https://github.com/DrChainsaw/NaiveGAflux.jl
I'm currently just using it to play around with very loosely restricted search spaces so there is perhaps not much to offer yet in terms of simplify tuning as you'll most likely end up tuning the tuning parameters :)
If someone has a favourite search space and think it would be useful to add an MLJTuning plugin for it I could probably make an effort to make a package as long as it doesn't require me to shift through python code of the same type as what is used in the examples in the OP.
from mlj.jl.
@DrChainsaw Looks like your doing pretty cool things here, and would be great to get some integration at some point. We have a basic implementation of MLJ's model interface for Flux - which we are polishing at the moment. Consider it just a POC for now (https://github.com/alan-turing-institute/MLJFlux.jl).
If someone has a favourite search space
Well, I guess this is the million dollar question. I'm no expert and would be happy to hear your own thoughts. Do you think, if you know, that what auto-keras and tpot do is worth emulating?
from mlj.jl.
Thanks @ablaom
When I started I was perhaps naively thinking that the search space should not be the million dollar question if you just had a framework which allowed for arbitrary mutation of the network architecture. Turns out things might be just a little bit more complicated than that, but I still havenโt satisfied my curiosity on the subject. Once I do (or maybe before) I will certainly look into integration. From what I have seen, it looks like MLJ has answers for a lot of the API questions I have not yet wanted to address.
I understand the neural network stuff in MLJ is in a quite early phase, but do you forsee that packages of Flux based tuning methods should depend on MLJFlux or would they depend on the more basic packages if they want to integrate with the MLJ APIs?
Iโm certainly no expert either and I have not used topt or autokeras myself. I think I have seen people claim some success with them. From just browsing the code on github it was not super easy to find out what the default search space looks like.
Reimplementing an existing and well tested method for NAS is probably a safe choice though. Iโm not looking into doing this at the moment but if someone who reads this would like to tackle it Iโm happy to help make use of the NaiveXX packages. If the method relies on modification of existing network architectures one should be able to save a lot of effort that by using them and I dare to say that they are quite well tested.
from mlj.jl.
I understand the neural network stuff in MLJ is in a quite early phase, but do you forsee that packages of Flux based tuning methods should depend on MLJFlux or would they depend on the more basic packages if they want to integrate with the MLJ APIs?
You would depend on MLJFlux. The idea of this package is simply to encapsulate any given set of instructions for building a supervised Flux neural network into an MLJ model (which is just a struct storing hyperparameters). In that way all the MLJ meta-algorithms (evaluation and hyperparameter optimisation) can be applied. However, competing with the flexibility implied by this remit is a desire to make deep learning models look and feel more like traditional models, to make them accessible to users outside of NLP and images. There is some friction here because the different communities value different things, but we are going to have a stab at something and see what people think.
from mlj.jl.
Related Issues (20)
- Update docs for new class imbalance support
- Add new sk-learn models to the docs
- Export the name `MLJFlow` HOT 1
- `evaluate` errors HOT 3
- Add AutoEncoderMLJ model (part of BetaML) HOT 10
- need a tutorial for using logger with dagshub and mlflow HOT 4
- Document how to add plot recipes in a new model implementation HOT 4
- Add new model descriptors to fix doc-generation fail HOT 1
- Two models fail integration tests but defy isolation
- Update list of BetaML models HOT 1
- Reinstate CatBoost integraton test
- Upate ROADMAP.md HOT 1
- Improve documentation by additional hierarchy HOT 5
- Include support for MixedModels.jl HOT 2
- Deserialisation fails for wrappers like `TunedModel` when atomic model overloads `save/restore` HOT 2
- feature_importances for Pipeline including XGBoost don't work HOT 2
- Current performance evaluation objects, recently added to TunedModel histories, are too big HOT 2
- Update cheat sheet instance of depracated `@from_network` code
- Requesting better exposure to MLJFlux in the model browser HOT 2
- Reexport `CompactPerformanceEvaluation` and `InSample`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlj.jl.