Code Monkey home page Code Monkey logo

Comments (15)

datnamer avatar datnamer commented on May 18, 2024 2

Thanks for the feedback everyone, the project seems quite well thought out. Perhaps a blogpost will help garner cooperation from the broader Julia community.

Glad to know many of these things are or have been considered.

@fkiraly , regarding PPL integration, I did mean primarily for prediction and in that vein the reference you linked seems quite intriguing.

@ablaom When you reconsider statsmodels, please do note that a monster PR by @kleinschmidt is pending which will provide far more expressiveness and composability to the modeling interface.

from mlj.jl.

ablaom avatar ablaom commented on May 18, 2024 2

@quinn Thanks indeed for those details. After taking another look, I have switched MLJ from Query to Tables, which is serving our needs well for now.

from mlj.jl.

datnamer avatar datnamer commented on May 18, 2024 1

Thanks for your reply. That all makes sense.

Re ppl, it would be nice to be able to wrap a turing.jl model for use in a pipeline without having to expose learnable parameters. Pymc3.has a scikitlearm wrapper to do that.

Also regarding Julia ml, it was initially meant as a framework for complex ml pipelines with rich abstractions and nested models as transformation . Here is some info to that effect : https://github.com/JuliaML/LearnBase.jl/blob/master/src/LearnBase.jl

And @Evizero can say more

from mlj.jl.

fkiraly avatar fkiraly commented on May 18, 2024 1

I think I more or less agree with @tlienart , some other comments:

  1. probabilistic programming is essentially the modern, object oriented formulation of Bayesian MCMC (sometimes variational too). As such, it falls into the category of a method class, within the Bayesian framework. Bayesians are, in contemporary practice, not very clear on the separation of fitting versus model application (e.g., prediction) vs evaluation that is reflected in sklearn's fit/predict design, while the PP world is extremely advanced in modular/abstract model building.

Where the task is prediction, the two can be made to fit together, see e.g., the discussion in section 1.3.4 of
https://arxiv.org/abs/1801.00753
for more details on my opinion on the issue, though it is not immediate as the sklearn an common Bayesian modelling interface designs are differently focused (on model class specific model building). I'd see
https://github.com/alan-turing-institute/skpro
as a design study in how an interface could look like, it is conditional on a probabilistic learning interface which sklearn does not have but MLJ now does :-)

I agree with @tlienart that this is an exciting research area (with direct real world applications, e.g., finance or medicine) - though I disagree with the conclusion, I think that makes it especially worthwhile to work on! Though maybe not as one of the current MLJ priorities for pragmatic reasons.

  1. We've discussed abstract model specification and model description in #22 - I think the high-tech solution would synergize really well with a formula specification interface such as in statsmodels or R. However, again it might not be a priority given limited resources...

Generally, we're always happy for suggestions/designs/contributions!

The Turing is also always looking for competent people, please consider formally applying here:
https://www.turing.ac.uk/work-turing/research-associates
https://www.turing.ac.uk/work-turing/research-assistants
(part time arrangements are possible)

from mlj.jl.

tlienart avatar tlienart commented on May 18, 2024

Anthony or Franz will probably give a more detailed and better answer but here's my take on your points from what I understand of the MLJ plan:

  1. unless I misunderstand it, JuliaML ecosystem does not provide something that offers composable models which is the core of the MLJ idea (another way to frame it is: sklearn pipelines but much better). However JuliaML provides tools that could be used in MLJ such as metrics which may replace some of the things that are in MLJ.
  2. I don't think there's a plan to integrate probabilistic programming in MLJ and I also don't think it would be a good idea imo given the philosophy is very different and the use case as well. AFAIK PPL are more in the realm of research whereas MLJ would hope to be a practical ml library that can be used for real applications and ideally work on heterogeneous/distributed architectures.But it could be good to interface with Turing
  3. Automatic HP tuning is definitely something that will be looked into afaik.
  4. I think that's already the plan
  5. An interface to statsmodel is probably a good idea
  6. cool!
  7. also a good idea though I think it's probably too early at this point

Ps: you may want to join the slack channel if you haven't already ๐Ÿ˜„

from mlj.jl.

datnamer avatar datnamer commented on May 18, 2024

Here's the right link https://github.com/JuliaML/Transformations.jl and http://www.breloff.com/transformations/

from mlj.jl.

fkiraly avatar fkiraly commented on May 18, 2024

Renamed the topic to be more descriptive of its content.

from mlj.jl.

ablaom avatar ablaom commented on May 18, 2024

Many thanks @datnamer for your comments and enthusiasm! Just a few things to add to the other comprehensive responses:

  1. MLJ already has a flexible API for building learning networks beyond simple linear pipelines, for exporting them as stand-alone models, and tuning their nested parameters. While "architecture" search should be possible, the immediate priority would to improve usability of the existing interface, for example, by providing standard architectures (linear pipelines, stacks, etc) out-of-the-box, and to add to the existing tuning strategies (to include AD gradient descent for pure julia models).

  2. MLJ is indeed attempting to be "data agnostic" and there are two generic tabular data interfaces we have looked at: the Tables.jl interface you refer to, and the Query.jl iterable tablets interface (defined in TableTraits.jl). @ayush1999 and I have played around with these but it is still not absolutely clear to me which is this best. At the moment we are using iterable tables, although we currently have a small intermediate interface in MLJBase that could allow us to change our minds later. An important requirement is integration with CategoricalArrays.jl; some other requirements have been discussed here, another here. What we have now works but could be improved.

  3. Will have another look at Statsmodels.jl

There are only a few models for which the MLJ interface has been implemented, and a priority is to implement existing the MLJ framework for new models. Any help in this area is particularly appreciated.

from mlj.jl.

quinnj avatar quinnj commented on May 18, 2024

On point 4., note that the Tables.jl interface is a superset of the iterable tables set of sources/sinks. I.e. any iterable table is also a Tables.jl-compatible source. This change was made to help simplify the ecosystem and allow for package developers (and use-cases exactly like this) to only need to rely on Tables.jl and get everything else for free. Happy to help answer any other questions regarding using Tables.jl.

from mlj.jl.

ablaom avatar ablaom commented on May 18, 2024

Thanks. To clarify, every object X for which TableTraits.iterabletable(X) is true, also implements the Tables.jl interface? But surely not all of it, as column access is not universally adopted by iterable tables, as I understand it. Do you have a link to a relevant discussion?

from mlj.jl.

quinnj avatar quinnj commented on May 18, 2024

@ablaom, let me clarify. Every object X that implements/satisfies TableTraits.isiterabletable(X), also automatically implements/satisfies the Tables.jl interface. This is because Tables.jl checks if an object first implements Tables.jl itself; if not, it checks if the object satisfies TableTraits.isiterabletable and if so, it knows how to provide the Tables.jl interface for those objects. Tables.jl contains fallbacks that ensures that any object that is a "table" can be accessed by both Tables.rows and Tables.columns, allowing users/package developers using Tables.jl to use Tables.rows or Tables.columns as is most convenient for their package, without needing to worry about whether the input table will support one or the other. Hopefully that helps? Let me know if you have any other questions.

from mlj.jl.

DrChainsaw avatar DrChainsaw commented on May 18, 2024

A little late to the party, but here is a related effort w.r.t point 3: https://github.com/DrChainsaw/NaiveGAflux.jl

I'm currently just using it to play around with very loosely restricted search spaces so there is perhaps not much to offer yet in terms of simplify tuning as you'll most likely end up tuning the tuning parameters :)

If someone has a favourite search space and think it would be useful to add an MLJTuning plugin for it I could probably make an effort to make a package as long as it doesn't require me to shift through python code of the same type as what is used in the examples in the OP.

from mlj.jl.

ablaom avatar ablaom commented on May 18, 2024

@DrChainsaw Looks like your doing pretty cool things here, and would be great to get some integration at some point. We have a basic implementation of MLJ's model interface for Flux - which we are polishing at the moment. Consider it just a POC for now (https://github.com/alan-turing-institute/MLJFlux.jl).

If someone has a favourite search space

Well, I guess this is the million dollar question. I'm no expert and would be happy to hear your own thoughts. Do you think, if you know, that what auto-keras and tpot do is worth emulating?

from mlj.jl.

DrChainsaw avatar DrChainsaw commented on May 18, 2024

Thanks @ablaom

When I started I was perhaps naively thinking that the search space should not be the million dollar question if you just had a framework which allowed for arbitrary mutation of the network architecture. Turns out things might be just a little bit more complicated than that, but I still havenโ€™t satisfied my curiosity on the subject. Once I do (or maybe before) I will certainly look into integration. From what I have seen, it looks like MLJ has answers for a lot of the API questions I have not yet wanted to address.

I understand the neural network stuff in MLJ is in a quite early phase, but do you forsee that packages of Flux based tuning methods should depend on MLJFlux or would they depend on the more basic packages if they want to integrate with the MLJ APIs?

Iโ€™m certainly no expert either and I have not used topt or autokeras myself. I think I have seen people claim some success with them. From just browsing the code on github it was not super easy to find out what the default search space looks like.

Reimplementing an existing and well tested method for NAS is probably a safe choice though. Iโ€™m not looking into doing this at the moment but if someone who reads this would like to tackle it Iโ€™m happy to help make use of the NaiveXX packages. If the method relies on modification of existing network architectures one should be able to save a lot of effort that by using them and I dare to say that they are quite well tested.

from mlj.jl.

ablaom avatar ablaom commented on May 18, 2024

I understand the neural network stuff in MLJ is in a quite early phase, but do you forsee that packages of Flux based tuning methods should depend on MLJFlux or would they depend on the more basic packages if they want to integrate with the MLJ APIs?

You would depend on MLJFlux. The idea of this package is simply to encapsulate any given set of instructions for building a supervised Flux neural network into an MLJ model (which is just a struct storing hyperparameters). In that way all the MLJ meta-algorithms (evaluation and hyperparameter optimisation) can be applied. However, competing with the flexibility implied by this remit is a desire to make deep learning models look and feel more like traditional models, to make them accessible to users outside of NLP and images. There is some friction here because the different communities value different things, but we are going to have a stab at something and see what people think.

from mlj.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.