Code Monkey home page Code Monkey logo

learnbase.jl's Introduction

WARNING

This package has been discontinued. Most functionalities have been moved MLUtils.jl.

learnbase.jl's People

Contributors

ahwillia avatar carlolucibello avatar darsnack avatar dfdx avatar evizero avatar joshday avatar juliatagbot avatar juliohm avatar neveritt avatar staticfloat avatar tbreloff avatar tk3369 avatar wookay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

learnbase.jl's Issues

Long term re-org

Ref #48 (comment)

  1. Rename LearnBase.jl ➡️ LearnAPI.jl (similar to StatsAPI.jl and DataAPI.jl)
  2. Create MLBase.jl as an umbrella over MLDataPattern.jl, MLLabelUtils.jl, LossFunctions.jl, and PenaltyFunctions.jl

For (2), do we want an umbrella package or consolidation of code? Right now, I prefer the former to maintain small dependencies for people who need them. But maybe after those packages get cleaned up, they will be trivially small.

Refactoring of codebase

Dear all,

In this issue I would like to discuss a refactoring of LearnBase.jl to accommodate more general problems under transfer learning settings. Before I can do this, I would like to get your feedback on a few minor changes. These changes should facilitate a holistic view of the interface, and should help shape the workflow that developers are expected to follow (see #28).

Below are a few suggestions of improvement that I would like to consider.

Suggestions of improvement

  1. Split the main LearnBase.jl file into smaller source files with more specific concepts. For example, I'd like to review the Cost interface in a separate file called costs.jl. Similarly, we could move the data orientation interface to a separate file orientation.jl and include these two files in LearnBase.jl.

  2. Can we get rid of all exports in the module? I understand that this module is intended for use by developers who would import LearnBase; const LB = LearnBase in their code. Exporting all the names in LearnBase.jl can lead to problems downstream like the fact that LossFunctions.jl was not exporting the abstract SupervisedLoss type, and then users of LossFunctions.jl would also need to import LearnBase.jl just to get access to the name. My suggestion here is to define the interface without exports. And then each package in JuliaML can export the relevant concepts.

  3. The interface for learning models is currently spread over various different Julia ecosystems. In most cases, there are two functions that developers need to implement (e.g. fit/predict, model/update, fit/transform). I would like to do a literature review on the existing approaches, and generalize this to transfer learning settings. This generalization shouldn't force users to subtype their models from some Model type. A traits-based interface is ideal for developers who want to plug their models after the fact, and developers interested in fitting entire pipelines (e.g. AutoMLPipeline.jl).

I would like to start addressing (1) and (2) in the following weeks. In order to address (3) I need more time to investigate and brainstorm a more general interface.

Remove dependencies: StatsBase and Distributions

My PR to move params from Distributions to StatsBase now has 8 commits and 22 comments...

I think this is as good a time as any to visit the idea of going back to 0 dependencies, which was our original thought when we created LearnBase. We essentially only have StatsBase and Distributions in our require file for nobs and params/params!. Does anyone have a strong opinion on adding these ourselves and just not exporting them? Or other solutions?

Add documentation

As we evolve the interface, it is quite important to have clear and precise documentation for the currently implemented concepts. The docstrings already do a great job explaining the concepts, but we need an official documentation with Documenter.jl sharing our motivations for the interface design and how these concepts interact.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Packages that plan to depend on LearnBase in the short term

For getting this merged into METADATA I would like to present a list of packages that we know will depend on LearnBase and derivatives soon after the registration.

Already in METADATA

Ready soon

In development

Please adapt the list accordingly. If you are not sure about a specific package, then let us omit it from this list for now.

cc: @tbreloff @ahwillia @joshday

Importing StatsBase?

I somehow completely missed this discussion. I know we had a lengthy conversation at JuliaCon about why we weren't going to import StatsBase. Can we add here for posterity what changed?

ObsDim is not exported and out of sync with MLLabelUtils?

When trying to use MLDataPattern, I keep getting an error from MLLabelUtils that "ObsDim is not defined." This is because after the refactor, LearnBase no longer exports ObsDim. It also no longer exports nobs from StatsBase either.

Can we get LearnBase in sync with the other JuliaML packages? And what do we want exported and what do we leave out?

What needs to be defined here

I transferred as little code to this package as I think is absolutely needed.

Let the discussion on what is missing / should be changed / should be added begin.

To start off: I chose to only define the baseclass Loss here in LearnBase and will define ModelLoss and ParameterLoss in MLModels instead. The motivation being that it turns out that if one programs something that falls into the ModelLoss / ParameterLoss framework one probably needs to import MLModels anyway. For example there are a lot of propertyfunctions such as isnemitski there that are useful or in some cases even needed to implement an algorithm properly (at least in some cases with SVMs).

Example using LearnBase

I know JuliaML is in an state of "get your hands dirty" but it would be really nice to have an explanation on how to make a model in a "JuliaML" way. If its not clear, maybe open a discussion on how we would like that to be.

The JuliaML ecosystem is right now focused on providing tools to be used afterwards to create models. There is nevertheless not a single example showing how to use the tools to build a model and how to use the model with the provided tools (MLDataUtils for example).

I would like to do a couple of things:

  • port an implementation of a Perceptron
    in such a way that is coherent with the ecosystem.

  • help to build a simple tutorial showing how to use the tools (and the model) in a real (yet it can be tiny) example.

I am doing tutorials for myself but I would like to generate something more readable such as MLDataPattern documentation but I have no idea on how to build this (is it markdown? I see the extension .rst and I have no idea on how to start building pretty documentation like that).

LearnBase equivalent for StatsBase.nobs

For MLDataUtils we need some kind of function that returns how many datapoints are in a dataset. right now I use StatsBase.nobs there. It would be useful to introduce the function here though, since I don't want packages to depend on MLDataUtils just for two function definitions.

As I see it we have three choices

  1. Make LearnBase depend on StatsBase, see #1
  2. Define a different nobs, which seems like like a recipe for trouble
  3. Come up with a new function name.

Thoughts?

cc: @ahwillia @tbreloff @joshday

Distributions dependency

We need to find a better solution to the params problem in #14 . LearnBase is not the place to have such a heavy dependency. maybe we can move it closer to the package that needs it @tbreloff ?

Properly display derived types of `AbstractSet` in REPL or IJulia

I encountered the issue when playing with Reinforce.jl, and the essential reason is in LearnBase.

Issue description:
When a concrete type in LearnBase derived from AbstractSet is displayed automatically in REPL or IJulia (i.e., with no semicolon at the end), an error will be caused like follows:

Error showing value of type LearnBase.DiscreteSet{Array{Int64,1}}:
ERROR: MethodError: no method matching iterate(::LearnBase.DiscreteSet{Array{Int64,1}})
...(a lot more, omitted here)

How to reproduce

julia> using LearnBase
julia> ds = LearnBase.DiscreteSet([1, 2, 3])

Note that, if you suppress the output with a semicolon and then print it manually with print(ds), then no error happens and the printed result is LearnBase.DiscreteSet{Array{Int64,1}}([1, 2, 3]).

Reason of the error
The reason is that when a variable is displayed automatically in REPL or IJulia, the display function is used. That is, if you print the output with display(ds), the same error is induced. It seems that, for subtypes of AbstractSet, the default display method tries to iterate over each element. However, there is no default implementation in Julia to iterate an AbstractSet. (see documentation)

Possible fix
Two obvious fixes are possible

  1. Add Base.iterate method for each related type in LearnBase.
    Example: if we dispatch Base.iterate for DiscreteSet by iterating DiscreteSet.items, the displayed output of the above ds is
LearnBase.DiscreteSet{Array{Int64,1}} with 3 elements:
  1
  2
  3

However, an iteration method may make little sense for LearnBase.IntervalSet.

  1. Support display by implementing the MIME show method for relevant types. (see documentation)
    Example:
Base.show(io::IO, ::MIME"text/plain", set::LearnBase.IntervalSet) = print(io, "$(typeof(set)):\n  ", "lo = $(set.lo)\n  ", "hi = $(set.hi)\n")

will display a LearnBase.IntervalSet(-1.0, 1.0) as

LearnBase.IntervalSet{Float64}:
  lo = -1.0
  hi = 1.0

My suggestion is that

  • Implement proper MIME show for all subtypes of AbstractSet pertaining to this issue.
  • For those subtypes that have iteration semantics (like DiscreteSet), implement also Base.iterate. Another benefit is that, with iteration support, those types can be used in a for loop naturally.

I can make a PR if you think the above suggestion is reasonable.

100% test coverage

@tbreloff there are a small handful untested lines in your new code. could you maybe add some tests for them when you have a chance?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.