Code Monkey home page Code Monkey logo

partialleastsquaresregressor.jl's Introduction

PartialLeastSquaresRegressor.jl

The PartialLeastSquaresRegressor.jl package is a package with Partial Least Squares Regressor methods. Contains PLS1, PLS2 and Kernel PLS2 NIPALS algorithms. Can be used mainly for regression. However, for classification task, binarizing targets and then obtaining multiple targets, you can apply KPLS.

Install

The package can be installed with the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:

pkg> add PartialLeastSquaresRegressor

Or, equivalently, via the Pkg API:

julia> import Pkg; Pkg.add("PartialLeastSquaresRegressor")

Using

PartialLeastSquaresRegressor is used with MLJ machine learning framework. Here are a few examples to show the Package functionalities:

Example 1

using MLJBase, RDatasets, MLJModels
PLSRegressor = @load PLSRegressor pkg=PartialLeastSquaresRegressor

# loading data and selecting some features
data = dataset("datasets", "longley")[:, 2:5]

# unpacking the target
y, X = unpack(data, ==(:GNP))

# loading the model
regressor = PLSRegressor(n_factors=2)

# building a pipeline with scaling on data
pipe = Standardizer |> regressor
model = TransformedTargetModel(pipe, transformer=Standardizer())

# a simple hould out
(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.7, rng=123, multi=true)

mach = machine(model, Xtest, ytest)

fit!(mach)
yhat = predict(mach, Xtest)

mae(yhat, ytest) |> mean

Example 2

using MLJBase, RDatasets, MLJTuning, MLJModels
@load KPLSRegressor pkg=PartialLeastSquaresRegressor

# loading data and selecting some features
data = dataset("datasets", "longley")[:, 2:5]

# unpacking the target
y, X = unpack(data, ==(:GNP), colname -> true)

# loading the model
pls_model = KPLSRegressor()

# defining hyperparams for tunning
r1 = range(pls_model, :width, lower=0.001, upper=100.0, scale=:log)

# attaching tune
self_tuning_pls_model = TunedModel(model =          pls_model,
                                   resampling = CV(nfolds = 10),
                                   tuning = Grid(resolution = 100),
                                   range = [r1],
                                   measure = mae)

# putting into the machine
self_tuning_pls = machine(self_tuning_pls_model, X, y)

# fitting with tunning
fit!(self_tuning_pls, verbosity=0)

# getting the report
report(self_tuning_pls)

What is Implemented

  • A fast linear algorithm for single targets (PLS1 - NIPALS)
  • A linear algorithm for multiple targets (PLS2 - NIPALS)
  • A non linear algorithm for multiple targets (Kernel PLS2 - NIPALS)

Model Description

  • PLS - PLS MLJ model (PLS1 or PLS2)

    • n_factors::Int = 10 - The number of latent variables to explain the data.
  • KPLS - Kernel PLS MLJ model

    • nfactors::Int = 10 - The number of latent variables to explain the data.
    • kernel::AbstractString = "rbf" - use a non linear kernel.
    • width::AbstractFloat = 1.0 - If you want to z-score columns. Recommended if not z-scored yet.

References

  • PLS1 and PLS2 based on

  • A Kernel PLS2 based on

  • NIPALS: Nonlinear Iterative Partial Least Squares

    • Wold, H. (1966). Estimation of principal components and related models by iterative least squares. In P.R. Krishnaiaah (Ed.). Multivariate Analysis. (pp.391-420) New York: Academic Press.
  • SIMPLS: more efficient, optimal result

    • Supports multivariate Y
    • De Jong, S., 1993. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 18: 251– 263

License

The PartialLeastSquaresRegressor.jl is free software: you can redistribute it and/or modify it under the terms of the MIT "Expat" License. A copy of this license is provided in LICENSE

partialleastsquaresregressor.jl's People

Contributors

ablaom avatar edniemeyer avatar filipebraida avatar github-actions[bot] avatar lalvim avatar zgornel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

partialleastsquaresregressor.jl's Issues

Info about upcoming removal of packages in the General registry

As described in https://discourse.julialang.org/t/ann-plans-for-removing-packages-that-do-not-yet-support-1-0-from-the-general-registry/ we are planning on removing packages that do not support 1.0 from the General registry. This package has been detected to not support 1.0 and is thus slated to be removed. The removal of packages from the registry will happen approximately a month after this issue is open.

To transition to the new Pkg system using Project.toml, see https://github.com/JuliaRegistries/Registrator.jl#transitioning-from-require-to-projecttoml.
To then tag a new version of the package, see https://github.com/JuliaRegistries/Registrator.jl#via-the-github-app.

If you believe this package has erroneously been detected as not supporting 1.0 or have any other questions, don't hesitate to discuss it here or in the thread linked at the top of this post.

Output of fitted_params(mach) and report(mach) is a bit confusing

Hi, I have a few questions regarding the outputs of f = fitted_params(mach) and r = report(mach) on a trained mach

  • What are each of the objects in the fitted_params(mach) output? After navigating this repo I could find out that the first element is f[1].W, the second element is f[1].b, and the third element is f[1].P; but this is not very clear and definitely not straightforward. It would be nice to have a description of how to access these objects and what they are in the docs of this package -there is a lot of inconsistency in terminology out there, and it is not easy to know what they actually are.
  • Is report(mach) expected to return nothing?
  • What is the best way to report feature importance with the matrices available (W, b, P)? Alternatively, it would be nice to have some metric of feature importance after fitting a model. (https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/coefficient-plots-in-pls)

ERROR: Unsatisfiable requirements detected for package PLSRegressor [fba1ee03]:

julia> Pkg.add("PLSRegressor")
Resolving package versions...
ERROR: Unsatisfiable requirements detected for package PLSRegressor [fba1ee03]:
PLSRegressor [fba1ee03] log:
├─possible versions are: 1.0.1 or uninstalled
├─restricted to versions * by an explicit requirement, leaving only versions 1.0.1
└─restricted by julia compatibility requirements to versions: uninstalled — no versions left
Stacktrace:
[1] #propagate_constraints!#61(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int64}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/GraphType.jl:1007
[2] propagate_constraints! at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/GraphType.jl:948 [inlined]
[3] #simplify_graph!#121(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int64}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/GraphType.jl:1462
[4] simplify_graph! at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/GraphType.jl:1462 [inlined] (repeats 2 times)
[5] resolve_versions!(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}, ::Nothing) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:371
[6] resolve_versions! at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:315 [inlined]
[7] #add_or_develop#63(::Array{Base.UUID,1}, ::Symbol, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:1172
[8] #add_or_develop at ./none:0 [inlined]
[9] #add_or_develop#17(::Symbol, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:59
[10] #add_or_develop at ./none:0 [inlined]
[11] #add_or_develop#16 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:36 [inlined]
[12] #add_or_develop at ./none:0 [inlined]
[13] #add_or_develop#13 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:34 [inlined]
[14] #add_or_develop at ./none:0 [inlined]
[15] #add_or_develop#12(::Base.Iterators.Pairs{Symbol,Symbol,Tuple{Symbol},NamedTuple{(:mode,),Tuple{Symbol}}}, ::Function, ::String) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:33
[16] #add_or_develop at ./none:0 [inlined]
[17] #add#22 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:64 [inlined]
[18] add(::String) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:64
[19] top-level scope at none:0

PLS2 regressor worse than baseline for unifrom random data

I tried to fit uninformative data (random, uniform, and centered) with PLS2 and the regressor was unable to learn the baseline (note that I am using the MLJ interface from #10).

regressor = PLS(n_factors=1)

X = rand(1000, 5) .- 0.5
y = rand(1000, 2) .- 0.5
plsmachine = MLJ.machine(regressor, MLJ.table(X), MLJ.table(y))
MLJ.fit!(plsmachine)

pred = MLJ.predict(plsmachine)
yhat = MLJ.matrix(pred)

# Error of the model
println(sum((y .- yhat).^2))  # 249.26
# Baseline prediction yhat = 0
println(sum(y.^2))  # 166.33

I would expect the error to be not worse for the PLS2 model here since by learning every internal parameters to be zero, it would always return [0, 0] as output and match the baseline prediction.

Scikit learn version on the other hand works as expected. It doesn't quite learn all parameters to be zero, but the final error matches the baseline's one.

Example 1 does not work

The example 1 in the readme has issues:

julia> regressor = PLSRegressor(n_factors=2)
ERROR: UndefVarError: PLSRegressor not defined

This is easily fixed by adding
using PartialLeastSquaresRegressor: PLSRegressor
but maybe you really should export that type from the package?

Then a bit later this happens:

julia> pls_model = @pipeline Standardizer regressor target=Standardizer

ERROR: LoadError: The `@pipeline` macro is deprecated. For pipelines without target transformations use pipe syntax, as in `ContinuousEncoder() |> Standardizer() |> my_classifier`. For details and advanced optioins, query the `Pipeline` docstring. To wrap a supervised model in a target transformation, use `TransformedTargetModel`, as in `TransformedTargetModel(my_regressor, target=Standardizer())`
in expression starting at REPL[16]:1

We would love to use the package, but we need a working example.

In the long term, I recommend using Literate.jl, to show working examples because then they are tested as part of CI, whereas examples in a README are not. But in the short run could you please fix the readme? Here is one Literate example:
https://jefffessler.github.io/ScoreMatching.jl/dev/generated/examples/01-overview/

check_constant_cols doesn't work on Adjoint

julia> PLSRegressor.fit(rand(3,3)', rand(3,3),nfactors=1)
PLSRegressor.PLS2Model{Float64}([-0.24304924768468017; -0.7652241603951436; 0.5961199942523806], [0.5000828333756707; 0.8403325816466939; 0.20918487513671646], [-1.014428813230312; -0.5140513192948163; 1.5284801325251283], [-0.5833408395500677; -0.6270781681635975; 0.6347112774551629], 1, [0.38126711006014835 0.7374788535159716 0.5745868740183244], [0.4833154788820418 0.3610395095534553 0.5343679246337361], [0.416986758761901 0.25893432832014807 0.23456254998816195], [0.4338424499764614 0.2871044566173785 0.1631054938683769], 3, 3, true)
julia> PLSRegressor.fit(rand(3,3), rand(3,3)',nfactors=1)
ERROR: MethodError: no method matching check_constant_cols(::Adjoint{Float64,Array{Float64,2}})
Closest candidates are:
  check_constant_cols(::Array{T,2}) where T<:AbstractFloat at /home/tyler/.julia/packages/PLSRegressor/w4SF2/src/utils.jl:31
  check_constant_cols(::Array{T,1}) where T<:AbstractFloat at /home/tyler/.julia/packages/PLSRegressor/w4SF2/src/utils.jl:32
Stacktrace:
 [1] fit(::Array{Float64,2}, ::Adjoint{Float64,Array{Float64,2}}; nfactors::Int64, copydata::Bool, centralize::Bool, kernel::String, width::Float64) at /home/tyler/.julia/packages/PLSRegressor/w4SF2/src/method.jl:27
 [2] top-level scope at REPL[445]:1

Port to MLJ.jl

Hi and thank you for this package!
have you considered porting it to MLJ.jl?

Regressors failing for some kinds of data

For some data sets training is failing. Given the MethodError thrown, this looks like a bug to me:

julia> using MLJBase, PartialLeastSquaresRegressor

julia> X, y = @load_boston;

julia> machine(PartialLeastSquaresRegressor.PLSRegressor(), X, y) |> fit!
[ Info: Training machine(PLSRegressor(n_factors = 1), ).
┌ Error: Problem fitting the machine machine(PLSRegressor(n_factors = 1), ). 
└ @ MLJBase ~/.julia/packages/MLJBase/wnJff/src/machines.jl:617
[ Info: Running type checks... 
[ Info: Type checks okay. 
ERROR: MethodError: no method matching check_constant_cols(::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
Closest candidates are:
  check_constant_cols(::Matrix{T}) where T<:AbstractFloat at /Users/anthony/.julia/packages/PartialLeastSquaresRegressor/OrIoJ/src/utils.jl:26
  check_constant_cols(::Vector{T}) where T<:AbstractFloat at /Users/anthony/.julia/packages/PartialLeastSquaresRegressor/OrIoJ/src/utils.jl:27
Stacktrace:
 [1] fit(m::PartialLeastSquaresRegressor.PLSRegressor, verbosity::Int64, X::NamedTuple{(:Crim, :Zn, :Indus, :NOx, :Rm, :Age, :Dis, :Rad, :Tax, :PTRatio, :Black, :LStat), NTuple{12, SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Y::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ PartialLeastSquaresRegressor ~/.julia/packages/PartialLeastSquaresRegressor/OrIoJ/src/mlj_interface.jl:65
 [2] fit_only!(mach::Machine{PartialLeastSquaresRegressor.PLSRegressor, true}; rows::Nothing, verbosity::Int64, force::Bool)
   @ MLJBase ~/.julia/packages/MLJBase/wnJff/src/machines.jl:615
 [3] fit_only!
   @ ~/.julia/packages/MLJBase/wnJff/src/machines.jl:568 [inlined]
 [4] #fit!#52
   @ ~/.julia/packages/MLJBase/wnJff/src/machines.jl:683 [inlined]
 [5] fit!
   @ ~/.julia/packages/MLJBase/wnJff/src/machines.jl:681 [inlined]
 [6] |>(x::Machine{PartialLeastSquaresRegressor.PLSRegressor, true}, f::typeof(fit!))
   @ Base ./operators.jl:858
 [7] top-level scope
   @ REPL[162]:1
 [8] top-level scope
   @ ~/.julia/packages/CUDA/fAEDi/src/initialization.jl:52

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.