juliastats / lasso.jl Goto Github PK
View Code? Open in Web Editor NEWLasso/Elastic Net linear and generalized linear models
License: Other
Lasso/Elastic Net linear and generalized linear models
License: Other
I can play a role as well.
More generally the Julia ecosystem would be a much better place if there was a way to coordinate all penalized linear models into one package including:
SubsetSelection (paper) by @jeanpauphilet & co.
MIP-Boost forthcoming
Global Search Regression
There should be a setting to set the maximum number of iterations for fit()
. Hard-to-solve problems may require more than the hard-coded 10000 maximum iterations.
Running the following
`
using Lasso
X = [1 2; 3 4]
y = zeros(2)
fit(LassoPath, X, y)
`
leads to the error message
ERROR: coordinate descent failed to converge in 10000 iterations at λ = NaN
Having done some debugging, the problem is the following: when computing the λ values that make up the path, λmax is computed to be 0, resulting in λ being a vector of NaNs. The correct thing to output is that the null model is the only Lasso solution.
The easiest way to fix this is to have a check for λ being a vector of all NaNs right after line 490 in Lasso.jl. This is not a good solution, though, since we're relying on the fact that λmax = 0 implies λ is a vector of all NaNs; λ may be a vector of all NaNs due to another reason.
I think the best solution will be to modify build_model so that, in addition to the 4 values it currently returns, it also returns λmax. Then, we can have a check for λmax == 0 in line 491.
Lasso.jl/src/cross_validation.jl
Lines 27 to 36 in 5571896
For example,
using Lasso
X = rand(100, 5)
y = rand([0., 1.], 100)
path = fit(LassoPath, X, y)
cross_validate_path(path)
produces an error UndefVarError: y not defined
. Should the default value for gen
be changed to gen=Kfold(length(path.m.rr.y), 10)
(or similar using a function to grab y
from the RegularizationPath
)?
In StatsBase 0.32.1 uweights(n)::AbstractVector{Int}
.
That causes this to error
Sometimes when solving a model or more commonly when doing cross validation I get a BoundsError
m = selectmodel(path, MinCV1se(path));
BoundsError: attempt to access 306-element Array{Float64,1} at index [307]
Stacktrace:
[1] getindex at ./array.jl:744 [inlined]
[2] update_coef! at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/coordinate_descent.jl:189 [inlined]
[3] cycle!(::Lasso.SparseCoefficients{Float64}, ::NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}, ::Float64, ::Bool) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/coordinate_descent.jl:246
[4] cdfit!(::Lasso.SparseCoefficients{Float64}, ::NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}, ::Float64, ::Symbol) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/coordinate_descent.jl:598
[5] #fit!#14(::Bool, ::Int64, ::Float64, ::Float64, ::Bool, ::Symbol, ::Float64, ::typeof(fit!), ::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/coordinate_descent.jl:827
[6] (::StatsBase.var"#kw##fit!")(::NamedTuple{(:irls_tol,),Tuple{Float64}}, ::typeof(fit!), ::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}) at ./none:0
[7] #fit#3(::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Float64, ::Array{Float64,1}, ::Bool, ::Bool, ::Type, ::Bool, ::Float64, ::Bool, ::Int64, ::Nothing, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}, ::typeof(fit), ::Type{LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Normal{Float64}, ::IdentityLink) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/Lasso.jl:472
[8] (::StatsBase.var"#kw##fit")(::NamedTuple{(:λ, :wts, :offset, :standardize),Tuple{Array{Float64,1},Array{Float64,1},Array{Float64,1},Bool}}, ::typeof(fit), ::Type{LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Normal{Float64}, ::IdentityLink) at ./none:0
[9] #cross_validate_path#48(::Array{Float64,1}, ::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:standardize,),Tuple{Bool}}}, ::typeof(cross_validate_path), ::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,false,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}, ::Array{Float64,2}, ::Array{Float64,1}, ::MinCV1se) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/cross_validation.jl:73
[10] #cross_validate_path at ./none:0 [inlined]
[11] #cross_validate_path#49 at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/cross_validation.jl:113 [inlined]
[12] cross_validate_path at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/cross_validation.jl:109 [inlined]
[13] segselect at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/segselect.jl:109 [inlined]
[14] coef at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/segselect.jl:86 [inlined]
[15] selectmodel(::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,false,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}, ::MinCV1se) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/segselect.jl:191
[16] top-level scope at In[1355]:1
The bounds error seems to occur randomly which given how cross validation works makes sense.
I'm using Julia 0.51 and Pkg.status("Lasso") = 0.1.0
y = x -> 1. + 0.*x + 3x^2 + 0.x^3 # choose true function
# training data
R = rand(20)*pi/2
Y = y.(R)
# select features
X = hcat(R.^0, R, R.^2, R.^3) # matrix with 4 columns
# L2 regression with no regularization works
X\Y # ~= [1.,0.,3.,0.]
# but Lasso does not
fit(LassoPath,X,Y)
> LassoPath (100 solutions for 4 predictors in 99 iterations):
λ pct_dev ncoefs
[1] NaN 0.0 0
[2] NaN NaN 4
[3] NaN NaN 4
[4] NaN NaN 4
[5] NaN NaN 4
[6] NaN NaN 4
# As λ = NaN, which is strange I tried specifying
λs = reverse([1e-7,1e-6,1e-4, 1e-3,1e-2,1e-1,1])
fit(LassoPath,X,Y; λ=λs)
> LassoPath (7 solutions for 4 predictors in 7 iterations):
λ pct_dev ncoefs
[1] 1.0 NaN 4
[2] 0.1 NaN 4
[3] 0.01 NaN 4
[4] 0.001 NaN 4
[5] 0.0001 NaN 4
[6] 1.0e-6 NaN 4
[7] 1.0e-7 NaN 4
....
Am I doing something wrong?
When fitting models with a large number of variables, Lasso.jl and GLMnet return different paths, and the difference grows bigger as the number of variables is bigger.
An example to illustrate this:
using Lasso, GLMNet, Statistics
# fits identical models in Lasso and GLMNet from mock data
# and returns the mean absolute difference of the betas of both models
function lasso_glmnet_dif(nrow, ncol, n_col_contributing)
data = rand(nrow, ncol)
outcome = mean(data[:, 1:n_col_contributing], dims = 1)[:,1] .> rand(nrow)
presence_matrix = [1 .- outcome outcome]
l = Lasso.fit(LassoPath, data, outcome, Binomial())
g = GLMNet.glmnet(data, presence_matrix, Binomial())
lcoefs = Vector(l.coefs[:,end])
gcoefs = g.betas[:, end]
mean(abs, lcoefs .- gcoefs)
end
# 1000 records, 5 variables that all contribute to outcome
lasso_glmnet_dif(1000, 5, 5) # order of magnitude 1e-9
# 1000 records, 100 variables of which 5 contribute to the outcome
lasso_glmnet_dif(1000, 1000, 5) # around 0.05
The context for this problem is that I'm working on a julia implementation of maxnet, where a big-ish model matrix is generated (100s of columns) and a lasso path is used to select the most important ones.
This is with Lasso v0.0.3:
julia> fit(LassoPath, [1.0 2.0; 3.0 4.0], [0.0, 1.0], Binomial())
ERROR: `*` has no method matching *(::Array{Float64,1}, ::Array{Float64,1})
in fit at /home/user/.julia/v0.3/Lasso/src/Lasso.jl:283
in fit at /home/user/.julia/v0.3/Lasso/src/Lasso.jl:212
PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.
Tests pass.
Tests fail.
This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.
Test log:
>>> 'Pkg.add("Lasso")' log
INFO: Cloning cache of Lasso from git://github.com/simonster/Lasso.jl.git
INFO: Installing ArrayViews v0.6.2
INFO: Installing Distributions v0.7.3
INFO: Installing GLM v0.4.6
INFO: Installing Lasso v0.0.3
INFO: Installing NumericFuns v0.2.3
INFO: Installing PDMats v0.3.3
INFO: Installing Reexport v0.0.2
INFO: Installing StatsBase v0.6.15
INFO: Package database updated
>>> 'Pkg.test("Lasso")' log
INFO: Computing test dependencies for Lasso...
INFO: Installing DataArrays v0.2.15
INFO: Installing DataFrames v0.6.6
INFO: Installing Docile v0.5.7
INFO: Installing FactCheck v0.2.7
INFO: Installing GLMNet v0.0.4
INFO: Installing GZip v0.2.15
INFO: Installing SortingAlgorithms v0.0.5
INFO: Building GLMNet
INFO: Testing Lasso
ERROR: LoadError: LoadError: LoadError: LoadError: LoadError: syntax: local declaration in global scope
in include at ./boot.jl:254
in include_from_node1 at ./loading.jl:133
in anonymous at no file:110
in include at ./boot.jl:254
in include_from_node1 at ./loading.jl:133
in reload_path at ./loading.jl:157
in _require at ./loading.jl:69
in require at ./loading.jl:55
in include at ./boot.jl:254
in include_from_node1 at ./loading.jl:133
in reload_path at ./loading.jl:157
in _require at ./loading.jl:69
in require at ./loading.jl:52
in include at ./boot.jl:254
in include_from_node1 at ./loading.jl:133
in include at ./boot.jl:254
in include_from_node1 at loading.jl:133
in process_options at ./client.jl:304
in _start at ./client.jl:404
while loading /home/vagrant/.julia/v0.4/DataFrames/src/abstractdataframe/show.jl, in expression starting on line 49
while loading /home/vagrant/.julia/v0.4/DataFrames/src/DataFrames.jl, in expression starting on line 79
while loading /home/vagrant/.julia/v0.4/GLMNet/src/GLMNet.jl, in expression starting on line 2
while loading /home/vagrant/.julia/v0.4/Lasso/test/lasso.jl, in expression starting on line 1
while loading /home/vagrant/.julia/v0.4/Lasso/test/runtests.jl, in expression starting on line 3
================================[ ERROR: Lasso ]================================
failed process: Process(`/home/vagrant/julia/bin/julia --check-bounds=yes --code-coverage=none --color=no /home/vagrant/.julia/v0.4/Lasso/test/runtests.jl`, ProcessExited(1)) [1]
================================================================================
INFO: Removing DataArrays v0.2.15
INFO: Removing DataFrames v0.6.6
INFO: Removing Docile v0.5.7
INFO: Removing FactCheck v0.2.7
INFO: Removing GLMNet v0.0.4
INFO: Removing GZip v0.2.15
INFO: Removing SortingAlgorithms v0.0.5
ERROR: Lasso had test errors
in error at ./error.jl:21
in test at pkg/entry.jl:746
in anonymous at pkg/dir.jl:31
in cd at file.jl:22
in cd at pkg/dir.jl:31
in test at pkg.jl:71
in process_options at ./client.jl:280
in _start at ./client.jl:404
>>> End of log
What' s the preferred way of saving model for later use? Currently using JLD2 save the model seems to save a file with a huge size that doesn't feel right for me.
age=sh["F2:F3778"]
sex=sh["D2:D3778"]
wage=sh["I2:I3778"]
work=sh["H2:H3778"]
family_size=sh["C2:C3778"]
data = (;age,sex,wage,work,family_size);
X=[ones(n) age sex family_size work]
beta = [:beta1,:beta2,:beta3,:beta4,:beta5,:beta6 ]
beta_ml=zeros(6,1)
xf = XLSX.readxlsx("HW1.xlsx")
n=3777
sh = xf["Sheet1"]
age=sh["F2:F3778"]
sex=sh["D2:D3778"]
wage=sh["I2:I3778"]
work=sh["H2:H3778"]
family_size=sh["C2:C3778"]
data = (;age,sex,wage,work,family_size);
X=[ones(n) age sex family_size work]
beta = [:beta1,:beta2,:beta3,:beta4,:beta5,:beta6 ]
beta_ml=zeros(6,1)
result=optimize(beta->sum(.-log(1/sqrt(2πbeta[6]^2)) .+ 1/(2 * beta[6]^2) .* ((wage.-beta[5]*(beta[1]+beta[2]*age+beta[3]*sex+beta[4]*family_size).*work.^((beta[1]+beta[2]*age+beta[3]*sex+beta[4]family_size).-1))./(beta[5])(1
.+(beta[1]+beta[2]*age+beta[3]*sex+beta[4]*family_size).log(work)).(work.^(beta[1]+beta[2]*age+beta[3]*sex+beta[4]*family_size).-1)).^2),[0.0,0.0,0.0,0.0,0.0,0.0])
beta_ml=Optim.minimizer(result)
Very nice package!
Unfortunately I'm getting an error when trying to run fused lasso or trend filtering (LassoPath works like a charm):
Example code:
using Lasso
n = 100;
p = 10000;
X = rand(n,p) - 0.5;
y = rand(n) - 0.5;
lambda = norm(X'*y, Inf)/2
f = fit(LassoPath, X,y);fit(FusedLasso, y, lambda)
ERROR: FusedLasso not definedfit(TrendFilter, y, 0, lambda)
ERROR: TrendFilter not defined
I apologize in advance if I'm doing something silly! (I'm still finding my sea-legs with Julia)
Hi, this package is causing problems for me because it restricts Reexport to 0.2 in latest tag. In master, this has been updated to version 1.
Could we please get a tag of this so that packages that depend on it can update properly?
julia> y = Vector{Union{Float64, Missing}}()
Union{Missing, Float64}[]
julia> append!(y, [1,2,3])
3-element Vector{Union{Missing, Float64}}:
1.0
2.0
3.0
julia> X = rand(3,3)
3×3 Matrix{Float64}:
0.830955 0.821036 0.235061
0.028721 0.326948 0.560919
0.905771 0.673137 0.986464
julia> fit(LassoPath, X, y)
ERROR: StackOverflowError:
Simon:
This is just a heads-up that I plan to create a PR to switch the package to Julia v0.6 constructions and change the minimum version of Julia. Please let me know if I am duplicating other efforts.
The following code leads to an error:
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.5.1 (2020-08-25)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
(@v1.5) pkg> status Lasso
Status `~/.julia/environments/v1.5/Project.toml`
[b4fcebef] Lasso v0.5.2
julia> using Lasso
julia> lambdas=[0.001];
julia> X=rand(100,13500);
julia> Y=rand(100);
julia> res=fit(LassoPath, X, Y;λ=lambdas)
ERROR: BoundsError: attempt to access 200-element Array{Float64,1} at index [201]
Stacktrace:
[1] getindex at ./array.jl:809 [inlined]
[2] update_coef! at /home/peter/.julia/packages/Lasso/HtY0O/src/coordinate_descent.jl:189 [inlined]
[3] cycle!(::Lasso.SparseCoefficients{Float64}, ::NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}, ::Float64, ::Bool) at /home/peter/.julia/packages/Lasso/HtY0O/src/coordinate_descent.jl:246
[4] cdfit!(::Lasso.SparseCoefficients{Float64}, ::NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}, ::Float64, ::Symbol) at /home/peter/.julia/packages/Lasso/HtY0O/src/coordinate_descent.jl:598
[5] fit!(::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}; verbose::Bool, cd_maxiter::Int64, cd_tol::Float64, irls_tol::Float64, stopearly::Bool, criterion::Symbol, minStepFac::Float64) at /home/peter/.julia/packages/Lasso/HtY0O/src/coordinate_descent.jl:827
[6] fit(::Type{LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Normal{Float64}, ::IdentityLink; wts::Array{Float64,1}, offset::Array{Float64,1}, α::Float64, nλ::Int64, λminratio::Float64, λ::Array{Float64,1}, standardize::Bool, intercept::Bool, algorithm::Type{T} where T, dofit::Bool, irls_tol::Float64, randomize::Bool, maxncoef::Int64, penalty_factor::Nothing, standardizeω::Bool, fitargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}) at /home/peter/.julia/packages/Lasso/HtY0O/src/Lasso.jl:491
[7] top-level scope at REPL[6]:1
using Lasso
ERROR: LoadError: UndefVarError: wrkwt! not defined
in include_from_node1(::String) at ./loading.jl:488
in eval(::Module, ::Any) at ./boot.jl:234
in require(::Symbol) at ./loading.jl:415
while loading /home/jakub/.julia/v0.5/Lasso/src/Lasso.jl, in expression starting on line 18
I'm needing to run logistic regression, ideally with the ability to be multinomial. I noticed that you ( @simonster ) also built a wrapper for GLMnet; have you had any plans on adding this feature to either? If not, how difficult an extension do you think it would be? I'm willing to contribute, but not entirely sure where to start.
A PkgEval run for a Julia pull request which changes the generated numbers for rand(a:b)
indicates that the tests of this package might fail in Julia 1.5 (and on Julia current master branch).
Also, you might be interested in using the new StableRNGs.jl registered package, which provides guaranteed stable streams of random numbers across Julia releases.
Apologies if this is a false positive. Cf. https://github.com/JuliaCI/NanosoldierReports/blob/ab6676206b210325500b4f4619fa711f2d7429d2/pkgeval/by_hash/52c2272_vs_47c55db/logs/Lasso/1.5.0-DEV-87d2a04de3.log
It seems to me that the Lasso model-fitting API is not designed for the user to simply pick a particular choice of lambda and fit a model with that particular choice. Having looked at the source code I can see that the way to do this would be to fit a LassoPath and specify the chosen lambda value as a vector containing only that value. This is a bit clunky: I only want to fit one model yet I'm using the LassoPath API.
I suggest having a fit(LassoModel, some stuff; \lambda) method which allows to do this. It should fit a LassoPath only at the chosen lambda value, then construct the LassoModel object from this. We could also give it a default value of 1.0, like sklearn, so that the user can just run fit(LassoModel, X, y) to quickly fit a 'default' lasso model.
Maintainers of this project: If you give me the green light, I can go ahead and submit a PR.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
I'm attempting to use this in a multidimensional context in which the number of observations is smaller than the number of variables. I'm getting some errors I don't understand. Here's a demo:
julia> using Lasso, GLM
julia> m = fit(LassoModel, rand(5, 16), rand(5))
LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}}(Error showing value of type LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}}:
ERROR: DimensionMismatch("matrix is not square: dimensions are (5, 17)")
Stacktrace:
[1] inv!(::LinearAlgebra.Cholesky{Float64,Array{Float64,2}}) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/LinearAlgebra.jl:221
[2] inv at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/cholesky.jl:597 [inlined]
[3] invchol at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:200 [inlined]
[4] vcov(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:216
[5] stderror(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:224
[6] #coeftable#1(::Float64, ::typeof(coeftable), ::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/lm.jl:184
[7] coeftable at /home/tim/.julia/packages/GLM/ceUyW/src/lm.jl:183 [inlined]
[8] show(::IOContext{REPL.Terminals.TTYTerminal}, ::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:227
[9] _show_default(::IOContext{REPL.Terminals.TTYTerminal}, ::Any) at ./show.jl:347
[10] show_default at ./show.jl:330 [inlined]
[11] show at ./show.jl:327 [inlined]
[12] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}}) at ./multimedia.jl:47
[13] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:132
[14] display(::REPL.REPLDisplay, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:136
[15] display(::Any) at ./multimedia.jl:323
[16] #invokelatest#1 at ./essentials.jl:709 [inlined]
[17] invokelatest at ./essentials.jl:708 [inlined]
[18] print_response(::IO, ::Any, ::Bool, ::Bool, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:156
[19] print_response(::REPL.AbstractREPL, ::Any, ::Bool, ::Bool) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:141
[20] (::REPL.var"#do_respond#38"{Bool,REPL.var"#48#57"{REPL.LineEditREPL,REPL.REPLHistoryProvider},REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:719
[21] #invokelatest#1 at ./essentials.jl:709 [inlined]
[22] invokelatest at ./essentials.jl:708 [inlined]
[23] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/LineEdit.jl:2306
[24] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:1045
[25] run_repl(::REPL.AbstractREPL, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:201
[26] (::Base.var"#768#770"{Bool,Bool,Bool,Bool})(::Module) at ./client.jl:382
[27] #invokelatest#1 at ./essentials.jl:709 [inlined]
[28] invokelatest at ./essentials.jl:708 [inlined]
[29] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./client.jl:366
[30] exec_options(::Base.JLOptions) at ./client.jl:304
[31] _start() at ./client.jl:460
julia> stderror(m)
ERROR: DimensionMismatch("matrix is not square: dimensions are (5, 17)")
Stacktrace:
[1] inv!(::LinearAlgebra.Cholesky{Float64,Array{Float64,2}}) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/LinearAlgebra.jl:221
[2] inv at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/cholesky.jl:597 [inlined]
[3] invchol at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:200 [inlined]
[4] vcov(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:216
[5] stderror(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:224
[6] #stderror#31 at /home/tim/.julia/packages/StatsModels/9cHwk/src/statsmodel.jl:28 [inlined]
[7] stderror(::LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}}) at /home/tim/.julia/packages/StatsModels/9cHwk/src/statsmodel.jl:28
[8] top-level scope at REPL[3]:1
Note that the first error is triggered upon display, the computation of m
succeeds.
Contrast this with
julia> m = fit(LassoPath, rand(5, 16), rand(5))
LassoPath (79) solutions for 17 predictors in 461 iterations):
───────────────────────────────────
λ pct_dev ncoefs
───────────────────────────────────
[1] 0.0795677 0.0 0
[2] 0.0759513 0.0548964 2
[3] 0.0724992 0.112467 2
[4] 0.069204 0.164934 2
[5] 0.0660585 0.21274 2
[6] 0.0630561 0.256267 2
[7] 0.0601901 0.295989 2
[8] 0.0574543 0.332152 2
[9] 0.054843 0.365227 2
[10] 0.0523503 0.399728 3
[11] 0.0499709 0.451952 3
[12] 0.0476996 0.499779 4
[13] 0.0455316 0.544217 4
[14] 0.0434621 0.5847 4
[15] 0.0414867 0.621608 4
[16] 0.039601 0.655211 4
[17] 0.0378011 0.685854 4
[18] 0.036083 0.713747 4
[19] 0.034443 0.739184 4
[20] 0.0328775 0.762356 4
[21] 0.0313832 0.783462 4
[22] 0.0299567 0.802699 4
[23] 0.0285952 0.820229 4
[24] 0.0272955 0.836205 4
[25] 0.0260548 0.850761 4
[26] 0.0248706 0.86401 4
[27] 0.0237402 0.876093 4
[28] 0.0226612 0.887102 4
[29] 0.0216312 0.897126 4
[30] 0.020648 0.906278 4
[31] 0.0197095 0.914592 4
[32] 0.0188137 0.922191 4
[33] 0.0179586 0.9291 4
[34] 0.0171423 0.935397 4
[35] 0.0163632 0.941135 4
[36] 0.0156195 0.946364 4
[37] 0.0149095 0.951128 4
[38] 0.0142319 0.955474 4
[39] 0.013585 0.959426 4
[40] 0.0129676 0.963036 4
[41] 0.0123782 0.966317 4
[42] 0.0118156 0.969307 4
[43] 0.0112785 0.972036 4
[44] 0.0107659 0.974516 4
[45] 0.0102766 0.976783 4
[46] 0.00980948 0.978844 4
[47] 0.00936363 0.980727 4
[48] 0.00893803 0.982438 4
[49] 0.00853179 0.983995 4
[50] 0.008144 0.985419 4
[51] 0.00777385 0.986715 4
[52] 0.00742051 0.987895 4
[53] 0.00708324 0.98897 4
[54] 0.0067613 0.98995 4
[55] 0.00645398 0.990844 4
[56] 0.00616064 0.991655 4
[57] 0.00588063 0.992398 4
[58] 0.00561335 0.993073 4
[59] 0.00535821 0.99369 4
[60] 0.00511467 0.994249 4
[61] 0.0048822 0.99476 4
[62] 0.0046603 0.995225 4
[63] 0.00444848 0.99565 4
[64] 0.00424629 0.996036 4
[65] 0.00405329 0.996389 4
[66] 0.00386906 0.99671 4
[67] 0.00369321 0.997003 4
[68] 0.00352534 0.997269 4
[69] 0.00336511 0.997511 4
[70] 0.00321216 0.997732 4
[71] 0.00306617 0.997933 4
[72] 0.0029268 0.998118 4
[73] 0.00279378 0.998285 4
[74] 0.00266679 0.998437 4
[75] 0.00254558 0.998576 4
[76] 0.00242988 0.998703 4
[77] 0.00231944 0.998817 4
[78] 0.00221402 0.998923 4
[79] 0.00211339 0.999018 4
───────────────────────────────────
julia> m.coefs[:,end]
16-element SparseArrays.SparseVector{Float64,Int64} with 4 stored entries:
[2 ] = -0.112423
[3 ] = -0.103032
[6 ] = 0.218529
[9 ] = 0.263868
but for which it's not possible to compute the error:
julia> stderror(m)
ERROR: vcov is not defined for LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},CovarianceCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] vcov(::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},CovarianceCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}) at /home/tim/.julia/packages/StatsBase/DyWPR/src/statmodels.jl:133
[3] stderror(::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},CovarianceCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}) at /home/tim/.julia/packages/StatsBase/DyWPR/src/statmodels.jl:126
[4] top-level scope at REPL[6]:1
Hey people,
maybe this is related to #54 , but LassoPath
seems to struggle for small λ's
, in particular for λ=0
(least squares solution). Here is a MWE:
using Lasso
using StatsBase
using Random
##
Random.seed!(1234)
# time series length
N = 5
# create a random time series of length N
s = rand(N)
# create a Data Matrix
Θ = ones(N,15)
Θ[1:50] .= 0
shuffle!(Θ)
# λ 1e-1 works
Lf = fit(LassoPath, Θ, s; standardize = false, intercept = false, λ=[1e-1])
# λ 1e-2 fails
Lf = fit(LassoPath, Θ, s; standardize = false, intercept = false, λ=[1e-2])
# ERROR: BoundsError: attempt to access 10×15 Array{Float64,2} at index [11, 12]
In Matlab's lasso
this is no problem and λ=0
yields the least squares solution.
What am I missing here? Any help is appreciated,
Cheers
I am not able to set one given value of lambda for the Lasso fit.
I am not sure if the lambda parameter is one lambda per covariate, or just a vector of lambdas to try (if we want only one lambda, we would still need to put in an array, because an array is expected). Either way, I get an error.
using DataFrames, Lasso
dat = readtable("test.txt", header=true)
y = convert(Array{Float64,1},dat[:,1]) ##1st column: response
X = convert(Array{Float64,2},dat[:,2:end]) ##10,000 SNPs
f2=@time fit(LassoPath,X,y,Bernoulli(),LogitLink())
## 11.347873 seconds (6.19 M allocations: 363.455 MiB, 1.41% gc time)
##Bernoulli LassoPath (100 solutions for 10000 predictors in 1387 iterations):
## λ pct_dev ncoefs
## [1] 0.0893361 0.0 0
## [2] 0.0852756 0.00204586 1
## [3] 0.0813997 0.00391051 1
## [4] 0.0777 0.00655943 3
## [5] 0.0741684 0.0114346 5
## [6] 0.0707973 0.0185944 8
## [7] 0.0675795 0.0293905 12
## [8] 0.0645079 0.0414231 16
## [9] 0.0615759 0.0549327 19
## [10] 0.0587772 0.0695958 24
## ...
## [91] 0.00135783 0.975783 415
## [92] 0.00129611 0.976892 416
## [93] 0.0012372 0.977948 419
## [94] 0.00118097 0.978963 420
## [95] 0.00112729 0.979923 423
## [96] 0.00107606 0.980836 418
## [97] 0.00102715 0.981715 422
## [98] 0.000980462 0.98255 421
## [99] 0.000935899 0.983345 420
##[100] 0.000893361 0.984107 427
## using the smalles lambda:
f3=@time fit(LassoPath,X,y,Bernoulli(),LogitLink(),λ=[0.000893361],nλ=1)
#ERROR: maximum number of coefficients 1038 exceeded at λ = 0.000893361 (λωj=0.000893361)
#Stacktrace:
#[1] cycle!(::Lasso.SparseCoefficients{Float64}, ::Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Bool) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:237
#[2] cdfit!(::Lasso.SparseCoefficients{Float64}, ::Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Symbol) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:596
#[3] #fit!#23(::Bool, ::Int64, ::Int64, ::Float64, ::Float64, ::Symbol, ::Float64, ::Function, ::Lasso.LassoPath{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}},Float64}) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:701
#[4] (::StatsBase.#kw##fit!)(::Array{Any,1}, ::StatsBase.#fit!, ::Lasso.LassoPath{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}},Float64}) at ./<missing>:0
#[5] #fit#1(::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Float64, ::Array{Float64,1}, ::Bool, ::Bool, ::Type{T} where T, ::Bool, ::Float64, ::Bool, ::Int64, ::Void, ::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Bernoulli{Float64}, ::GLM.LogitLink) at /Users/Clauberry/.julia/v0.6/Lasso/src/Lasso.jl:338
#[6] (::StatsBase.#kw##fit)(::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Bernoulli{Float64}, ::GLM.LogitLink) at ./<missing>:0
## one lambda per covariate:
f3=@time fit(LassoPath,X,y,Bernoulli(),LogitLink(),λ=fill(0.000893361,size(X,2)),nλ=1)
#ERROR: maximum number of coefficients 1038 exceeded at λ = 0.000893361 (λωj=0.000893361)
#Stacktrace:
#[1] cycle!(::Lasso.SparseCoefficients{Float64}, ::Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Bool) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:237
#[2] cdfit!(::Lasso.SparseCoefficients{Float64}, ::Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Symbol) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:596
#[3] #fit!#23(::Bool, ::Int64, ::Int64, ::Float64, ::Float64, ::Symbol, ::Float64, ::Function, ::Lasso.LassoPath{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}},Float64}) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:701
#[4] (::StatsBase.#kw##fit!)(::Array{Any,1}, ::StatsBase.#fit!, ::Lasso.LassoPath{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}},Float64}) at ./<missing>:0
#[5] #fit#1(::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Float64, ::Array{Float64,1}, ::Bool, ::Bool, ::Type{T} where T, ::Bool, ::Float64, ::Bool, ::Int64, ::Void, ::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Bernoulli{Float64}, ::GLM.LogitLink) at /Users/Clauberry/.julia/v0.6/Lasso/src/Lasso.jl:338
#[6] (::StatsBase.#kw##fit)(::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Bernoulli{Float64}, ::GLM.LogitLink) at ./<missing>:0
The error is strange because for that exact value of lambda, the initial fit would include 427 covariates in the model (less than the limit of 1038).
I attach a test data file.
test.txt
Hello, I was trying to use v0.6.3, during precompilation, this shows up
ERROR: LoadError: MethodError: no method matching names(::Base.Broadcast.BroadcastFunction{Irrational{:log4π}})
Closest candidates are:
names(::Module; all, imported) at reflection.jl:98
I used an older version 0.6.2 and everything was fine, so I think maybe you have something that needs to be fixed in 0.6.3
Hi there, can anyone explain why this simple example gives the wrong solution?
using Lasso
N = 10;
X = eye(N)[:,1:(N-3)]
y = zeros(N);
y[1] = 1;
path = fit(LassoPath,X,y);
betas = collect(path.coefs[:,end]);
# the solution should be approximately betas = [1, 0, 0, 0, 0, 0, 0], but it's not!
The relative error norm(betas - [1, 0, 0, 0, 0, 0, 0]) = 0.029
which is almost 3%, so it is too high! Note that the least squares solution norm(X\y - [1, 0, 0, 0, 0, 0, 0]) = 0.0
, up too machine precision. Setting α=0.1
does not change the results.
Any ideas?
The URL of this package does not match that stored in METADATA.jl.
cc: @andreasnoack
currently Lasso could not be compiled on Version 0.7.0 (2018-08-08 06:46 UTC).
julia> using Lasso
[ Info: Precompiling Lasso [b4fcebef-c861-5a0f-a7e2-ba9dc32b180a]
ERROR: LoadError: UndefVarError: df not defined
Stacktrace:
[1] getproperty(::Module, ::Symbol) at ./sysimg.jl:13
[2] top-level scope at none:0
[3] include at ./boot.jl:317 [inlined]
[4] include_relative(::Module, ::String) at ./loading.jl:1038
[5] include(::Module, ::String) at ./sysimg.jl:29
[6] top-level scope at none:2
[7] eval at ./boot.jl:319 [inlined]
[8] eval(::Expr) at ./client.jl:399
[9] top-level scope at ./none:3
in expression starting at /Users/xua/.julia/packages/Lasso/GBbsl/src/Lasso.jl:360
ERROR: Failed to precompile Lasso [b4fcebef-c861-5a0f-a7e2-ba9dc32b180a] to /Users/xua/.julia/compiled/v0.7/Lasso/Djwuj.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] macro expansion at ./logging.jl:313 [inlined]
[3] compilecache(::Base.PkgId, ::String) at ./loading.jl:1185
[4] _require(::Base.PkgId) at ./logging.jl:311
[5] require(::Base.PkgId) at ./loading.jl:852
[6] macro expansion at ./logging.jl:311 [inlined]
[7] require(::Module, ::Symbol) at ./loading.jl:834
Lasso.jl/src/cross_validation.jl
Lines 27 to 36 in 5571896
So that parameters, such as irls_maxiter
can be set during CV. E.g.
function cross_validate_path(path::RegularizationPath; gen=Kfold(length(y),10),
select=:CVmin, fitargs...)
m = path.m
y = m.rr.y
offset = m.rr.offset
Xstandardized = m.pp.X
cross_validate_path(path, Xstandardized, y; gen=gen, select=select, offset=offset,
standardize=false, fitargs...)
end
A researcher came to me with a weird problem.
It seems that the first model they get from selectmodel
is invalid -- calling show
on ti makes it throw an error.
but if they call selectmodel
again
not only does the new model the get back show
correctly
but the first one starts working right too!
using Lasso
X = rand(9,8)
y = rand(9)
# this is to run this step: https://github.com/JuliaStats/Lasso.jl/blob/9d099f512afb6042b1094714e0ab280cf10f3863/src/segselect.jl#L276
# I set λ to a single variable to simplify things
path = fit(LassoPath, X,y; λ=[0.005])
julia> # select model to replicate this step: https://github.com/JuliaStats/Lasso.jl/blob/9d099f512afb6042b1094714e0ab280cf10f3863/src/segselect.jl#L278
m1 = selectmodel(path, MinAICc());
julia> show(m1)
ERROR: ArgumentError: FDist: the condition ν1 > zero(ν1) && ν2 > zero(ν2) is not satisfied.
Stacktrace:
[1] macro expansion at /Users/oxinabox/.julia/packages/Distributions/wRw5p/src/utils.jl:6 [inlined]
[2] Type at /Users/oxinabox/.julia/packages/Distributions/wRw5p/src/univariate/continuous/fdist.jl:30 [inlined]
[3] Type at /Users/oxinabox/.julia/packages/Distributions/wRw5p/src/univariate/continuous/fdist.jl:35 [inlined]
[4] Type at /Users/oxinabox/.julia/packages/Distributions/wRw5p/src/univariate/continuous/fdist.jl:37 [inlined]
[5] #coeftable#1(::Float64, ::typeof(coeftable), ::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /Users/oxinabox/.julia/packages/GLM/ci8Mp/src/lm.jl:186
[6] coeftable at /Users/oxinabox/.julia/packages/GLM/ci8Mp/src/lm.jl:183 [inlined]
[7] show(::Base.TTY, ::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /Users/oxinabox/.julia/packages/GLM/ci8Mp/src/linpred.jl:227
[8] show(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at ./show.jl:325
[9] top-level scope at REPL[51]:1
julia> m2 = selectmodel(path, MinAICc());
julia> show(m2)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}:
Coefficients:
──────────────────────────────────────────────────────────────────────
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────
x1 -0.110053 10.0199 -0.0109834 0.9913 -20.0845 19.8644
x2 -0.116832 1.45942 -0.0800535 0.9364 -3.02614 2.79248
x3 0.478205 0.864751 0.552997 0.5820 -1.24565 2.20205
x4 -0.0911054 1.31273 -0.0694015 0.9449 -2.70799 2.52578
x5 0.0 1.72254 0.0 1.0000 -3.43382 3.43382
x6 0.285712 1.76747 0.161651 0.8720 -3.23767 3.80909
x7 0.114095 0.68929 0.165525 0.8690 -1.25998 1.48817
x8 0.380283 0.732736 0.51899 0.6054 -1.0804 1.84097
x9 -0.164557 1.53326 -0.107325 0.9148 -3.22106 2.89195
──────────────────────────────────────────────────────────────────────
julia> show(m1)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}:
Coefficients:
──────────────────────────────────────────────────────────────────────
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────
x1 -0.110053 10.0199 -0.0109834 0.9913 -20.0845 19.8644
x2 -0.116832 1.45942 -0.0800535 0.9364 -3.02614 2.79248
x3 0.478205 0.864751 0.552997 0.5820 -1.24565 2.20205
x4 -0.0911054 1.31273 -0.0694015 0.9449 -2.70799 2.52578
x5 0.0 1.72254 0.0 1.0000 -3.43382 3.43382
x6 0.285712 1.76747 0.161651 0.8720 -3.23767 3.80909
x7 0.114095 0.68929 0.165525 0.8690 -1.25998 1.48817
x8 0.380283 0.732736 0.51899 0.6054 -1.0804 1.84097
x9 -0.164557 1.53326 -0.107325 0.9148 -3.22106 2.89195
──────────────────────────────────────────────────────────────────────
I have verified this with selectmodel(path, MinBIC())
and selectmodel(path, MinCVmse(path, 5))
also.
Here is the line that is erroring in GLM.jl
GLM.jl: https://github.com/JuliaStats/GLM.jl/blob/master/src/lm.jl#L186
but I think this is not GLMs fault, I think it is right to be erroring?
and seems to lead to downgrading of other packages.
My entire design matrix cannot fit in memory.
Much like SGDRegressor.partial_fit()
in scikit-learn (see here), can I use Lasso.jl
to fit in epochs, feeding batches of data at a time? I realize that this will likely not converge to the same parameters as if the data could all fit in memory.
Maybe one way to train in batches would be to modify criterion
in fit()
to stop after a certain number of iterations?
julia> using DataFrames, Lasso
julia> df = DataFrame(x=randn(100), y=3randn(100) .+ 1);
julia> fit(LassoModel, @formula(x ~ 1 + y), df)
ERROR: ArgumentError: Model type LassoModel doesn't support intercept specified in formula x ~ 1 + y
Stacktrace:
[1] apply_schema(t::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, schema::StatsModels.Schema, Mod::Type{LassoModel})
@ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/schema.jl:288
[2] ModelFrame(f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}}; model::Type{LassoModel}, contrasts::Dict{Symbol, Any})
@ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:84
[3] kwcall(::NamedTuple{(:model, :contrasts), Tuple{UnionAll, Dict{Symbol, Any}}}, ::Type{ModelFrame}, f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}})
@ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:73
[4] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:85
[5] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame)
@ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:78
[6] top-level scope
@ REPL[7]:1
Why can I not manually specify an intercept like @formula(x ~ 1 + y)
? The documentation ?@formula
says:
1
,0
, and-1
indicate the presence (for1
) or absence (for0
and-1
) of an intercept column.
So 1
is a valid intercept specification, like in R. This @formula
also works in GLM.lm
.
If I write @formula(x ~ y)
, Lasso.jl will automatically fit a model with an intercept:
julia> fit(LassoModel, @formula(x ~ y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}
x ~ y
Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.
Coefficients:
──────────────
Estimate
──────────────
x1 -0.132743
x2 0.0497596
──────────────
I assume the first coefficient is the intercept and the second one is multiplied by y
, so the model is:
x = -0.132743 + 0.0497596 * y
So, intercepts are supported, but I can't manually specify that I want an intercept.
Let's fit a model without an intercept. I specify this with the 0
in @formula(x ~ 0 + y)
.
julia> fit(LassoModel, @formula(x ~ 0 + y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}
x ~ 0 + y
Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.
Coefficients:
──────────────
Estimate
──────────────
x1 -0.132743
x2 0.0497596
──────────────
It seems like the package ignored the zero in the formula, fitted an intercept -0.132743
anyway and produced the same model as above, even though the @formula
is different. R's glmnet
supports fitting without an intercept since 2013.
It would be nice if it were possible to specify the intercept in the formula.
Consider this line:
Lasso.jl/src/coordinate_descent.jl
Line 619 in ea74151
It would be much better and more kosher to use throw()
along with a proper exception.
As an alternative, which I actually think makes more sense, you could just throw a warning instead.
In the Lasso.md, a method of fit
is introduced as follows:
fit(GammaLassoPath, X, y, d=Normal(), l=canonicallink(d); ...)
fits a linear or generalized linear (concave) gamma lasso path given the design matrixX
and responsey
.
Is it possible to provide an example in the documentation or mention the acceptable shape of X
and y
? I.e., X
should be in size of y
should be a vector of length y
should be equal to the nrows(X)
or ncols(X)
.
P.S.: In my case study, I have a X
of size y
of length X
or X'
as the second argument. I expect to get a matrix of coefficients of size
Are there plans to create CRAN glmnet-like features and documentation?
Features that are helpful from glmnet:
Thanks
julia> fit(LassoPath, hcat(ones(1000), rand(1000, 7)), rand(1000))
ERROR: ArgumentError: start and stop must be finite, got NaN and NaN
Stacktrace:
[1] _linspace(::Float64, ::Float64, ::Int64) at ./twiceprecision.jl:342
[2] linspace(::Float64, ::Float64, ::Int64) at ./twiceprecision.jl:338
[3] computeλ(::Array{Float64,1}, ::Float64, ::Float64, ::Int64, ::Void) at /Users/ericdavies/.julia/v0.6/Lasso/src/Lasso.jl:213
[4] build_model(::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Normal{Float64}, ::GLM.IdentityLink, ::Lasso.CovarianceCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Void, ::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Void, ::Bool, ::Float64) at /Users/ericdavies/.julia/v0.6/Lasso/src/Lasso.jl:242
[5] #fit#1(::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Float64, ::Void, ::Bool, ::Bool, ::Type{T} where T, ::Bool, ::Float64, ::Bool, ::Int64, ::Void, ::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Normal{Float64}, ::GLM.IdentityLink) at /Users/ericdavies/.julia/v0.6/Lasso/src/Lasso.jl:328
[6] fit(::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Normal{Float64}, ::GLM.IdentityLink) at /Users/ericdavies/.julia/v0.6/Lasso/src/Lasso.jl:297 (repeats 2 times)
I'm not exactly sure what causes this but it seems like this error shouldn't occur. I would expect just a failure to converge?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.