juliastats / lasso.jl Goto Github PK

View Code? Open in Web Editor NEW

141.0 14.0 31.0 1.12 MB

Lasso/Elastic Net linear and generalized linear models

License: Other

Julia 97.90% R 2.10%

lasso julia regularized-linear-regression l1

lasso.jl's People

Contributors

Stargazers

Watchers

lasso.jl's Issues

Feature requests

Is there any chance your package can work w/ Regression Tables by @jmboehm?
Glmnet now includes Relaxed Lasso, which is known to have nice properties.
It would be awesome if your package included Relaxed Lasso :)

I can play a role as well.

More generally the Julia ecosystem would be a much better place if there was a way to coordinate all penalized linear models into one package including:
SubsetSelection (paper) by @jeanpauphilet & co.
MIP-Boost forthcoming
Global Search Regression

Document maximum iterations settings.

There should be a setting to set the maximum number of iterations for fit(). Hard-to-solve problems may require more than the hard-coded 10000 maximum iterations.

Package fails when response is zero vector.

Running the following

`
using Lasso

X = [1 2; 3 4]
y = zeros(2)
fit(LassoPath, X, y)
`

leads to the error message

ERROR: coordinate descent failed to converge in 10000 iterations at λ = NaN

Having done some debugging, the problem is the following: when computing the λ values that make up the path, λmax is computed to be 0, resulting in λ being a vector of NaNs. The correct thing to output is that the null model is the only Lasso solution.

The easiest way to fix this is to have a check for λ being a vector of all NaNs right after line 490 in Lasso.jl. This is not a good solution, though, since we're relying on the fact that λmax = 0 implies λ is a vector of all NaNs; λ may be a vector of all NaNs due to another reason.

I think the best solution will be to modify build_model so that, in addition to the 4 values it currently returns, it also returns λmax. Then, we can have a check for λmax == 0 in line 491.

cross_validate_path gen keyword arg does not work

Lasso.jl/src/cross_validation.jl

Lines 27 to 36 in 5571896

    
           # convenience function to use the same data as in original path 
        
           function cross_validate_path(path::RegularizationPath;                                  # fitted path 
        
                                                                gen=Kfold(length(y),10),           # folds generator (see MLBase) 
        
                                                                select=:CVmin)                     # :CVmin or :CV1se 
        
               m = path.m 
        
               y = m.rr.y 
        
               offset = m.rr.offset 
        
               Xstandardized = m.pp.X 
        
               cross_validate_path(path,Xstandardized,y;gen=gen,select=select,offset=offset,standardize=false) 
        
           end

For example,

using Lasso

X = rand(100, 5)
y = rand([0., 1.], 100)

path = fit(LassoPath, X, y)
cross_validate_path(path)

produces an error UndefVarError: y not defined. Should the default value for gen be changed to gen=Kfold(length(path.m.rr.y), 10) (or similar using a function to grab y from the RegularizationPath)?

Accept weights that are not `AbstractVector{<:AbstractFloat}`

In StatsBase 0.32.1 uweights(n)::AbstractVector{Int}.

That causes this to error

Intermittent Bounds Error

Sometimes when solving a model or more commonly when doing cross validation I get a BoundsError

m = selectmodel(path, MinCV1se(path));
BoundsError: attempt to access 306-element Array{Float64,1} at index [307]

Stacktrace:
 [1] getindex at ./array.jl:744 [inlined]
 [2] update_coef! at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/coordinate_descent.jl:189 [inlined]
 [3] cycle!(::Lasso.SparseCoefficients{Float64}, ::NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}, ::Float64, ::Bool) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/coordinate_descent.jl:246
 [4] cdfit!(::Lasso.SparseCoefficients{Float64}, ::NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}, ::Float64, ::Symbol) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/coordinate_descent.jl:598
 [5] #fit!#14(::Bool, ::Int64, ::Float64, ::Float64, ::Bool, ::Symbol, ::Float64, ::typeof(fit!), ::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/coordinate_descent.jl:827
 [6] (::StatsBase.var"#kw##fit!")(::NamedTuple{(:irls_tol,),Tuple{Float64}}, ::typeof(fit!), ::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}) at ./none:0
 [7] #fit#3(::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Float64, ::Array{Float64,1}, ::Bool, ::Bool, ::Type, ::Bool, ::Float64, ::Bool, ::Int64, ::Nothing, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}, ::typeof(fit), ::Type{LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Normal{Float64}, ::IdentityLink) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/Lasso.jl:472
 [8] (::StatsBase.var"#kw##fit")(::NamedTuple{(:λ, :wts, :offset, :standardize),Tuple{Array{Float64,1},Array{Float64,1},Array{Float64,1},Bool}}, ::typeof(fit), ::Type{LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Normal{Float64}, ::IdentityLink) at ./none:0
 [9] #cross_validate_path#48(::Array{Float64,1}, ::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:standardize,),Tuple{Bool}}}, ::typeof(cross_validate_path), ::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,false,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}, ::Array{Float64,2}, ::Array{Float64,1}, ::MinCV1se) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/cross_validation.jl:73
 [10] #cross_validate_path at ./none:0 [inlined]
 [11] #cross_validate_path#49 at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/cross_validation.jl:113 [inlined]
 [12] cross_validate_path at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/cross_validation.jl:109 [inlined]
 [13] segselect at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/segselect.jl:109 [inlined]
 [14] coef at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/segselect.jl:86 [inlined]
 [15] selectmodel(::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,false,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}, ::MinCV1se) at /Users/lstagner/.julia/packages/Lasso/7IQvf/src/segselect.jl:191
 [16] top-level scope at In[1355]:1

The bounds error seems to occur randomly which given how cross validation works makes sense.

fit LassoPath currently not working?

I'm using Julia 0.51 and Pkg.status("Lasso") = 0.1.0

y = x ->  1. + 0.*x + 3x^2 + 0.x^3 # choose true function

# training data
R = rand(20)*pi/2
Y = y.(R)

# select features
X = hcat(R.^0, R, R.^2, R.^3) # matrix with 4 columns

# L2 regression with no regularization works
X\Y # ~= [1.,0.,3.,0.]

# but Lasso does not
fit(LassoPath,X,Y)
> LassoPath (100 solutions for 4 predictors in 99 iterations):
         λ pct_dev ncoefs
  [1]  NaN     0.0      0
  [2]  NaN     NaN      4
  [3]  NaN     NaN      4
  [4]  NaN     NaN      4
  [5]  NaN     NaN      4
  [6]  NaN     NaN      4

# As λ = NaN, which is strange I tried specifying  
λs = reverse([1e-7,1e-6,1e-4, 1e-3,1e-2,1e-1,1])
fit(LassoPath,X,Y; λ=λs)
> LassoPath (7 solutions for 4 predictors in 7 iterations):
          λ pct_dev ncoefs
[1]     1.0     NaN      4
[2]     0.1     NaN      4
[3]    0.01     NaN      4
[4]   0.001     NaN      4
[5]  0.0001     NaN      4
[6]  1.0e-6     NaN      4
[7]  1.0e-7     NaN      4

....

Am I doing something wrong?

Divergence from GLMnet when using a matrix with many variables

When fitting models with a large number of variables, Lasso.jl and GLMnet return different paths, and the difference grows bigger as the number of variables is bigger.

An example to illustrate this:

using Lasso, GLMNet, Statistics

# fits identical models in Lasso and GLMNet from mock data
# and returns the mean absolute difference of the betas of both models
function lasso_glmnet_dif(nrow, ncol, n_col_contributing)
    data = rand(nrow, ncol)
    outcome = mean(data[:, 1:n_col_contributing], dims = 1)[:,1] .> rand(nrow)
    presence_matrix = [1 .- outcome outcome]

    l = Lasso.fit(LassoPath, data, outcome, Binomial())
    g = GLMNet.glmnet(data, presence_matrix, Binomial())

    lcoefs = Vector(l.coefs[:,end])
    gcoefs = g.betas[:, end]

    mean(abs, lcoefs .- gcoefs)
end

# 1000 records, 5 variables that all contribute to outcome
lasso_glmnet_dif(1000, 5, 5) # order of magnitude 1e-9
# 1000 records, 100 variables of which 5 contribute to the outcome
lasso_glmnet_dif(1000, 1000, 5) # around 0.05

The context for this problem is that I'm working on a julia implementation of maxnet, where a big-ish model matrix is generated (100s of columns) and a lasso path is used to select the most important ones.

fit for LassoPath results in no method matching *(::Array{Float64,1}, ::Array{Float64,1})

This is with Lasso v0.0.3:

julia> fit(LassoPath, [1.0 2.0; 3.0 4.0], [0.0, 1.0], Binomial())
ERROR: `*` has no method matching *(::Array{Float64,1}, ::Array{Float64,1})
 in fit at /home/user/.julia/v0.3/Lasso/src/Lasso.jl:283
 in fit at /home/user/.julia/v0.3/Lasso/src/Lasso.jl:212

[PkgEval] Lasso may have a testing issue on Julia 0.4 (2015-06-23)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.4

On 2015-06-22 the testing status was Tests pass.
On 2015-06-23 the testing status changed to Tests fail.

This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

>>> 'Pkg.add("Lasso")' log
INFO: Cloning cache of Lasso from git://github.com/simonster/Lasso.jl.git
INFO: Installing ArrayViews v0.6.2
INFO: Installing Distributions v0.7.3
INFO: Installing GLM v0.4.6
INFO: Installing Lasso v0.0.3
INFO: Installing NumericFuns v0.2.3
INFO: Installing PDMats v0.3.3
INFO: Installing Reexport v0.0.2
INFO: Installing StatsBase v0.6.15
INFO: Package database updated

>>> 'Pkg.test("Lasso")' log
INFO: Computing test dependencies for Lasso...
INFO: Installing DataArrays v0.2.15
INFO: Installing DataFrames v0.6.6
INFO: Installing Docile v0.5.7
INFO: Installing FactCheck v0.2.7
INFO: Installing GLMNet v0.0.4
INFO: Installing GZip v0.2.15
INFO: Installing SortingAlgorithms v0.0.5
INFO: Building GLMNet
INFO: Testing Lasso
ERROR: LoadError: LoadError: LoadError: LoadError: LoadError: syntax: local declaration in global scope
 in include at ./boot.jl:254
 in include_from_node1 at ./loading.jl:133
 in anonymous at no file:110
 in include at ./boot.jl:254
 in include_from_node1 at ./loading.jl:133
 in reload_path at ./loading.jl:157
 in _require at ./loading.jl:69
 in require at ./loading.jl:55
 in include at ./boot.jl:254
 in include_from_node1 at ./loading.jl:133
 in reload_path at ./loading.jl:157
 in _require at ./loading.jl:69
 in require at ./loading.jl:52
 in include at ./boot.jl:254
 in include_from_node1 at ./loading.jl:133
 in include at ./boot.jl:254
 in include_from_node1 at loading.jl:133
 in process_options at ./client.jl:304
 in _start at ./client.jl:404
while loading /home/vagrant/.julia/v0.4/DataFrames/src/abstractdataframe/show.jl, in expression starting on line 49
while loading /home/vagrant/.julia/v0.4/DataFrames/src/DataFrames.jl, in expression starting on line 79
while loading /home/vagrant/.julia/v0.4/GLMNet/src/GLMNet.jl, in expression starting on line 2
while loading /home/vagrant/.julia/v0.4/Lasso/test/lasso.jl, in expression starting on line 1
while loading /home/vagrant/.julia/v0.4/Lasso/test/runtests.jl, in expression starting on line 3

================================[ ERROR: Lasso ]================================

failed process: Process(`/home/vagrant/julia/bin/julia --check-bounds=yes --code-coverage=none --color=no /home/vagrant/.julia/v0.4/Lasso/test/runtests.jl`, ProcessExited(1)) [1]

================================================================================
INFO: Removing DataArrays v0.2.15
INFO: Removing DataFrames v0.6.6
INFO: Removing Docile v0.5.7
INFO: Removing FactCheck v0.2.7
INFO: Removing GLMNet v0.0.4
INFO: Removing GZip v0.2.15
INFO: Removing SortingAlgorithms v0.0.5
ERROR: Lasso had test errors
 in error at ./error.jl:21
 in test at pkg/entry.jl:746
 in anonymous at pkg/dir.jl:31
 in cd at file.jl:22
 in cd at pkg/dir.jl:31
 in test at pkg.jl:71
 in process_options at ./client.jl:280
 in _start at ./client.jl:404


>>> End of log

preferred way of saving model

What' s the preferred way of saving model for later use? Currently using JLD2 save the model seems to save a file with a huge size that doesn't feel right for me.

matrix is not square,how to fix it?

age=sh["F2:F3778"]
sex=sh["D2:D3778"]
wage=sh["I2:I3778"]
work=sh["H2:H3778"]
family_size=sh["C2:C3778"]
data = (;age,sex,wage,work,family_size);
X=[ones(n) age sex family_size work]
beta = [:beta1,:beta2,:beta3,:beta4,:beta5,:beta6 ]
beta_ml=zeros(6,1)
xf = XLSX.readxlsx("HW1.xlsx")
n=3777
sh = xf["Sheet1"]
age=sh["F2:F3778"]
sex=sh["D2:D3778"]
wage=sh["I2:I3778"]
work=sh["H2:H3778"]
family_size=sh["C2:C3778"]
data = (;age,sex,wage,work,family_size);
X=[ones(n) age sex family_size work]
beta = [:beta1,:beta2,:beta3,:beta4,:beta5,:beta6 ]
beta_ml=zeros(6,1)
result=optimize(beta->sum(.-log(1/sqrt(2πbeta[6]^2)) .+ 1/(2 * beta[6]^2) .* ((wage.-beta[5]*(beta[1]+beta[2]*age+beta[3]*sex+beta[4]*family_size).*work.^((beta[1]+beta[2]*age+beta[3]*sex+beta[4]family_size).-1))./(beta[5])(1
.+(beta[1]+beta[2]*age+beta[3]*sex+beta[4]*family_size).log(work)).(work.^(beta[1]+beta[2]*age+beta[3]*sex+beta[4]*family_size).-1)).^2),[0.0,0.0,0.0,0.0,0.0,0.0])
beta_ml=Optim.minimizer(result)

fused lasso + trend filtering not working with Julia v 0.3.6 (2015-01-08 22:33 UTC) x86-64-linux-gnu

Very nice package!

Unfortunately I'm getting an error when trying to run fused lasso or trend filtering (LassoPath works like a charm):

Example code:

using Lasso
n = 100;
p = 10000;
X = rand(n,p) - 0.5;
y = rand(n) - 0.5;
lambda = norm(X'*y, Inf)/2
f = fit(LassoPath, X,y);

fit(FusedLasso, y, lambda)
ERROR: FusedLasso not defined

fit(TrendFilter, y, 0, lambda)
ERROR: TrendFilter not defined

I apologize in advance if I'm doing something silly! (I'm still finding my sea-legs with Julia)

please tag a new release for Pkg resolver

Hi, this package is causing problems for me because it restricts Reexport to 0.2 in latest tag. In master, this has been updated to version 1.

Could we please get a tag of this so that packages that depend on it can update properly?

StackOverflow with type Vector{Union{Float64, Missing}} even if no missing values present

julia> y = Vector{Union{Float64, Missing}}()
Union{Missing, Float64}[]

julia> append!(y, [1,2,3])
3-element Vector{Union{Missing, Float64}}:
 1.0
 2.0
 3.0

julia> X = rand(3,3)
3×3 Matrix{Float64}:
 0.830955  0.821036  0.235061
 0.028721  0.326948  0.560919
 0.905771  0.673137  0.986464

julia> fit(LassoPath, X, y)
ERROR: StackOverflowError:

PR to switch to Julia v0.6 constructions

Simon:

This is just a heads-up that I plan to create a PR to switch the package to Julia v0.6 constructions and change the minimum version of Julia. Please let me know if I am duplicating other efforts.

Bug: BoundsError: attempt to access 200-element Array{Float64,1} at index [201]

The following code leads to an error:


               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.1 (2020-08-25)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.5) pkg> status Lasso
Status `~/.julia/environments/v1.5/Project.toml`
  [b4fcebef] Lasso v0.5.2

julia> using Lasso

julia> lambdas=[0.001];

julia> X=rand(100,13500);

julia> Y=rand(100);

julia> res=fit(LassoPath, X, Y;λ=lambdas)
ERROR: BoundsError: attempt to access 200-element Array{Float64,1} at index [201]
Stacktrace:
 [1] getindex at ./array.jl:809 [inlined]
 [2] update_coef! at /home/peter/.julia/packages/Lasso/HtY0O/src/coordinate_descent.jl:189 [inlined]
 [3] cycle!(::Lasso.SparseCoefficients{Float64}, ::NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}, ::Float64, ::Bool) at /home/peter/.julia/packages/Lasso/HtY0O/src/coordinate_descent.jl:246
 [4] cdfit!(::Lasso.SparseCoefficients{Float64}, ::NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}, ::Float64, ::Symbol) at /home/peter/.julia/packages/Lasso/HtY0O/src/coordinate_descent.jl:598
 [5] fit!(::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}; verbose::Bool, cd_maxiter::Int64, cd_tol::Float64, irls_tol::Float64, stopearly::Bool, criterion::Symbol, minStepFac::Float64) at /home/peter/.julia/packages/Lasso/HtY0O/src/coordinate_descent.jl:827
 [6] fit(::Type{LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Normal{Float64}, ::IdentityLink; wts::Array{Float64,1}, offset::Array{Float64,1}, α::Float64, nλ::Int64, λminratio::Float64, λ::Array{Float64,1}, standardize::Bool, intercept::Bool, algorithm::Type{T} where T, dofit::Bool, irls_tol::Float64, randomize::Bool, maxncoef::Int64, penalty_factor::Nothing, standardizeω::Bool, fitargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{,Tuple{}}}) at /home/peter/.julia/packages/Lasso/HtY0O/src/Lasso.jl:491
 [7] top-level scope at REPL[6]:1

not loading with julia 0.5

using Lasso
ERROR: LoadError: UndefVarError: wrkwt! not defined
in include_from_node1(::String) at ./loading.jl:488
in eval(::Module, ::Any) at ./boot.jl:234
in require(::Symbol) at ./loading.jl:415
while loading /home/jakub/.julia/v0.5/Lasso/src/Lasso.jl, in expression starting on line 18

Multinomial Logistic regression

I'm needing to run logistic regression, ideally with the ability to be multinomial. I noticed that you ( @simonster ) also built a wrapper for GLMnet; have you had any plans on adding this feature to either? If not, how difficult an extension do you think it would be? I'm willing to contribute, but not entirely sure where to start.

possible test failure in upcoming Julia version 1.5

A PkgEval run for a Julia pull request which changes the generated numbers for rand(a:b) indicates that the tests of this package might fail in Julia 1.5 (and on Julia current master branch).

Also, you might be interested in using the new StableRNGs.jl registered package, which provides guaranteed stable streams of random numbers across Julia releases.

Apologies if this is a false positive. Cf. https://github.com/JuliaCI/NanosoldierReports/blob/ab6676206b210325500b4f4619fa711f2d7429d2/pkgeval/by_hash/52c2272_vs_47c55db/logs/Lasso/1.5.0-DEV-87d2a04de3.log

Fitting a LassoModel with a particular choice of lambda

It seems to me that the Lasso model-fitting API is not designed for the user to simply pick a particular choice of lambda and fit a model with that particular choice. Having looked at the source code I can see that the way to do this would be to fit a LassoPath and specify the chosen lambda value as a vector containing only that value. This is a bit clunky: I only want to fit one model yet I'm using the LassoPath API.

I suggest having a fit(LassoModel, some stuff; \lambda) method which allows to do this. It should fit a LassoPath only at the chosen lambda value, then construct the LassoModel object from this. We could also give it a default value of 1.0, like sklearn, so that the user can just run fit(LassoModel, X, y) to quickly fit a 'default' lasso model.

Maintainers of this project: If you give me the green light, I can go ahead and submit a PR.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

"matrix not square" error

I'm attempting to use this in a multidimensional context in which the number of observations is smaller than the number of variables. I'm getting some errors I don't understand. Here's a demo:

julia> using Lasso, GLM

julia> m = fit(LassoModel, rand(5, 16), rand(5))
LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}}(Error showing value of type LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}}:
ERROR: DimensionMismatch("matrix is not square: dimensions are (5, 17)")
Stacktrace:
 [1] inv!(::LinearAlgebra.Cholesky{Float64,Array{Float64,2}}) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/LinearAlgebra.jl:221
 [2] inv at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/cholesky.jl:597 [inlined]
 [3] invchol at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:200 [inlined]
 [4] vcov(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:216
 [5] stderror(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:224
 [6] #coeftable#1(::Float64, ::typeof(coeftable), ::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/lm.jl:184
 [7] coeftable at /home/tim/.julia/packages/GLM/ceUyW/src/lm.jl:183 [inlined]
 [8] show(::IOContext{REPL.Terminals.TTYTerminal}, ::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:227
 [9] _show_default(::IOContext{REPL.Terminals.TTYTerminal}, ::Any) at ./show.jl:347
 [10] show_default at ./show.jl:330 [inlined]
 [11] show at ./show.jl:327 [inlined]
 [12] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}}) at ./multimedia.jl:47
 [13] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:132
 [14] display(::REPL.REPLDisplay, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:136
 [15] display(::Any) at ./multimedia.jl:323
 [16] #invokelatest#1 at ./essentials.jl:709 [inlined]
 [17] invokelatest at ./essentials.jl:708 [inlined]
 [18] print_response(::IO, ::Any, ::Bool, ::Bool, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:156
 [19] print_response(::REPL.AbstractREPL, ::Any, ::Bool, ::Bool) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:141
 [20] (::REPL.var"#do_respond#38"{Bool,REPL.var"#48#57"{REPL.LineEditREPL,REPL.REPLHistoryProvider},REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:719
 [21] #invokelatest#1 at ./essentials.jl:709 [inlined]
 [22] invokelatest at ./essentials.jl:708 [inlined]
 [23] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/LineEdit.jl:2306
 [24] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:1045
 [25] run_repl(::REPL.AbstractREPL, ::Any) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:201
 [26] (::Base.var"#768#770"{Bool,Bool,Bool,Bool})(::Module) at ./client.jl:382
 [27] #invokelatest#1 at ./essentials.jl:709 [inlined]
 [28] invokelatest at ./essentials.jl:708 [inlined]
 [29] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./client.jl:366
 [30] exec_options(::Base.JLOptions) at ./client.jl:304
 [31] _start() at ./client.jl:460

julia> stderror(m)
ERROR: DimensionMismatch("matrix is not square: dimensions are (5, 17)")
Stacktrace:
 [1] inv!(::LinearAlgebra.Cholesky{Float64,Array{Float64,2}}) at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/LinearAlgebra.jl:221
 [2] inv at /home/tim/src/julia-1/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/cholesky.jl:597 [inlined]
 [3] invchol at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:200 [inlined]
 [4] vcov(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:216
 [5] stderror(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /home/tim/.julia/packages/GLM/ceUyW/src/linpred.jl:224
 [6] #stderror#31 at /home/tim/.julia/packages/StatsModels/9cHwk/src/statsmodel.jl:28 [inlined]
 [7] stderror(::LassoModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}}) at /home/tim/.julia/packages/StatsModels/9cHwk/src/statsmodel.jl:28
 [8] top-level scope at REPL[3]:1

Note that the first error is triggered upon display, the computation of m succeeds.

Contrast this with

julia> m = fit(LassoPath, rand(5, 16), rand(5))
LassoPath (79) solutions for 17 predictors in 461 iterations):
───────────────────────────────────
               λ    pct_dev  ncoefs
───────────────────────────────────
 [1]  0.0795677   0.0             0
 [2]  0.0759513   0.0548964       2
 [3]  0.0724992   0.112467        2
 [4]  0.069204    0.164934        2
 [5]  0.0660585   0.21274         2
 [6]  0.0630561   0.256267        2
 [7]  0.0601901   0.295989        2
 [8]  0.0574543   0.332152        2
 [9]  0.054843    0.365227        2
[10]  0.0523503   0.399728        3
[11]  0.0499709   0.451952        3
[12]  0.0476996   0.499779        4
[13]  0.0455316   0.544217        4
[14]  0.0434621   0.5847          4
[15]  0.0414867   0.621608        4
[16]  0.039601    0.655211        4
[17]  0.0378011   0.685854        4
[18]  0.036083    0.713747        4
[19]  0.034443    0.739184        4
[20]  0.0328775   0.762356        4
[21]  0.0313832   0.783462        4
[22]  0.0299567   0.802699        4
[23]  0.0285952   0.820229        4
[24]  0.0272955   0.836205        4
[25]  0.0260548   0.850761        4
[26]  0.0248706   0.86401         4
[27]  0.0237402   0.876093        4
[28]  0.0226612   0.887102        4
[29]  0.0216312   0.897126        4
[30]  0.020648    0.906278        4
[31]  0.0197095   0.914592        4
[32]  0.0188137   0.922191        4
[33]  0.0179586   0.9291          4
[34]  0.0171423   0.935397        4
[35]  0.0163632   0.941135        4
[36]  0.0156195   0.946364        4
[37]  0.0149095   0.951128        4
[38]  0.0142319   0.955474        4
[39]  0.013585    0.959426        4
[40]  0.0129676   0.963036        4
[41]  0.0123782   0.966317        4
[42]  0.0118156   0.969307        4
[43]  0.0112785   0.972036        4
[44]  0.0107659   0.974516        4
[45]  0.0102766   0.976783        4
[46]  0.00980948  0.978844        4
[47]  0.00936363  0.980727        4
[48]  0.00893803  0.982438        4
[49]  0.00853179  0.983995        4
[50]  0.008144    0.985419        4
[51]  0.00777385  0.986715        4
[52]  0.00742051  0.987895        4
[53]  0.00708324  0.98897         4
[54]  0.0067613   0.98995         4
[55]  0.00645398  0.990844        4
[56]  0.00616064  0.991655        4
[57]  0.00588063  0.992398        4
[58]  0.00561335  0.993073        4
[59]  0.00535821  0.99369         4
[60]  0.00511467  0.994249        4
[61]  0.0048822   0.99476         4
[62]  0.0046603   0.995225        4
[63]  0.00444848  0.99565         4
[64]  0.00424629  0.996036        4
[65]  0.00405329  0.996389        4
[66]  0.00386906  0.99671         4
[67]  0.00369321  0.997003        4
[68]  0.00352534  0.997269        4
[69]  0.00336511  0.997511        4
[70]  0.00321216  0.997732        4
[71]  0.00306617  0.997933        4
[72]  0.0029268   0.998118        4
[73]  0.00279378  0.998285        4
[74]  0.00266679  0.998437        4
[75]  0.00254558  0.998576        4
[76]  0.00242988  0.998703        4
[77]  0.00231944  0.998817        4
[78]  0.00221402  0.998923        4
[79]  0.00211339  0.999018        4
───────────────────────────────────

julia> m.coefs[:,end]
16-element SparseArrays.SparseVector{Float64,Int64} with 4 stored entries:
  [2 ]  =  -0.112423
  [3 ]  =  -0.103032
  [6 ]  =  0.218529
  [9 ]  =  0.263868

but for which it's not possible to compute the error:

julia> stderror(m)
ERROR: vcov is not defined for LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},CovarianceCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] vcov(::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},CovarianceCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}) at /home/tim/.julia/packages/StatsBase/DyWPR/src/statmodels.jl:133
 [3] stderror(::LassoPath{LinearModel{GLM.LmResp{Array{Float64,1}},CovarianceCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Nothing}},Float64}) at /home/tim/.julia/packages/StatsBase/DyWPR/src/statmodels.jl:126
 [4] top-level scope at REPL[6]:1

Bug: Bounds error thrown for small regularization parameters

Hey people,

maybe this is related to #54 , but LassoPath seems to struggle for small λ's, in particular for λ=0 (least squares solution). Here is a MWE:

using Lasso
using StatsBase
using Random

##

Random.seed!(1234)
# time series length
N = 5
# create a random time series of length N
s = rand(N)

# create a Data Matrix
Θ = ones(N,15)
Θ[1:50] .= 0
shuffle!(Θ)

# λ 1e-1 works
Lf = fit(LassoPath, Θ, s; standardize = false, intercept = false, λ=[1e-1])

# λ 1e-2 fails
Lf = fit(LassoPath, Θ, s; standardize = false, intercept = false, λ=[1e-2])

# ERROR: BoundsError: attempt to access 10×15 Array{Float64,2} at index [11, 12]

In Matlab's lasso this is no problem and λ=0 yields the least squares solution.

What am I missing here? Any help is appreciated,

Cheers

cannot set a given lambda for fit

I am not able to set one given value of lambda for the Lasso fit.
I am not sure if the lambda parameter is one lambda per covariate, or just a vector of lambdas to try (if we want only one lambda, we would still need to put in an array, because an array is expected). Either way, I get an error.

using DataFrames, Lasso
dat = readtable("test.txt", header=true)
y = convert(Array{Float64,1},dat[:,1]) ##1st column: response
X = convert(Array{Float64,2},dat[:,2:end]) ##10,000 SNPs
f2=@time fit(LassoPath,X,y,Bernoulli(),LogitLink())
## 11.347873 seconds (6.19 M allocations: 363.455 MiB, 1.41% gc time)
##Bernoulli LassoPath (100 solutions for 10000 predictors in 1387 iterations):
##                 λ    pct_dev ncoefs
##  [1]    0.0893361        0.0      0
##  [2]    0.0852756 0.00204586      1
##  [3]    0.0813997 0.00391051      1
##  [4]       0.0777 0.00655943      3
##  [5]    0.0741684  0.0114346      5
##  [6]    0.0707973  0.0185944      8
##  [7]    0.0675795  0.0293905     12
##  [8]    0.0645079  0.0414231     16
##  [9]    0.0615759  0.0549327     19
## [10]    0.0587772  0.0695958     24
## ...
## [91]   0.00135783   0.975783    415
## [92]   0.00129611   0.976892    416
## [93]    0.0012372   0.977948    419
## [94]   0.00118097   0.978963    420
## [95]   0.00112729   0.979923    423
## [96]   0.00107606   0.980836    418
## [97]   0.00102715   0.981715    422
## [98]  0.000980462    0.98255    421
## [99]  0.000935899   0.983345    420
##[100]  0.000893361   0.984107    427

## using the smalles lambda:
f3=@time fit(LassoPath,X,y,Bernoulli(),LogitLink(),λ=[0.000893361],nλ=1)
#ERROR: maximum number of coefficients 1038 exceeded at λ = 0.000893361 (λωj=0.000893361)
#Stacktrace:
 #[1] cycle!(::Lasso.SparseCoefficients{Float64}, ::Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Bool) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:237
 #[2] cdfit!(::Lasso.SparseCoefficients{Float64}, ::Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Symbol) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:596
 #[3] #fit!#23(::Bool, ::Int64, ::Int64, ::Float64, ::Float64, ::Symbol, ::Float64, ::Function, ::Lasso.LassoPath{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}},Float64}) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:701
 #[4] (::StatsBase.#kw##fit!)(::Array{Any,1}, ::StatsBase.#fit!, ::Lasso.LassoPath{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}},Float64}) at ./<missing>:0
 #[5] #fit#1(::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Float64, ::Array{Float64,1}, ::Bool, ::Bool, ::Type{T} where T, ::Bool, ::Float64, ::Bool, ::Int64, ::Void, ::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Bernoulli{Float64}, ::GLM.LogitLink) at /Users/Clauberry/.julia/v0.6/Lasso/src/Lasso.jl:338
 #[6] (::StatsBase.#kw##fit)(::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Bernoulli{Float64}, ::GLM.LogitLink) at ./<missing>:0
 
## one lambda per covariate:
f3=@time fit(LassoPath,X,y,Bernoulli(),LogitLink(),λ=fill(0.000893361,size(X,2)),nλ=1)
#ERROR: maximum number of coefficients 1038 exceeded at λ = 0.000893361 (λωj=0.000893361)
#Stacktrace:
 #[1] cycle!(::Lasso.SparseCoefficients{Float64}, ::Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Bool) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:237
 #[2] cdfit!(::Lasso.SparseCoefficients{Float64}, ::Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Symbol) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:596
 #[3] #fit!#23(::Bool, ::Int64, ::Int64, ::Float64, ::Float64, ::Symbol, ::Float64, ::Function, ::Lasso.LassoPath{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}},Float64}) at /Users/Clauberry/.julia/v0.6/Lasso/src/coordinate_descent.jl:701
 #[4] (::StatsBase.#kw##fit!)(::Array{Any,1}, ::StatsBase.#fit!, ::Lasso.LassoPath{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},Lasso.NaiveCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}},Float64}) at ./<missing>:0
 #[5] #fit#1(::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Float64, ::Array{Float64,1}, ::Bool, ::Bool, ::Type{T} where T, ::Bool, ::Float64, ::Bool, ::Int64, ::Void, ::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Bernoulli{Float64}, ::GLM.LogitLink) at /Users/Clauberry/.julia/v0.6/Lasso/src/Lasso.jl:338
 #[6] (::StatsBase.#kw##fit)(::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Bernoulli{Float64}, ::GLM.LogitLink) at ./<missing>:0

The error is strange because for that exact value of lambda, the initial fit would include 427 covariates in the model (less than the limit of 1038).

I attach a test data file.
test.txt

0.6.3 precompilation error

Hello, I was trying to use v0.6.3, during precompilation, this shows up
ERROR: LoadError: MethodError: no method matching names(::Base.Broadcast.BroadcastFunction{Irrational{:log4π}})
Closest candidates are:
names(::Module; all, imported) at reflection.jl:98
I used an older version 0.6.2 and everything was fine, so I think maybe you have something that needs to be fixed in 0.6.3

Low precision because of early stopping (when using automatic λ)

Hi there, can anyone explain why this simple example gives the wrong solution?

using Lasso

N = 10;
X = eye(N)[:,1:(N-3)]
y = zeros(N);
y[1] = 1;

path = fit(LassoPath,X,y);
betas = collect(path.coefs[:,end]);
# the solution should be approximately betas = [1, 0, 0, 0, 0, 0, 0], but it's not!

The relative error norm(betas - [1, 0, 0, 0, 0, 0, 0]) = 0.029 which is almost 3%, so it is too high! Note that the least squares solution norm(X\y - [1, 0, 0, 0, 0, 0, 0]) = 0.0, up too machine precision. Setting α=0.1 does not change the results.

Any ideas?

Error tagging new release

The URL of this package does not match that stored in METADATA.jl.
cc: @andreasnoack

package could not be compiled on Julia 0.7

currently Lasso could not be compiled on Version 0.7.0 (2018-08-08 06:46 UTC).

julia> using Lasso
[ Info: Precompiling Lasso [b4fcebef-c861-5a0f-a7e2-ba9dc32b180a]
ERROR: LoadError: UndefVarError: df not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./sysimg.jl:13
 [2] top-level scope at none:0
 [3] include at ./boot.jl:317 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1038
 [5] include(::Module, ::String) at ./sysimg.jl:29
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:319 [inlined]
 [8] eval(::Expr) at ./client.jl:399
 [9] top-level scope at ./none:3
in expression starting at /Users/xua/.julia/packages/Lasso/GBbsl/src/Lasso.jl:360
ERROR: Failed to precompile Lasso [b4fcebef-c861-5a0f-a7e2-ba9dc32b180a] to /Users/xua/.julia/compiled/v0.7/Lasso/Djwuj.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] macro expansion at ./logging.jl:313 [inlined]
 [3] compilecache(::Base.PkgId, ::String) at ./loading.jl:1185
 [4] _require(::Base.PkgId) at ./logging.jl:311
 [5] require(::Base.PkgId) at ./loading.jl:852
 [6] macro expansion at ./logging.jl:311 [inlined]
 [7] require(::Module, ::Symbol) at ./loading.jl:834

Allow additional fitting arguments to be passed in cross_validate_path

Lasso.jl/src/cross_validation.jl

Lines 27 to 36 in 5571896

    
           # convenience function to use the same data as in original path 
        
           function cross_validate_path(path::RegularizationPath;                                  # fitted path 
        
                                                                gen=Kfold(length(y),10),           # folds generator (see MLBase) 
        
                                                                select=:CVmin)                     # :CVmin or :CV1se 
        
               m = path.m 
        
               y = m.rr.y 
        
               offset = m.rr.offset 
        
               Xstandardized = m.pp.X 
        
               cross_validate_path(path,Xstandardized,y;gen=gen,select=select,offset=offset,standardize=false) 
        
           end

So that parameters, such as irls_maxiter can be set during CV. E.g.

function cross_validate_path(path::RegularizationPath; gen=Kfold(length(y),10),
                                   select=:CVmin, fitargs...)
    m = path.m
    y = m.rr.y
    offset = m.rr.offset
    Xstandardized = m.pp.X
    cross_validate_path(path, Xstandardized, y; gen=gen, select=select, offset=offset,
                        standardize=false, fitargs...)
end

The first time selectmodel is called it returns an invalid model, the second time fixes it

A researcher came to me with a weird problem.
It seems that the first model they get from selectmodel
is invalid -- calling show on ti makes it throw an error.
but if they call selectmodel again
not only does the new model the get back show correctly
but the first one starts working right too!

Setup

using Lasso

X = rand(9,8)
y = rand(9)

# this is to run this step: https://github.com/JuliaStats/Lasso.jl/blob/9d099f512afb6042b1094714e0ab280cf10f3863/src/segselect.jl#L276
#   I set λ to a single variable to simplify things
path = fit(LassoPath, X,y; λ=[0.005])

demo of errors

julia> # select model to replicate this step: https://github.com/JuliaStats/Lasso.jl/blob/9d099f512afb6042b1094714e0ab280cf10f3863/src/segselect.jl#L278
       m1 = selectmodel(path, MinAICc());

julia> show(m1)
ERROR: ArgumentError: FDist: the condition ν1 > zero(ν1) && ν2 > zero(ν2) is not satisfied.
Stacktrace:
 [1] macro expansion at /Users/oxinabox/.julia/packages/Distributions/wRw5p/src/utils.jl:6 [inlined]
 [2] Type at /Users/oxinabox/.julia/packages/Distributions/wRw5p/src/univariate/continuous/fdist.jl:30 [inlined]
 [3] Type at /Users/oxinabox/.julia/packages/Distributions/wRw5p/src/univariate/continuous/fdist.jl:35 [inlined]
 [4] Type at /Users/oxinabox/.julia/packages/Distributions/wRw5p/src/univariate/continuous/fdist.jl:37 [inlined]
 [5] #coeftable#1(::Float64, ::typeof(coeftable), ::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /Users/oxinabox/.julia/packages/GLM/ci8Mp/src/lm.jl:186
 [6] coeftable at /Users/oxinabox/.julia/packages/GLM/ci8Mp/src/lm.jl:183 [inlined]
 [7] show(::Base.TTY, ::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at /Users/oxinabox/.julia/packages/GLM/ci8Mp/src/linpred.jl:227
 [8] show(::LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}) at ./show.jl:325
 [9] top-level scope at REPL[51]:1

julia> m2 = selectmodel(path, MinAICc());

julia> show(m2)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}:

Coefficients:
──────────────────────────────────────────────────────────────────────
      Estimate  Std. Error     t value  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────
x1  -0.110053    10.0199    -0.0109834    0.9913  -20.0845    19.8644
x2  -0.116832     1.45942   -0.0800535    0.9364   -3.02614    2.79248
x3   0.478205     0.864751   0.552997     0.5820   -1.24565    2.20205
x4  -0.0911054    1.31273   -0.0694015    0.9449   -2.70799    2.52578
x5   0.0          1.72254    0.0          1.0000   -3.43382    3.43382
x6   0.285712     1.76747    0.161651     0.8720   -3.23767    3.80909
x7   0.114095     0.68929    0.165525     0.8690   -1.25998    1.48817
x8   0.380283     0.732736   0.51899      0.6054   -1.0804     1.84097
x9  -0.164557     1.53326   -0.107325     0.9148   -3.22106    2.89195
──────────────────────────────────────────────────────────────────────

julia> show(m1)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}}:

Coefficients:
──────────────────────────────────────────────────────────────────────
      Estimate  Std. Error     t value  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────
x1  -0.110053    10.0199    -0.0109834    0.9913  -20.0845    19.8644
x2  -0.116832     1.45942   -0.0800535    0.9364   -3.02614    2.79248
x3   0.478205     0.864751   0.552997     0.5820   -1.24565    2.20205
x4  -0.0911054    1.31273   -0.0694015    0.9449   -2.70799    2.52578
x5   0.0          1.72254    0.0          1.0000   -3.43382    3.43382
x6   0.285712     1.76747    0.161651     0.8720   -3.23767    3.80909
x7   0.114095     0.68929    0.165525     0.8690   -1.25998    1.48817
x8   0.380283     0.732736   0.51899      0.6054   -1.0804     1.84097
x9  -0.164557     1.53326   -0.107325     0.9148   -3.22106    2.89195
──────────────────────────────────────────────────────────────────────

I have verified this with selectmodel(path, MinBIC()) and selectmodel(path, MinCVmse(path, 5))
also.

Here is the line that is erroring in GLM.jl
GLM.jl: https://github.com/JuliaStats/GLM.jl/blob/master/src/lm.jl#L186

but I think this is not GLMs fault, I think it is right to be erroring?

compats are outdated

and seems to lead to downgrading of other packages.

Lasso.jl for big-ish data

My entire design matrix cannot fit in memory.
Much like SGDRegressor.partial_fit() in scikit-learn (see here), can I use Lasso.jl to fit in epochs, feeding batches of data at a time? I realize that this will likely not converge to the same parameters as if the data could all fit in memory.

Maybe one way to train in batches would be to modify criterion in fit() to stop after a certain number of iterations?

Model type LassoModel doesn't support intercept

Code that doesn't work

julia> using DataFrames, Lasso

julia> df = DataFrame(x=randn(100), y=3randn(100) .+ 1);

julia> fit(LassoModel, @formula(x ~ 1 + y), df)
ERROR: ArgumentError: Model type LassoModel doesn't support intercept specified in formula x ~ 1 + y
Stacktrace:
 [1] apply_schema(t::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, schema::StatsModels.Schema, Mod::Type{LassoModel})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/schema.jl:288
 [2] ModelFrame(f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}}; model::Type{LassoModel}, contrasts::Dict{Symbol, Any})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:84
 [3] kwcall(::NamedTuple{(:model, :contrasts), Tuple{UnionAll, Dict{Symbol, Any}}}, ::Type{ModelFrame}, f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:x, :y), Tuple{Vector{Float64}, Vector{Float64}}})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/modelframe.jl:73
 [4] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:85
 [5] fit(::Type{LassoModel}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, ::DataFrame)
   @ StatsModels ~/.julia/packages/StatsModels/fK0P3/src/statsmodel.jl:78
 [6] top-level scope
   @ REPL[7]:1

Why can I not manually specify an intercept like @formula(x ~ 1 + y)? The documentation ?@formula says:

1, 0, and -1 indicate the presence (for 1) or absence (for 0 and -1) of an intercept column.

So 1 is a valid intercept specification, like in R. This @formula also works in GLM.lm.

Code that works

If I write @formula(x ~ y), Lasso.jl will automatically fit a model with an intercept:

julia> fit(LassoModel, @formula(x ~ y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}

x ~ y

Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
──────────────
      Estimate
──────────────
x1  -0.132743
x2   0.0497596
──────────────

I assume the first coefficient is the intercept and the second one is multiplied by y, so the model is:

x = -0.132743 + 0.0497596 * y

So, intercepts are supported, but I can't manually specify that I want an intercept.

More code that doesn't work

Let's fit a model without an intercept. I specify this with the 0 in @formula(x ~ 0 + y).

julia> fit(LassoModel, @formula(x ~ 0 + y), df)
StatsModels.TableRegressionModel{LassoModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, MinAICc}, Matrix{Float64}}

x ~ 0 + y

Coefficients:
LassoModel using MinAICc(2) segment of the regularization path.

Coefficients:
──────────────
      Estimate
──────────────
x1  -0.132743
x2   0.0497596
──────────────

It seems like the package ignored the zero in the formula, fitted an intercept -0.132743 anyway and produced the same model as above, even though the @formula is different. R's glmnet supports fitting without an intercept since 2013.

It would be nice if it were possible to specify the intercept in the formula.

Versions

Julia v1.9-beta2
Lasso v0.7.0

Declare error exception or throw warning instead of just using error()

Consider this line:

Lasso.jl/src/coordinate_descent.jl

Line 619 in ea74151

    
           error("coordinate descent failed to converge in $maxiter iterations at λ = $λ")

It would be much better and more kosher to use throw() along with a proper exception.

As an alternative, which I actually think makes more sense, you could just throw a warning instead.

Improve documentation

In the Lasso.md, a method of fit is introduced as follows:

fit(GammaLassoPath, X, y, d=Normal(), l=canonicallink(d); ...)
fits a linear or generalized linear (concave) gamma lasso path given the design matrix X and response y.

Is it possible to provide an example in the documentation or mention the acceptable shape of X and y? I.e., X should be in size of $n\times m$, and y should be a vector of length $m$. I have a problem using the method since I don't know what is the acceptable size of these two arguments. I believe they should have something in common for example the length of y should be equal to the nrows(X) or ncols(X).

P.S.: In my case study, I have a X of size $d\times w$, and a y of length $d$. I don't know if I should pass X or X' as the second argument. I expect to get a matrix of coefficients of size $n\times d$ or $d\times n$.

upgrades and documentation

Are there plans to create CRAN glmnet-like features and documentation?

Features that are helpful from glmnet:

k-fold CV
plotting - cv MSE and lambda plots (including degrees of freedom on the top x-axis too).

Thanks

`linspace` error with bad inputs

julia> fit(LassoPath, hcat(ones(1000), rand(1000, 7)), rand(1000))
ERROR: ArgumentError: start and stop must be finite, got NaN and NaN
Stacktrace:
 [1] _linspace(::Float64, ::Float64, ::Int64) at ./twiceprecision.jl:342
 [2] linspace(::Float64, ::Float64, ::Int64) at ./twiceprecision.jl:338
 [3] computeλ(::Array{Float64,1}, ::Float64, ::Float64, ::Int64, ::Void) at /Users/ericdavies/.julia/v0.6/Lasso/src/Lasso.jl:213
 [4] build_model(::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Normal{Float64}, ::GLM.IdentityLink, ::Lasso.CovarianceCoordinateDescent{Float64,true,Array{Float64,2},Lasso.RandomCoefficientIterator,Void}, ::Float64, ::Void, ::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Void, ::Bool, ::Float64) at /Users/ericdavies/.julia/v0.6/Lasso/src/Lasso.jl:242
 [5] #fit#1(::Array{Float64,1}, ::Array{Float64,1}, ::Float64, ::Int64, ::Float64, ::Void, ::Bool, ::Bool, ::Type{T} where T, ::Bool, ::Float64, ::Bool, ::Int64, ::Void, ::Array{Any,1}, ::StatsBase.#fit, ::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Normal{Float64}, ::GLM.IdentityLink) at /Users/ericdavies/.julia/v0.6/Lasso/src/Lasso.jl:328
 [6] fit(::Type{Lasso.LassoPath}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Normal{Float64}, ::GLM.IdentityLink) at /Users/ericdavies/.julia/v0.6/Lasso/src/Lasso.jl:297 (repeats 2 times)

I'm not exactly sure what causes this but it seems like this error shouldn't occur. I would expect just a failure to converge?

	# convenience function to use the same data as in original path
	function cross_validate_path(path::RegularizationPath; # fitted path
	gen=Kfold(length(y),10), # folds generator (see MLBase)
	select=:CVmin) # :CVmin or :CV1se
	m = path.m
	y = m.rr.y
	offset = m.rr.offset
	Xstandardized = m.pp.X
	cross_validate_path(path,Xstandardized,y;gen=gen,select=select,offset=offset,standardize=false)
	end