hazimehh / l0learn Goto Github PK

Efficient Algorithms for L0 Regularized Learning

License: Other

R 29.78% Shell 0.01% M4 1.32% C++ 43.54% C 0.16% Python 0.19% Jupyter Notebook 24.99%

machine-learning sparse-regression compressed-sensing feature-selection regularization sparse-modeling l0-regularization l0learn

l0learn's People

Contributors

Stargazers

Watchers

l0learn's Issues

Excessive memory usage in L0Learn.cvfit with large n, small p

Hello,

Thanks for putting these methods together in an R package. I'm interested in using it on a problem in which I have very large n (on the order of 2 million) and small p (on the order of 50).

When I attempt to run L0Learn.cvfit memory usage explodes (several hundred gigabytes are used before R crashes). What's more concerning is that the memory does not appear to be released when the function successfully finishes even after calling gc(). I'm assuming that there's some sort of memory leak in the C++ portion of your code.

I'm wondering if it's possibly related to your use of a .Call wrapper in line 39 of cvfit.R. Since you export the C++ function L0LearnCV you should be able to directly access the function without the need for this wrapper.

I've included a reproducible example below.

n=2e6
set.seed(1) 
X = matrix(rnorm(n*10),nrow=n,ncol=100)
B = c(rep(1,5),rep(0,95))
e = rnorm(n)
y = X%*%B + e

mod <- L0Learn.cvfit(x=X,y=y,penalty="L0L2",intercept=FALSE,nLambda=50)

L1L2 and L0L1L2

Hi,

Nice package! Thanks.

Are there plans to implement L1L2 (elastic net) and L0L1L2? (I do not know if latter makes sense).

Clarification on the solution path

Is this correct to say that the L0Learn package is able to generate the solution path like glmnet, for example, using the code below:

# code to support the claim
data <- GenSynthetic(n=500,p=15,k=10,seed=1)
X = data$X
y = data$y
fit1 <- L0Learn.fit(X, y, penalty="L0", algorithm="CD", nLambda=50, scaleDownFactor = 1-1e-16)

fit1$lambda
coef(fit1, lambda = 0.058) # extract the coef at lambda = 0.058, though 0.058 is not among the computed lambda knots
coef(fit1, lambda = fit1$lambda[[1]])

Also, to compute the maximum lambda, is it correct that it uses the equation (31) in the paper
[
M^{i} = \frac{1}{2(1+2\lambda_2)}max_{j \in S^c} \big( |<r, X_j>| - \lambda_1\big)^2
]
where S is the support of (\beta^{(i)}), $S^c$ denotes what ?

Setting seed in cvfit should not affect global seed

This line in L0Learn.cvfit() introduces unintended side effects:

L0Learn/R/cvfit.R

Line 40 in fdfe13b

set.seed(seed)

In particular, it affects the RNG seed in the global environment so that if, say, someone is carrying out a simulation and calls L0Learn.cvfit(), it can throw off the whole simulation.

Better to do something like this:

original_seed <- .GlobalEnv$.Random.seed
on.exit(.GlobalEnv$.Random.seed <- original_seed)
set.seed(seed)

Bug in coef

Hi,

in a real data situation I ran across the error:

Error in object$beta[[gammaindex]][, indices, drop = FALSE] : 
  incorrect number of dimensions

It can be reproduced with this example:

set.seed(1)

X = matrix(rnorm(1000), ncol = 20)
y = rnorm(50)

fit = L0Learn.fit(X, y, penalty = "L0", maxSuppSize = 2)

l = min(fit$lambda[[1]])
coef(fit, l)

Allow for observation weights to be able to fit e.g. identity link Poisson GLMs (with or without nonnegativity constraints)

Just wondering if it would be possible to support an extra argument weights with observation weights? Right now I can fit an L0Learn model with observation weights by multiplying both my covariate matrix X and the dependent variable y by sqrt(weights). The only problem is that the automatic tuning of the lambda values doesn't work then as that should then be based on the weighted mean square error,

weighted.rmse <- function(observed, predicted, weights){
    sqrt(sum((observed-predicted)^2*weights)/sum(weights))
}

and for that you would need to know your original observed values y (as opposed to y*sqrt(weights)) and weights, therefore requiring a separate argument. You could always make it default to weights=rep(1,ncol(X))

I am asking this because I would like to fit an identity link Poisson model and to approximate such a model I would use weights=1/variance=1/(y+1), which would essentially amount to using a single step of the standard IRLS algo to fit GLMs (http://bwlewis.github.io/GLM/) and approximate a GLM using a weighted least squares analysis. This would also require nonnegativity constraints, so I am using your branch with nonnegativity constraints for this. Also any chance btw that those nonnegativity constraints could be specified by an argument and not require a separate branch with the same name, as that's a bit of an annoyance to use?

Thanks a lot for considering this!

Control of number of lambda values in L0Learn.fit with algorithm = "CDPSI"

Is there a way to precisely control the number of lambda values in the fitting output?

When running the code L0Learn.fit below, depending on the data set, the results in terms of the nlambda and convergence are not stable. For example, though nLambda = 50 is assigned in the arguments, the lambda in the output does not have 50 values.

Also, at some lambda values, the convergence is FALSE, what does the "FALSE" suggest?

fit = L0Learn.fit(x, y, penalty="L0", algorithm="CDPSI", nLambda=50, maxSuppSize = max(50,ncol(x),nrow(x)))
# I would like exactly 50 lambda values to be fitted with, so I used the combination of arguments nLambda and maxSuppSize above

# Some output from the fit above
$beta
     [,1]
[1,] ?   

$lambda
$lambda[[1]]
 [1] 8.76580e-02 5.16886e-02 3.16239e-02 2.40184e-02 1.41623e-02 7.46217e-03 4.65557e-03 3.72445e-03
 [9] 3.08345e-03 1.75683e-03 9.18510e-04 7.34808e-04 3.92448e-04 5.06849e-07 4.21151e-12 3.09958e-17
[17] 1.06243e-22 6.32945e-28 2.19079e-32 2.67651e-36 6.16308e-38 8.28652e-39 2.27869e-39 2.21473e-40
[25] 2.87073e-41 4.52241e-42 1.31918e-42 5.74783e-43 7.50083e-44 2.66276e-44 1.06700e-44 5.45064e-45
[33] 1.63368e-45 5.86101e-46 2.10022e-46 1.68946e-46 8.16552e-47 4.49811e-47 2.56446e-47 6.42752e-48
[41] 1.00124e-48 5.36576e-49

$a0
$a0[[1]]
 [1]   0.2592565   0.1708613  -0.2503479  23.7212300  85.8548066 112.7671879 113.1016513 134.6013585
 [9] 168.6231844 169.2002953 169.4254580 163.0167145 165.2270487 165.5510734 165.5420496 165.5611895
[17] 165.5631310 165.5631310 165.5634426 165.5631896 165.5632278 165.5632348 165.5632328 165.5632325
[25] 165.5632324 165.5632322 165.5632322 165.5632323 165.5632322 165.5632322 165.5632322 165.5632323
[33] 165.5632322 165.5632323 165.5632322 165.5632323 165.5632323 165.5632322 165.5632322 165.5632322
[41] 165.5632322 165.5632322

$converged
$converged[[1]]
 [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
[17]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[33]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Control of lambda0 and the corresponding support size?

From the L0Learn.fit results, it looks like the lambda0 is computed based on the data, so we could not control the corresponding support size, even using the argument scaleDownFactor does not help. I was wondering if there is a way to control the support size, say, from 1 to 20, or this feature is a future update, or this issue is not a big concern?

Thanks

# code used as an example
data <- GenSynthetic(n=500,p=1000,k=50,seed=1)
X = data$X
y = data$y
fit1 <- L0Learn.fit(X, y, penalty="L0", algorithm="CD", nLambda=50, scaleDownFactor = 1-1e-16)
print(fit1)

outputs for L0learn.cvfit not documented

thanks for this great package! I realize this is likely work in progress,
but I could not work out what to do with the output of L0learn.cvfit from
the current documentation. It seems to have some undocumented values
(like cvmeans and cvsds) and I wasn't sure how to extract the "optimal" value of lambda.
thanks for any help/pointers!

L0 and L0L2 results from the CDPS(1) method are confusing

The L0 and L0L2 results from the CDPSI(1) are confusing to me. It could be either my usage of the package or my poor understanding of the L0 and L0L2 methods. So I am posting to ask for some help in order to avoid mistakenly using the package.

First, please correct me if I was wrong that I used the arguments, algorithm="CDPSI" and maxSwaps=1, to run the CDPSI(1) mentioned in the paper.

fit_l0 = L0Learn.fit(x, y, penalty="L0", algorithm="CDPSI", nLambda=50, maxIters = 1e4, maxSwaps=1, maxSuppSize = max(50,ncol(x),nrow(x)))

The graphs below are about the comparison of nonzero coefficients and risk errors from the L0 and L0L2 method in the package. The parameters are tuned based on validation data sets.

In that case setting, the plots show that L0L2 has larger risk error than L0, and L0L2 sometimes could have smaller nonzero coefficient numbers than L0. But why does L0L2 have smaller nonzero coefficient numbers?

(plots updated based on comments)

R code to generate the plots, (updated based on comments)

# https://github.com/hazimehh/L0Learn/issues/31
library(L0Learn)

compute_risk = function(betahat, beta_ture, Sigma){
  betahat0 = betahat[1,]
  betahat = betahat[-1,]

  delta = betahat - beta_ture
  risk = diag(t(delta) %*% Sigma %*% delta)
  risk = risk + betahat0^2
  return (risk)
}
l0l2_nonzero_path = l0_nonzero_path = l0l2_risk_path = l0_risk_path = c()

rep = 30

for (i in 1:rep) {
  # simulate x and y
  nval = n = 100
  p = 10
  rho = 0.35
  snr=3;
  s=5
  x = matrix(rnorm(n*p),n,p)
  xval = matrix(rnorm(n*p),n,p)

  # Introduce autocorrelation
  if (rho != 0) {
    inds = 1:p
    Sigma = rho^abs(outer(inds, inds, "-"))
    obj = svd(Sigma)
    Sigma.half = obj$u %*% (sqrt(diag(obj$d))) %*% t(obj$v)
    x = x %*% Sigma.half
    xval = xval %*% Sigma.half
  }

  beta = rep(0,p)
  s = min(s,p)
  beta[1:s] = 1
  # set snr
  vmu = as.numeric(t(beta) %*% beta)
  sigma = sqrt(vmu/snr)
  y = as.numeric(x %*% beta + rnorm(n)*sigma)
  yval = as.numeric(xval %*% beta + rnorm(nval)*sigma)

  # fit_L0
  fit_l0 = L0Learn.fit(x, y, penalty="L0", algorithm="CDPSI", nLambda=50, maxIters = 1e4, maxSuppSize = min(50,ncol(x),nrow(x)))
  fit_l0$lambda
  fit_l0$converged
  betahat_l0 = as.matrix(coef(fit_l0))
  pred = predict(fit_l0, xval)
  error_val_l0 = colMeans(as.matrix((pred - matrix(yval,ncol=1))^2))

  id = which.min(error_val_l0) # selection based on validation error

  nonzero = colSums(betahat_l0!=0)[id]
  l0_nonzero_path = c(l0_nonzero_path, nonzero)

  risk_val_l0 = compute_risk (betahat = betahat_l0, beta_ture = beta, Sigma)
  l0_risk_path = c(l0_risk_path, risk_val_l0[id])


  # fit_L0L2
  # nGamma = 10, gammaMax = 10, gammaMin = 1e-04,
  fit_l0l2 = L0Learn.fit(x, y, penalty="L0L2", algorithm="CDPSI", nLambda=50, maxIters = 1e4, maxSuppSize = min(50,ncol(x),nrow(x)))
  betahat_l0l2 = as.matrix(coef(fit_l0l2))
  pred_l0l2 = predict(fit_l0l2, xval)
  error_val_l0l2 = colMeans(as.matrix((pred_l0l2 - matrix(yval,ncol=1))^2))

  id = which.min(error_val_l0l2) # selection based on validation error

  nonzero = colSums(betahat_l0l2!=0)[id]
  l0l2_nonzero_path = c(l0l2_nonzero_path, nonzero)

  risk_val_l0l2 = compute_risk (betahat = betahat_l0l2, beta_ture = beta, Sigma)
  l0l2_risk_path = c(l0l2_risk_path, risk_val_l0l2[id])
}

# results
par(mfrow=c(1,2))
plot(l0_risk_path, ylim = c(min(l0_risk_path, l0l2_risk_path), max(l0_risk_path, l0l2_risk_path))
     , ylab = "Risk Error", xlab = "index of replication, L0 in blue circle, L0L2 in red cross"
     , main = "parameter selected based on minimum validation error", col = "blue")
points(l0l2_risk_path, pch = 3, col = "red" )

plot(l0_nonzero_path, ylim = c(min(l0_nonzero_path, l0l2_nonzero_path), max(l0_nonzero_path, l0l2_nonzero_path))
     , ylab = "Nonzero coefficient", xlab = "index of replication, L0 in blue circle, L0L2 in red cross"
     , main = "parameter selected based on minimum validation error", col = "blue")
points(l0l2_nonzero_path, pch = 3, col = "red" )

Readme images

Tuning the parameter, gamma, using the function L0Learn.cvfit

After running the function L0Learn.cvfit, there are values equal to 0 in the results from cvfit$cvMeans, which is confusing, because the error should always be greater than 0.

> cvfit$cvMeans[1,1]
[[1]]
 [1] 39.474119 39.473877 39.473055 39.472827 39.472379 39.472067 39.471837 39.471278 39.471120
[10] 39.470902 39.470746 39.470734 39.470730 39.470731 39.470726 39.470726 39.470725 39.470725
[19] 39.470725 39.470725 39.470718 39.470717 39.470710 39.470712 39.470720 39.470723 39.470721
[28] 39.470721 39.470716 39.470718 39.470718 39.470718 39.470715 39.470713 39.470714 39.470715
[37] 39.470717 39.470714 39.470712 39.470711 39.470711 39.470710 39.470707 31.497491 23.998097
[46] 16.108749 16.108748  8.329246  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
[55]  0.000000  0.000000  0.000000  0.000000  0.000000

The code used to generate the above results

# simulate x, y
     n = 5000
      p = 1000
      s = 10
      rho = 0.35
      x = matrix(rnorm(n*p),n,p)
      inds = 1:p
      Sigma = rho^abs(outer(inds, inds, "-"))
      obj = svd(Sigma)
      Sigma.half = obj$u %*% (sqrt(diag(obj$d))) %*% t(obj$v)
      x = x %*% Sigma.half
      beta = rep(0,p)
      beta[1:s] = 1
      vmu = as.numeric(t(beta) %*% Sigma %*% beta)
      sigma = sqrt(vmu)
      y = as.numeric(x %*% beta + rnorm(n)*sigma)
# set range of gamma
    xty = base::crossprod(x,y)
    gamma_max = max(abs(xty)) # another choice based on proof is sqrt(sum((xty)^2))
    gamma_min = 1e-5 * gamma_max
    nGamma = 50
# run the function
    cvfit = L0Learn::L0Learn.cvfit(x, y, nFolds=5, seed=1, penalty="L0L2", nGamma=nGamma, gammaMin=gamma_min, gammaMax=gamma_max, maxSuppSize=100)
    cvfit$cvMeans[1,1]

OpenMP not found on MAC OS

Thanks for the great package. I have a MAC OS 10.13.4 and installed clang6 as the compiler. I've encountered some warnings during the installation process (detailed below).

Among these warnings (I don't quite understand most of them), the one 'OpenMP unavailable' is a bit wield to me, since I have no problem with OpenMP in R package like 'RcppArmadillo'.

* installing *source* package ‘L0Learn’ ...
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether /usr/local/clang6/bin/clang++ accepts -g... yes
checking how to run the C++ preprocessor... /usr/local/clang6/bin/clang++ -E
checking for macOS... found
configure: WARNING: OpenMP unavailable and turned off.
configure: creating ./config.status
config.status: creating src/Makevars
** libs
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CD.cpp -o CD.o
CD.cpp:6:41: warning: field 'ActiveSetNum' will be initialized after field 'CyclingOrder' [-Wreorder]
    Tol{P.Tol}, ActiveSet{P.ActiveSet}, ActiveSetNum{P.ActiveSetNum}, CyclingOrder{P.CyclingOrder}
                                        ^
1 warning generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL0.cpp -o CDL0.o
CDL0.cpp:13:10: warning: unused variable 'SecondPass' [-Wunused-variable]
    bool SecondPass = false;
         ^
In file included from CDL0.cpp:1:
../include/CDL0.h:11:24: warning: private field 'ytX' is not used [-Wunused-private-field]
        arma::rowvec * ytX; // new imp
                       ^
../include/CDL0.h:12:48: warning: private field 'D' is not used [-Wunused-private-field]
        std::map<unsigned int, arma::rowvec> * D; //new imp
                                               ^
3 warnings generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL012.cpp -o CDL012.o
CDL012.cpp:14:10: warning: unused variable 'SecondPass' [-Wunused-variable]
    bool SecondPass = false;
         ^
1 warning generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL012Cons.cpp -o CDL012Cons.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL012KSwapsExh.cpp -o CDL012KSwapsExh.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL012Logistic.cpp -o CDL012Logistic.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL012LogisticSwaps.cpp -o CDL012LogisticSwaps.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL012SquaredHinge.cpp -o CDL012SquaredHinge.o
In file included from CDL012SquaredHinge.cpp:1:
../include/CDL012SquaredHinge.h:1:9: warning: 'CCDL012SquaredHinge_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
#ifndef CCDL012SquaredHinge_H
        ^~~~~~~~~~~~~~~~~~~~~
../include/CDL012SquaredHinge.h:2:9: note: 'CDL012SquaredHinge_H' is defined here; did you mean 'CCDL012SquaredHinge_H'?
#define CDL012SquaredHinge_H
        ^~~~~~~~~~~~~~~~~~~~
        CCDL012SquaredHinge_H
CDL012SquaredHinge.cpp:48:16: warning: unused variable 'Oldobjective' [-Wunused-variable]
        double Oldobjective = objective;
               ^
2 warnings generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL012SquaredHingeSwaps.cpp -o CDL012SquaredHingeSwaps.o
In file included from CDL012SquaredHingeSwaps.cpp:2:
../include/CDL012SquaredHinge.h:1:9: warning: 'CCDL012SquaredHinge_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
#ifndef CCDL012SquaredHinge_H
        ^~~~~~~~~~~~~~~~~~~~~
../include/CDL012SquaredHinge.h:2:9: note: 'CDL012SquaredHinge_H' is defined here; did you mean 'CCDL012SquaredHinge_H'?
#define CDL012SquaredHinge_H
        ^~~~~~~~~~~~~~~~~~~~
        CCDL012SquaredHinge_H
1 warning generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL012Swaps.cpp -o CDL012Swaps.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL1.cpp -o CDL1.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CDL1Relaxed.cpp -o CDL1Relaxed.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c CrossValidation.cpp -o CrossValidation.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c Grid.cpp -o Grid.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c Grid1D.cpp -o Grid1D.o
Grid1D.cpp:201:20: warning: unused variable 'thr' [-Wunused-variable]
            double thr = sqrt(2 * P.ModelParams[0] * (Lipconst)) + P.ModelParams[1]; // pass this to class? we're calc this twice now
                   ^
1 warning generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c Grid2D.cpp -o Grid2D.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c IHTL0.cpp -o IHTL0.o
In file included from IHTL0.cpp:1:
../include/IHTL0.h:1:9: warning: 'IHTLO_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
#ifndef IHTLO_H
        ^~~~~~~
../include/IHTL0.h:2:9: note: 'IHTL0_H' is defined here; did you mean 'IHTLO_H'?
#define IHTL0_H
        ^~~~~~~
        IHTLO_H
1 warning generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c MakeCD.cpp -o MakeCD.o
In file included from MakeCD.cpp:6:
../include/IHTL0.h:1:9: warning: 'IHTLO_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
#ifndef IHTLO_H
        ^~~~~~~
../include/IHTL0.h:2:9: note: 'IHTL0_H' is defined here; did you mean 'IHTLO_H'?
#define IHTL0_H
        ^~~~~~~
        IHTLO_H
In file included from MakeCD.cpp:10:
../include/CDL012SquaredHinge.h:1:9: warning: 'CCDL012SquaredHinge_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]
#ifndef CCDL012SquaredHinge_H
        ^~~~~~~~~~~~~~~~~~~~~
../include/CDL012SquaredHinge.h:2:9: note: 'CDL012SquaredHinge_H' is defined here; did you mean 'CCDL012SquaredHinge_H'?
#define CDL012SquaredHinge_H
        ^~~~~~~~~~~~~~~~~~~~
        CCDL012SquaredHinge_H
MakeCD.cpp:40:1: warning: control may reach end of non-void function [-Wreturn-type]
}
^
3 warnings generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c Normalize.cpp -o Normalize.o
Normalize.cpp:7:18: warning: unused variable 'p' [-Wunused-variable]
    unsigned int p = X.n_cols;
                 ^
1 warning generated.
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c RInterface.cpp -o RInterface.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include" -I/usr/local/include  "-I../include" -DARMA_DONT_USE_OPENMP -fPIC  -Wall -g -O2 -c RcppExports.cpp -o RcppExports.o
/usr/local/clang6/bin/clang++ -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/clang6/lib -o L0Learn.so CD.o CDL0.o CDL012.o CDL012Cons.o CDL012KSwapsExh.o CDL012Logistic.o CDL012LogisticSwaps.o CDL012SquaredHinge.o CDL012SquaredHingeSwaps.o CDL012Swaps.o CDL1.o CDL1Relaxed.o CrossValidation.o Grid.o Grid1D.o Grid2D.o IHTL0.o MakeCD.o Normalize.o RInterface.o RcppExports.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/usr/local/gfortran/lib -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing to /Library/Frameworks/R.framework/Versions/3.5/Resources/library/L0Learn/libs
** R
** byte-compile and prepare package for lazy loading
** help
Warning: bad markup (extra space?) at L0Learn.cvfit.Rd:75:15
Warning: bad markup (extra space?) at L0Learn.cvfit.Rd:78:13
Warning: bad markup (extra space?) at L0Learn.cvfit.Rd:80:10
Warning: bad markup (extra space?) at L0Learn.cvfit.Rd:82:12
Warning: bad markup (extra space?) at L0Learn.cvfit.Rd:85:14
Warning: bad markup (extra space?) at L0Learn.cvfit.Rd:87:13
Warning: bad markup (extra space?) at L0Learn.cvfit.Rd:88:16
Warning: bad markup (extra space?) at L0Learn.cvfit.Rd:90:17
Warning: bad markup (extra space?) at L0Learn.fit.Rd:71:10
Warning: bad markup (extra space?) at L0Learn.fit.Rd:73:12
Warning: bad markup (extra space?) at L0Learn.fit.Rd:76:14
Warning: bad markup (extra space?) at L0Learn.fit.Rd:78:13
Warning: bad markup (extra space?) at L0Learn.fit.Rd:79:16
Warning: bad markup (extra space?) at L0Learn.fit.Rd:81:17
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (L0Learn)

Allowing for positivity or box constraints on fitted coefficients?

Not sure if this would be possible at all - but do you think it might be possible to support positivity or box constraints on the fitted coefficients, similar to what is possible using the arguments lower.limits and upper.limits in glmnet?

Support for sparse covariate matrices?

Was just wondering if support for a sparse covariate matrices (of class dgCMatrix from the Matrix package) is planned by any chance, similar to the way glmnet supports this, and if doing so would allow for some possible further speed optimizations?

Tried installing L0Learn in a jupyter notebook

Hi! I was trying to install L0Learn using the instructions given
pip install l0learn

but I get the following error
ERROR: Could not find a version that satisfies the requirement l0learn (from versions: none)
ERROR: No matching distribution found for l0learn

any help would be much appreciated. Thank you!

Prediction results on test data from L0 is much worse than L0L2 or LASSO

Hi, I was running a comparison of the prediction errors between models L0, L0L2, and LASSO. I found that the prediction results from L0 are much worse than L0L2 or LASSO. I was curious if it is due to my misuse of the L0 method, the nature of the L0 method or a bug in the L0 method?

I also checked the coef from the L0 method, it selects all the predictors after the cv.

The prediction error results are

> c(lasso, l0, l0l2)
[1] 22.19890 72.50530 24.21283

The code in R are

library(L0Learn);library(MASS); library(glmnet)
data(Boston)

y = Boston[,"medv"]
x = Boston[,-which(colnames(Boston) %in% "medv")]
x_orig = scale(x)
form <- x_orig ~ .^2 -1
x_expand = model.matrix(form, data = data.frame(x_orig))
x = x_expand

foldid = sample(rep(seq(5), length=length(y)))
y.tr <- y[which(foldid!=1)]
x.tr <- x[which(foldid!=1),]
y.ts <- y[which(foldid==1)]
x.ts <- x[which(foldid==1),]

# fit L0
cvfit = L0Learn.cvfit(x.tr, y = matrix(y.tr, ncol=1), nFolds=5, penalty="L0", algorithm = "CDPSI")
cvfit_l0 = cvfit
optimalLambdaIndex  = which.min(cvfit$cvMeans[[1]])
coef_l0 = as.numeric(coef(cvfit$fit, lambda = cvfit$fit$lambda[[1]][optimalLambdaIndex]))
optimalLambdaIndex
pred_l0 = predict(cvfit, newx = x.ts, lambda=optimalLambdaIndex)
pred_l0 = as.vector(pred_l0)
l0 <- mean( (y.ts-pred_l0)^2 )

# fit L0L2
cvfit = L0Learn.cvfit(x.tr, y = matrix(y.tr, ncol=1), nFolds=5, penalty="L0L2", nGamma=20, gammaMin=0.001, gammaMax=10, maxSuppSize=50, algorithm = "CDPSI")
cvfit_l0l2 = cvfit
optimalGammaIndex = which.min(lapply(cvfit$cvMeans, min))
optimalLambdaIndex = which.min(cvfit$cvMeans[[optimalGammaIndex]])
optimalLambda = cvfit$fit$lambda[[optimalGammaIndex]][optimalLambdaIndex]
optimalLambda
pred_l0l2 = predict(cvfit, newx = x.ts, lambda=optimalLambda, gamma=cvfit$fit$gamma[optimalGammaIndex])
pred_l0l2 = as.vector(pred_l0l2)
l0l2 <- mean( (y.ts-pred_l0l2)^2 )

# fit LASSO
cv_las <- cv.glmnet(x.tr,y.tr, nfolds=5)
fit_las <- glmnet(x.tr,y.tr,lambda = cv_las$lambda)
pred_las = predict(fit_las, newx = x.ts)[,which.min(cv_las$cvm)]
lasso <- mean( (y.ts-pred_las)^2 )

c(lasso, l0, l0l2)

Pre-specifying lambda grid does not seem to work

With both the CRAN and github version L0Learn.fit and L0Learn.cvfit if I specify autoLambda=FALSE and lambdaGrid = as.list(lambdas) with a custom vector lambdas I get a fit back in which just the first lambda value is used. Would you happen to have a working example of using a custom lambda grid?

hazimehh / l0learn Goto Github PK

l0learn's People

Contributors

Stargazers

Watchers

Forkers

l0learn's Issues

Recommend Projects

Recommend Topics

Recommend Org