This repo implements a projected coordinate-descent Newton (CDN) solver for L1-regularized logistic regression in Cython.
The basic CDN algorithm is presented in Yuan et al (2010) and Yuan et al (2012). The only modification here is to add projection to allow for imposing bounds on the model coefficients, such as constraining them to be positive.
The API closely resembles the scikit-learn implementation of Logistic Regression, which calls liblinear, which in turn uses newGLMNET (Yuan et al, 2012). This cython implementation of CDN is perhaps 10 times slower than scikit-learn's call to liblinear, but it allows for placing constraints on the weights, which was an option in the original GLMNET package, but not in liblinear or scikit-learn.
- python3
- numpy
- scipy
- cython
To use this code, first compile it with cython by using
> python setup.py build_ext --inplace
You can then run a test using test_lrb.py
(add --both
to compare to the scikit-learn results or -h
to see more options).
To embed in python code, the API is essentially the same as scikit-learn, e.g.
import lrb
model = lrb.LogisticRegressionBounded()
model.fit(X, y)
predictions = model.predict(X)
To add bounds on the weights, use the lower
or upper
keywords, e.g.
model = lrb.LogisticRegression(lower=0)
This implementation currently only supports binary classification (not multi-class), and X
must be passed as a scipy.sparse matrix (not dense). Also, only L1 regularization is implemented (not L2).
Note that at the moment, the intercept is being regularized (as is the case in liblinear and scikit-learn). I may change this in the future...
- handle dense data
- remove bounds restriction on intercept
- replace convergence test with something better
- Yuan et al. A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification. JMLR 11 (2010).
- Yuan et al. An Improved GLMNET for L1-regularized Logistic Regression. JLMR 13 (2012).