A Java implementation of linear models for regression and classification of in-memory data sets. I needed to get myself reacquainted with Java, which I last used two years ago, for one of my graduate classes, so I figured this would be the perfect way to refresh myself while doing something fun at the same time.
Although I may not be able to make significant progress during the school year, I do intend to finish this project since it will be a nice exercise for me to refresh my knowledge on some linear supervised learning methods.
Inspired by the scikit-learn SGDClassifier.
Some loss functions I wish to make available.
- Least-squares
Supports l1, l2, and naive elastic-net regularization. See Optimization methods for solving details.
- Logistic
Gives a logistic regression classifier and also supports l1, l2, and naive elastic-net regularization.
- Hinge
Gives a linear support vector classifier and supports all previously mentioned regularization methods.
- Huber
We take δ = 1 in this case. Supports all previously mentioned regularization methods.
A list of the optimization methods I intend to implement.
- Batch [sub]gradient descent
Will support l2 regularization and l1 + naive elastic net regularization using iterative soft thresholding1. For those who are interested, there is a derivation of the soft thresholding operator from the promixal mapping in the first answer to this question on the mathematics StackExchange.
- Stochastic [sub]gradient descent
Will also support l1, l2, and naive elastic net regularization using iterative soft thresholding.
These both will directly solve the primal formulation of the problem by operating on the loss functional directly. Not sure if I plan to implement any methods to solve the dual problem.