lenskit / lenskit Goto Github PK

View Code? Open in Web Editor NEW

968.0 97.0 329.0 24.22 MB

LensKit recommender toolkit.

Home Page: http://lenskit.org

License: Other

Groovy 8.27% Java 91.22% Python 0.23% Shell 0.04% R 0.23% JavaScript 0.01%

java recsys

lenskit's Issues

Make sparse vector framework copy-on-write.

(originally reported in Trac by @elehack on 2011-04-05 22:36:41)

We would like to make the sparse vector framework copy-on-write.

The following value sharing strategy will be safe:

Key arrays are always shared.
Sparse vectors have an atomic integer tracking the number of copies of the value array. This is initialized to 1 for a fresh array.
Making a MutableSparseVector from another vector always creates a fresh, unshared copy.
Making an ImmutableSparseVector from another vector copies the value array and counter and increments the atomic counter.
Finalizing an ImmutableSparseVector decrements the counter.
Modifying a MutableSparseVector that is shared (counter > 1) results in copying the value array, decrementing the counter, and creating a new counter initialized to 1.

We can also allow sharing between mutable sparse vectors when one is created from another, as they will not be allowed to cross thread boundaries.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:53.246648+00:00, last updated: 2013-02-01T22:35:40.926617+00:00

Support data updates

(originally reported in Trac by @elehack on 2011-03-16 22:34:46)

LensKit needs support for handling data updates in a consistent and well-defined fashion. This is necessary in particular for continuous update evaluation a la #13.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:44.755007+00:00, last updated: 2013-02-01T22:35:44.661125+00:00

Update design document

(originally reported in Trac by @elehack on 2011-03-16 22:24:57)

We need a design document explaining how LensKit's architecture works and why.

This document has been pretty much written, we just need to review it and make sure that it's up to date.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:32.658057+00:00, last updated: 2013-02-01T22:35:45.688634+00:00

Support parallel tests

(originally reported in Trac by @elehack on 2011-03-16 22:38:13)

Consider supporting multiple evaluations being run in parallel (multiple algorithms or multiple folds).

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:50.162828+00:00, last updated: 2013-02-01T22:35:44.539723+00:00

Support serializing and deserializing recommenders

(originally reported in Trac by @elehack on 2011-03-16 22:26:52)

We want to be able to serialize and deserialize recommenders (or their models) so that a recommender can be built and then used by a different process.

It would also be interesting to explore alternative models (e.g. memory-mapped similarity matrices).

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:36.048971+00:00, last updated: 2013-02-01T22:35:45.578125+00:00

Complete API documentation

(originally reported in Trac by @elehack on 2011-03-16 22:18:33)

A lot of APIs are not well-documented yet. We need thorough API documentation.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:37:02.246349+00:00, last updated: 2013-05-31T22:23:14.291139+00:00

Support time-based splitting for cross-fold evaluation

(originally reported in Trac by @elehack on 2011-03-16 22:23:29)

The cross-fold evaluator should support time-based instead of random splitting (try to predict the last third).

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:31.110415+00:00, last updated: 2013-02-01T22:35:45.815275+00:00

Placeholder

Placeholder for deleted ticket.

Note: This issue has been automatically migrated from Bitbucket
Created by Anonymous on 2013-04-01T00:00:00.0+00:00

Implement JDBC data source

(originally reported in Trac by @elehack on 2011-04-04 17:36:24)

It would be useful to have a JDBC-based implementation of RatingDataAccessObject. It should be configurable w.r.t. what tables & fields to use.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:42.122763+00:00, last updated: 2013-02-01T22:35:41.541898+00:00

Support damping in the user mean predictor

(originally reported in Trac by @elehack on 2011-03-24 18:09:32)

The user mean predictor does not currently support damping.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:28.001089+00:00, last updated: 2013-02-01T22:35:42.970424+00:00

Move algorithms to separate projects

(originally reported in Trac by @elehack on 2011-03-16 22:44:10)

Separate the algorithms into lenskit-knn and lenskit-svd projects that depend on lenskit-core so that algorithms can be loaded independently, and to stand as examples of how to package new algorithms.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:12.914258+00:00, last updated: 2013-02-01T22:35:43.896767+00:00

Make rating normalization a strategy

(originally reported in Trac by @elehack on 2011-03-24 17:56:17)

Right now, the only normalization we support in item-item and user-user CF is subtracting baselines. It would be useful to abstract this into a strategy so we can do things like z-score normalization.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:24.439719+00:00, last updated: 2013-02-01T22:35:43.098551+00:00

Placeholder

Placeholder for deleted ticket.

Note: This issue has been automatically migrated from Bitbucket
Created by Anonymous on 2013-04-01T00:00:00.0+00:00

Implement equals and hashCode for rating implementations

(originally reported in Trac by @elehack on 2011-03-24 17:47:42)

Right now our rating implementations fail to implement .equals() and .hashCode().

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:23.746940+00:00, last updated: 2013-02-01T22:35:43.213464+00:00

Support RecLab integration

(originally reported in Trac by @elehack on 2011-03-16 22:32:26)

Build necessary infrastructure to use LensKit recommenders with RecLab.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:44.128606+00:00, last updated: 2013-02-01T22:35:44.771635+00:00

Implement basket-based recommendation

(originally reported in Trac by @elehack on 2011-03-16 22:19:28)

So far, we do not have any support for basket-based recommendation. The APIs are there, but no data sources or recommenders actually do it.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:24.655224+00:00, last updated: 2013-02-01T22:35:46.144753+00:00

Implement conditional probability similarity function

(originally reported in Trac by @elehack on 2011-03-16 22:20:49)

We should support Karypis's conditional probability asymmetric similarity function.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:27.483717+00:00, last updated: 2013-07-19T21:51:59.761117+00:00

Provide more flexible holdouts for train-test extraction

(originally reported in Trac by @elehack on 2011-03-25 16:04:08)

In addition to our fractional holdout scheme for splitting query and probe sets for user-based crossfold evaluation, it would be useful to support leave-''N''-out and retain-''N'' protocols.

All that's required to finish this is support for retain-N protocols.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:29.120420+00:00, last updated: 2013-03-05T18:29:43.403522+00:00

Develop Event abstraction

(originally reported in Trac by @elehack on 2011-04-05 14:44:19)

In order to support basket-based recommenders, we need the ability to present unary purchase or click data in the data source. In order to support this, we will introducing the concept of Events which are stored in the DAO.

Users have a sequence of Events associated with Items (non-item events may or may not be supported) - the user history or profile.
Ratings are events.
Recommenders often operate on summaries of the user's history - e.g. a rating vector extracted from the most recent rating of each rated item.
Events can be untimestamped (so if all you have is a rating matrix, it shows up as a set of ratings without timestamps). Untimestamped events are assumed to occur in the present.

The upshot of this is that we no longer need a hierarchy of DAO objects - we can just have DAO which provides users, items, and history queries. This also makes all data sources available uniformly - so more exotic recommenders can operate on combinations of purchases and ratings, for example. Subclasses can, of course, introduce additional information such as item metadata.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:47.002449+00:00, last updated: 2013-02-01T22:35:41.165673+00:00

Enable evaluation of baseline predictors

(originally reported in Trac by @elehack on 2011-03-22 20:46:30)

Baseline predictors currently exist almost as their own little entities. We need to be able to evaluate them like any other predictor.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:20.855760+00:00, last updated: 2013-02-01T22:35:43.441992+00:00

Implement gradient descent baseline

(originally reported in Trac by @elehack on 2011-03-16 22:40:55)

Implement the gradient descent fitted baseline with regularization described by Koren in Factorization meets the neighborhood. For this ticket, we aren't implementing the whole SVD++ or Asymmetric-SVD mode; we're just learning the regularized baseline.

This will be LeastSquaresPredictor in the org.grouplens.lenskit.baseline package in the lenskit-core project.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:10.665857+00:00, last updated: 2013-02-05T21:16:27.708650+00:00

Implement jointly-derived neighborhood weighting

(originally reported in Trac by @elehack on 2011-03-22 16:57:00)

Implement Koren's jointly derived neighborhood interpolation weights.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:19.477803+00:00, last updated: 2013-02-01T22:35:43.554457+00:00

Support reading the NetFlix data set

(originally reported in Trac by @elehack on 2011-03-16 22:29:58)

Some people still have copies of the NetFlix data set; it would be nice to be able to process it with LensKit.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:43.073434+00:00, last updated: 2013-02-01T22:35:45.117116+00:00

Mean damping decreases recommender quality

(originally reported in Trac by @elehack on 2011-03-28 20:38:18)

FunkSVD has a marked performance decrease when the MeanDamping parameter is enabled and set to 25 with the item-user mean predictor. This needs to be debugged. I suspect the problem is in ItemUserMeanPredictor.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:37.180451+00:00, last updated: 2013-02-01T22:35:41.997519+00:00

Rename SimilarityMatrixBuildStrategy

(originally reported in Trac by @elehack on 2011-03-29 17:44:59)

Having SimilarityMatrixBuildStrategy family of classes as well as the SimilarityMatrixBuilder family is confusing. To fix this, we will rename SimilarityMatrixBuildStrategy to ItemItemModelBuildStrategy. Along with renaming ItemItemRecommenderBuilder to ItemItemModelBuilder, this will make for a much clearer source tree layout.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:38.191254+00:00, last updated: 2013-02-01T22:35:41.887240+00:00

Provide wrappers to integrate with Mahout

(originally reported in Trac by @elehack on 2011-03-16 22:31:36)

It could be interesting to allow Mahout recommenders to be used in LensKit and vice-versa. It at least has interesting evaluation and comparison potential.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:43.760915+00:00, last updated: 2013-02-01T22:35:44.887908+00:00

Support aggregating statistics per-user

(originally reported in Trac by @elehack on 2011-03-25 16:04:50)

Support aggregating statistics like MAE and RMSE per-user rather than just overall.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:30.218995+00:00, last updated: 2013-02-01T22:35:42.743459+00:00

Write scripts to run the evaluator

(originally reported in Trac by @elehack on 2011-03-24 14:03:48)

Running the evaluator is somewhat complex, particularly with the splitting out of algorithms into separate programs. We need a set of scripts to make this easy. I am not sure what scripts - whether they are in Ant, Maven, or shell scripts - but we need something.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:22.305557+00:00, last updated: 2013-02-01T22:35:43.336753+00:00

Add exclusion set to RatingRecommender (and other recommenders?)

(originally reported in Trac by @mludwig on 2011-03-25 21:23:11)

The purpose of this is to enable the ability to include or exclude a user's rated items from the rating set. It is not reasonable to make someone pass in a candidate set consisting of the compliment of the user's rated items.

Instead we need to provide both an include and exclude set of items. Items in the exclude set will never be returned, even if they are in the include set. Items not in the include set will not be returned. If the include set is empty (default), it is the universe of items. The default exclude set is the user's ratings, although the empty set/null is still considered to be empty (to say "don't exclude anything").

This creates a little bit of an inconsistency between include/exclude default behavior, but we can document this or improve it.

Another option is to provide an include set and an exclusion predicate, but this provides the same capabilities as the system above.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:32.023931+00:00, last updated: 2013-02-01T22:35:42.483440+00:00

Upgrade site configuration to Maven 3

(originally reported in Trac by @elehack on 2011-03-25 22:31:40)

We would like to upgrade to Maven 3 to take advantage of the new reporting configuration.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:33.505606+00:00, last updated: 2013-02-01T22:35:42.373687+00:00

Document LensKit recommender designs.

(originally reported in Trac by @elehack on 2011-03-16 22:46:59)

For [[milestone:1.0]], we need to have some good documentation. This tracker ticket records specific documentation tasks for 1.0.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:15.789898+00:00, last updated: 2013-02-05T21:47:54.992869+00:00

Implement Slope-One recommenders

(originally reported in Trac by @elehack on 2011-03-16 22:29:28)

It'd be good to implement Slope-One recommenders someday.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:40.038150+00:00, last updated: 2013-02-01T22:35:45.229460+00:00

Add type parameter to Parameter

(originally reported in Trac by @elehack on 2011-03-16 22:42:56)

The =Parameter= annotation should get a type parameter to specify the type expected by the parameter. This would enable parameter type-checking in an annotation processor and aid with documentation.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:12.171813+00:00, last updated: 2013-02-01T22:35:44.007590+00:00

Implement parallel item-item similarity computation strategy

(originally reported in Trac by @elehack on 2011-03-16 22:22:19)

Item-item similarity computation can be parallelized by writing a new similarity matrix build strategy. We should do this.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:28.772959+00:00, last updated: 2013-02-01T22:35:45.926735+00:00

Write more extensive usage guide

(originally reported in Trac by @elehack on 2011-03-16 22:17:45)

The Getting Started guide is good, but we need a more extensive user guide to cover setting up LensKit, writing recommenders, etc.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:35:44.177522+00:00, last updated: 2013-02-01T22:35:46.337593+00:00

Implement cache updates for CachingNeighborhoodFinder

(originally reported in Trac by @elehack on 2011-04-04 15:54:56)

The CachingNeighborhoodFinder class needs to use the rating update listener capabilities to update itself.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:41.424757+00:00, last updated: 2013-02-01T22:35:41.655756+00:00

Support TemporalMAE and ProfileMAE

(originally reported in Trac by @elehack on 2011-03-16 22:30:36)

The evaluator needs to support Burke's TemporalMAE and ProfileMAE metrics.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:43.416145+00:00, last updated: 2013-02-01T22:35:45.014660+00:00

Consider renaming the mean damping

(originally reported in Trac by @elehack on 2011-03-16 22:40:13)

The mean damping parameter needs to be reviewed for sane naming. It might be somewhat confusing.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:52.597584+00:00, last updated: 2013-02-01T22:35:44.326551+00:00

Review rating prediction/recommendation interface

(originally reported in Trac by @elehack on 2011-04-05 14:26:36)

The current prediction/recommendation interface for rating-based systems takes the user as a long ID and a SparseVector of their ratings. This may couple the interface too closely to LensKit, making bridges (#14, #15) to other systems more difficult to write. Further, it does not allow the prediction or recommendation method to use the timestamps of the user's ratings, or to deal with duplicate ratings.

Therefore, we should consider changing the interfaces to use Collection<Rating> as the rating input.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:43.807165+00:00, last updated: 2013-02-01T22:35:41.274590+00:00

Support generating variants of an algorithm

(originally reported in Trac by @elehack on 2011-03-16 22:39:23)

We want to be able to easily generate multiple variants of an algorithm in the evaluator. In particular, we want to be able to vary a parameter and record that parameter's particular values in the output so we can plot e.g. RMSE vs. neighborhood size.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:51.539327+00:00, last updated: 2013-02-01T22:35:44.434178+00:00

Drop the RecommenderService class

(originally reported in Trac by @elehack on 2011-03-28 19:13:28)

LensKit will have a much cleaner API if we drop the RecommenderService class and allow recommenders and predictors to be directly injected.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:35.733505+00:00, last updated: 2013-02-01T22:35:42.255617+00:00

Deal properly with multiple ratings

(originally reported in Trac by @elehack on 2011-03-16 22:28:52)

If the data set has multiple ratings for the same user-item pair, something intelligent should happen. Most of the current code ''should'' be OK with this, but we need to audit the code for good behavior and design APIs as appropriate to allow multiple ratings to be dealt with sanely.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:38.699012+00:00, last updated: 2013-02-01T22:35:45.369663+00:00

Refactor statistic aggregation into separate class

(originally reported in Trac by @elehack on 2011-03-25 16:05:43)

Create statistic aggregator classes to abstract mean and RMS accumulation.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:30.951936+00:00, last updated: 2013-02-01T22:35:42.623591+00:00

Implement recommender for SVD

(originally reported in Trac by @elehack on 2011-03-28 20:13:43)

We need a RatingRecommender implementation for the SVD recommenders.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:36.441028+00:00, last updated: 2013-02-01T22:35:42.113345+00:00

Implement class parameter auto-binding

(originally reported in Trac by @elehack on 2011-03-16 22:42:10)

It would be possible to make =RecommenderModuleComponent= automatically bind class parameters. We should think about doing this.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:11.325949+00:00, last updated: 2013-02-01T22:35:44.120179+00:00

Fix SparseVector hashCode method

(originally reported in Trac by @elehack on 2011-03-16 22:45:05)

The SparseVector class should have a proper hashCode() method.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:14.804078+00:00, last updated: 2013-02-01T22:35:43.784727+00:00

Introduce ImmutableSparseVector class

(originally reported in Trac by @elehack on 2011-04-05 22:35:08)

We want a separate ImmutableSparseVector class that extends SparseVector and represents immutable - as opposed to read-only - sparse vectors. ImmutableSparseVectors are also safe to share across thread boundaries. Eventually, then, we can introduce value sharing to avoid excess copying of vectors.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:51.605893+00:00, last updated: 2013-02-01T22:35:41.050916+00:00

Implement PLSI

(originally reported in Trac by @elehack on 2011-03-16 22:27:19)

We would like to have a PLSI recommender in LensKit. References:

The PLSI recommender should be in a new Maven project lenskit-prob.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:38:37.682862+00:00, last updated: 2013-02-01T22:35:45.472453+00:00

Support normalization in user-user CF

(originally reported in Trac by @elehack on 2011-04-04 15:11:59)

Currently, user-user CF is entirely unnormalized. This needs to be fixed.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:39.657595+00:00, last updated: 2013-02-01T22:35:41.765732+00:00

Finish implementing algorithm graphing mojo

(originally reported in Trac by @elehack on 2011-04-04 18:13:28)

The algorithm graphing mojo needs to be finished to support multiple algorithm files.

Note: This issue has been automatically migrated from Bitbucket
Created by grouplens on 2013-02-01T21:55:43.115305+00:00, last updated: 2013-02-01T22:35:41.402987+00:00

lenskit / lenskit Goto Github PK

lenskit's Issues

Recommend Projects

Recommend Topics

Recommend Org