cosanlab / neighbors Goto Github PK

View Code? Open in Web Editor NEW

10.0 4.0 8.0 6.28 MB

A package to perform collaborative filtering on emotion datasets.

Home Page: https://cosanlab.github.io/neighbors

License: MIT License

Python 100.00%

collaborative-filtering

neighbors's Introduction

Neighbors

A Python package for collaborative filtering on social datasets

Installation

Pip (official releases): pip install neighbors
Github (bleeding edge): pip install git+https://github.com/cosanlab/neighbors.git

Getting started

The best way to learn how to use the package is by checking out the documentation site which contains usage tutorials as well as API documentation for all package functionality.

Quick Demo Usage

from neighbors.models import NNMF_sgd
from neighbors.utils create_user_item_matrix, estimate_performance

# Assuming data is 3 column pandas df with 'User', 'Item', 'Rating'
# convert it to a (possibly sparse) user x item matrix
mat = create_user_item_matrix(df)

# Initialize a model
model = NNMF_sgd(mat)

# Fit
model.fit()

# If data are time-series optionally fit model using dilation
# to leverage auto-correlation and improve performance
model.fit(dilate_by_nsamples=60)

# Visualize results
model.plot_predictions()

# Estimate algorithm performance using
# Repeated refitting with random masking (dense data)
# Or cross-validation (sparse data)
group_results, user_results = estimate_performance(NNMF_sgd, mat)

Algorithms

Currently supported algorithms include:

Mean - a baseline model
KNN - k-nearest neighbors
NNMF_mult - non-negative matrix factorization trained via multiplicative updating
NNMF_sgd - non-negative matrix factorization trained via stochastic gradient descent

neighbors's People

Contributors

Stargazers

Watchers

Forkers

infiniteline elvandy daisyburr ngreenstein zhang-haoming ejolly eegkit

neighbors's Issues

README intro missing import statement

from neighbors.models import NNMF_sgd
from neighbors.utils create_user_item_matrix, estimate_performance

In your example code in the README, the second import statement is missing the word 'import'.

Double check code for NNMF_sgd since refactor.

Everything else seems to be working well.

Speed up SGD with Parallelization

Could potentially add joblib and parallelize iterations.

Add support for sparse matrices

This would be a good idea to add at some point.

Question about estimate_performance()

I am running estimate_performance() function on sparse data and am curious about how the mask_items keyword argument works.

From docstrings
n_mask_items (int/float, optional): how much randomly sparsify dense data each iteration; Defaults to masking out 20% of observed

This keyword makes sense for dense data, but how does it work with sparse data? is it ignored?

Need to add time series options

Correct sorting issue with create_sub_by_item_matrix

change:
ratings = pd.DataFrame(columns=df.Item.unique(),index=df['Subject'].unique())

to:

columnNames = sorted(df.Item.unique())
ratings = pd.DataFrame(columns=columnNames,index=df['Subject'].unique())

Need to add cross-validation methods

Add ability to pass any kernel

Currently we force a boxcar, but it would be a trivial extension to allow any kernel shape to be passed into the convolution.

unflatten_dataframe produces warnings on repeated calls, e.g. with estimate_performance

This results a "fragmented dataframe" warning. Could refactor the lines around here: https://github.com/cosanlab/neighbors/blob/master/neighbors/utils.py#L283

Documentation

We should add sphinx documentation similar to nltools package. Readthedocs seems to work well. @infiniteline interested in taking a crack at this?