Code Monkey home page Code Monkey logo

recommenders's Introduction

Recommenders

This repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

  • Prepare Data: Preparing and loading data for each recommender algorithm
  • Model: Building models using various recommender algorithms such as Alternating Least Squares (ALS), Singular Value Decomposition (SVD), etc.
  • Evaluate: Evaluating algorithms with offline metrics
  • Model Select and Optimize: Tuning and optimizing hyperparameters for recommender models
  • Operationalize: Operationalizing models in a production environment on Azure

Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting train/test data. Implementations of several state-of-the-art algorithms are provided for self-study and customization in your own applications.

Getting Started

Please see the setup guide for more details on setting up your machine locally, on Spark, or on Azure Databricks.

To setup on your local machine:

  1. Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
  2. Clone the repository
    git clone https://github.com/Microsoft/Recommenders
    
  3. Run the generate conda file script and create a conda environment:
    cd Recommenders
    ./scripts/generate_conda_file.sh
    conda env create -n reco -f conda_bare.yaml  
    
  4. Activate the conda environment and register it with Jupyter:
    conda activate reco
    python -m ipykernel install --user --name reco --display-name "Python (reco)"
    
  5. Start the Jupyter notebook server
    cd notebooks
    jupyter notebook
    
  6. Run the SAR Python CPU Movielens notebook under the 00_quick_start folder. Make sure to change the kernel to "Python (reco)".

Notebooks

We provide several notebooks to show how recommendation algorithms can be designed, evaluated and operationalized.

The Quick-Start and Modeling notebooks showcase how to utilize the following algorithms to build a recommender system:

Algorithms

The table below lists recommender algorithms available in the repository at the moment.

Algorithm Environment Type Description
Classic Recommenders
Surprise/Singular Value Decomposition (SVD) Python Collaborative Filtering General purpose algorithm for smaller datasets
Alternating Least Squares (ALS) Spark Collaborative Filtering General purpose algorithm for larger datasets, optimized with Spark
Microsoft Recommenders
Smart Adaptive Recommendations (SAR) Python / Spark Collaborative Filtering Generalized algorithm utilizing item similarities and can easily adapt to new users
Vowpal Wabbit Family (VW) Python / Online Collaborative, Content-based Filtering Fast online learning algorithms, great for scenarios where user features / context are constantly changing, like real-time bidding
eXtreme Deep Factorization Machine (xDeepFM) Python / GPU Hybrid Deep learning model combining implicit and explicit features
Deep Knowledge-Aware Network (DKN) Python / GPU Content-based Filtering Deep learning model incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations
Deep Learning Recommenders
Neural Collaborative Filtering (NCF) Python / GPU Collaborative Filtering General algorithm built using a multi-layer perceptron
Restricted Boltzmann Machines (RBM) Python / GPU Collaborative Filtering Generative neural network algorithm built to learn the underlying probability distribution for user/item affinity
FastAI Embedding Dot Bias (FAST) Python / GPU Collaborative Filtering General purpose algorithm embedding dot biases for users and items

In addition, we also provide a comparison notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, data (MovieLens 1M) is randomly split into train/test sets at a 75/25 ratio. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k = 10 (top 10 results). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 K80 GPU). Spark ALS is run in local standalone mode.

Preliminary Comparison

Algo MAP nDCG@k Precision@k Recall@k RMSE MAE R2 Explained Variance
ALS 0.002020 0.024313 0.030677 0.009649 0.860502 0.680608 0.406014 0.411603
SVD 0.010915 0.102398 0.092996 0.025362 0.888991 0.696781 0.364178 0.364178
FastAI 0.023022 0.168714 0.154761 0.050153 0.887224 0.705609 0.371552 0.374281

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Build Status

Build Type Branch Status Branch Status
Linux CPU master Status staging Status
Linux GPU master Status staging Status
Linux Spark master Status staging Status

NOTE - the tests are executed every night, we use pytest for testing python utilities in reco_utils and papermill for the notebooks.

recommenders's People

Contributors

aaronheee avatar anargyri avatar danielsc avatar dciborow avatar eisber avatar gramhagen avatar jreynolds01 avatar leavingseason avatar loomlike avatar maxbikes avatar maxkazmsft avatar microsoftopensource avatar miguelgfierro avatar msftgits avatar nicolashug avatar nikhilrj avatar wesszumino avatar wutaomsft avatar yueguoguo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.