Code Monkey home page Code Monkey logo

recsys-notes's Introduction

RecSys-Notes

Classic papers and resources on recommendation system, along with python implementation (focusing on PyTorch).

What should we consider about recommendation - From Netflix:

We want our recommendations to be accurate in that they are relevant to the tastes of our members, but they also need to be diverse so that we can address the spectrum of a member’s interests versus only focusing on one. We want to be able to highlight the depth in the catalog we have in those interests and also the breadth we have across other areas to help our members explore and even find new interests. We want our recommendations to be fresh and responsive to the actions a member takes, such as watching a show, adding to their list, or rating; but we also want some stability so that people are familiar with their homepage and can easily find videos they’ve been recommended in the recent past.

Covered Model & Performance

Model Key Idea Recommended Hyperparameter Criteo Test AUC Implementation
Factorization Machine Use embedding and dot product to model low-level interaction explicitly 0.792564 after one epoch Paper
PyTorch
Field-aware Factorization Machine Model interactions between different fields differently
Deep Factorization Machine Use FM to model low-level interaction explicitly and DNN to model high-level interaction implicitly DNN: 3 * 400 0.801416 after two epoches Paper
PyTorch
Deep Cross Network Use Cross Net to model bit-level interaction between feature embedding explicitly and DNN to model high-level interaction implicitly Cross: 6
DNN: 2*1024
0.801345 after three epoches Paper
PyTorch
Extreme Deep Factorization Machine Introduce Compressed Interaction Network to enhance Cross Net, capture feature interaction at vector level instead of bit level CIN: 3*200
DNN: 4*400
0.804545 after two epoches Paper
Pytorch

Data Preparation

Criteo Data

Criteo data can be downloaded at Kaggle Displaying Ads Dataset, to prepare the data, do the following steps.

  • Git clone this repo to your local environment and change directory to your local repo

  • Create directory mkdir ./Data/crieto/criteo_raw_artifact

  • Unzip the criteo data dac.tar.gz and move train.txt and test.txt to ./Data/crieto/criteo_raw_artifact

  • Run the following command in shell

    cd ./Data/crieto
    python3 split.py
    python3 prepare.py

    Note that the current implementation of prepare.py will have all the prepared data stored in memory which may not be feasible for machines with small memory. A work around would be to store the prepared data in partition.

recsys-notes's People

Contributors

ywu94 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.