Code Monkey home page Code Monkey logo

xclimf's Introduction

xCLiMF

Python implementation of the Extended Collaborative Less-isMore Filtering, a CLiMF evolution to allow using multiple levels of relevance data. Both algorithms are a variante of Latent factor CF, wich optimises a lower bound of the smoothed reciprocal rank of "relevant" items in ranked recommendation lists.

References

CLiMF: Learning to Maximize Reciprocal Rank with Collaborative Less-is-More Filtering Yue Shi, Martha Larson, Alexandros Karatzoglou, Nuria Oliver, Linas Baltrunas, Alan Hanjalic ACM RecSys 2012

CLiMF implementation that this xCLiMF implementation is based: https://github.com/gamboviol/climf (This CLiMF implementation has this bug: gamboviol/climf#2)

xCLiMF: Optimizing Expected Reciprocal Rank for Data with Multiple Levels of Relevance Yue Shia, Alexandros Karatzogloub, Linas Baltrunasb, Martha Larsona, Alan Hanjalica ACM RecSys 2013

xCLiMF implementation that have been consulted: https://github.com/gpoesia/xclimf (with this bug: gpoesia/xclimf#1)

Experiments

  1. Runned Grid Search for movie lens 20m dataset ( https://grouplens.org/datasets/movielens/20m/ ). Got as best cross validation MRR: 0.008 using D=15, lambda=0.001, gamma=0.0001.

     python -u grid_search.py --dataset ../ml-20m/ratings.csv --sep , --skipfl
    
  2. Runned XClimf with hyperparameters tunned by Grid Search on movie lens 20m dataset, but got math range error

     python -u xclimf.py --dataset ../ml-20m/ratings.csv --sep , --skipfl --dim 15 --lambda 0.001 --gamma 0.0001
    
  3. After debugging, found some differences between paper and implementation. Tryied exactly same experimentantion protocol described in the paper using random disjoint ratings for each user in training and testing dataset (protocol that i strongly disagree). Got again math range error.

     python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 0.001 --dim 10 --seltype random
    
  4. Found some combinations of the random latent features initialization U and V that causes bigger results in first step of gradient ascending. These results causes the math range error in the objective function. Using a gamma like 1e-7 made me get a MRR of 0.028, but we can observe that the objective function is not ascending.

     python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 1e-7 --dim 10 --seltype random
    
  5. Experimented gamma 1e-6 and got a worst MRR: 0.021

     python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 1e-6 --dim 10 --seltype random
    
  6. Tried normalizing rating as r/max(r), using gamma like described in paper, and got: MRR: 0.068

     python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 0.001 --dim 10 --seltype random --norm
    
  7. Since MRR was getting high slowly, I tried a bigger gamma of 0.1, but got divide by zero and math domain error

     python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 0.1 --dim 10 --seltype random --norm
    
  8. Repeated gamma 0.001 with 500 iteractions, stopying when get the maximum objective. Could not achive it. But we can see that objective do not stop increasing, but MRR for train and test datasets stopped increasing at iteraction 100:

     python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 0.001 --dim 10 --seltype random --norm --iters 500
    

    objective

    train mrr

    test mrr

  9. Tryied with a bigger step size gamma of 0.01, but stoped when achived 50 iteractions. MRR was getting slightly worst each iteraction. The last one was 0.08.

     python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 0.01 --dim 10 --seltype random --norm --iters 100
    

    objective

    train mrr

    test mrr

  10. In this experiment I used my original experimental protocol, using only the top items for each user, randomly selecting the training and testing items from those tops. Now MRR stabilized at 0.24 from 10 iteractions and above:

    python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 0.001 --dim 10 --norm --iters 100
    
  11. Using 20 top ratings for each user, got worst MRR (0.13), than using 5 top ratings:

    python xclimf.py --dataset data/ml-1m/ratings.dat --sep :: --lambda 0.001 --gamma 0.001 --dim 10 --norm --iters 100 --topktrain 20
    
  12. Comparing with ALS:

  • MRR: 0.01

    python als_spark.py --dataset data/ml-1m/ratings.dat --sep :: --iters 100 --topktrain 5 --dim 100
    
  • MRR: 0.006

    python als_spark.py --dataset data/ml-1m/ratings.dat --sep :: --iters 100 --topktrain 5 --dim 200 --lambda 0.01
    
  • MRR: 0.004

    python als_spark.py --dataset data/ml-1m/ratings.dat --sep :: --iters 100 --topktrain 5 --dim 200 --lambda 0.0001
    
  • MRR: 0.003

    python als_spark.py --dataset data/ml-1m/ratings.dat --sep :: --iters 100 --topktrain 5 --dim 100 --lambda 0.1
    

Problems

  • Get many times math range errors
  • Get sometimes Numerical result out of range

Running with real data

To see all options:

python xclimf.py -h

So you run like this

python xclimf.py --dataset data/ml-100k/u.data

Running tests

py.test -s

xclimf's People

Contributors

timotta avatar gamboviol avatar

Watchers

James Cloos avatar zhouyonglong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.