Code Monkey home page Code Monkey logo

cpmf's Introduction

cpmf: Collection of Parallel Matrix Factorization

Prerequisite

required

piconjson is needed to parse config.json.

$ git clone https://github.com/kazuho/picojson.git vendor/picojson

optional

If you want to use MassiveThreads as a task parallel library, install it by the following command.

$ git clone https://github.com/massivethreads/massivethreads.git vendor/massivethreads
$ cd vendor/massivethreads
$ ./configure --prefix=/usr/local
$ make && make install

When you change PREFIX from /usr/local, be sure to also change MYTH_PATH in Makefile.

Converting MovieLens data

Use scripts/convert_movielens.py to convert MovieLens data format to cpmf format.

To convert MovieLens 100K Dataset,

$ python scripts/convert_movielens.py PATH/ml-100k/u.data > input/ml-100k

To convert MovieLens 1M dataset,

$ python scripts/convert_movielens.py PATH/ml-1m/ratings.dat --separator :: > input/ml-1m

To convert MovieLens 10M dataset

$ python scripts/convert_movielens.py PATH/ml-10M100K/ratings.dat --separator :: > input/ml-10m

Parallel methods

Users can designate the parallel method by DPARALLEL in Makefile.

FPSGD

In FPSGD, the rating matrix is divided into many blocks and multiple threads work on blocks not to share the same row or column.

If you want to use FPSGD method, specify DPARALLEL = -DFPSGD.

  • Reference

    Y.Zhuang, W-S.Chin and Y-C.Juan and C-J.Lin, "A fast parallel SGD for matrix factorization in shared memory systems", RecSys'13, paper

dcMF (by Intel Cilk or MassiveThreads)

dcMF is our proposing way to parallelize matrix factorization by recursively dividing the rating matrix into 4 smaller blocks and dynamically assigning the created tasks to threads.

If you want to use dcMF, specify DPARALLEL = -DTP_BASED.

To decide which task parallel library to use, you should set as follows: DTP = -DTP_CILK for Intel Cilk or DTP = -DTP_MYTH for MassiveThreads.

  • Reference

    Y. Nishioka, and K. Taura. "Scalable task-parallel SGD on matrix factorization in multicore architectures." Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International. paper

How to use

Just make and run!

$ make
$ ./mf train config.json

cpmf's People

Contributors

ysk24ok avatar

Stargazers

zhilin gong avatar Shaden Smith avatar Rozz avatar

Watchers

James Cloos avatar  avatar

Forkers

dashuzhilin

cpmf's Issues

Add subcommand to divide data into training set and test set

Add subcommand (like cv_split) to divide data into training set and test set for cross validation.

After converting MovieLens data to cpmf format,

$ python scripts/convert_movielens.py PATH/ml-1m/ratings.dat --separator :: > input/ml-1m

users will split the data into training set and test set by the following command.

$ ./mf cv_split input/ml-1m --training_size 0.8

or

$ ./mf cv_split input/ml-1m --test_size 0.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.