Code Monkey home page Code Monkey logo

bigdata_t13_deeprec's Introduction

Big Data - T13 - DeepRec

Final Project for UF EEL6935 Spring 2018

Flow chart

pic10

Overview Architecture

pic11

Dataset

Netflix Prize open competition dataset

We mainly used the file training_set.tar in the raw dataset. It is a tar of a directory containing 17,770 files, one per movie. The first line of each file contains the movie id followed by a colon. Each subsequent line in the file corresponds to a rating from a customer and its date in the following format:

CustomerID,Rating,Date

MovieIDs range from 1 to 17,770 sequentially.
CustomerIDs range from 1 to 2,649,429, with gaps. There are 480,189 users.
Ratings are on a five star (integral) scale from 1 to 5.
Dates have the format YYYY-MM-DD.

  • Data preprocess
    Put the raw data under the root file folder, and then
tar -xvf nf_prize_dataset.tar.gz
tar -xf download/training_set.tar

We mainly divided movie rating data into three sets: train set, valid set and test set, according to the dates of the ratings. In each set, files have three columns structure: CustomerID,MovieID,Rating.

In the root file folder, run the script below to process the original Netflix dataset and output the basic statistics of number of ratings and users.

python ./data_utils/dataprocessing.py training_set Netflix

picture 2

After that, the raw data will be converted into 3 columns with a format of
CustomerID,MovieID, Rating

  • **Model and data test For test the autoencoder model, run the script to test whether the autoencoder for data and layers is working smoothly.
python -m unittest test/data_layer_tests.py
python -m unittest test/test_model.py

Train

python train.py --gpu_ids 0 --path_to_train_data Netflix/N3M_TRAIN --path_to_eval_data Netflix/N3M_VALID --hidden_layers 128,256,256 --non_linearity_type selu --batch_size 128 --logdir model_save --drop_prob 0.8 --optimizer momentum --lr 0.005 --weight_decay 0 --aug_step 1 --noise_prob 0 --num_epochs 100 --summary_frequency 1000

Evaluate

python evaluate.py --path_to_train_data Netflix/N3M_TRAIN --path_to_eval_data Netflix/N3M_TEST --hidden_layers 128,256,256 --non_linearity_type selu --save_path model_save/model.epoch_99 --drop_prob 0.8 --predictions_path preds.txt

Result

finalresult

bigdata_t13_deeprec's People

Contributors

jeness avatar xiaoxiao000 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.