Code Monkey home page Code Monkey logo

pa1's Introduction

#Assignment: PA 1 #####Written by: Noranda Brown #####Version: 2014.1.18

###CodeClimate Code Climate

###Data The data set, ml-100k, consists of 100,000 ratings of 1682 movies from 943 users downloaded from http://www.grouplens.org. The main data set u.data consists of 100,000 rows where each row has 4 tab-separated items: user_id, movie_id, rating, and timestamp.

###Program Class: MovieData

initialize - Initializes instance variable @movies as a hash.

load_data - Reads in data from the original ml-100k/u.data file and stores movie objects in a hash.

popularity(movie_id) - Returns a number (0 - 100) that indicates the popularity (higher numbers are more popular). Note: Popularity is calculated by summing the ratings for the movie and normalizing the result. This takes into account not only the average rating, but also the number of times a movie has been reviewed. i.e. sum_ratings / num_ratings (to get the average rating) * num_ratings (to account for multiple reviews) = sum_ratings -> normalize

popularity_list - Generates a list of all movie_id’s ordered by decreasing popularity.

similarity(user1, user2) - Generates a number (0 - 100) which indicates the similarity in movie preference between user1 and user2 (higher numbers indicate greater similarity).

most_similar(u, number_of_users = 5) - Returns a list of the top number_of_users (default = 5) whose tastes are most similar to the tastes of user u.

other_users(u) - Private method that returns a list of all users except u.

Class: Movie

initialize(movie_id) - Initializes instance variables @movie_id, @user_ratings, and @sum_ratings.

add_rating(user_id, rating, timestamp) - Adds a user_id, rating and timestamp to a movie.

sum_ratings - Returns the sum of all ratings for a movie.

user_rated?(user_id) - Returns true if user_id has rated a movie and false otherwise.

user_rating(user_id) - Returns the user rating from user_id for a movie.

user_list - Returns a list of users that have reviewed a movie.

Class: UserRating

initialize(user_id, rating, timestamp) - Initializes instance variables @user_id, @rating, and @timestamp.

###Questions

  1. Describe an algorithm to predict the ranking that a user U would give to a movie M assuming the user hasn’t already ranked the movie in the dataset.
predict_rating(user_id)
        sum = most_similar(user_id, 20).inject(0) { |sum, user| sum + user_rating(user) }
        count = most_similar(user_id, 20).inject(0) { |count, user| count + 1 if user_rated?(user) }
        (sum / count).to_i

This algorithm calculates the average rating of the top 20 most similar users.

  1. Does your algorithms scale? What factors determine the execution time of your “most_similar” and “popularity_list” algorithms.

    While this algorithm would work, it would not scale very well, primarily due to the most_similar method which requires calculating the similarity between the user and all other users. Thus, the more users, the less efficient the algorithm.

pa1's People

Contributors

noranda avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.