saurabhmathur96 / movie-recommendations Goto Github PK

View Code? Open in Web Editor NEW

18.0 3.0 10.0 1.32 MB

Recommend movies to users by RBMs, TruncatedSVD, Stochastic SVD and Variational Inference

Jupyter Notebook 99.92% Python 0.08%

recommend-movies svd movie-recommendation restricted-boltzmann-machine svd-matrix-factorisation cosine-similarity

movie-recommendations's Introduction

Movie Recommendations

A recommender system is one that seeks to predict the "rating" or "preference" a user would give to an item.

Methods

Cosine Similarity Notebook

Since the one-hot representation of movies is too sparse, we can create a dense representation using Principal Component Analysis. On this dense representation, we can recommend similar movies using cosine similarity metric.

Truncated SVD Notebook

In this method, we create representations of both, movies and users by considering the top-n factors from their Singular Value Decompositions. Next, using the (movie, user) representaion pairs we can train a regression model to predict the corresponding rating value.

Restricted Boltzmann Machines Notebook

The Restricted Boltzmann Machine (RBM) is a special type of artificial neural network. Here, the RBM is trained using the Contrastive Divergence loss function to estimate the distribution of ratings given the movie ratings of a user.

SVD Matrix Factorisation Notebook

This method involves embedding movies in a vector space by using a stochastic estimation of Matrix Factorisation. The movie embedding can be considered a representation of the movie features and we can make recommendations using a similarity metric.

Probabilistic Matrix Factorization Repository

A Bayesian approach to factorizing the Ratings matrix using Variational Inference. As a result, each rating prediction is a Gaussian with its variance representing uncertainty.

Miscellaneous

Movie Sentiment : Analysing a movie review's text to determine whether it is positive or negative. Find the repository here.
Anime Finder : A cosine similarity based anime recommendation engine along with a web-based interface. Find the repository here.
Book Recommendations : An experiment using the Truncated SVD method to recommend books. Find the notebook here.

movie-recommendations's People

Contributors

Stargazers

Watchers

Forkers

yxing555 baizhongliu ml-solutions ivorobyev sandeep102297 xuewang2019 harshinir4 tirimula margaretnm sthavir123

movie-recommendations's Issues

Implement CF-NADE

CF-NADE is the current state-of-the-art. It was developed by Yin Zheng et. al. at Hulu Research. It leverages deep learning to make recommendations.

Please include the link to user_movie_ratings.mtx

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Please see the error:

IndexError Traceback (most recent call last)
in ()
10
11 for movie_id in (10,100,1000):
---> 12 print(movie_id,predict_rating(similarities, user_ratings, movie_id))

in predict_rating(model, ratings, movie_id, n)
9
10 rated_movies = ratings.keys()
---> 11 similar_movies = model[movie_id, rated_movies].argsort()[:-1]
12 top_n = [ratings.keys()[i] for i in similar_movies[:n]]
13

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

------------------------------------------------------Cosine Similarity
import numpy as np

def predict_rating(model, ratings, movie_id, n=5):

# model = movie similarities matrix
# movie_id = target movie id
# ratings = dict of movie_id: rating


rated_movies = ratings.keys()
similar_movies = model[movie_id, rated_movies].argsort()[:-1]
top_n = [ratings.keys()[i] for i in similar_movies[:n]]

# Average rating weighted by similarity
scores = sum(model[movie_id, m] * ratings[m] for m in top_n)

prediction = float(scores) / sum(model[movie_id, m] for m in top_n)
return prediction

user_id = 10
movies_rated = np.where(R[user_id].todense() > 0)[1].tolist()
movie_ratings = R[user_id, movies_rated].todense().tolist()[0]

user_ratings = dict(zip(movies_rated, movie_ratings))
print('10 2.77208508312')
print('100 1.77135318293')
print('1000 2.35262213947')

for movie_id in (10,100,1000):
print(movie_id,predict_rating(similarities, user_ratings, movie_id))

A problem about the RBM based recommendation

In your implementation of RBM based CF, I see you directly fed the rating data in, ratings ranges from 1-5. I think this should corresponds to Guassian visible layer because the data is not binary, which is different from the original implementation of Ruslan and Hinton. However, your reconstruction, v_2 seems to be binary...

This confused me because I think in RBM, v_2 should try to reconstruct the input data, so I don't know why real input but binary reconstruction works. Can you give some explanation about this?