Code Monkey home page Code Monkey logo

rc_for_yelp's Introduction

Final Project - Personalization Theory

Authors: Bertrand Thia-Thiong-Fat (bt2513), Jeremy Yao (jy3015), Paul Doan (pqd2001)

Directories and files

Please look at the requirements file to learn about dependencies and useful packages to reproduce our results.

We splitted our work in 6 notebooks. Each of them is independent and the code can be run independently from one or the other.

The titles explicitly describes the content of each notebook:

  1. Introduction
  2. Data Preprocessing
  3. Baseline Model
  4. Content-based Model
  5. Deep Learning Model
  6. Conclusion

Finally, please find the datasets used to test our models during this study.

The Objectives

Context

We are placing ourselves in the position of Senior Data Scientists at a company that recommends local businesses. We wish to focus on a particular business objective: predict accurately the latest rating of all active users of the website Yelp. Being able to accurately predict the last rating of a given user allows for a better understanding of their current preferences well. As a result, we can recommend other businesses that the user could potentially be interested in. This explains why we decided to focus on making accurate predictions to understand consumer preferences and drive valuable insights for Yelp's business.

We will study different models and compare them to suggest the best available tool for Yelp. Our work attempts at predicting customer ratings accurately and does not address the cold start problem.

In the end, we will decide if the created algorithm can really be used in a real situation for Yelp or if more studies and more data need to be available in order to provide an effective and reliable recommender system.

Content

To make the data more tractable, we will proceed to reduction of the size of the datasets and strive to obtain unbiased samples. We will reduce the original data to approximately 500k ratings. As a next step, we will develop 3 different models of recommender systems. We will start with a user-based collaborative filtering model, which will be also act as a baseline for comparison with other models. For instance, we will create a collective factorization algorithm and develop a deep learning model.

We will also conclude our analysis by comparing the different models and methods to recommend relevant local businesses.

The data

Our full dataset can be found here: https://www.yelp.com/dataset/challenget

Quantitative results

See the different Notebooks. They unravel our workflow, from problem definition to our results, limitations and future works.

References and useful links

scikit-learn documentation: https://scikit-learn.org/stable/documentation.html

keras documentation: Useful to create our AutoEncoder models and train them: https://keras.io

Kuchaiev, Oleksii, and Boris Ginsburg. "Training deep autoencoders for collaborative filtering." arXiv preprint arXiv:1708.01715 (2017).

rc_for_yelp's People

Contributors

bt2513 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.