Code Monkey home page Code Monkey logo

sample-collaborative-filt-recsyst's Introduction

Item-based Collaborative Filtering Recommender System

Project Description

Business Context: You are hired as a Data Science and AI for an e-commerce company named "Terra Store." Terra Store is looking to enhance its marketing strategy by predicting customer purchase behavior based on historical data. The company wants to build an AI-powered application that can provide insights into which products a customer is likely to purchase next.

Setup

Prerequisite Packages (Dependencies)

  • numpy==1.26.4
  • pandas==2.2.0
  • scikit-learn==1.4.1
  • scipy==1.12.0
  • streamlit==1.31.1

Environment

CPU AMD Ryzen 5 5600H
RAM 16,384 MB
OS Windows 11 64-bit (10.0, Build 22621)

Dataset

Check folder datasets/ for more info.

Methodology

Internal documentations can be found within the code (redirect to docs/main_ipynb.pdf).

  1. All datasets were loaded and preprocessed (e.g., impute missing values, if any).
  2. All tables were joined and then pivoted to create a raw distance matrix.
  3. Cosine similarity was used to find correlation between items.
  4. For ratings prediction,
    • The similarity scores (sim_arr) were filtered using the filtering array to keep only the scores for products that the user has rated.
    • Then, the top max_neighbor most similar products was selected based on their similarity scores.
    • The user's ratings for these top max_neighbor products were retrieved from the user_item_matrix.
    • Finally, the predicted rating was calculated as the weighted average of the user's ratings for the top similar products, where the weights are the similarity scores. The weights were then normalized by their sum to ensure they add up to 1.
  5. By this, the algorithm can recommend products for a single user based on ratings (examine the function get_recommendation()).

How to Run the App (Streamlit)

  1. Create the Python virtual environment (VENV) first: python -m venv <VENV_NAME>.
  2. Activate your VENV and install all packages (from requirements.txt file).
  3. To run the app, execute this command: streamlit run main.py.

App Display

Notice that the highlighted red box is for the user's inputs.

Remarks

  • Due to limited datasets, the result may seems inaccurate (or even nan).
  • For forthcoming works, kNN can be used to measure the similarity distance between items.

Contact

sample-collaborative-filt-recsyst's People

Contributors

nicholasdominic avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.