Code Monkey home page Code Monkey logo

productgenius's Introduction

Product Genius

Product Genius enables consumers to make data-driven decisions about products. Most online shopping sites display a product's average rating, which can be misleading for products with few reviews. Product Genius implements its own rating system, using Bayesian averaging, to present a more robust metric to the consumer. Product Genius also aggregates useful information from customer reviews, using machine learning to extract the most relevant terms from positive and negative reviews of a product. Finally, users can search within a product's reviews and favorite the most relevant ones, to assist with side-by-side product comparisons.

Product Genius is deployed on heroku: https://productgenius.herokuapp.com/

Technology Stack

Frontend: HTML, CSS, Javascript, jQuery, jQcloud, Chart.js, Bootstrap, Jinja2
Backend: Python, Flask, PostgreSQL, SQLAlchemy, NumPy, Scikit-Learn

Product Genius has 74% test coverage - see tests.py

image

Search for Products

Users can search for products on the homepage or from the navbar. When the user clicks Search, the input from the form is passed to the postgres database. The database performs an intelligent and efficient search, handling case, stemming the query, and returning the most relevant products. Relevancy is determined by ranking matches to the product title above matches to the product description. The entire search is performed efficiently with a GIN index. You can read more about full-text search in postgres here.

image

Product Genius scores using Bayesian Averaging

Users can view the Product Genius score on the product details page. Most websites report a product's average review score - this is a poor metric when a product only has a few reviews! Two five-star ratings are not the same as 500. Product Genius understands this and uses Bayesian logic to provide a score that better represents customer satisfaction. The Product Genius score is a weighted average of a product's actual reviews and a prior expectation that unreviewed products would be on average a "3". Read more about Bayesian Averages here

image

Extracting Review Keywords with Naive Bayes

Rather than reading lengthy product descriptions, specs and reviews, users can quickly get a summary of the good and bad about a product by looking at the review keywords. These clouds were populated with a Naive Bayes model built with scikit-learn. Reviews were considered positive if they rated a product a 4 or 5, and negative if they rated a product a 1 or a 2. After the classifier was trained to predict positive and negative reviews, the 10 words with the highest likelihoods of being in either class were extracted. The model was cross-validated using KFolds on a sample of 50 products, yielding over 90% precision and recall. The model code can be found in keyword_extraction.py.

image

Advanced review search

Scanning through hundreds of product reviews is time consuming! Product Genius allows users to search directly within product reviews for the terms they care about. Like the product search, the advanced review search handles case, stemming, and ranks results by relevancy. Reviews are considered more relevant if the term appears in the title than the body of the review. The search call is made via AJAX, and a jquery highlight plugin was used to highlight the user's search query in the results.

image

Favoriting Products and Reviews

To use this feature, users must create an account or sign in prior to searching.

image

When the user clicks on Favorite next to a product, or the heart next to a review, that information is stored in the database. If a user favorites a review, the product is automatically favorited. If a user unfavorites a product, all of their favorited reviews are removed as well.

image

A user can view their favorite products and reviews by visiting their profile page.

image

Set up

Clone or fork this repo:

https://github.com/michberr/ProductGenius.git

Create and activate a virtual environment inside your project directory:

virtualenv env
source env/bin/activate

Install the requirements:

pip install -r requirements.txt

Set up the database:

psql product_genius < product_genius.sql

** The code and data that seeded the original database can be found in seed.py and /data

Run the app:

python server.py

Navigate to localhost:5000/ to begin researching products!

Future Plans

Features I plan to implement in the future include testing for bimodal product rating distributions, additional sorting and filtering features for products, and improvements to the existing keyword extraction model using stemming and corrections for class imbalance.

The Author

Product Genius was created by Michelle Berry, a software engineer and data scientist in San Francisco. Michelle completed the Software Engineering Fellowship at Hackbright Academy and holds graduate and undergraduate degrees in Earth Systems and Human Biology from Stanford University.

Learn more about Michelle at her LinkedIn.

productgenius's People

Contributors

michberr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.