Code Monkey home page Code Monkey logo

amazon_vine_analysis's Introduction

Amazon Vine Analysis

Project Overview

Analyzing the Amazon customer review dataset is the goal of this project. The dataset is extracted, the data is transformed, and a connection to an AWS RDS instance is made using PySpark. After transformation, the data is put into pgAdmin. The dataset is examined with Pandas to see whether there is any bias favouring positive ratings from "Amazon Vine" users.

Resources

Data Source

The dataset is extracted from Amazon's US Reviews Dataset. From this list, the Video Game dataset is chosen for analysis. Click here to download the full Video Game dataset.

Softwares:

Visual Studio Code Postgres AWS Pandas Spark Open In Colab

What is Amazon Vine?

Through the Amazon Vine programme, a curated group of Amazon consumers known as "Vine Voices" can provide reviews for manufacturers and publishers' items. These Vine Voices were selected based on a number of factors, including the quantity and quality of their reviews. These Vine Voices are compelled to write reviews in return for free goods. Voices are not compensated, according to Amazon's Vine help manual, and the company appreciates "honest opinions regarding the product."

Extract, Transform, Load (ETL)

Using Amazon_Reviews_ETL.ipynb, Customers_df, Products_df, Review_id_df, and Vine_df are the DataFrames that were created from the video game dataset. Each of these DataFrames is written to an existing table in pgAdmin after connecting to the AWS RDS instance. For security reasons, the password and url used to setup the RDS settings have been concealed; you must enter your own information in this section.

postgres_table

For the purpose of this project, only the vine_table is necessary, which is exported using PySpark as vine_table.

Determining Review Bias

To determine if there is any review bias, PySpark is used to filter and create new DataFrames. This potion of the analysis is found in Vine_Review_Analysis.ipynb.

The vine_table is read in as DataFrame:

vine_df

In the first filter, vine_df is filtered to only show rows where the number of total votes is greater than or equal to 20. Doing this will help pick reviews that more likely to be helpful and to avoid having division by zero errors. This filter is saved as a new DataFrame.

first_filter

A second filter (Filter #2) is then used on previous filter (Filter #1) to create a new DataFrame that retrieves all rows where the number of helpful votes divided by the total votes is greater than or equal to 50%.

second filter

Finally, two more DataFrames are created to separate Filter #2 between reviews written as part of the Vine program (paid) and reviews not part of the Vine program (unapid). After creating these final DataFrames, the following metrics are determined:

  • The total number of reviews.
  • The number of 5-star reviews.
  • The percentage of 5-star reviews (Paid and Unpaid).

Results

For the Video Game dataset:

  • There are only 94 Vine reviews.
    • 48 of Vine reviews gave 5-stars.
    • Approximately 51.06% of Vine reviews were 5-stars.
  • There are 40,471 non-Vine reviews.
    • 15,663 non-Vine reviews gave 5-stars.
    • Approximately 38.70% of non-Vine reviews were 5-stars.

Summary

Based on this analysis, there appears to be a positivity bias among Video Game reviews in the Vine program. While only 38.70% of regular reviews gave 5-stars, 51.06% of Vine reviews gave 5-stars.

It should be emphasised, nonetheless, that this dataset does not represent any particular product. This dataset includes a wide range of various hardware, software, and add-ons for various video game consoles. Due to the wide range of items, this analysis must be performed on the dataset as a whole rather than on specific products. Additionally, just 0.23% of the 40,565 data points examined were Vine reviews. This number of reviews is insufficient to have an impact on the overall evaluation of the products available for purchase on Amazon.

Recommendations for Further Analysis

Comparing the typical Vine review ratings to typical customer ratings is another analysis that might be done on this dataset to investigate the potential for positivity bias. If it turns out that Vine customers have higher star ratings on average than customers who don't use Vine, this may be evidence of positivity bias.

amazon_vine_analysis's People

Contributors

whysocodius avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.