chrismbryant / amazon-confidence-interval Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 4.0 7.25 MB

A browser extension which adds Bayesian visualizations to Amazon ratings.

License: MIT License

JavaScript 100.00%

amazon-confidence-interval's People

Contributors

Stargazers

Watchers

Forkers

aeciorc musicin3d kgryte icodein

amazon-confidence-interval's Issues

Assess the difference between two stats methods

For each Amazon product, there exists a distribution of star ratings (number of ratings per star value). To compute the probability of positive experience (i.e. satisfaction probability) with the product, we'd like to assign 4 and 5 stars the binary label of "positive" and {1, 2, 3} stars the binary label of "not positive". Given this assignment, we assume that each rating can be viewed as an independent Bernoulli trial with fixed (but unknown) probability of success. Taking this assumption, we can use the Beta distribution to compute a confidence interval on our measurement in a straightforward way.

However, it is not feasible to access the full histogram of star ratings for a full page of products immediately on page load (Amazon locks you out if you fetch these results too quickly, and rate limiting results in a poor user experience). Instead, the only variables we have readily accessible to us are the average star rating and number of ratings for each product. Given these two pieces of information, we can obtain an alternative satisfaction probability and confidence interval by linearly scaling the star rating in range [1, 5] to a success probability in range [0, 1], then building a confidence interval off of that proportion in the same way as before.

What are the consequences of choosing the "average star rating" as a proxy for "proportion of 4 or 5 star ratings"? Can the Beta-distribution-derived confidence interval still be trusted?

Higher granularity in scraped ratings

@aeciorc To compute the confidence score, we'll need the fraction of ratings that came from each star value, ideally as an array like [0.1, 0, 0, 0.4, 0.5], with the values corresponding to the percentage distribution you see when you hover over the star rating (i.e. [1 star, 2 stars, ..., 5 stars]). Can your code be modified to retrieve this info?

Scraping fails to find rating sometimes

https://www.amazon.com/s?k=contact+solution

Uncaught (in promise) TypeError: Cannot read property 'innerText' of null
at ./src/chrome/inject.js.window.injectConfidenceInterval (VM4917 inject.js:127921)
at Module../src/chrome/inject.js (VM4917 inject.js:127935)
at webpack_require (VM4917 inject.js:20)
at VM4917 inject.js:84
at VM4917 inject.js:87

Declarative injection?

I'm curious why we're using programmatic injection. I noticed that we are manually re-injecting after navigation events, and we've made it handle being rerun. As I understand it, declarative injection automatically injects the content script into each page that matches the pattern. It seems to me that would remove some boilerplate. [How] does this project benefit from doing it programmatically?

Have multiple branches: Chromium, FF and Userscript

Run computations remotely?

If the statistical calculations are computationally expensive, we could run them on a server. We could even cache the results for frequently accessed products.

This would also allow us to do the scraping server side, which means we could instantly deploy updates to the scraping process. I've been working on a scraping library backed by crawler. I've been refining the api as I work on two other projects that depend on it. It's nearing the first release.

I'd be willing to host the service during alpha. If the costs aren't too bad, I wouldn't mind donating it from them on.

(Out of scope) Bonus: If we want to get fancy, we could fall back to a cheaper calculation directly in the extension if the server is unavailable. We could have a configuration option that disables the server side lookup, relying solely on in-extension calculations.

Ideal user experience

What are your thoughts on the ideal user experience for the extension? For a mathematically challenged user like me, I think ideally it would be a simple numerical score near the ratings that could be compared between different listings to inform the decision to buy one or the other. The score could be hovered to reveal the details and some explanation about how it's calculated and what it means, e.g "This listing has a score of 98, which means that based on the experiences of 345 buyers, you are 98% likely to give it 5 stars, 90% likely to give it at least 4 stars (...)".

I'm curious to know what everyone else's opinions are

test.html doesn't work anymore

I've attempted to fix this. I added type="module" to the script tag for calculations.js, but then it failed to find the stdlib file. I tried the full path to the file, but then Chrome complained that the file didn't provide a default export. I'm pretty sure Webpack does some extra work to make it easier on us. I thought about skipping the import altogether, but conditional imports are highly experimental and seemingly unsupported in Chrome. We may just need to delete test.html and find another way to experiment.

Game Theory considerations

This was just a thought, not sure if it is too relevant on the current scale, but wanted to get it out there.

If this extension were to become popular and be used by a large number of amazon users, it would be possible that everyone choosing the same recommended seller will overload the seller's logistics capacity to deliver, stock being available etc. and would therefore lead to a worse overall experience.

I assume that this would balance out if reviews come in significantly quick because the extension would then recommend a different seller but this might be something worth to consider? Especially if multiple sellers are almost identically recommendable.

Add a license (MIT?)

This project is unlicensed. No one else is allowed to create derivative works, which is technically what we're doing everytime we push updates.

chrismbryant / amazon-confidence-interval Goto Github PK

amazon-confidence-interval's People

Contributors

Stargazers

Watchers

Forkers

amazon-confidence-interval's Issues

Assess the difference between two stats methods

Higher granularity in scraped ratings

Scraping fails to find rating sometimes

Declarative injection?

Have multiple branches: Chromium, FF and Userscript

Run computations remotely?

Ideal user experience

test.html doesn't work anymore

Game Theory considerations

Add a license (MIT?)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent