Code Monkey home page Code Monkey logo

amazon-confidence-interval's People

Contributors

aeciorc avatar chrismbryant avatar dependabot[bot] avatar kgryte avatar musicin3d avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-confidence-interval's Issues

Assess the difference between two stats methods

For each Amazon product, there exists a distribution of star ratings (number of ratings per star value). To compute the probability of positive experience (i.e. satisfaction probability) with the product, we'd like to assign 4 and 5 stars the binary label of "positive" and {1, 2, 3} stars the binary label of "not positive". Given this assignment, we assume that each rating can be viewed as an independent Bernoulli trial with fixed (but unknown) probability of success. Taking this assumption, we can use the Beta distribution to compute a confidence interval on our measurement in a straightforward way.

However, it is not feasible to access the full histogram of star ratings for a full page of products immediately on page load (Amazon locks you out if you fetch these results too quickly, and rate limiting results in a poor user experience). Instead, the only variables we have readily accessible to us are the average star rating and number of ratings for each product. Given these two pieces of information, we can obtain an alternative satisfaction probability and confidence interval by linearly scaling the star rating in range [1, 5] to a success probability in range [0, 1], then building a confidence interval off of that proportion in the same way as before.

What are the consequences of choosing the "average star rating" as a proxy for "proportion of 4 or 5 star ratings"? Can the Beta-distribution-derived confidence interval still be trusted?

Higher granularity in scraped ratings

@aeciorc To compute the confidence score, we'll need the fraction of ratings that came from each star value, ideally as an array like [0.1, 0, 0, 0.4, 0.5], with the values corresponding to the percentage distribution you see when you hover over the star rating (i.e. [1 star, 2 stars, ..., 5 stars]). Can your code be modified to retrieve this info?

Declarative injection?

I'm curious why we're using programmatic injection. I noticed that we are manually re-injecting after navigation events, and we've made it handle being rerun. As I understand it, declarative injection automatically injects the content script into each page that matches the pattern. It seems to me that would remove some boilerplate. [How] does this project benefit from doing it programmatically?

Run computations remotely?

If the statistical calculations are computationally expensive, we could run them on a server. We could even cache the results for frequently accessed products.

This would also allow us to do the scraping server side, which means we could instantly deploy updates to the scraping process. I've been working on a scraping library backed by crawler. I've been refining the api as I work on two other projects that depend on it. It's nearing the first release.

I'd be willing to host the service during alpha. If the costs aren't too bad, I wouldn't mind donating it from them on.

(Out of scope) Bonus: If we want to get fancy, we could fall back to a cheaper calculation directly in the extension if the server is unavailable. We could have a configuration option that disables the server side lookup, relying solely on in-extension calculations.

Ideal user experience

What are your thoughts on the ideal user experience for the extension? For a mathematically challenged user like me, I think ideally it would be a simple numerical score near the ratings that could be compared between different listings to inform the decision to buy one or the other. The score could be hovered to reveal the details and some explanation about how it's calculated and what it means, e.g "This listing has a score of 98, which means that based on the experiences of 345 buyers, you are 98% likely to give it 5 stars, 90% likely to give it at least 4 stars (...)".

I'm curious to know what everyone else's opinions are

test.html doesn't work anymore

I've attempted to fix this. I added type="module" to the script tag for calculations.js, but then it failed to find the stdlib file. I tried the full path to the file, but then Chrome complained that the file didn't provide a default export. I'm pretty sure Webpack does some extra work to make it easier on us. I thought about skipping the import altogether, but conditional imports are highly experimental and seemingly unsupported in Chrome. We may just need to delete test.html and find another way to experiment.

Game Theory considerations

This was just a thought, not sure if it is too relevant on the current scale, but wanted to get it out there.

If this extension were to become popular and be used by a large number of amazon users, it would be possible that everyone choosing the same recommended seller will overload the seller's logistics capacity to deliver, stock being available etc. and would therefore lead to a worse overall experience.

I assume that this would balance out if reviews come in significantly quick because the extension would then recommend a different seller but this might be something worth to consider? Especially if multiple sellers are almost identically recommendable.

Add a license (MIT?)

This project is unlicensed. No one else is allowed to create derivative works, which is technically what we're doing everytime we push updates.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.