Comments (6)
@DanielGoldfarb thanks for bringing this to my attention. I've just deployed an update to the application which has IP-based rate limiting as you suggested. I'm hoping that this will mitigate similar situations in the future.
Based on the application logs, there was a PHP client that was repeatedly fetching recent download counts for official python packages associated with cloud providers, namely AWS, Azure, and Aliyun (Alibaba cloud). There are about 100 or so packages for these cloud providers, and so the client was requesting all of them several times per minute, resulting in server timeouts.
FWIW I built this project in about a week between jobs a while back and haven't really touched the code since then. It's admittedly in rough shape and I'd like to improve it so that others can run it locally and contribute. Improvements have been on my TODO list for a while, and I'll make a more serious effort when I have more free time.
from pypistats.org.
I just noticed this issue reported previously. Is there no way to throttle API requests, and/or detect a huge number of API requests from the same client and respond with a failure and a message about getting the data directly from BigQuery instead?
from pypistats.org.
If the problem really is what was reported previously, that is, someone is hammering the API when they can and should be going directly to BigQuery instead, it is definitely possible to throttle the rate of requests and prevent such inadvertant denial of service attacks.
I don't know exactly how to implement it, but I do know, for example, that I use the Alphavantage API sometimes to get market data. The API requires me to provide a uniquely generated key along with my requests (and obtaining a key requires an email address). For the free version of the API, if the same key makes 5 requests in under a minute, then all further requests immediately return an error for the remainder of the minute (after which another 5 requests are permitted in the next minute. This greatly limits the load on the servers. (The paid version also throttles, but allows a faster request rate). I'm sure most, if not all, of these API's do some kind of throttling to prevent [even unintended] denial of service.
If someone reading this knows how to implement such a throttle, it will certainly be appreciated if you can do so and provide a pull request. Or if you can provide me with enough information/examples of similar code then I may be implement it myself.
On the other hand, if this really just some sort of hosting/server issue, I'm not sure how to even being to investigate and/or fix that.
from pypistats.org.
This appears to have gotten better starting yesterday afternoon. So the problem lasted about a week. I still think relying on the kindness of API clients to behave well, and not hammer the API, is not a healthy software approach. There needs to be some kind of a throttle built into the API itself. Will leave this open and see if/when it happens again.
from pypistats.org.
@crflynn Thanks!
from pypistats.org.
Since at least yesterday https://pypistats.org/ is not available. It returns a "502 Bad Gateway" error:
Not able to ping the server neither.
ΞΆ ping pypistats.org
PING pypistats.org (3.234.152.95) 56(84) bytes of data.
^C
--- pypistats.org ping statistics ---
104 packets transmitted, 0 received, 100% packet loss, time 104373ms
from pypistats.org.
Related Issues (20)
- pypistats.org has NO new data since July 19. HOT 13
- pypistats reporting 502 HOT 1
- Missing data since ~2021-03-22 HOT 6
- Package Not Showing HOT 1
- Wheel statistics
- [feature requeset] downloads per user
- Website is not functioning HOT 2
- CORS Headers Issue
- Intermittent 429 RATE LIMIT EXCEEDED HOT 4
- API: 404 is returned for some endpoints but not others
- Download stats dropped close to 0 since 2011-11-24? HOT 5
- broken links to Google BigQuery
- [feature request] new endpoint: `/api/top`
- Sort python versions in natural ordering
- [feature request] Toggle last 30/60/90 days for all the graphs
- List dependent packages
- Feature request: get most recent data including mirrors
- optional dependencies are incorrect HOT 2
- Add `<package name>` on page's title
- No Download Statistics pre-2016 on Google BigQuery HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pypistats.org.