Comments (7)
I captured the cloudfront logs for most of a day, and did some analysis. there are a very small number of IP addresses that account for the vast majority of our served traffic. Most of these, are in AWS. And they look like they keep requesting the same deb
files. I would bet these are VPC gateways to large sites with frequent or ephemeral node creation.
If we recognize that most of our traffic is within AWS, then we can plan hosting around that. Notably, transfer from S3 to an endpoint within the same region is free. This leads to a possible solution.
We create a bucket per region, replicate our package content, and then bounce users to these direct s3 URLs. There are, of course, implementation questions...
Possible Implementations:
What | Pro | Con |
---|---|---|
Lambda | Lots of control; probably inexpensive | Served from US; need to write it |
Cloud Front Function | Easily fits what we're doing today | limited functionality |
Lambda @ Edge | Global | need to write it; limited languages |
S3 MRAP | AWS implementation of what I'd write | Unclear if custom URLs are supported; unclear price |
Route53 Geo Load Balancing | Simple | Unclear if we can do the custom URLs for s3 |
Just use s3 from us-east-1 | Simple | Need to rename bucket; can't easily do other regions |
I suspect that the best route for me, is to write the lambda. It gets us off cloudfront, go is supported, etc. But, that's going to take more more than a couple days. So as a stop gap, I've gone and implemented a CloudFront function to redirect users from our top 15 IP addresses to the bucket closest to them.
Getting to that, has a bunch of other pieces:
- There are now 3 additional buckets serving packages
- S3 is configured to replicate between them
- Cloudfront has a viewer request function to generate URLs for the top N users
- I enabled s3 storage metrics, they look pretty cheap
from foundation.
Another approach here, is to use github. There is some prior art in making an apt repo in a github pages:
from foundation.
CloudSmith offers OSS package hosting sponsorships, but we'd have to contact them: https://help.cloudsmith.io/docs/open-source-hosting-policy
GitHub Packages is free for OSS. Can we use this? https://docs.github.com/en/packages/learn-github-packages/introduction-to-github-packages ah....maybe only for Chocolatey / NuGet https://docs.github.com/en/packages/learn-github-packages/introduction-to-github-packages#supported-clients-and-formats
from foundation.
CloudSmith offers OSS package hosting sponsorships, but we'd have to contact them: https://help.cloudsmith.io/docs/open-source-hosting-policy
Always good to have more options! They're quoted bandwidth caps are orders of magnitude below our current bandwidth. (this is also true for packagecloud, and the various other places I've seen). This may be different if my S3 redirects pan out.
GitHub Packages is free for OSS. Can we use this? https://docs.github.com/en/packages/learn-github-packages/introduction-to-github-packages ah....maybe only for Chocolatey / NuGet https://docs.github.com/en/packages/learn-github-packages/introduction-to-github-packages#supported-clients-and-formats
I want this to be useful. But the supported repos are very lacking.
from foundation.
CloudSmith offers OSS package hosting sponsorships, but we'd have to contact them: https://help.cloudsmith.io/docs/open-source-hosting-policy
Always good to have more options! They're quoted bandwidth caps are orders of magnitude below our current bandwidth. (this is also true for packagecloud, and the various other places I've seen). This may be different if my S3 redirects pan out.
Yes, but the way I read it, those limits are the guaranteed for any open-source project, and if you accept a sponsorship agreement of some kind there would be other unstated limits (perhaps negotiable up to where we need them). Not sure if osquery is high-profile enough for them but we could ask.
We definitely want to get this hosting bill off your wallet, it's not sustainable and is kind of an existential risk to the project if you suddenly have to cut out.
I think we want to encourage these top downloaders to manage their own package cache. We could write a tutorial that explains how to create private mirrors of package repos so that they can point their ephemeral VMs to that instead of constantly re-downloading from our S3 bucket. Maybe we can even pitch it as a cost saver for them too, if it means less inbound network traffic cost to them. What if we could rate limit by IP address or IP range, whichever would be effective? Not for everyone, just for the repeat downloaders. Eventually they will notice that they should be caching.
from foundation.
Wondering if you have more stats:
- size of the actual repo, in terms of distinct objects & byte count
- From the Fastly engagement, showing how much origin traffic there was?
I'll send an email to @directionless making some introductions.
from foundation.
I'm not great at updating tickets...
I thought a bit, and realized that something seemed very off. The number of requests we see (about 2.5 million/day) is very very high for osquery packaging. So I went digging into the actual users of the data. Is it actually credible that we see ~30 computers a second download osquery?
Anyhow, I discovered that there is a single consumer that is responsible for the vast bulk of the traffic. I don't know much about it, other than it's an VPC in AWS us-east-1, and they keep downloading the ubuntu x86 package.
Because AWS in-zone S3 data transfer is free, this leads to a simple solution. For the busy clients, we can redirect them directly to the bucket. I went and setup some redirect magic in cloud-front to bounce the top 10 AWS ips to direct bucket links. Our monthly bill is now much more manageable. Though probably still a bit higher than desired.
I think I can get it even lower, by moving my redirecting project into lambda, and completely moving away from cloud-front.
from foundation.
Related Issues (20)
- Get more contributors HOT 1
- Provide CLA to people *before* they sign it HOT 2
- Move osquery DNS to osquery HOT 1
- Move osquery downloads HOT 2
- TSC Transition Ryan to Nick HOT 8
- Is it time for a user survey? HOT 2
- Move SECURITY.md to github security issues HOT 4
- Feedback solicited about changing apple signing keys HOT 6
- Transfer osquery-go to the foundation HOT 3
- Add Sharvil as an Apple Developer HOT 1
- Setup Fastly HOT 2
- Sponsor Shoutouts HOT 2
- CI Support for aarch64 (AWS graviton2) HOT 15
- Proposal: Elect Seph as TSC chair HOT 8
- Proposal: Elect Sharvil as TSC member HOT 9
- Proposal: Remove groob@ from TSC role HOT 2
- Proposal: Give PR review approval to mike-myers-tob
- Proposal: Add Marcos Oviedo as an osquery committer HOT 5
- Propose: Add additional maintainers to `osquery-go` HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from foundation.