Comments (16)
I think I need more context on this one — the S3 archiver or Vortex or both (both write events to S3)? Which Redshift data? For which customers?
from astronomer.
I updated the description to clarify - we don't want any trace of customer data left in our S3 after the data is loaded to Redshift.
from astronomer.
Okay, so for the Vortex bucket only delete it after the load by Clickstream DAG has succeeded, right? And for failed load tasks, should we keep it for some amount of time (24 hours? 7 days?) so we can retry or drop the data?
from astronomer.
from astronomer.
What about just setting an expiration policy on the s3 bucket to automatically delete objects after X days?
from astronomer.
I agree with this @cwurtz - simplest thing to solve that issue
from astronomer.
Yea, definitely.
from astronomer.
I'll close this out with S3 bucket policies on the 2 buckets today. I'm going to set a default expiration date of 9999 days for now just to have it in place.
@schnie @ryw Can one of you guys give me the official # of days we want to dial this down to? We could do it now, or have this scoped at having the policy setup knowing that we can change it in 1-click when we want to drop the data. I don't want to take deleting our primary source of truth lightly.
from astronomer.
@tedmiston - can this be closed?
from astronomer.
Yep, closing as done. @timbrunk
I've created lifecycle rules in the two buckets known to have clickstream events (astronomer-clickstream-prod
, astronomer-workflows
).
Per my previous comment, the lifespan is set high as a placeholder, so sometime before 5/25 we should decide what to set that value to permanently. I created a follow up issue for that here https://github.com/astronomerio/team/issues/140 so we don't forget.
from astronomer.
We need to delete buckets astronomer-archive
and astronomer-archive-dev
— I don't have permissions to delete, @schnie can you do it?
from astronomer.
@ryw I have the ability to delete everything in the buckets. Do we need to make a backup before deleting these for good or just blow them out?
(When I did this one, I stuck to the scope above that everyone already agreed to of just adding lifecycle policies for this issue.)
On the Metrics page, astronomer-archive-dev has the same # of objects today as 10 days ago but astronomer-archive looks like something is still writing to it at least as far as the graph shows right now.
P.S. It's confusing from GitHub to un-assign people after tickets are done since it makes the issue disappear from our completed lists but without sending a notification.
from astronomer.
Delete both buckets please. I turned off process tonight that was writing to astronomer-archive
and we can't keep that data.
from astronomer.
Sure thing. astronomer-archive-dev
is now emptied and deleted. astronomer-archive
is emptying now just waiting on queued delete tasks to finish - it looks like this could take an hour or more. I'll check it again EOD.
from astronomer.
Alright, the astronomer-archive
delete job either timed out or is still running server-side but without we can't tell from the AWS Console.
Apparently deleting a multiple TB bucket takes a while. Ours has 130M objects. I see posts suggesting with s3nukem we can delete up to 10k objects/minute.
I just tried the hack in the last post of adding a 1-day lifecycle policy to jumpstart it. I'll check this again over the weekend to see where it's at.
- https://serverfault.com/questions/679989/most-efficient-way-to-batch-delete-s3-files
- https://robertlathanh.com/2010/07/s3nukem-delete-large-amazon-s3-buckets/
- https://www.reddit.com/r/aws/comments/8b04x6/how_to_delete_large_s3_buckets_easily/
- http://www.heystephenwood.com/2012/08/how-to-delete-large-s3-buckets-easily.html
from astronomer.
Alright, so my 1-day lifecycle trick worked. However, something is still actively writing to the astronomer-archive bucket. Hundreds of new files were created today. I'm spinning off a separate issue https://github.com/astronomerio/astronomer-cloud/issues/224 for that extra work and will ask in Slack.
from astronomer.
Related Issues (20)
- [This was actually changed to 5Gi](https://github.com/astronomer/astronomer/commit/061c85e040d50a849e3ed307e19eed15d53699eb), which means these are out of sync and the alert will not fire at 90% using these defaults. We should update charts/prometheus/templates/prometheus-alerts-configmap.yaml with a comment pointing to this default, and also figure out how we want to reconcile this difference. IMHO we can just bump this to 5Gi.
- Help needed - additionalVolume not parsable by helm with version 0.25.8 HOT 3
- Kibana content-security-policy page errors HOT 4
- Fluentd flush_at_shutdown?
- [HELM] Hardcoded Ingress hostnames for main Astronomer components HOT 2
- [HELM] Not possible to set custom Ingress annotations without enabling authSidecar HOT 3
- [HELM] Not possible to use Deployment for FluentD HOT 4
- Add ability to provide individual subdomain tls cert HOT 1
- K8s v1.22 support HOT 2
- Add k8s 1.22 support for astronomer airflow-chart HOT 1
- cannot use 'airflow db shell' command in scheduler container in astro cli HOT 5
- Provide correct email templates for developers if runtime upgrade fails. HOT 1
- Expose prometheus scrape intervals HOT 1
- email recipients within the UI not receiving deployment unhealthy email alerts HOT 1
- postgres grant statement fails with user@host:pw@host syntax HOT 16
- False alert for alertname: TargetDown HOT 4
- Feature Request: Filterable Deployments on Lineage Issues Page HOT 2
- performanceOptimizationModeEnabled - Needs some elaboration in document HOT 2
- helm uninstall doesn't remove jobs, secrets and pvcs HOT 1
- Deprecated Metrics Usage (kube-state-metrics)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from astronomer.