Code Monkey home page Code Monkey logo

Comments (16)

tedmiston avatar tedmiston commented on May 30, 2024

I think I need more context on this one — the S3 archiver or Vortex or both (both write events to S3)? Which Redshift data? For which customers?

from astronomer.

ryw avatar ryw commented on May 30, 2024

I updated the description to clarify - we don't want any trace of customer data left in our S3 after the data is loaded to Redshift.

from astronomer.

tedmiston avatar tedmiston commented on May 30, 2024

Okay, so for the Vortex bucket only delete it after the load by Clickstream DAG has succeeded, right? And for failed load tasks, should we keep it for some amount of time (24 hours? 7 days?) so we can retry or drop the data?

from astronomer.

schnie avatar schnie commented on May 30, 2024

from astronomer.

cwurtz avatar cwurtz commented on May 30, 2024

What about just setting an expiration policy on the s3 bucket to automatically delete objects after X days?

from astronomer.

ryw avatar ryw commented on May 30, 2024

I agree with this @cwurtz - simplest thing to solve that issue

from astronomer.

schnie avatar schnie commented on May 30, 2024

Yea, definitely.

from astronomer.

tedmiston avatar tedmiston commented on May 30, 2024

I'll close this out with S3 bucket policies on the 2 buckets today. I'm going to set a default expiration date of 9999 days for now just to have it in place.

@schnie @ryw Can one of you guys give me the official # of days we want to dial this down to? We could do it now, or have this scoped at having the policy setup knowing that we can change it in 1-click when we want to drop the data. I don't want to take deleting our primary source of truth lightly.

from astronomer.

timbrunk avatar timbrunk commented on May 30, 2024

@tedmiston - can this be closed?

from astronomer.

tedmiston avatar tedmiston commented on May 30, 2024

Yep, closing as done. @timbrunk

I've created lifecycle rules in the two buckets known to have clickstream events (astronomer-clickstream-prod, astronomer-workflows).

Per my previous comment, the lifespan is set high as a placeholder, so sometime before 5/25 we should decide what to set that value to permanently. I created a follow up issue for that here https://github.com/astronomerio/team/issues/140 so we don't forget.

from astronomer.

ryw avatar ryw commented on May 30, 2024

We need to delete buckets astronomer-archive and astronomer-archive-dev — I don't have permissions to delete, @schnie can you do it?

from astronomer.

tedmiston avatar tedmiston commented on May 30, 2024

@ryw I have the ability to delete everything in the buckets. Do we need to make a backup before deleting these for good or just blow them out?

(When I did this one, I stuck to the scope above that everyone already agreed to of just adding lifecycle policies for this issue.)

On the Metrics page, astronomer-archive-dev has the same # of objects today as 10 days ago but astronomer-archive looks like something is still writing to it at least as far as the graph shows right now.

P.S. It's confusing from GitHub to un-assign people after tickets are done since it makes the issue disappear from our completed lists but without sending a notification.

from astronomer.

ryw avatar ryw commented on May 30, 2024

Delete both buckets please. I turned off process tonight that was writing to astronomer-archive and we can't keep that data.

from astronomer.

tedmiston avatar tedmiston commented on May 30, 2024

Sure thing. astronomer-archive-dev is now emptied and deleted. astronomer-archive is emptying now just waiting on queued delete tasks to finish - it looks like this could take an hour or more. I'll check it again EOD.

from astronomer.

tedmiston avatar tedmiston commented on May 30, 2024

Alright, the astronomer-archive delete job either timed out or is still running server-side but without we can't tell from the AWS Console.

Apparently deleting a multiple TB bucket takes a while. Ours has 130M objects. I see posts suggesting with s3nukem we can delete up to 10k objects/minute.

I just tried the hack in the last post of adding a 1-day lifecycle policy to jumpstart it. I'll check this again over the weekend to see where it's at.

from astronomer.

tedmiston avatar tedmiston commented on May 30, 2024

Alright, so my 1-day lifecycle trick worked. However, something is still actively writing to the astronomer-archive bucket. Hundreds of new files were created today. I'm spinning off a separate issue https://github.com/astronomerio/astronomer-cloud/issues/224 for that extra work and will ask in Slack.

from astronomer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.