Code Monkey home page Code Monkey logo

Comments (9)

savingoyal avatar savingoyal commented on May 17, 2024 3

Got it. We have a similar issue open at #16.

from metaflow.

jinnovation avatar jinnovation commented on May 17, 2024 2

As of today, Metaflow now appears to support GCP. 🔥

from metaflow.

savingoyal avatar savingoyal commented on May 17, 2024

@manesioz Curious, what does your tech stack look like on GCP? Kubernetes +GCS + Airflow?

from metaflow.

manesioz avatar manesioz commented on May 17, 2024

We actually run Airflow on Cloud Composer, and our data lake is in BigQuery. We're currently considering migrating to Kubernetes

from metaflow.

barrywhart avatar barrywhart commented on May 17, 2024

At Mailchimp, we also use Cloud Dataflow.

We could potentially contribute to the effort to support GCP. In particular, we have a battle-tested @retry decorator that retries according to Google Cloud's documented policy: https://cloud.google.com/apis/design/errors.

We would be happy to share this code for inclusion in Metaflow. Our decoraor incorporates a fork of the Apache 2.0 licensed retrying package, which appears to be unmaintained at this point. The fork was necessary because on GCP, there is a case where the wait period between retries depends on the type of error, which was not supported by retrying:

For 429 RESOURCE_EXHAUSTED errors, the client may retry at the higher level with minimum 30s delay. Such retries are only useful for long running background jobs.

from metaflow.

savingoyal avatar savingoyal commented on May 17, 2024

@barrywhart We would be happy to engage on a POC. @jaychia already has a PR out for GCS integration.

from metaflow.

barrywhart avatar barrywhart commented on May 17, 2024

@savingoyal: We are not currently using Metaflow, but I see some potential for using it in some cases as an alternate to Airflow (complex!) and bash scripts (may not always be powerful enough for our needs). So I want to help, but also need to time box my involvement for now.

Can you point me to the GCS PR? Any thoughts on how the package might accomodate multiple @retry implementations? Could it literally just be a different decorator in a different module, or is there a need for a single, "polymorphic" @retry decorator?

from metaflow.

jaychia avatar jaychia commented on May 17, 2024

@savingoyal: We are not currently using Metaflow, but I see some potential for using it in some cases as an alternate to Airflow (complex!) and bash scripts (may not always be powerful enough for our needs). So I want to help, but also need to time box my involvement for now.

Can you point me to the GCS PR? Any thoughts on how the package might accomodate multiple @retry implementations? Could it literally just be a different decorator in a different module, or is there a need for a single, "polymorphic" @retry decorator?

#153 - please feel free to contribute or comment

The Metaflow S3 datastore internally does its own error handling for storage-client-related retries (retry N number of times if an error that isn't metaflow-related is thrown). I replicated that logic for the GCS datastore. See:
https://github.com/Netflix/metaflow/pull/153/files#diff-88a07e3f313e3d7fec566c156ed68baeR28-R51

Also, tenacity is a great retrying package that should be able to do the custom retry logic that you mentioned (wait period depends on type of error).

from metaflow.

candalfigomoro avatar candalfigomoro commented on May 17, 2024

Is PR #153 still relevant?

from metaflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.