Comments (9)
Got it. We have a similar issue open at #16.
from metaflow.
As of today, Metaflow now appears to support GCP. 🔥
from metaflow.
@manesioz Curious, what does your tech stack look like on GCP? Kubernetes +GCS + Airflow?
from metaflow.
We actually run Airflow on Cloud Composer, and our data lake is in BigQuery. We're currently considering migrating to Kubernetes
from metaflow.
At Mailchimp, we also use Cloud Dataflow.
We could potentially contribute to the effort to support GCP. In particular, we have a battle-tested @retry
decorator that retries according to Google Cloud's documented policy: https://cloud.google.com/apis/design/errors.
We would be happy to share this code for inclusion in Metaflow. Our decoraor incorporates a fork of the Apache 2.0 licensed retrying
package, which appears to be unmaintained at this point. The fork was necessary because on GCP, there is a case where the wait period between retries depends on the type of error, which was not supported by retrying
:
For 429 RESOURCE_EXHAUSTED errors, the client may retry at the higher level with minimum 30s delay. Such retries are only useful for long running background jobs.
from metaflow.
@barrywhart We would be happy to engage on a POC. @jaychia already has a PR out for GCS integration.
from metaflow.
@savingoyal: We are not currently using Metaflow, but I see some potential for using it in some cases as an alternate to Airflow (complex!) and bash scripts (may not always be powerful enough for our needs). So I want to help, but also need to time box my involvement for now.
Can you point me to the GCS PR? Any thoughts on how the package might accomodate multiple @retry
implementations? Could it literally just be a different decorator in a different module, or is there a need for a single, "polymorphic" @retry
decorator?
from metaflow.
@savingoyal: We are not currently using Metaflow, but I see some potential for using it in some cases as an alternate to Airflow (complex!) and bash scripts (may not always be powerful enough for our needs). So I want to help, but also need to time box my involvement for now.
Can you point me to the GCS PR? Any thoughts on how the package might accomodate multiple
@retry
implementations? Could it literally just be a different decorator in a different module, or is there a need for a single, "polymorphic"@retry
decorator?
#153 - please feel free to contribute or comment
The Metaflow S3 datastore internally does its own error handling for storage-client-related retries (retry N number of times if an error that isn't metaflow-related is thrown). I replicated that logic for the GCS datastore. See:
https://github.com/Netflix/metaflow/pull/153/files#diff-88a07e3f313e3d7fec566c156ed68baeR28-R51
Also, tenacity is a great retrying package that should be able to do the custom retry logic that you mentioned (wait period depends on type of error).
from metaflow.
Is PR #153 still relevant?
from metaflow.
Related Issues (20)
- Question: Access hash of a card while being populated HOT 2
- Conda environment being treated as disabled, and not appending environment to PATH.
- Metaflow crashes on AWS Batch if folder called `metaflow` is present in the working directory HOT 5
- Cardview on WSL error HOT 2
- S3 access denied even if I have full access to S3
- Certain flows failing on Argo Workflows =>3.5.0 HOT 1
- Metaflow job completion or exit handlers?
- run.finished not set when using AWS Step Functions and there's an error
- setting METAFLOW_OTEL_ENDPOINT when running in ECS fargate, not Kubernetes HOT 1
- add __repr__ methods to Parameter
- create contributing guide
- "Service token file does not exist" error when deploying flow to Argo from CI HOT 1
- argo-workflows create --only-json doesn't export the cron workflow configuration
- Using `tags` as a Parameter name breaks flow. HOT 1
- Add option to batch decorator to increase ephemeralStorage on Fargate
- `--package-suffixes` omits dotfiles HOT 1
- Is it possible to run metaflow steps in custom docker containers on local?
- Opentelemetry configuration not carrying over to Batch
- Add a priority class option for the kubernetes flow decorator HOT 1
- Reduce the number of reserved parameter names
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metaflow.