Comments (5)
Hey @shchur, I wasn't able to reproduce this...attaching a screenshot of a successful run with @batch
Could you perhaps:
- post the contents of
0.runtime_stderr.log
and0.runtime_stdout.log
- try with a newer metaflow version i.e.
2.10.8
since you use2.10.3
from metaflow.
Thank you for a quick response! I tried using 2.10.8
and encountered the same error. Here are the contents of s3://****-metaflows3bucket-vqcend9aqn9z/HelloFlow/1704868945391404/hello/2/
:
0.runtime_stderr.log
:
[MFLOG|0|2024-01-10T06:46:03.311728Z|runtime|91c794bc-1325-4cc5-98d7-3939daef119f] Data store error:
[MFLOG|0|2024-01-10T06:46:03.311985Z|runtime|735f3f38-9a60-4978-aefb-6f69aff0d75a] No completed attempts of the task was found for task 'HelloFlow/1704868945391404/hello/2'
[MFLOG|0|2024-01-10T06:46:03.691040Z|runtime|7f210482-078e-4217-abc9-92b1901c9e3c]
[MFLOG|0|2024-01-10T06:46:04.086122Z|runtime|e0e3f263-4ef9-404f-8196-134901d480d3]Task failed.
0.runtime_stdout.log
:
[MFLOG|0|2024-01-10T06:42:38.449017Z|runtime|dfcf4833-1cab-43fd-996e-4edefcc22977][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:43:08.649344Z|runtime|e2a6331b-df2c-42ec-b401-3c0463190c3a][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:43:38.852722Z|runtime|9ebac194-f505-491e-8a17-e45a25339148][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:44:09.018890Z|runtime|085c5b85-e151-4248-92b5-49cff2390b1f][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:44:39.160499Z|runtime|4d066689-d3e0-45a1-a9ee-6aaad1fe506d][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:45:09.384003Z|runtime|de486574-58a1-4c38-b52f-326f3237b4bc][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:45:39.528610Z|runtime|ee0bd5f6-3739-4242-bab8-4eedccdb964e][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:45:45.104157Z|runtime|f36f4539-ceaa-490d-a4a0-afb0ec2ec232][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status STARTING)...
[MFLOG|0|2024-01-10T06:46:00.746162Z|runtime|a324752b-a551-46e4-9db0-f347123d424c][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status FAILED)...
0.runtime
{"return_code": 1, "killed": false, "success": false}
0.attempt.json
{"time": 1704868953.193214}
I suspect that the problem lies in the AWS configuration, but I'm not sure how to get to its root cause. I used the CloudFormation template without any modifications, and stack creation finished with CREATE_COMPLETE
status.
Are there any additional log statements that I could add to the Metaflow code to get a more informative error message / understand which exact operation is failing?
from metaflow.
I found the source of the problem: my working directory included a folder called metaflow
, which crashed the metaflow command executed during env setup.
mkdir: cannot create directory ‘metaflow’: File exists
It might be helpful to check for presence of this folder before submitting the job and raise an informative error message to the user.
As for debugging jobs on AWS Batch, I was able to find the detailed log with error message in the Amazon Elastic Container Service
console under Clusters
.
My problem is solved, but I'm keeping this issue open in case you want to add an informative error message for the edge case.
Thank you for building and maintaining such an amazing framework!
from metaflow.
@shchur I wasn't able to reproduce this, can you please share your directory structure?
Mine looks the following:
and I submit the flow with python hello.py run --with batch
from metaflow.
@madhur-ob The problem was that I built my Docker image used by Metaflow in the same directory. In other words, if the metaflow
folder is present in the working directory of the Docker image, the Batch job crashes because it cannot unpack the archive.
from metaflow.
Related Issues (20)
- Conda environment being treated as disabled, and not appending environment to PATH.
- Cardview on WSL error HOT 2
- S3 access denied even if I have full access to S3
- Certain flows failing on Argo Workflows =>3.5.0 HOT 1
- Metaflow job completion or exit handlers?
- run.finished not set when using AWS Step Functions and there's an error
- setting METAFLOW_OTEL_ENDPOINT when running in ECS fargate, not Kubernetes HOT 1
- add __repr__ methods to Parameter
- create contributing guide
- "Service token file does not exist" error when deploying flow to Argo from CI HOT 1
- argo-workflows create --only-json doesn't export the cron workflow configuration
- Using `tags` as a Parameter name breaks flow. HOT 1
- Add option to batch decorator to increase ephemeralStorage on Fargate
- `--package-suffixes` omits dotfiles HOT 1
- Is it possible to run metaflow steps in custom docker containers on local?
- Opentelemetry configuration not carrying over to Batch
- Add a priority class option for the kubernetes flow decorator HOT 1
- Reduce the number of reserved parameter names
- Logs don't show up on the console. gs_tail raises NotFound error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metaflow.