Comments (5)
Hello @cloud-np,
Thank you for reporting this potential issue. We have investigated the scenarios you described and we believe it is related to the following cases:
- Some of the workflow tasks may have additional input data that does not necessarily comes from the parent task (e.g.,
input_file_for_t3
for taskt3
in the image below). In this case, the sum of input files sizes for taskt3
will be larger than the sum of output files sizes for taskt1
. - In other cases, workflow tasks may have generated output data that are not necessarily used by a child task (e.g.,
output_file_for_t2
for taskt2
in the image below). In this case, the sum of output files sizes for taskt2
will be larger than the sum of input files sizes for taskt4
.
Could you please confirm this is the behavior you have observed?
from wfcommons.
Makes sense, thanks for the example as well made it really clear!
But still, I am not so sure because I try to add up the files that come from each specific parent at the time only.
For example T[1] parent --> T[2] child
T[1] sends 2390 to T[2]
T[2] receives 2709 from T[1]
In the code above I try add the input files that exist in the child with the output files (with the same name ofc) from the parent.
Not sure if I am missing something.
from wfcommons.
@cloud-np, I have investigated it further and could not reproduce your error.
In this example code (issue_23.py.zip), it generates 100 workflows (you can choose between Montage or SoyKB) and test whether the files sizes (from one task output to another task input) match. In case the match fails, the program stops with exit code 1. I ran this program several times and no mismatch was found.
Could you please send me a JSON file from one of the workflows you generated and is causing the issue?
You can write the JSON file using the write_json()
method from the workflow object.
from wfcommons.
Hi @cloud-np, do you have any update on this issue? Thank you!
from wfcommons.
I will be closing this issue as the error could not be reproduced. Please, re-open it if you experience the same issues again.
from wfcommons.
Related Issues (20)
- Add a logs parser for Makeflow
- Ability to increase/reduce task runtime or file sizes by a factor of X
- Generators for Makeflow workflows
- Add generation of average CPU usage for synthetic workflows (from recipe)
- Naming issue for .yml files during Seismology/Montagev3 execution preventing PythonLogsParser from working out of the box HOT 1
- Integrate WfChef into WfCommons
- generated output DAG json strcuture not similar to workflowhub HOT 2
- Make WfCommons compatible with workflow-schema version 1.2
- Use pathlib to manage all file/folder paths
- Make graphviz dependency optional
- Bug when using PegasusLogsParser with
- Simplify PegasusLogsParser with removing the legacy flag
- WfBench: allowing max memory consumption set for tasks
- WfBench: add support for time limit for running a task
- Generate WfInstances from dot files HOT 1
- 'WorkflowBenchmark' object has no attribute 'tasks' HOT 2
- Montage workflow : no transfer file between mImgtbl and mAdd job HOT 1
- File size distribution fit is not fully accurate
- Recipe for generating SRA Search synthetic workflows
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wfcommons.