Code Monkey home page Code Monkey logo

wfcommons / wfcommons Goto Github PK

View Code? Open in Web Editor NEW
29.0 6.0 8.0 190.68 MB

WfCommons: A Framework for Enabling Scientific Workflow Research and Development

Home Page: https://wfcommons.org

License: GNU General Public License v3.0

Python 97.94% Makefile 0.23% C++ 0.82% Cuda 1.01%
scientific-workflows simulation reproducible-research workflow distributed-systems workflow-simulator scheduling-simulator hpc workflow-management-system workflow-generator

wfcommons's Introduction

Build PyPI version License: LGPL v3 CodeFactor Documentation Status Downloads


A Framework for Enabling Scientific Workflow Research and Development

This Python package provides a collection of tools for:

  • Analyzing instances of actual workflow executions;
  • Producing recipes structures for creating workflow recipes for workflow generation;
  • Generating synthetic realistic workflow instances; and
  • Generating realistic workflow benchmark specifications.

Open In Gitpod

Installation

WfCommons is available on PyPI. WfCommons requires Python3.8+ and has been tested on Linux and macOS.

Installation using pip

While pip can be used to install WfCommons, we suggest the following approach for reliable installation when many Python environments are available:

$ python3 -m pip install wfcommons

Retrieving the latest unstable version

If you want to use the latest WfCommons unstable version, that will contain brand new features (but also contain bugs as the stabilization work is still underway), you may consider retrieving the latest unstable version.

Cloning from WfCommons's GitHub repository:

$ git clone https://github.com/wfcommons/wfcommons
$ cd wfcommons
$ pip install .

Optional Requirements

Graphviz

WfCommons uses pygraphviz for generating visualizations for the workflow task graph. If you want to enable this feature, you will have to install the graphviz package (version 2.16 or later). You can install graphviz easily on Linux with your favorite package manager, for example for Debian-based distributions:

sudo apt-get install graphviz libgraphviz-dev

and for RedHat-based distributions:

sudo yum install python-devel graphviz-devel

On macOS you can use brew package manager:

brew install graphviz

Then you can install pygraphviz by running:

python3 -m pip install pygraphviz

pydot

WfCommons uses pydot for reading and writing DOT files. If you want to enable this feature, you will have to install the pydot package:

python3 -m pip install pydot

Get in Touch

The main channel to reach the WfCommons team is via the support email: [email protected].

Bug Report / Feature Request: our preferred channel to report a bug or request a feature is via
WfCommons's Github Issues Track.

Citing WfCommons

When citing WfCommons, please use the following paper. You should also actually read that paper, as it provides a recent and general overview on the framework.

@article{wfcommons,
    title = {{WfCommons: A Framework for Enabling Scientific Workflow Research and Development}},
    author = {Coleman, Taina and Casanova, Henri and Pottier, Loic and Kaushik, Manav and Deelman, Ewa and Ferreira da Silva, Rafael},
    journal = {Future Generation Computer Systems},
    volume = {128},
    number = {},
    pages = {16--27},
    doi = {10.1016/j.future.2021.09.043},
    year = {2022},
}

wfcommons's People

Contributors

ftschirpke avatar henricasanova avatar jaredraycoleman avatar john-dobbs avatar lpottier avatar rafaelfsilva avatar schastel-perso avatar tainagdcoleman avatar wrwilliams avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

wfcommons's Issues

Simplify PegasusLogsParser with removing the legacy flag

Is your feature request related to a problem? Please describe.
When using PegasusLogsParser, the user currently has to know whether the submit directory has been generated using Pegasus 4.x or 5.x (5.x uses YAML and 4.x a custom XML-based format).
We could improve PegasusLogsParser to automatically detects the submit directory version and acts appropriately.

Describe the solution you'd like
Remove the legacy flag

Montage workflow : no transfer file between mImgtbl and mAdd job

Hi @rafaelfsilva

I am not sure if there is a bug in workhub or not
when I generate a Montage DAG
based on the presented structure here : https://pegasus.isi.edu/workflow_gallery/gallery/montage/index.php
there should be a link between mImgtbl and mAdd job
I can see in the generated JSON file, mAdd is listed as a child of mImgtbl
however, the mImgtbl output link is not appear as input file for mAdd

...
 {
    "name": "mImgtbl_00000131",
    ...
    "children": [
        "mAdd_00000132"
    ],
    "files": [
        ...
        {
            "link": "output",
            "name": "509e9372-a8f4-4be5-bef5-6cfc5dcc34f9.tbl",
            "size": 2594
        }
    ]
}
...

i.e, there is no 509e9372-a8f4-4be5-bef5-6cfc5dcc34f9.tbl as an input file for mAdd_00000132 job

Bug when using PegasusLogsParser with

WfCommons Information

  • WfCommons version: master
  • Python Version: 3.910

Describe the bug
When using PegasusLogsParser with a Pegasus submit directory <= 4.9, I obtain the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/wfcommons/wfinstances/logs/pegasus.py", line 78, in build_workflow
    self._parse_braindump()
  File "/usr/local/lib/python3.9/site-packages/wfcommons/wfinstances/logs/pegasus.py", line 96, in _parse_braindump
    raise OSError(f'Unable to find braindump file: {braindump_file}')
OSError: Unable to find braindump file: /braindump.txt

Desktop (please complete the following information):

  • OS: macOS
  • Version: 12.2.1

Edges on Montage and Soykbr workflows are not correct.

WfCommons Information

  • WfCommons version: 0.7, master branch
  • Python Version: 3.8.10

Describe the bug
Some of the edges in Montage and Soykbr do not have the correct weight.

To Reproduce
Sum of the "input" files of a child in the montage or soykbr workflows do not result in same value as
the parent "output" files.

Expected behavior
The sum of the input files of the child and sum of the output files in the parent should have the same weight because they describe the same edge.

Screenshots
Example of creating a montage workflow. (Similar to the one I am using in my program)
image

Desktop (please complete the following information):

  • OS: Windows 10 using wsl with Ubuntu 20.04
  • Version [e.g. 20.04]

Additional context
Found this problem when trying to construct the critical path of the workflow.

Naming issue for .yml files during Seismology/Montagev3 execution preventing PythonLogsParser from working out of the box

WorkflowHub Information

  • WorkflowHub version: [e.g., 0.3, master branch, etc.] 5.0
  • Python Version: [e.g., 3.5] 3.6.9

Describe the bug
Seismology and Montage v3 are not producing yml/txt files that can work with PythonLogsParser out of the box. Seismology produces a braindump.yml file, but that needs to be renamed to braindump.txt to work with legacy=True. Legacy shouldn't be required as it was run on version 5.0. Montage v3 produces both braindump.yml and montage-workflow.yml, the latter of which needs to be renamed to workflow.yml for the parser to run.

To Reproduce
After running either seismology or montagev3 workflows, you can attempt to create a json using the process at https://docs.workflowhub.org/en/latest/parsing_logs.html#pegasus-wms . The script with legacy=False needs a workflow.yml file, or if legacy=True it needs a braindump.txt file. Neither is created by default on execution of the workflow.

Expected behavior
Seismology and Montage producing workflow.yml files during execution.

Desktop (please complete the following information):
Ubuntu 18.04.04

Generate WfInstances from dot files

Request from Svetlana Kulagina:

This Recipe can be generated with wfchef from an executable file. What if I don't have an executable, but rather have a .dot description of the workflow? Can I somehow generate a recipe from it? Is there maybe a way to manually transform the .dot file into the recipe?

generated output DAG json strcuture not similar to workflowhub

Hi
previously, I used workflowhub to generate some realworld workflow application, now it seems it merged with wfcommons package
I did some simple tests and the generated output json file is not clear
let me to explain this with a simple example, for EpigenomicsRecipe, we have

        "jobs": [
            {
                "name": "fastqSplit_00000001",
                "type": "compute",
                "runtime": 878.473,
                "parents": [],
                "children": [
                    "filterContams_00000002",
                    "filterContams_00000006",
                    "filterContams_00000010"
                ],
                "files": [
                    {
                        "link": "input",
                        "name": "06252281-89da-4385-b6cd-025b55f91d56.sfq",
                        "size": 57233202
                    },
                    {
                        "link": "output",
                        "name": "314ac45e-0b2b-447d-b7b3-e44806bcd60a.sfq",
                        "size": 12060453
                    },
                    {
                        "link": "output",
                        "name": "03c46ee5-7d81-48e8-b738-6a52a3f02044.sfq",
                        "size": 10733270
                    },
                    {
                        "link": "output",
                        "name": "3f8c84bf-2c61-4a30-a891-4aeda1de6fd2.sfq",
                        "size": 12346046
                    }
                ],
                "cores": 1
            },
            ...
            ...
            {
                "name": "filterContams_00000002",
                "type": "compute",
                "runtime": 12.196,
                "parents": [
                    "fastqSplit_00000001"
                ],
                "children": [
                    "sol2sanger_00000003"
                ],
                "files": [
                    {
                        "link": "input",
                        "name": "314ac45e-0b2b-447d-b7b3-e44806bcd60a.sfq",
                        "size": 12060453
                    },
                    {
                        "link": "output",
                        "name": "2b441ab8-e098-46d1-834f-dc11513ee8ec.sfq",
                        "size": 2747304
                    }
                ],
                "cores": 1
            },

as it can be seen, file 314ac45e-0b2b-447d-b7b3-e44806bcd60a.sfq is marked as the output file of task fastqSplit_00000001 and the input file for task sol2sanger_00000003

however, by using wfcommons we don't have this structure
for example,

            {
                "name": "fastqSplit_00000021",
                "type": "compute",
                "runtime": 878.473,
                "parents": [],
                "children": [
                    "filterContams_00000022",
                    "filterContams_00000023",
                    "filterContams_00000024",
                    "filterContams_00000025",
                    "filterContams_00000026",
                    "filterContams_00000027",
                    "filterContams_00000028",
                    "filterContams_00000029",
                    "filterContams_00000030"
                ],
                "files": [
                    {
                        "link": "input",
                        "name": "a22a4e96-1955-4395-8049-aad709e7e2c0.sfq",
                        "size": 367561779
                    },
                    {
                        "link": "output",
                        "name": "5c162d9d-72b6-443b-982b-4c503cbafa0a.sfq",
                        "size": 11562952
                    }
                ],
                "cores": 1
            },
            ...
            ...
            {
                "name": "filterContams_00000022",
                "type": "compute",
                "runtime": 40.919,
                "parents": [
                    "fastqSplit_00000021"
                ],
                "children": [
                    "sol2sanger_00000004"
                ],
                "files": [
                    {
                        "link": "input",
                        "name": "f1fa831a-c037-4f0d-b468-87cecff9004d.sfq",
                        "size": 5489748
                    },
                    {
                        "link": "output",
                        "name": "4c0098e8-ea3b-435b-967d-cbe3d6a5c06e.sfq",
                        "size": 1360589
                    }
                ],
                "cores": 1
            },

as you can see, for task fastqSplit_00000021, child filterContams_00000022 has input file f1fa831a-c037-4f0d-b468-87cecff9004d.sfq but it did not listed as the output file of task fastqSplit_00000021

is this a bug @tainagdcoleman ? I think, similar to workflowhub, the file name should be listed twice, once for parent as output file and once as input file for the child

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.