Code Monkey home page Code Monkey logo

common-workflow-lab / galaxy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from galaxyproject/galaxy

10.0 9.0 2.0 626.34 MB

Fork of Galaxy (http://galaxyproject.org/) attempting to implement the CWL spec.

Home Page: https://www.commonwl.org

License: Other

Shell 0.51% JavaScript 15.42% HTML 0.11% CSS 0.95% Python 59.90% Mako 2.25% Perl 0.27% Makefile 0.05% Lua 0.01% Vue 8.94% Jupyter Notebook 3.54% Common Workflow Language 0.02% Dockerfile 0.03% SCSS 0.72% Smarty 0.02% R 0.10% TypeScript 7.16% Sass 0.01%
bh2019europe biohackeu20

galaxy's Introduction

Galaxy Logo

The latest information about Galaxy can be found on the Galaxy Community Hub.

Community support is available at Galaxy Help.

Chat on gitter Chat on irc Release Documentation Inspect the test results

Galaxy Quickstart

Galaxy requires Python 3.8 . To check your Python version, run:

$ python -V
Python 3.8.18

Start Galaxy:

$ sh run.sh

Once Galaxy completes startup, you should be able to view Galaxy in your browser at: http://localhost:8080

For more installation details please see: https://getgalaxy.org/

Documentation is available at: https://docs.galaxyproject.org/

Tutorials on how to use Galaxy, perform scientific analyses with it, develop Galaxy and its tools, and admin a Galaxy server are at: https://training.galaxyproject.org/

Tools

Tools can be either installed from the Tool Shed or added manually. For details please see the tutorial. Note that not all dependencies for the tools provided in the tool_conf.xml.sample are included. To install them please visit "Manage dependencies" in the admin interface.

Issues and Galaxy Development

Please see CONTRIBUTING.md .

galaxy's People

Contributors

ahmedhamidawan avatar anuprulez avatar assuntad23 avatar bernt-matthias avatar bgruening avatar blankenberg avatar carlfeberhard avatar dannon avatar davebx avatar davelopez avatar electronicblueberry avatar fubar2 avatar gregvonkuster avatar guerler avatar heisner-tillman avatar hexylena avatar itisalirh avatar jdavcs avatar jmchilton avatar jxtx avatar kanwei avatar martenson avatar mvdbeek avatar natefoo avatar nerdinacan avatar nsoranzo avatar nuwang avatar olegzharkov avatar pcm32 avatar vjalili avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

galaxy's Issues

More Sophisticated Galaxy Docker Semantics - dockerLoad

Extend Galaxy XML syntax as follows:

<container type="docker" load="import_url">imageID</container>

doc: "Specify a HTTP URL from which to download a Docker image using docker load."

CWLtool implementation

        elif "dockerLoad" in dockerRequirement:
            cmd = ["docker", "load"]
            _logger.info(str(cmd))
            if not dry_run:
                if os.path.exists(dockerRequirement["dockerLoad"]):
                    _logger.info("Loading docker image from %s", dockerRequirement["dockerLoad"])
                    with open(dockerRequirement["dockerLoad"], "rb") as f:
                        loadproc = subprocess.Popen(cmd, stdin=f, stdout=sys.stderr)
                else:
                    loadproc = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=sys.stderr)
                    _logger.info("Sending GET request to %s", dockerRequirement["dockerLoad"])
                    req = requests.get(dockerRequirement["dockerLoad"], stream=True)
                    n = 0
                    for chunk in req.iter_content(1024*1024):
                        n += len(chunk)
                        _logger.info("\r%i bytes" % (n))
                        loadproc.stdin.write(chunk)
                    loadproc.stdin.close()
                rcode = loadproc.wait()
                if rcode != 0:
                    raise process.WorkflowException("Docker load returned non-zero exit status %i" % (rcode))
                found = True

Implement test case for env-tool1.

Demonstrates using the job description to populate environment variables.

Parsing and stuff is there and test case written (test_cwl_env_tool1) but implementation remains a little tricky.

More Sophisticated Galaxy Docker Semantics - dockerPull

Galaxy queries on imageId as the result of dockerPull.

This is what CWLTool defaults to as well, but Galaxy should be updated to allow a different image ID then pull target.

if "dockerImageId" not in dockerRequirement and "dockerPull" in dockerRequirement:
        dockerRequirement["dockerImageId"] = dockerRequirement["dockerPull"]

How CWLTool uses imageId for search.

for ln in subprocess.check_output(["docker", "images", "--no-trunc", "--all"]).splitlines():
        try:
            m = re.match(r"^([^ ]+)\s+([^ ]+)\s+([^ ]+)", ln)
            sp = dockerRequirement["dockerImageId"].split(":")
            if len(sp) == 1:
                sp.append("latest")
            # check for repository:tag match or image id match
            if ((sp[0] == m.group(1) and sp[1] == m.group(2)) or dockerRequirement["dockerImageId"] == m.group(3)):
                found = True
                break
        except ValueError:
            pass

So the extension in Galaxy land is likely to keep the XML body as the imageID and optionally allow overriding the pull location:

<container type="docker" pull="dockerPull">imageID</container>

More Sophisticated Galaxy Docker Semantics - dockerImport

Extend Galaxy XML syntax as follows:

<container type="docker" import="import_url">imageID</container>

doc: "Provide HTTP URL to download and gunzip a Docker images using `docker import."

CWLtool implementation

        elif "dockerImport" in dockerRequirement:
            cmd = ["docker", "import", dockerRequirement["dockerImport"], dockerRequirement["dockerImageId"]]
            _logger.info(str(cmd))
            if not dry_run:
                subprocess.check_call(cmd, stdout=sys.stderr)
                found = True

Optimize CWL tool loading

CWL tools can take over 2 seconds to load on a local SSD disk. Comparable Galaxy tools load in milliseconds.

galaxy.tools.parser.factory INFO 2016-02-04 09:43:39,420 Loading CWL tool - this is experimental - tool likely will not function in future at least in same way.
galaxy.tools.parser.cwl WARNING 2016-02-04 09:43:41,689 TOOL ID is cat1-tool
galaxy.tools.parser.factory INFO 2016-02-04 09:43:41,690 Loading CWL tool - this is experimental - tool likely will not function in future at least in same way.
galaxy.tools.parser.cwl WARNING 2016-02-04 09:43:44,106 TOOL ID is cat2-tool
galaxy.tools.parser.factory INFO 2016-02-04 09:43:44,107 Loading CWL tool - this is experimental - tool likely will not function in future at least in same way.
galaxy.tools.parser.cwl WARNING 2016-02-04 09:43:46,412 TOOL ID is cat3-tool
galaxy.tools.parser.factory INFO 2016-02-04 09:43:46,413 Loading CWL tool - this is experimental - tool likely will not function in future at least in same way.
galaxy.tools.parser.cwl WARNING 2016-02-04 09:43:48,709 TOOL ID is cat4-tool
galaxy.tools.parser.factory INFO 2016-02-04 09:43:48,710 Loading CWL tool - this is experimental - tool likely will not function in future at least in same way.
galaxy.tools.parser.cwl WARNING 2016-02-04 09:43:51,018 TOOL ID is bwa-mem-tool
galaxy.tools.parser.factory INFO 2016-02-04 09:43:51,019 Loading CWL tool - this is experimental - tool likely will not function in future at least in same way.
galaxy.tools.parser.cwl WARNING 2016-02-04 09:43:53,478 TOOL ID is env-tool1
galaxy.tools.parser.factory INFO 2016-02-04 09:43:53,478 Loading CWL tool - this is experimental - tool likely will not function in future at least in same way.
galaxy.tools.parser.cwl WARNING 2016-02-04 09:43:55,793 TOOL ID is env-tool2
galaxy.tools.parser.factory INFO 2016-02-04 09:43:55,794 Loading CWL tool - this is experimental - tool likely will not function in future at least in same way.

Implement Any

Initial and entirely terrible first crack at this is here abbae34. It will be rebased into the main CWL commit.

Implement secondary files.

Collect Output:

  • Ensure relocate iterates over secondary output files.
    {'path': '/home/john/workspace/galaxy/database/job_working_directory/000/42/input.txt', 'size': 1, 'secondaryFiles': [{'path': '/home/john/workspace/galaxy/database/job_working_directory/000/42/input.txt.idx1', 'class': 'File'}], 'class': 'File', 'checksum': 'sha1$b6589fc6ab0dc82cf12099d1c2d40ab994e8410c'}
  • Persist dataset name to output map, CWL doesn't know about the ID only the name.
  • Move each secondaryFile <f> to <job_working_directory>/dataset_<id>_files/__secondaryFiles__/f.

Ensure Input:

  • Check for secondary files when preparing the job.
  • Link in if any secondary files are referenced (where, how can I tell, how can I link them).

Redo MultipleInputFeatureRequirement toward Merge Options

This was the first attempt at multi-input scattering - ae11f56 - now that I'm staring at count-lines 6 and 7 it is clear to me that replacing CollectionInfo with a new type of lets say tracker of scattering and input collections is wrong. We should use CollectionInfo and all of its utilities for tracking - we should just feed it "ephemeral" collections created just for tool execution. This has some distinct advantages over the first approach:

  • We can track these collections in the DB at some point for tracking the structure of what we did.
  • We can build arbitrary ephemeral collections in response to scatter input expressions. e.g. scatter-valuefrom-wf1.
  • We can handle non-scattering operations with Ephemeral collections as well (such as supplying multiple lists to a single list input in a merged fashion) - e.g. count-lines7.

More Sophisticated Galaxy Docker Semantics - Dockerfile

Galaxy tools should allow building from a specified Dockerfile.

        elif "dockerFile" in dockerRequirement:
            dockerfile_dir = tempfile.mkdtemp()
            with open(os.path.join(dockerfile_dir, "Dockerfile"), "w") as df:
                df.write(dockerRequirement["dockerFile"])
            cmd = ["docker", "build", "--tag=%s" % dockerRequirement["dockerImageId"], dockerfile_dir]
            _logger.info(str(cmd))
            if not dry_run:
                subprocess.check_call(cmd, stdout=sys.stderr)
                found = True

Extend Galaxy XML syntax as follows:

<container type="docker" dockerfile="Dockerfile">imageID</container>

Cwl Tools are Window-in-Window

When CWL tools are clicked on in the tool panel a weird window-in-window thing happens. Only happen for CwlTools not for normal Tools. tool.tool_type is different.

Already made one fix to prevent tool from just submitting. 1315239

screenshot from 2015-10-21 01 04 25

Grepping around I can't find any place where the behavior is dispatching on tool_type, any clues @guerler

Implement StepInputExpressionRequirements.

Needed by bcbio workflow and other GA4GH workflows.

http://www.commonwl.org/v1.0/Workflow.html#StepInputExpressionRequirement

  • step-valuefrom-wf.cwl ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_61
  • step-valuefrom2-wf.cwl ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_62
  • step-valuefrom3-wf.cwl ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_63

Will create a follow up issue on implementing over scattering - it looks like at least the BCBIO workflows do not require scattering and valueFrom on the same step. The vision chipseq workflows do require this however.

Expression Tool Follow Ups

  • Ensure Galaxy options for expression evaluation are respected in runtime actions for expression tools.
  • Ensure problems encountered during evaluation of expression tools are handled well, job should fail with a reasonable message. Write a test case.

File-driven Galaxy Workflows (Rough pass)

  • Refactor Galaxy's workflow import stuff to abstract representation.
    • Merge build_workflow_from_dict and update_workflow_from_dict for creating model.Workflow(). (galaxyproject#1735)
    • Allow creation from a file (.ga at first).
  • Allow toolbox to create workflows from files.
  • Track transient file path and expose in the API.

Implement features required for all reference tools, jobs, and workflows.

  • binding-test.cwl
  • bwa-mem-job.json
  • bwa-mem-tool.cwl
  • cat-job.json
  • cat-n-job.json (1e2385b)
  • cat1-tool.cwl
  • cat2-tool.cwl (3be2d69)
  • cat3-tool.cwl
  • cat4-tool.cwl (a255442)
  • cat5-tool.cwl
  • count-lines1-wf.cwl
  • count-lines2-wf.cwl
  • count-lines3-job.json
  • count-lines3-wf.cwl
  • count-lines4-job.json
  • count-lines4-wf.cwl
  • count-lines5-wf.cwl
  • count-lines6-job.json
  • count-lines6-wf.cwl
  • count-lines7-wf.cwl
  • count-lines8-wf.cwl
  • count-lines9-wf.cwl
  • echo-tool.cwl
  • empty.json
  • env-job.json (e5c5f9b)
  • env-tool1.cwl
  • env-tool2.cwl (e5c5f9b)
  • env-wf1.cwl
  • env-wf2.cwl
  • foaf.rdf
  • formattest-job.json
  • formattest.cwl
  • formattest2-job.json
  • formattest2.cwl
  • formattest3.cwl
  • hello.txt
  • index.py
  • metadata.cwl
  • number.txt
  • optional-output.cwl
  • params.cwl
  • params2.cwl
  • params_inc.yml
  • parseInt-job.json (f4b5e8e)
  • parseInt-tool.cwl (f4b5e8e)
  • rename-job.json
  • record-output-job.json
  • record-output.cwl
  • ref.fasta
  • rename-job.json (ac74d46)
  • rename.cwl (ac74d46)
  • revsort-job.json
  • revsort.cwl
  • revtool.cwl
  • scatter-job1.json
  • scatter-job2.json
  • scatter-wf1.cwl
  • scatter-wf2.cwl
  • scatter-wf3.cwl
  • scatter-wf4.cwl
  • search-job.json
  • search.cwl
  • search.py
  • shelltest.cwl
  • sorttool.cwl
  • template-tool.cwl
  • tmap-job.json
  • tmap-tool.cwl
  • wc-job.json (0a89778)
  • wc-tool.cwl (0a89778)
  • wc2-tool.cwl (6a0460e)
  • wc3-tool.cwl (692c46f)
  • wc4-tool.cwl (692c46f)

More Sophisticated Galaxy Docker Semantics - output_directory

Extend Galaxy XML syntax as follows:

<container type="docker" output_directory="directory">imageID</container>

CWL definition of attribute:

    - name: dockerOutputDirectory
      type: ["null", "string"]
      doc: |
        Set the designated output directory to a specific location inside the
        Docker container.

Handle Expressions

Draft 3 will not have generic expressions, it will only be JavaScript. So while I would still like to target "draft 2" here, I'm just going to assume all expressions are JavaScript expressions.

Refactor Work In Branch

Running issue of junk I want to refactor.

  • Move all CWL tool tests to their own file, update commit message.
  • Relabel OutputInstance fields.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.