Code Monkey home page Code Monkey logo

tool-registry-service-schemas's Introduction

ga4gh logo

DOI

release_badge

Schemas for the GA4GH Tool Registry API

This repository is the home for the schema for the GA4GH Tool Registry API. The goal of the API is to provide a standardized way to describe the availability of tools and workflows. In this way, we can have multiple repositories that share tools and workflows of various types that are described in workflow languages (e.g. WDL, CWL, Nextflow, Galaxy, Snakemake), have their dependencies embedded as containers (e.g. Docker, Singularity) or suitable alternatives (e.g., Conda), and have a consistent way to interact, search, and retrieve information from these various registries. The end goal is to make it much easier to share scientific tools and workflows, enhancing our ability to make research reproducible, sharable, and transparent.

See the human-readable Reference Documentation. You can also explore the specification in the Swagger Editor. Manually load the JSON if working from a non-develop branch version. Preview documentation from the gh-openapi-docs for the development branch here

The Global Alliance for Genomics and Health (GA4GH) is an international coalition, formed to enable the sharing of genomic and clinical data.

Cloud Work Stream

The Cloud Work Stream is focused on creating specific standards for defining, sharing, and executing portable workflows and self-contained tasks, and accessing data across clouds.

We work with many different Driver Projects to develop, enhance, test, and use the Cloud Work Stream APIs.

What is the Tool Registry API Schema?

This is the home of the schema for the GA4GH Tool Registry API. The GA4GH Tool Registry API is a standard for listing and describing available tools (both stand-alone, self-contained tools and workflows in CWL, WDL, Nextflow, Galaxy or Snakemake) in a given registry. This defines a minimal, common API describing tools that we propose for support by multiple tool/workflow registries like Dockstore, BioContainers, and Agora for the purposes of exchange, indexing, and searching.

This repo uses the HubFlow scheme which is closely based on GitFlow. In practice, this means that the master branch contains the last production release of the schema whereas the develop branch contains the latest development changes which will end up in the next production release. As of February 2022, the master branch contains the last production release (currently release_badge)) whereas the develop branch contains work which will accumulate and evolve into a 2.1 production release.

Our current iteration focuses on a read-only API due to potentially different views and approaches to registration/security.

Key features of the current API:

  • Read-only API
  • Serve tool and workflow resources via specifically designed schemas that encourage rich metadata annotation and help enable software FAIRification
  • Download individual workflow descriptor files or an archive of all workflow and accessory files (e.g., test files)
  • Allow integrators to interrogate the language versions of these workflows (e.g. CWL 1.1, CWL 1.2 or Nextflow DSL2) to identify compatible workflows
  • Get specific versions of workflows and tools, potentially with immutable versions with checksums on their files
  • Assign globally unique TRS URIs to specific versions of tool and workflow resources
  • Provides more structure than a simple unformatted list of tools but it is also a standard for registries to implement as opposed to a registry implementation itself

Open questions:

Questions TRS currently does not (comprehensively) address include the following:

  • How do we track authorship? Should we track authorship of the tool metadata, the Docker image, or the underlying algorithm, or all of above?
  • How to describe indexing and external services like an external SPARQL service?
  • How to better interoperate with the GA4GH Workflow Execution Service (WES) and Task Execution Service (TES) APIs for triggering workflow and tool runs

How to view

See the swagger editor to view our schema in progress.

How to contribute changes

Take cues for now from the CONTRIBUTING.md document.

At the very least, create an issue in our GitHub tracker.

Even better, fork the codebase, fix the issue, and create a pull request back to the project along with your ticket.

Adding registries

To add a registry that supports the GA4GH Registry API:

  1. fork the repo
  2. modify registry.json
  3. submit a pull request back to the project
  4. we will confirm the site is valid then accept your pull request

Cross indexing Tool Registry sites

See our registry.json for a list of known registries that conform to the Tool Registry API standard.

License

See the LICENSE

For more information

tool-registry-service-schemas's People

Contributors

briandoconnor avatar cjllanwarne avatar coverbeck avatar denis-yuen avatar dependabot-preview[bot] avatar dependabot[bot] avatar dmohs avatar garyluu avatar geoffjentry avatar mr-c avatar osallou avatar patmagee avatar pditommaso avatar susheel avatar uniqueg avatar ypriverol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tool-registry-service-schemas's Issues

documentation feedback from committee meeting

There were the following questions from the steering committee meeting that should result in documentation improvements

  • the docs might reference Docker specifically at times and do not make it clear that we believe that the standard as it exists should support Singularity and other Docker alternatives
  • should emphasize that this is a standard for computationally actionable tools, this is different both from a simple unformatted list of tools, but it also means that this is a standard for registries to implement as opposed to a registry implementation itself
  • Highlight 4053fcd which describes our process for guaranteeing ID uniqueness when exchanging tools

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-28

Secondary files/import support

WDL descriptors or CWL descriptors can often use external files. The current endpoint is too restrictive and only returns one descriptor at a time.

As an idea from @tetron
Instead, the API could return, basically, a virtual directory similar to https://repository.cloudera.com/cloudera/cloudera-repos/org/ with the other CWL files hosted in subdirectories or alongside the main descriptor (... would result in undefined behaviour) The cwl-runner provides an implementation that could be used to test this.

Confusing description for descriptor type

The type path parameter for the /tools/{id}/versions/{version_id}/{type}/descriptor is required, but the description implies that it is optional ("If not specified..."):

The output type of the descriptor. If not specified, it is up to the
underlying implementation to determine which output type to return.
Plain types return the bare descriptor while the "non-plain" types
return a descriptor wrapped with metadata. Allowable values include
"CWL", "WDL", "NFL", "PLAIN_CWL", "PLAIN_WDL", "PLAIN_NFL".

┆Issue is synchronized with this Jira Task
┆containerName: GA4GH tool-registry-service
┆Epic: GA4GH TRS Approval
┆Issue Number: TRS-21

Singularity compatibility

Following up on dockstore/dockstore#1049

Let's make sure that tool registry schema is compatible with Singularity, in particular I think this means the following although @vsoch or @pimpim may have additional pointers

  1. References to a "Dockerfile" or a 'Docker image" can be generalized, maybe to "container specification" and "container image". In the comments it can be explained that examples are a Dockerfile or a Singularity Recipe http://singularity.lbl.gov/docs-recipes
  2. A once over can be done to make sure that IDs are generic enough to accommodate their container naming convention which looks like <registry>/<namespace>/<container>:<digest>

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆friendlyId: TRS-10

Tool Classes

Dear All:

As you may know, BioContainers is implementing the Ga4Gh API. We have well organized tools and their corresponding containers. We would like to extend some of the definitions of the current API. For example, would be great to have to extend the definition of tool classes. It would be great if we have an example of Tool Classes in the documentation. For example, we see three main classes in BioContainers:

  • Tool: A non annotated tool in biocontainers. I said not annotated because we don't have a proper descriptor file associated with it (CWLTool). We want to use this generic category for all the tools in BioContainers and if the tool gets annotated with the corresponding CWL then it will be updated to CWLTool.

  • Multi-Tool: This category is used for containers "tools" that contains multiple tools inside. We think this class is really useful because our users are interested in using containers with multiple tools. Also we have a mechanism in place to create multi-tools containers automatically.

  • Service: I know most of the things deploy in Ga4GH are workflows but biocontainers is moving forward some services. We will be interested to create a Service class.

In our implementation we will provide the following classes:

[
  {
    "description": "CommandLineTool",
    "id": "0",
    "name": "CommandLineTool"
  },
  {
    "description": "Workflow",
    "id": "1",
    "name": "Workflow"
  },
{
    "description": "Tool",
    "id": "2",
    "name": "Tool"
  }, 
{
    "description": "Multi-Tool",
    "id": "3",
    "name": "Multi-Tool"
  },
{
    "description": "Service",
    "id": "4",
    "name": "Service"
  }
]

Is this possible @denis-yuen @bwalsh

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-2

basePath inclusion of ga4gh

The basePath is currently set as
basePath: /api/ga4gh/v1

There is a high level discussion if this is needed, as it might confuse with people outside genomics field. OTOH the ga4gh means it is easily identifiable as a member of a family of tools.

After similar discussions in WES, the agreed basePath setting was
basePath: '/ga4gh/wes/v1'
so I'd suggest the actual change would be to
basePath: '/ga4gh/trs/v2'

┆Issue is synchronized with this Jira Task
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-13

Propose that `ToolDescriptor.descriptor` be removed in favor or `ToolDescriptor.url`

Propose that the spec change to delete ToolDescriptor.descriptor since ToolDescriptor.url provides what is likely the canonical source for the contents of the ToolDescriptor. It would also cut down on response message size significantly.

Alternative: make ToolDescriptor.descriptor optional in the response if ToolDescriptor.url is provided.

Galaxy support

Are we going for this before June and 2.0?

┆Issue is synchronized with this Jira Story
┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service
┆Epic: Galaxy in Dockstore
┆Issue Number: TRS-16

Some comments around the API

Dear @ALL:

I have some comments for the API before doing the PR:

  • In the /api/v2/tools entry point we shouldn't retrieve the complete tool object including the full versions information. We should retrieve only there the version Ids. In a second call the user can retrieve the the specific information of the versions by calling to this entry point tools/{id}/versions . This will avoid to retrieve long entry json files with almost the entire data of the registry.

  • Author should be everywhere a list of authors or maintainers. We have cases where the tools+containers are maintain by more than one author.

  • Major change: I think will be great to have an structure like the following, from my point of view we should try the following approach:

Tool -> ToolVersion -> ContainerImages

The tool is the tool/workflow/service/ etc with a unique identifier across the entire resource. The version is a version of the tool (e.g. blast 2.0 ). The container image then are the final containers for each version. for example blast-2.0-python or blast-2.0-deb. This will enable our users to follow the normal release cycles of tools and versions.

In the current approach versions + container images are collapsed in the same category.

Minor questions:

The features related with the checker, signed, etc can be move to a different call for those services that will provide information for it:

"checker_url": "string",
"has_checker": true,
"signed": true,
"verified": true,
"verified_source": "string",

it can be something like:

tools/{id}/verified/

Can someone explain me the purpose of those fields?

Best Regards
Yasset

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-7

Should be able to fetch bundles with TRS API

Feature Request

Right now if you want to fetch a tool with n descriptor files via the TRS API, you need to make n+1 API calls to fetch them all.

  1. Call GET ​/tools​/{id}​/versions​/{version_id}​/{type}​/files
  2. Iterate through the files in the response and fetch them one-by-one.

As workflows grow in complexity, e.g., see gatk-sv-clinical, which has ~70 WDL files, this becomes increasingly inefficient.

It would be nice if you could fetch all the descriptors, perhaps optionally including the test parameter files, with one API call.

┆Issue is synchronized with this Jira Story
┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service
┆Issue Number: TRS-29

How to refer to this in user facing documenation

It gets a bit tiresome to say things like "Enable resolution using a GA4GH compliant tool registry" or the slightly more correct "Enable resolution using the GA4GH Tool Registry API"

Using just "GA4GH tool registry" implies that there is only one tool registry and that the GA4GH operates/owns it, which isn't true.

What do you all think?

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-11

ToolFile.file_type CONTAINERFILE could be CONTAINER_FILE

There are some incoherence in naming: TEST_FILE, but CONTAINERFILE.
Note that in ToolVersion.containerfile there is no underscore either.

Also no coherency in plurality:

ToolVersion.images
ToolVersion.descriptor_type
ToolVersion.verified_source

Cheers,

Dmitry

┆Issue is synchronized with this Jira Task
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-26

Frozen versions of TRS spec

To provide backward compatibility support TRS servers/clients, we need to provide a frozen-in-time version of the spec. Suggest we keep a major-minor-version in the /openapi folder for the repo locked for any further edits.

TRS v1 --> /openapi/v1/tool_registry_service.swagger.yaml
TRS latest --> /openapi/tool_registry_service.swagger.yaml

OR

TRS v1 --> /openapi/tool_registry_service.v1.swagger.yaml
TRS latest --> /openapi/tool_registry_service.swagger.yaml

P.S. I would also suggest dropping swagger from the specification filename

┆Issue is synchronized with this Jira Task
┆containerName: GA4GH tool-registry-service
┆Epic: GA4GH TRS Approval
┆Issue Number: TRS-18

Descriptor merging (from PRC Meeting)

From the first review by the PRC in the GA4GH Approval Process

In order to get all necessary information and files from a TRS server, the client needs to make quite a few requests (listed below). Would it be possible to merge c, d and e into one call? Or at least make it one call to get all descriptors.

a /tools/{id}
b /tools/{id}/versions
c /tools/{id}/versions/{version_id}/{type}/files
d /tools/{id}/versions/{version_id}/{type}/descriptor
e /tools/{id}/versions/{version_id}/{type}/descriptor/{relative_path} (one file per request, could have many files)

┆Issue is synchronized with this Jira Story
┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service
┆Issue Number: TRS-27

Unclear what to do with query parameters

The /tools endpoint allows the user to pass various query parameters to filter by: https://ga4gh.github.io/tool-registry-service-schemas/preview/2.0.0/docs/html5/index.html#_parameters

but for description, for example, it just says "The description of the tool.". What should the implementing service do with this? An exact string match? Partial?

(Also, is this the right repo to raise issues with the validator?)

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-39

Verified support

The API should have some form of "verified"/"blessed" support for both Tools and maybe even specific versions of a tool. (Kind of like how Twitter handles can be verified as real people).

The GA4GH standard would probably just note that this system exists (and maybe some metadata that could be used to "prove" that someone was legitimate, for example a link to a GitHub profile) while the actual implementation of the tool-registry would have more freedom to define how verification happens.

Ideas courtesy of from @geoffjentry and @kellrott

ToolVersion.url example correction

Hello,

In the latest openapi.yaml

ToolVersion
    url:
        description: The URL for this tool in this registry.
        example: http://agora.broadinstitute.org/tools/123456/1

Shouldn't it be "http://agora.broadinstitute.org/tools/123456/versions/1" ?

Thank you,

Dmitry

┆Issue is synchronized with this Jira Task
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-25

URL ambiguity

We use the property "url" in two different ways which can be confusing.

  1. Some objects have a self-link style url (https://softwareengineering.stackexchange.com/questions/230157/what-is-the-point-of-link-rel-self-in-a-rest-api)
    such as
    https://github.com/ga4gh/tool-registry-service-schemas/blob/develop/src/main/resources/swagger/ga4gh-tool-discovery.yaml#L478

  2. Some objects have an optional url to the original source for content
    (i.e. Dockstore would pass along the URL to the original GitHub content, something like Agora which stores content in itself would not populate this)
    such as https://github.com/ga4gh/tool-registry-service-schemas/blob/develop/src/main/resources/swagger/ga4gh-tool-discovery.yaml#L619

Pain point encountered by @jaeddy

Endpoints for specific image versions

Currently the API defines an endpoint /tools/{registry-id} which returns a "versions" field, which is an array of ToolVersion objects. These provide a global-id but the format is not defined. Suggest that it would be more in line with REST and Linked Data principals if individual versions have well defined URL endpoints, e.g. /tools/{registry-id}/version/{version-id}.

Consistent specification location for GA4GH Cloud standards

WES, TES & DRS stores it's specification at /openapi/<specification_name>.swagger.yaml. It would be great if TRS also follows the same approach to make things consistent across all the GA4GH Cloud standards.

┆Issue is synchronized with this Jira Task
┆containerName: GA4GH tool-registry-service
┆Epic: GA4GH TRS Approval
┆Issue Number: TRS-19

List tools endpoint should allow filtering by descriptor type

The list tools endpoint, Dockstore implementation for reference, should allow filtering by descriptor type.

All platforms that I'm aware of only support one descriptor type, i.e, Terra supports only WDL, Seven Bridges only supports CWL, Galaxy only supports Galaxy. If a platform wants to fetch all tools from a TRS implementation that it can run, it has do filtering client-side, which is a mild hassle. The TRS implementation is also loading and sending back a lot more data than needed.

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-35

Consider properties for exposing version of a language

For example, a workflow might be written in CWL draft-3 vs v1.0. And WDL also has versions.
Instead of relying on knowledge of the language, we should expose a property that simply indicates what version a workflow or tool is written in (at least on a version by version basis)

Idea/pain point by @jaeddy

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-8

Tool Descriptor relative path is not discoverable

I do not think the values for following path are discoverable:

/tools/{id}/versions/{version-id}/{type}/descriptor/{relative-path}

I assume the intent is to be able to navigate to subsumed CWL files that are encoded in a main CWL file of a workflow description. Is that correct? If so then it should be documented.

Can we make this field optional in the schema

  • meta_version: The version of this tool in the registry. Iterates when fields like the description, author, etc. are updated.

I think this property is great it is available but it shouldn't be required.

Github.io page does not have recent TRS developments

The page does not indicate that v2 has been released.

TRS V2 is an ongoing version of the schema that is currently undergoing beta development and is under review by GA4GH as an official standard

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-38

entries can describe multiple images

CWL and WDL workflows can use multiple Docker images.
The current (2.0.0-beta.2) schema assumes one Docker image, similar to CWL CommandLineTools which doesn't make sense anymore now that we're using this to describe workflows as well.

See #58

┆Issue is synchronized with this Jira Task
┆Project Name: GA4GH tool-registry-service
┆Issue Number: TRS-14

Naming issue

Might as well mention that there is some ambiguity in the way this is named. In the terminology of https://github.com/ga4gh/workflow-execution-schemas/blob/master/Definitions.md a tool registry schema actually registers both workflows and tools.


  1. A suggestion from @junjun-zhang is possibly functions. Seems overloaded with functions in programming languages

  2. Thing Registry Schema is tempting in that it retains the acronym. But overloaded with the amusing Managing Things with the Thing Registry

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-9

GA4GH TRS Approval

We need to monitor the TRS approval. We also should get the tickets in here.

┆Issue is synchronized with this Jira Epic
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-1

URL for Version

Would be nice to have a version string e.g. "V1" in the URL for the API. Also, maybe put all "tools" under /GA4GH//tools instead.

That way you could have:

/GA4GH//repositories
.../tools
.../execute
.../write

Dependabot couldn't find a package.json for this project

Dependabot couldn't find a package.json for this project.

Dependabot requires a package.json to evaluate your project's current JavaScript dependencies. It had expected to find one at the path: /package.json.

If this isn't a JavaScript project, or if it is a library, you may wish to disable updates for it from within Dependabot.

View the update logs.

┆Issue is synchronized with this Jira Task
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-32

Documentation Process Improvements

Several improvements for the documentation generation process

  • Not have a bunch of swagger-ui, just have a bunch of links pointed to the swagger-ui which then imports the corresponding swagger-ui
  • Fix delete script so that it handles branches with slashes in it
  • Make it easier for forked repos to pass Travis. Docs currently always change (because of timestamp) and will then require a deploy key that forked repos don't have and will fail.

┆Issue is synchronized with this Jira Task
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-12

Propose to remove `ToolDockerfile.descriptor`

Much like issue #21 having both a descriptor and url seems redundant and the url should be the canonical source.

Both of these seems like an optimization for a particular implementation of a the Dockstore UI.

Dependabot couldn't find a build.gradle for this project

Dependabot couldn't find a build.gradle for this project.

Dependabot requires a build.gradle to evaluate your project's current Java dependencies. It had expected to find one at the path: /build.gradle.

If this isn't a Java project, or if it is a library, you may wish to disable updates for it from within Dependabot.

View the update logs.

┆Issue is synchronized with this Jira Story
┆containerName: GA4GH tool-registry-service
┆Issue Number: TRS-37

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.