elixir-cloud-aai / protes Goto Github PK

View Code? Open in Web Editor NEW

5.0 25.0 6.0 487 KB

Proxy service for injecting middleware into GA4GH TES requests

License: Apache License 2.0

Dockerfile 0.76% Python 99.24%

api-gateway ga4gh task-execution hacktoberfest

protes's Introduction

proTES

Synopsis

proTES is a robust and scalable Global Alliance for Genomics and Health (GA4GH) Task Execution Service (TES) API gateway that may play a pivotal role in augmenting the capabilities of your GA4GH Cloud ecosystem by offering flexible middleware injection for effectively federating atomic, containerized workloads across on premise, hybrid and multi-cloud environments composed of GA4GH TES nodes.

Description

proTES gateway may serve as a crucial component in federated compute networks based on the GA4GH Cloud ecosystem. Its primary purpose is to provide centralized features to a federated network of independently operated GA4GH TES instances. As such, it can serve, for example, as a compatibility layer, a load balancer workload distribution layer, a public entry point to an enclave of independent compute nodes, or a means of collecting telemetry.

When TES requests are received, proTES applies a configured middlewares before forwarding the requests to appropriate TES instances in the network. A plugin system makes it easy to write and inject middlewares tailored to specific requirements, such as for access control, request/response processing or validation, or the selection of suitable endpoints considering data use restrictions and client preferences.

Built-in middleware plugins

Currently, there are two plugins shipped with proTES that each serve as proof-of-concept examples for different task distribution scenarios:

Load balancing: The pro_tes.middleware.task_distribution.random plugin evenly (actually: randomly!) distributes workloads across a network of TES endpoints
Bringing compute to the data: The pro_tes.middleware.task_distribution.distance plugin selects TES endpoints to relay incoming requests to in such a way that the distance the (input) data of a task has to travel across the network of TES endpoints is minimized.

Implementation notes

proTES is a Flask microservice that supports [OAuth2][res-oauth2]-based authorization out of the box (bearer authentication) and stores information about incoming and outgoing tasks in a NoSQL database ([MongoDB][res-mongodb]). Based on our FOCA microservice archetype, it is highly configurable in a declarative (YAML-based!) manner. Forwarded tasks are tracked asynchronously via a RabbitMQ broker and Celery workers that can be easily scaled up. Both a Helm chart and a Docker Compose configuration are provided for easy deployment in native cloud-based production and development environments, respectively.

Installation

For production-grade Kubernetes-based deployment, see separate instructions. For testing/development purposes, you can use the instructions described below.

Requirements

Ensure you have the following software installed:

Docker (18.06.1-ce, build e68fc7a)
docker-compose (1.22.0, build f46880fe)
Git (1.8.3.1)

Note: These indicated versions are those that were used for developing/testing. Other versions may or may not work.

Prerequisites

Create data directory and required subdiretories

export PROTES_DATA_DIR=/path/to/data/directory
mkdir -p $PROTES_DATA_DIR/{db,specs}

Note: If the PROTES_DATA_DIR environment variable is not set, proTES will require the following default directories to be available:

../data/pro_tes/db

../data/pro_tes/specs

Clone repository

git clone https://github.com/elixir-europe/proTES.git

Traverse to app directory

cd proTES

Configure (optional)

The following user-configurable files are available:

Deploy

Build/pull and run services

docker-compose up -d --build

Visit Swagger UI

firefox http://localhost:8080/ga4gh/tes/v1/ui

Note: Host and port may differ if you have changed the configuration or use an HTTP server to reroute calls to a different host.

Contributing

This project is a community effort and lives off your contributions, be it in the form of bug reports, feature requests, discussions, ideas, fixes, or other code changes. Please read these guidelines if you want to contribute. And please mind the code of conduct for all interactions with the community.

Versioning

The project adopts the [semantic versioning][semver] scheme for versioning. Currently the service is in beta stage, so the API may change and even break without further notice. However, once we deem the service stable and "feature complete", the major, minor and patch version will shadow the supported TES version, with the build version representing proTES-internal updates.

License

This project is covered by the Apache License 2.0 also shipped with this repository.

Contact

proTES is part of ELIXIR Cloud & AAI, a multinational effort at establishing and implementing FAIR data sharing and promoting reproducible data analyses and responsible data handling in the life sciences.

If you have suggestions for or find issue with this app, please use the issue tracker. If you would like to reach out to us for anything else, you can join our Slack board, start a thread in our Q&A forum, or send us an email.

protes's People

Contributors

Stargazers

Watchers

Forkers

soumyadipde alexandersenf ayush5120 byzantine26 sohamratnaparkhi sivangbagri

protes's Issues

Deploy proTES

Configure with all available TES endpoints:

TESK (Czech Republic, Finland, Greece)
Funnel (OpenPBS/Czech Republic, Slurm/CSC)

Kubernetes deployment fails

When deploying the app via Kubenetes on OpenShift, the Celery worker pod fails with an error message indicating that the API for starting Celery workers changed, from pattern celery worker -A celeryapp to celery -A celeryapp worker.

Update Celery command in Celery worker deployment in Helm chart accordingly.

Upon fixing that issue by rearranging the command locally, another issue appears, indicating a lack of permissions when FOCA wants to write the modified specs back to storage.

Add latest TES specs

Is your feature request related to a problem? Please describe.
The TES specs in this repo are somewhat outdated and should be replaced with the latest version.

Describe the solution you'd like
The latest version can be found here, already in YAML format:
https://github.com/ga4gh/task-execution-schemas/blob/d55bf880062442288afc95665aa0e21fbba77b20/openapi/task_execution.swagger.yaml

Describe alternatives you've considered
N/A

Additional context
N/A

Enforce immutability for incoming task document

The incoming task document serves as a log/reference to what the user sent, which is needed for accounting/audit purposes. Therefore, only the parts that are provided by the client should be immutable, and the outgoing request can be modified according to middleware configuration.

Implement endpoint "GET /tasks/{id}:delete"

Is your feature request related to a problem? Please describe.
proTES should respond to all TES requests in the most reasonable or helpful way in a spec-conformant way.

In the case of the GET /tasks/{id}:cancel endpoint, this means that the cancellation request should be forwarded to the TES instance that the task was sent to.

Describe the solution you'd like
The task with the specified ID should be cancelled by relaying the request to the TES instance to which the task was sent, if available, which should be extracted from the database. If the task is not known, a 404 response should be returned. If the TES supposed to run the task is not available, a 500 response should be returned. The solution is almost identical to that for the POST /runs/{run_id}/cancel endpoint in WES-ELIXIR, implemented in files:

Describe alternatives you've considered
N/A

Additional context
Depends on issue #19 being resolved.

Use environment variables for MongoDB and RabbitMQ

Is your feature request related to a problem? Please describe.
MongoDB and RabbitMQ are currently hard coded in the YAMLs under pro_tes/config.
This makes dynamic deployments on OpenShift/K8S environments impossible.

Describe the solution you'd like
Source the MongoDB and RabbitMQ variable from environment variables, while using the hard coded ones as defaults.

bug: Celery worker does not have access to database

Currently, the Celery app returned by Foca.create_celery_app() does not have the necessary configuration to access the database configured for the app.

Fix by bumping FOCA to v0.12.0

Implement endpoint "GET /tasks/{id}"

Is your feature request related to a problem? Please describe.
proTES should respond to all TES requests in the most reasonable or helpful way in a spec-conformant way.

In the case of the GET /tasks/{id} endpoint, this simply means to return the requested task's info from the database.

Describe the solution you'd like
Return the task info from the database, if available. The solution is almost identical to that for the GET /runs/{run_id} endpoint in WES-ELIXIR, implemented in files:

Describe alternatives you've considered
N/A

Additional context
Depends on issue #19 being resolved.

Fix type checker issues

This needs to be done in two phases:

Fix all issues that are not the result of this known issue in FOCA: elixir-cloud-aai/foca#144
Include mypy in the CI workflow

When these are addressed, do not close this issue; rather, re-add the status: blocked label until the above-mentioned issue is fixed in FOCA, then:

Upgrade FOCA
Fix the remaining issues

Simplify CI workflow

The current CI workflow is overly complicated, including several steps that are not required. Refactor the workflow to remove unnecessary steps and increase consistency with other GitHub Actions workflows in the organization.

Fix default page size when listing tasks

According to the specification, the default page size is supposed to be 256. However, in the current default configuration, it is set to 5.

Increase the default configuration to 256, as required by the specification.

Return proTES task ID rather than external task ID

Currently POSTing a task returns the external task ID, i.e., the task ID of the TES service that the task was forwarded to. However, proTES also mints its own internal task ID for each incoming task. It is this ID that should be returned to the client, and which should be used to GET tasks and listed in the GET /tasks response.

Deploy at CSC

Is your feature request related to a problem? Please describe.

For implementing an end-to-end test of WES-ELIXIR > proTES > TEStribute > TESK or mock-TES, instances of all services should be deployed in a publicly accessible location.

Describe the solution you'd like

Use docker-compose to deploy service at VM provided by CSC Finland.

Describe alternatives you've considered

N/A

Additional context

N/A

Config param to choose task distribution logic

Check #119 (comment) for context.

A good solution would keep the middleware handler truly generic. Each middleware should follow one of a number of abstract signatures. At least the following ones should be covered:

Change the request itself (tesTask in, tesTask out)
Change the request destination, i.e., task distribution logic: tesTask and list of known TES instances in, ranked list of TES instances out

The user would then list a number of middleware functions (either part of proTES or external ones) in the config, for these two (and possibly other, if we can think of any) signatures/sections. The handler would then apply these in the listed order. For example:

middleware:
    mutate_request:  # signature 1. above
        - some_external_middlware.harmonized_tes_requests
        - some_external_middlware.tes_v1_0_0_to_v1_1_0
    mutate_destination:  # signature 2. above
        - some_external_middleware.discard_unauthorized_tes_instances
        - pro_tes.middleware.task_distribution.distance

Here, the handler would mutate the request by first harmonizing incoming requests, then converting any v1.0.0 requests to v1.1.0 (obviously, these are just examples of what middleware could hypothetically do). The handler would also set the destination by first discarding any TES instances that the user anyway doesn't have access to, then use the built-in distance task distribution logic to find the available/accessible TES that is closest to the input data.

Ideally, the handler would also check that the signature of the provided functions fit.

Other than that, there should be no specific logic in the handler (which there currently is), and no mention of any specific, hard-coded middleware (which there also is).

It might be useful to check if/how abstract classes/interfaces are done in Python to specify and validate the function signatures.

Fix docstrings

Follow Google-styled docstring standards.
Check for accuracy and provide detailed docstring for functions/methods in the whole codebase.

Update & simplify Dockerfile

The current Dockerfile uses an older version of the FOCA image. Update to latest.

Also, simplify the Dockerfile, as there are a lot of unnecessary instructions in there.

Add Flower task monitoring

Describe the solution you'd like
Flower allows monitoring Celery background tasks via a web dashboard. Add the service to the docker-compose configuration file.

Replace boilerplate code with FOCA archetype

A lot of the code shared between this service and others in the organizationhas been moved to the FOCA archetype.

Duplicate code should be removed from this service and, where necessary, the remaining code should be refactored to make use of FOCA.

Obsolete MarkupSafe version causes building error

Describe the bug
The version of MarkupSafe currently listed in the requirements.txt file (v.1.0) uses a deprecated module from setuptools, and this causes an ImportError that breaks the building of proTES.

To Reproduce
Steps to reproduce the behavior:

docker build . in the root of this repo
error issued while installing requirements

Expected behavior
The Docker image building should run smoothly.

Screenshots

Collecting MarkupSafe==1.0
  Downloading MarkupSafe-1.0.tar.gz (14 kB)
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-rvrehuys/MarkupSafe/setup.py'"'"'; __file__='"'"'/tmp/pip-install-rvrehuys/MarkupSafe/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-rvrehuys/MarkupSafe/pip-egg-info
         cwd: /tmp/pip-install-rvrehuys/MarkupSafe/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-rvrehuys/MarkupSafe/setup.py", line 6, in <module>
        from setuptools import setup, Extension, Feature
    ImportError: cannot import name 'Feature'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
The command '/bin/sh -c cd /app   && pip install -r requirements.txt   && cd /' returned a non-zero code: 1

Software
Add versions of relevant software, e.g.

macOS 10.15.4
Docker 19.03.8

Additional context
The issue can be fixed quite easily, simply replacing MarkupSafe==1.0 with MarkupSafe==1.1.1 in the requirements file should do the job.

Tasks not filtered by name prefix

A GET request to /tasks?name_prefix=foo&view=MINIMAL should return only tasks whose identifiers start with foo. However, it appears that all tasks are returned, not filtered at all.

To reproduce

POST one or more tasks to a given proTES instance
Store URL to proTES TES API in variable TES (e.g., https://my.tes.org/ga4gh/tes/v1)

Run the following command

curl -X GET "${TES}/tasks?name_prefix=foo&page_size=10&view=MINIMAL" -H "accept: application/json"

Expected behavior

Should only list tasks whose IDs start with foo (probably none at all).

Actual behavior

Lists all tasks, including those whose identifiers do not start with foo.

Fix naming of TES task request objects

Description :

The Tes task request, when forwarded to proTES, undergoes certain operations on the request body such as updating the task state, adding a task identifier, logging information, and specifying the TES endpoint where the task is forwarded, etc. All of these operations are currently performed on the task_incoming object, which is an instance of the TesTask class and where the final results of the executed task are stored.
The task_outgoing object, on the other hand, is used to save and store the incoming request in the database without modifying it. This is done to keep a record of what the client had originally passed when the task was received.

Problem :

The naming of TES task requests, that is task_incoming and task_outgoing is currently confusing.

Solution :

Rather than using the terms task_incomingand task_outgoing, it may be more appropriate to refer to one as simply task and the other as task_original. The task copy would be the working version that gets modified and retains all the necessary information to be returned to the user or client, while the task_original would be the one that remains unchanged.

Error handling issues in calculate_distance function of distance-based task distribution module

Extension of issue #134
The calculate_distance function in the distance-based task distribution module has an issue with exception handling, leading to unexpected behavior or errors. Specifically:

ip_distance function is called to calculate distances between all IPs. However, if it raises a ValueError, the exception is ignored, which could lead to incorrect results or unexpected behavior if the distances_full variable is used later in the code.
Similarly, a KeyError exception is ignored when trying to access the distances_full dictionary, which might lead to incorrect outputs.

To solve this issue handle these exceptions more robustly, for example, by logging the exception or raising an error to alert the user that something has gone wrong.

Versioning in one place

Is your feature request related to a problem? Please describe.
Currently the app version is defined/hardcoded in several places. This is likely to cause inconsistencies.

Describe the solution you'd like
There should be one definite place to set the app version, possibly via an environment variable.

Describe alternatives you've considered
N/A

Additional context
N/A

Documentation contains broken links

The link bibliography at the bottom of README.md appears to be broken, causing several inline links not to be rendered correctly. Moreover, the documented port to access the API after docker-compose deployment is wrong (is 7878 but should be 8080).

Add unit tests

The goal is a code coverage of 100% through unit tests alone.

Remove unused configuration parser

pro_tes/config/config_parser.py is not used anymore (using FOCA instead) and can therefore be removed, along with the entire subpackage.

Add unit tests

Is your feature request related to a problem? Please describe.

Add extensive unit tests for the entire app.

Describe the solution you'd like

Write unit tests (preferentially using pytest) for every class/method/function, aiming for 100% code coverage.

Describe alternatives you've considered
N/A

Several issues with TES/input URI processing in distance-based task distribution logic

When working with FTP in Funnel, we need to supply FTP credentials through the URLs. In the current implementation, input and TES URIs/URLs are parsed with urllib.parse.urlparse. Out of the resulting fragments, the netloc is then passed to socket.gethostbyname to get the IP of that host. However, the netloc extracted via urllib.parse.urlparse retains (basic) authorization credentials (e.g., user:[email protected]), but socket.gethostbyname is not able to parse these and throws a socket.gaierror. This exception is caught, but is handled in such a way that the list of input URIs is either incomplete or empty. In the latter case, this leads to an error in pro_tes.middleware.task_distribution.distance.task_distribution because no TES/input URL/URI combinations can be compiled (a situtation that is not handled).

To address this issue fully, the following should be done:

Remove auth credentials from URLs/URIs before determining IPs
Return a custom error amounting to 400 if input URIs cannot be parsed gethostbyname
Return a custom error amounting to 500 if TES URIs cannot be parsed by gethostbyname
Add comprehensive type hints to module
The distance-based logic should not fail if, for some reason, no TES/input IP combinations can be constructed; in that case, proTES should fall back to random distribution
Write unit tests for all code in the module and run in the CI

Task state set incorrectly when best TES instance fails

Description :

proTES tries to send the task to the best available TES which is done using a ranked list of TES instances, but when the submission of the task to the best available TES fails, it tries to forward the task to the next available instances.

Problem:

When submission of the task to the best TES instance fails the state is set to SYSTEM_ERROR.Ideally, it should not set the state to SYSTEM_ERROR until the task submission fails on all the available TES instances.

Process TEStribute results

Is your feature request related to a problem? Please describe.

After integrating TEStribute (#16), its results need to be processed. In particular, the resulting list of ranked services should be looped over and the incoming TES request modified such that it uses the recommended DRS object IDs for input files. It is open for discussion how this is to be handled for output files (probably not at all, for the moment). The TES request should then be forwarded to the recommended TES instance.

Describe the solution you'd like

DRS object IDs should be replaced according to TEStributes recommendations in the original TES request and the request should then be forwarded to the recommended TES instance. If any of the services is unavailable, the next combination should be tried.

Describe alternatives you've considered

N/A

Additional context

Depends on #16

Note that TEStribute currently works slightly beyond the GA4GH specs, i.e., it makes an assumption of data repository services that is not warranted by their specification and it amends a property in a model of one endpoint and adds another required endpoint to the TES specs, thus making it depend on specifically tuned TES implementations for now. For this reason, the use of the task distribution logic middleware must be optional (set a switch and necessary config parameters in the config file.

Set up CD

Service should be automatically redeployed when dev branch changes.

Store geolocations of TES instances

Is your feature request related to a problem? Please describe.

The distance-based task distribution is quite slow. A major reason is that the geolocations for each TES instance are retrieved from a remote service, one by one, for each individual task, via a costly HTTP call.

Describe the solution you'd like

On the execution of the first task with the distance-based task distribution logic, fetch the geolocations of all TES instances from the remote service and store them in a dedicated database collection, using the host (e.g,. csc-tesk-noauth.rahtiapp.fi) as a key. Then, for any successive task, retrieve the geolocations of all TES instances in the list from the database collection with a single database call. Should there be any TES instances missing in the database (because they were added in the meantime; probably a rare occurence), fetch the missing geolocations from the remote service and add them to the database collection.

In this way, for the vast majority of calls, only input URI geolocations will need to be fetched from the remote service.

Describe alternatives you've considered

In addition to storing the geolocations of all encountered TES instances, it may be worthwhile to check if the geolocations for multiple input files can be fetched with a single request. This may further reduce the cost and limit the number of remote calls to one, even with multiple input files.

Implement endpoint "POST /tasks"

Is your feature request related to a problem? Please describe.
proTES should respond to all TES requests in the most reasonable or helpful way in a spec-conformant way.

In the case of the POST /tasks endpoint, this basically means that the task should be forwarded to a suitable TES endpoint after passing through some middleware.

Describe the solution you'd like
An ID for the task should be generated and stored in the database, together with metadata. The task should then be placed on a message broker and its ID returned to the caller. Afterwards, the task should be asynchronously picked up by a worker, passing through middleware (only generically implemented here) and eventually relayed to a TES endpoint. The task's database entry should then be updated with the TES instance URL and the returned TES ID (i.e., there will be two TES IDs, proTES's internal one that was returned to the caller, as well as the one generated by the TES instance that actually carries out the computation). In lieu of a callback mechanism that would allow the compute TES to proactively update task state changes during processing, TES should then be continuously polled for status changes, which should be monitored by a task monitor daemon. The entire process is almost entirely identical to that for the POST /runs endpoint in WES-ELIXIR, implemented in files:

https://github.com/elixir-europe/WES-ELIXIR/blob/dev/wes_elixir/ga4gh/wes/server.py
https://github.com/elixir-europe/WES-ELIXIR/blob/dev/wes_elixir/ga4gh/wes/endpoints/run_workflow.py
Several more files in the WES-ELIXIR repo can be used as guides for this, such as those required for the RabbitMQ broker, Celery workers and task monitor.

Describe alternatives you've considered
N/A

Additional context
Depends on issue #17 being resolved.

Replace config handler with external one

Linked with #55
Replace the config handler using foca package.

Middleware called before task is created

The middleware is called before the task is even created, which results in a modified incoming task request and we can never store the unmodified copy of the original request
The ideal way is that the incoming task request is stored first in the database and then the middleware is called

Implement endpoint "GET /tasks"

Is your feature request related to a problem? Please describe.
proTES should respond to all TES requests in the most reasonable or helpful way in a spec-conformant way.

In the case of the GET /tasks endpoint, this simply means to return the tasks that proTES itself has seen and which are stored in the database.

Describe the solution you'd like
Return the list of available tasks from the database. The solution is almost identical to that for the GET /runs endpoint in WES-ELIXIR, implemented in files:

Describe alternatives you've considered
N/A

Additional context
Depends on issue #19 being resolved.

Automatic semantic versioning

Is your feature request related to a problem? Please describe.
The app version has to be manually raised whenever the app changes. This is likely to be forgotten and leads to cluttering of the commit history.

Describe the solution you'd like
Ideally, versions should be automatically bumped depending on the commit/merge, possibly by setting webhooks.

Describe alternatives you've considered
N/A

Additional context
N/A

Task/workflow state change callback mechanism

Is your feature request related to a problem? Please describe.
Identical to elixir-cloud-aai/cwl-WES/issues/57 from WES-ELIXIR.

Describe the solution you'd like
Resolve together with elixir-cloud-aai/cwl-WES/issues/57 from WES-ELIXIR, as both services/apps will be affected.

Describe alternatives you've considered
N/A

Additional context
N/A

Implement endpoint "GET /tasks/service-info"

Is your feature request related to a problem? Please describe.
proTES should respond to all TES requests in the most reasonable or helpful way in a spec-conformant way.

In the case of the GET /tasks/service-info endpoint, this simply means to return the service info from the database.

Describe the solution you'd like
Add service info parameters to the config file and load them to the database upon starting the service (refresh upon restart). Upon receiving the request, return the service info from the database. The solution is almost identical to that for the /service-info endpoint in WES-ELIXIR, implemented in files:

Describe alternatives you've considered
N/A

Additional context
Depends on issue #17 being resolved.

Kubernetes deployment

Is your feature request related to a problem? Please describe.

For production, it should be possible to deploy proTES on Kubernetes.

Describe the solution you'd like

Create a new deployment subdirectory that contains documentation on how to deploy proTES on Kubernetes and contains the necessary YAML files for deployment.

Describe alternatives you've considered

Not applicable.

Additional context

The same YAML templates should be usable for both vanilla Kubernetes and OpenShift. We can have a set of common files and then files specific to OpenShift (e.g. Route) and Kubernetes (e.g. NGINX ingress setup).

Return execution trace in task log

Currently, there is no way for clients to trace the route the task took from proTES, and so they have no chance of knowing where a given task was actually executed. This may or may not be desirable.

Implement a config param that when set to True adds information to the log on the trace that a given task request took. The information should extend the tesTaskLog model and account for the possibility that multiple gateways may be included in a call chain.

One solution might be the following recursive extension of tesTaskLog:

    tesTaskLog:
      ...
      properties:
        ...
        forwarded_to:
          $ref: '#/components/schemas/tesNextTes'
      description: TaskLog describes logging information related to a Task.

with:

    tesNextTes:
      required:
      - url
      - id
      type: object
      properties:
        url:
          type: string
          description: TES server to which the task was forwarded.
          example: https://my.tes.instance/
        id:
          type: string
          description: Task identifier assigned by the TES server to which the task was forwarded.
          example: job-0012345
        forwarded_to:
          $ref: '#/components/schemas/tesNextTes'
      description: Describes the TES server to which the task was forwarded, if applicable.

Add auth for RabbitMQ

Is your feature request related to a problem? Please describe.
Currently neither the app itself nor the deployment use authentication for the RabbitMQ broker.

Describe the solution you'd like
For security reasons, secure RabbitMQ similar to what is already being done for MongoDB.

Describe alternatives you've considered
N/A

Additional context
N/A

Allow users to configure security definitions

Is your feature request related to a problem? Please describe.
If I want to build proTES without providing (or providing at a later time) an OpenAPI file, I won't be able to do it because it is currently required in app.py, since add_security_definitions=True.

Describe the solution you'd like
A very easy solution to this would be to get that flag from the app_config.yaml file, where the authorization_required key is already available, thus changing the above-mentioned line in app.py to add_security_definitions=get_conf(config, 'security', 'authorization_required').
This is consistent with what is done for other arguments of register_openapi.

Describe alternatives you've considered
At the moment I'm solving this by simply replacing the original app.py file with my own version that has that simple code change already in place, but it is quite annoying.

Additional context
N/A

Update package setup configuration

The package setup configuration is quite outdated. Refactor and replace values.

test: unit test for tasks module

Is your feature request related to a problem? Please describe.
This issue is in connection to issue #15
The current code-base does not have a test for tasks module.

This is a Python module that has scripts defining a Celery task named task__track_task_progress that is responsible for relaying a task run request to a remote TES (Task Execution Service) API, and then tracking the progress of the task.

Within the task, there is code for creating a database client, updating the state of the task to INITIALIZING, fetching the task log, and then tracking the task progress by continuously polling the remote TES API. The task state is updated in the database as the task progresses, and finally, once the task has finished, the document in the database is updated to reflect the final state and the output logs.

Describe the solution you'd like
The test will have following test:

mocks the DbDocumentConnector and tes.HTTPClient classes and the Flask app instance.
tests whether the update_task_state and get_document methods are called with the correct arguments
test whether the upsert_fields_in_root_object method is called with the correct root and arguments after the task has finished.
tests whether the get_task method of tes.HTTPClient is called with the correct arguments and whether the state of the task is updated correctly in the database.

Additional context
The tests will be added to test folder.

ci: include Docker image build & publishing job

Depends on #102

Integrate optional task distribution logic middleware

Is your feature request related to a problem? Please describe.

One of the main benefits of having TES requests intercepted by proTES is that it allows for the distribution of tasks over a network of TES instances, regardless of whether the workflow engine that emitted the TES request contains such a feature. Through the integration of a task distribution logic, proTES will be able to select the most advantageous TES instance for a given task, according to cost and/or clock time considerations.

Describe the solution you'd like

The proof-of-concept task distribution app TEStribute has recently been developed.

A publicly accessible API service of TEStribute can be integrated by service calls to its single endpoint /rank-services. Refer to TEStribute's documentation for details.

Describe alternatives you've considered

If no publicly accessible TEStribute API service is available, the package can also be imported and used in the following manner (refer to TEStribute documentation for details):

from TEStribute import rank_services

rank_services(...)

Additional context

Note that TEStribute currently works slightly beyond the GA4GH specs, i.e., it makes an assumption of data repository services that is not warranted by their specification and it amends a property in a model of one endpoint and adds another required endpoint to the TES specs, thus making it only with specifically tuned TES implementations for now. For this reason, the use of the task distribution logic middleware must be optional (set a switch and necessary config parameters in the config file.

Validate bearer token through OIDC/OAuth endpoint

Is your feature request related to a problem? Please describe.
Identical to elixir-cloud-aai/cwl-WES/issues/109 from WES-ELIXIR.

Describe the solution you'd like
Wait until elixir-cloud-aai/cwl-WES/issues/109 is resolved in WES-ELIXIR, then migrate solution to this repo.

Describe alternatives you've considered
N/A

Additional context
N/A

Production grade Flask deployment

Is your feature request related to a problem? Please describe.
The current Flask application is not production grade, and should be used for development environments only.

Describe the solution you'd like
Re-structure the proTES code base in order to support running a Gunicorn app on top of Flask.

Submitted tasks stay in state QUEUE on OpenShift deployment

On the remote TES(K) instances, the tasks are actually completed, so the issue must be in proTES.

Need to figure out details, but one reason may be that the external (TESK) task ID (of the form task- followed by 8-character string composed of numbers and lower-case characters) rather than the proTES task ID (7-character string composed of numbers and upper-case characters) is returned for POST /tasks.

elixir-cloud-aai / protes Goto Github PK

protes's Introduction

proTES

Synopsis

Description

Built-in middleware plugins

Implementation notes

Installation

Requirements

Prerequisites

Configure (optional)

Deploy

Contributing

Versioning

License

Contact

protes's People

Contributors

Stargazers

Watchers

Forkers

protes's Issues

Recommend Projects

Recommend Topics

Recommend Org