galaxyproject / galaxy-helm Goto Github PK

Minimal setup required to run Galaxy under Kubernetes

License: MIT License

Python 48.16% Smarty 42.84% Shell 9.01%

galaxy-helm's Introduction

Galaxy Helm Chart (v5)

Galaxy is a data analysis platform focusing on accessibility, reproducibility, and transparency of primarily bioinformatics data. This repo contains a Helm chart for easily deploying Galaxy on top of Kubernetes. The chart allows application configuration changes, updates, upgrades, and rollbacks.

Supported software versions

Kubernetes 1.27+
Helm 3.5+

Kubernetes cluster

You will need kubectl (instructions) and Helm (instructions) installed.

Running Galaxy locally in a dev environment

For testing and development purposes, an easy option to get Kubernetes running is to install Rancher Desktop. Once you have it installed, you will also need to setup an ingress controller. Rancher uses Traefik as the default one, so disable it first by unchecking Enable Traefik from the Kubernetes Settings page. Then deploy the NGINX ingress controller:

helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace

Dependency charts

This chart relies on the features of other charts for common functionality:

postgres-operator for the database;
galaxy-cvmfs-csi for linking the reference data to Galaxy and jobs based on CVMFS (default).
csi-s3 for linking reference data to Galaxy and jobs based on S3FS (optional/alternative to CVMFS).
rabbitmq-cluster-operator for deploying the message queue.

In a production setting, especially if the intention is to run multiple Galaxies in a single cluster, we recommend installing the dependency charts separately once per cluster, and installing Galaxy with --set postgresql.deploy=false --set s3csi.deploy=false --set cvmfs.deploy=false --set rabbitmq.deploy=false.

Installing the chart

Using the chart from the packaged chart repo

The chart is automatically packaged, versioned and uploaded to a helm repository on each accepted PR. Therefore, the latest version of the chart can be downloaded from this repository.

helm repo add cloudve https://raw.githubusercontent.com/CloudVE/helm-charts/master/
helm repo update

Install the chart with the release name my-galaxy. It is not advisable to install Galaxy in the default namespace.

helm install my-galaxy-release cloudve/galaxy

Using the chart from GitHub repo

Clone this repository and add required dependency charts:

git clone https://github.com/galaxyproject/galaxy-helm.git
cd galaxy-helm/galaxy
helm dependency update

To install the chart with the release name my-galaxy. See Data persistence section below about the use of persistence flag that is suitable for your Kubernetes environment.

helm install --create-namespace -n galaxy my-galaxy . --set persistence.accessMode="ReadWriteOnce"

In several minute, Galaxy will be available at /galaxy/ URL of your Kubernetes cluster. If you are running the development Kubernetes, Galaxy will be available at http://localhost/galaxy/ (note the trailing slash).

Uninstalling the chart

To uninstall/delete the my-galaxy deployment, run:

helm delete my-galaxy

if you see that some RabbitMQ and Postgres elements remain after some 10 minutes or more, you should be able to issue:

kubectl delete RabbitmqCluster/my-galaxy-rabbitmq-server
kubectl delete statefulset/galaxy-my-galaxy-postgres

it might be needed to remove the finalizer from the RabbitmqCluster if the above doesn't seem to get rid of RabbitmqCluster, through a

kubectl edit RabbitmqCluster/my-galaxy-rabbitmq-server

remove the finalizer in:

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  annotations:
    meta.helm.sh/release-name: my-galaxy
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2022-12-19T16:54:33Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2022-12-19T17:41:40Z"
  finalizers:
  - deletion.finalizers.rabbitmqclusters.rabbitmq.com

and remove the postgres secret:

kubectl delete secrets/standby.galaxy-my-galaxy-postgres.credentials.postgresql.acid.zalan.do

Consider as well that if you set persistence to be enabled, Postgres and Galaxy will leave their PVCs behind, which you might want to delete or not depending on your use case.

Configuration

The following table lists the configurable parameters of the Galaxy chart. The current default values can be found in values.yaml file.

Parameters	Description
`nameOverride`	Override the name of the chart used to prefix resource names. Defaults to `{{.Chart.Name}}` (e.g., `galaxy`)
`fullnameOverride`	Override the full name used to prefix resource names. Defaults to `{{.Release.Name}}-{{.Values.nameOverride}}`
`image.pullPolicy`	Galaxy image pull policy for more info
`image.repository`	The repository and name of the Docker image for Galaxy, searches Docker Hub by default
`image.tag`	Galaxy Docker image tag (generally corresponds to the desired Galaxy version)
`imagePullSecrets`	Secrets used to access a Galaxy image from a private repository
`persistence.enabled`	Enable persistence using PVC
`persistence.size`	PVC storage request for the Galaxy volume, in GB
`persistence.accessMode`	PVC access mode for the Galaxy volume
`persistence.annotations.{}`	Dictionary of annotations to add to the persistent volume claim's metadata
`persistence.existingClaim`	Use existing Persistent Volume Claim instead of creating one
`persistence.storageClass`	Storage class to use for provisioning the Persistent Volume Claim
`persistence.name`	Name of the PVC
`persistence.mountPath`	Path where to mount the Galaxy volume
`useSecretConfigs`	Enable Kubernetes Secrets for all config maps
`configs.{}`	Galaxy configuration files and values for each of the files. The provided value represent the entire content of the given configuration file
`jobs.priorityClass.enabled`	Assign a priorityClass to the dispatched jobs.
`jobs.rules`	Galaxy dynamic job rules. See `values.yaml`
`jobs.priorityClass.existingClass`	Use an existing priorityClass to assign if `jobs.priorityClass.enabled=true`
`refdata.enabled`	Whether or not to mount cloud-hosted Galaxy reference data and tools.
`refdata.type`	`s3fs` or `cvmfs`, determines the CSI to use for mounting reference data. `s3fs` is the default and recommended for the time being.
`s3csi.deploy`	Deploy the CSI-S3 Helm Chart. This is an optional dependency, and for production scenarios it should be deployed separately as a cluster-wide resource.
`cvmfs.deploy`	Deploy the Galaxy-CVMFS-CSI Helm Chart. This is an optional dependency, and for production scenarios it should be deployed separately as a cluster-wide resource
`cvmfs.enabled`	Enable use of CVMFS in configs, and deployment of CVMFS Persistent Volume Claims for Galaxy
`cvmfs.pvc.{}`	Persistent Volume Claim to deploy for CVMFS repositories. See `values.yaml` for examples.
`setupJob.ttlSecondsAfterFinished`	Sets `ttlSecondsAfterFinished` for the initialization jobs. See the Kubernetes documentation for more details.
`setupJob.downloadToolConfs.enabled`	Download configuration files and the `tools` directory from an archive via a job at startup
`setupJob.downloadToolConfs.archives.startup`	A URL to a `tar.gz` publicly accessible archive containing AT LEAST conf files and XML tool wrappers. Meant to be enough for Galaxy handlers to startup.
`setupJob.downloadToolConfs.archives.running`	A URL to a `tar.gz` publicly accessible archive containing AT LEAST confs, tool wrappers, and tool scripts but excluding test data. Meant to be enough for Galaxy handlers to run jobs.
`setupJob.downloadToolConfs.archives.full`	A URL to a `tar.gz` publicly accessible archive containing the full `tools` directory, including each tool's test data. Meant to be enough to run automated tool-tests, fully mimicking CVMFS repositories
`setupJob.downloadToolConfs.volume.mountPath`	Path at which to mount the unarchived confs in the each handler (should match path set in the tool confs)
`setupJob.downloadToolConfs.volume.subPath`	Name of subdirectory on Galaxy's shared filesystem to use for the unarchived configs
`setupJob.createDatabase`	Deploy a job to create a Galaxy database from scratch (does not affect subsequent upgrades, only first startup)
`ingress.path`	Path where Galaxy application will be hosted
`ingress.annotations.{}`	Dictionary of annotations to add to the ingress's metadata at the deployment level
`ingress.hosts`	Hosts for the Galaxy ingress
`ingress.canary.enabled`	This will create an additional ingress for detecting activity on Galaxy. Useful for autoscaling on activity.
`ingress.enabled`	Enable Kubernetes ingress
`ingress.tls`	Ingress configuration with HTTPS support
`service.nodePort`	If `service.type` is set to `NodePort`, then this can be used to set the port at which Galaxy will be available on all nodes' IP addresses
`service.port`	Kubernetes service port
`service.type`	Kubernetes Service type
`serviceAccount.annotations.{}`	Dictionary of annotations to add to the service account's metadata
`serviceAccount.create`	The serviceAccount will be created if it does not exist.
`serviceAccount.name`	The serviceAccount account to use.
`rbac.enabled`	Enable Galaxy job RBAC. This will grant the service account the necessary permissions/roles to view jobs and pods in this namespace. Defaults to true.
`webHandlers.{}`	Configuration for the web handlers (See table below for all options)
`jobHandlers.{}`	Configuration for the job handlers (See table below for all options)
`workflowHandlers.{}`	Configuration for the workflow handlers (See table below for all options)
`resources.limits.memory`	The maximum memory that can be allocated.
`resources.requests.memory`	The requested amount of memory.
`resources.limits.cpu`	The maximum CPU that can be alloacted.
`resources.limits.ephemeral-storage`	The maximum ephemeral storage that can be allocated.
`resources.requests.cpu`	The requested amount of CPU (as time or number of cores)
`resources.requests.ephemeral-storage`	The requested amount of ephemeral storage
`securityContext.fsGroup`	The group for any files created.
`tolerations`	Define the `taints` that are tolerated.
`extraFileMappings.{}`	Add extra files mapped as configMaps or Secrets at arbitrary paths. See `values.yaml` for examples.
`extraInitCommands`	Extra commands that will be run during initialization.
`extraInitContainers.[]`	A list of extra init containers for the handler pods
`extraVolumeMounts.[]`	List of volumeMounts to add to all handlers
`extraVolumes.[]`	List of volumes to add to all handlers
`postgresql.enabled`	Enable the postgresql condition in the requirements.yml.
`influxdb.username`	Influxdb user name.
`influxdb.url`	The connection URL to in the `influxdb`
`influxdb.enabled`	Enable the `influxdb` used by the metrics scraper.
`influxdb.password`	Password for the influxdb user.
`metrics.podAnnotations.{}`	Dictionary of annotations to add to the metrics deployment's metadata at the pod level
`metrics.image.repository`	The location of the galay-metrics-scraping image to use.
`metrics.image.pullPolicy`	Define the pull policy, that is, when Kubernetes will pull the image.
`metrics.podSpecExtra.{}`	Dictionary to add to the metrics deployment's pod template under `spec`
`metrics.image.tag`	The image version to use.
`metrics.annotations.{}`	Dictionary of annotations to add to the metrics deployment's metadata at the deployment level
`metrics.enabled`	Enable metrics gathering. The influxdb setting must be specified when using this setting.
`nginx.conf.client_max_body_size`	Requests larger than this size will result in a `413 Payload Too Large`.
`nginx.image.tag`	The Nginx version to pull.
`nginx.image.repository`	Where to obtain the Nginx container.
`nginx.image.pullPolicy`	When Kubernetes will pull the Nginx image from the repository.
`nginx.galaxyStaticDir`	Location at which to copy Galaxy static content in the NGINX pod init container, for direct serving. Defaults to `/galaxy/server/static`

Handlers

Galaxy defines three handler types: jobHandlers, webHandlers, and workflowHandlers. All three handler types share common configuration options.

Parameter	Description
`replicaCount`	The number of handlers to be spawned.
`startupDelay`	Delay in seconds for handler startup. Used to offset handlers and avoid race conditions at first startup
`annotations`	Dictionary of annotations to add to this handler's metadata at the deployment level
`podAnnotations`	Dictionary of annotations to add to this handler's metadata at the pod level
`podSpecExtra`	Dictionary to add to this handler's pod template under `spec`
`startupProbe`	Probe used to determine if a pod has started. Other probes wait for the startup probe. See table below for all probe options
`livenessProbe`	Probe used to determine if a pod should be restarted. See table below for all probe options
`readinessProbe`	Probe used to determine if the pod is ready to accept workloads. See table below for all probe options

Probes

Kubernetes uses probes to determine the state of a pod. Pods are not considered to have started up, and hence other probes are not run, until the startup probes have succeeded. Pods that fail the livenessProbe will be restarted and work will not be dispatched to the pod until the readinessProbe returns successfully. A pod is ready when all of its containers are ready.

Liveness and readiness probes share the same configuration options.

Parameter	Description
`enabled`	Enable/Disable the probe
`initialDelaySeconds`	How long to wait before starting the probe.
`periodSeconds`	How frequently Kubernetes with check the probe.
`failureThreshold`	The number of failures Kubernetes will retry the readiness probe before giving up.
`timeoutSeconds`	How long Kubernetes will wait for a probe to timeout.

Examples

jobHandlers:
  replicaCount: 2
  livenessProbe:
    enabled: false
  readinessProbe:
    enabled: true
    initialDelaySeconds: 300
    periodSecods: 30
    timeoutSeconds: 5
    failureThreshhold: 3

Additional Configurations

Extra File Mappings

The extraFileMappings field can be used to inject files to arbitrary paths in the nginx deployment, as well as any of the job, web, or workflow handlers, and the init jobs.

The contents of the file can be specified directly in the values.yml file with the content attribute.

The tpl flag will determine whether these contents are run through the helm templating engine.

Note: when running with tpl: true, brackets ({{ }}) not meant for Helm should be escaped. One way of escaping is: {{ '{{ mynon-helm-content}}' }}

extraFileMappings:
  /galaxy/server/static/welcome.html:
    applyToWeb: true
    applyToJob: false
    applyToWorkflow: false
    applyToNginx: true
    applyToSetupJob: false
    tpl: false
    content: |
      <!DOCTYPE html>
      <html>...</html>

NOTE for security reasons Helm will not load files from outside the chart so the path must be a relative path to location inside the chart directory. This will change when helm#3276 is resolved. In the interim files can be loaded from external locations by:

Creating a symbolic link in the chart directory to the external file, or
using --set-file to specify the contents of the file. E.g: helm upgrade --install galaxy cloudve/galaxy -n galaxy --set-file extraFileMappings."/galaxy/server/static/welcome\.html".content=/home/user/data/welcome.html --set extraFileMappings."/galaxy/server/static/welcome\.html".applyToWeb=true

Alternatively, if too many .applyTo need to be set, the apply flags can be inserted instead to the extraFileMappings (in addition to the --set-file in the cli) for that file in your values.yaml, with no content: part (as that is done through the --set-file):

extraFileMappings:
  /galaxy/server/static/welcome.html:
    applyToJob: false
    applyToWeb: true
    applyToSetupJob: false
    applyToWorkflow: false
    applyToNginx: false
    tpl: false

Setting parameters on the command line

Specify each parameter using the --set key=value[,key=value] argument to helm install or helm upgrade. For example,

helm install my-galaxy . --set persistence.size=50Gi

The above command sets the Galaxy persistent volume to 50GB.

Setting Galaxy configuration file values requires the key name to be escaped. In this example, we are upgrading an existing deployment.

helm upgrade my-galaxy . --set "configs.galaxy\.yml.brand"="Hello World"

You can also set the galaxy configuration file in its entirety with:

helm install my-galaxy . --set-file "configs.galaxy\.yml"=/path/to/local/galaxy.yml

To unset an existing file and revert to the container's default version:

helm upgrade my-galaxy . --set "configs.job_conf\.xml"=null

Alternatively, any number of YAML files that specifies the values of the parameters can be provided when installing the chart. For example,

helm install my-galaxy . -f values.yaml -f new-values.yaml

To unset a config file in a values file, use the YAML null type:

configs:
  job_conf.xml: ~

Data persistence

By default, the Galaxy handlers store all user data under /galaxy/server/database/ path in each container. This path can be changed via persistence.mountPath variable. Persistent Volume Claims (PVCs) are used to persist the data across deployments. It is possible to specify en existing PVC via persistence.existingClaim. Alternatively, a value for persistence.storageClass can be supplied to designate a desired storage class for dynamic provisioning of the necessary PVCs. If neither value is supplied, the default storage class for the K8s cluster will be used.

For multi-node scenarios, we recommend a storage class that supports ReadWriteMany, such as the nfs-provisioner as the data must be available to all nodes in the cluster.

In single-node scenarios, you must use --set persistence.accessMode="ReadWriteOnce".

Note about persistent deployments and restarts

If you wish to make your deployment persistent or restartable (bring deployment down, keep the state in disk, then bring it up again later in time), you should create PVCs for Galaxy and Postgres and use the persistence.existingClaim variable to point to them as explained in the previous section. In addition, you must set the postgresql.galaxyDatabasePassword variable; otherwise, it will be autogenerated and will mismatch when restoring.

Making Interactive Tools work on localhost

In general, Interactive Tools should work out of the box as long as you have a wildcard DNS mapping to *.its.<host_name>. To make Interactive Tools work on localhost, you can use dnsmasq or similar to handle wildcard DNS mappings for *.localhost.

For linux: Follow the instructions here to configure dnsmasq on Linux: https://superuser.com/a/1718296

For mac:

  $ brew install dnsmasq
  $ cp /usr/local/opt/dnsmasq/dnsmasq.conf.example /usr/local/etc/dnsmasq.conf
  $ edit /usr/local/etc/dnsmasq.conf and set

    address=/localhost/127.0.0.1

  $ sudo brew services start dnsmasq
  $ sudo mkdir /etc/resolver
  $ sudo touch /etc/resolver/localhost
  $ edit /etc/resolver/localhost and set

    nameserver 127.0.0.1

  $ sudo brew services restart dnsmasq

This should make all *.localhost and *.its.localhost map to 127.0.0.1, and ITs should work with a regular helm install on localhost.

Horizontal scaling

The Galaxy application can be horizontally scaled for the web, job, or workflow handlers by setting the desired values of the webHandlers.replicaCount, jobHandlers.replicaCount, and workflowHandlers.replicaCount configuration options.

Cron jobs

Two Cron jobs are defined by default. One to clean up Galaxy's database and one to clean up the tmp directory. By default, these jobs run at 02:05 (the database maintenance script) and 02:15 (tmp directyory cleanup). Users can change the times the cron jobs are run by changing the schedule field in the values.yaml file:

cronJobs:
  maintenance:
    schedule: "30 6 * * *" # Execute the cron job at 6:30 UTC

or by specifying the schedule on the command line when instaling Galaxy:

# Schedule the maintenance job to run at 06:30 on the first day of each month
helm install galaxy -n galaxy galaxy/galaxy --set cronJobs.maintenance.schedule="30 6 1 * *"

To disable a cron job after Galaxy has been deployed simply set the enabled flag for that job to false:

helm upgrade galaxy -n galaxy galaxy/galaxy --reuse-values --set cronJobs.maintenance.enabled=false

Run a CronJob manually

Cron jobs can be invoked manually with tools such as OpenLens or from the command line with kubectl

kubectl create job --namespace <namespace> <job name> --from cronjob/galaxy-cron-maintenance

This will run the cron job regardless of the schedule that has been set.

Note: the name of the cron job will be {{ .Release.Name }}-cron-<job name> where the <job name> is the name (key) used in the values.yaml file.

CronJob configuration

The following fields can be specified when defining cron jobs.

Name	Definition	Required
enabled	`true` or `false`. If `false` the cron job will not be run. Default is `true`	Yes
schedule	When the job will be run. Use tools such as crontab.guru for assistance determining the proper schedule string	Yes
defaultEnv	`true` or `false`. See the `galaxy.podEnvVars` macro in `_helpers.tpl` for the list of variables that will be defined. Default is `false`	No
extraEnv	Define extra environment variables that will be available to the job	No
securityContext	Specifies a `securityContext` for the job. Typically used to set `runAsUser`	No
image	Specify the Docker container used to run the job	No
command	The command to run	Yes
args	Any command line arguments that should be passed to the `command`	No
extraFileMappings	Allow arbitrary files to be mounted from config maps	No

Notes

If specifying the Docker image both the resposity and tag MUST be specified.

  image:
    repository: quay.io/my-organization/my-image
    tag: "1.0"

The extraFileMappings block is similar to the global extraFileMappings except the file will only be mounted for that cron job. The following fields can be specified for each file.

Name	Definition	Required
mode	The file mode (permissions) assigned to the file	No
tpl	If set to `true` the file contents will be run through Helm's templating engine. Defaults to `false`	No
content	The contents of the file	Yes

See the example cron job included in the values.yaml file for a full example.

galaxy-helm's People

Contributors

Stargazers

Watchers

galaxy-helm's Issues

Better handling of mutable configs

xref: #118
My current plan is to eliminate the manual listings in: https://github.com/galaxyproject/galaxy-helm/blob/master/galaxy/templates/_helpers.tpl#L80 and have two different sections in values: configs and managedConfigs (galaxyproject/galaxy#9444)

managedConfigs will not be editable via pure upgrades (i.e. via just configmap changes, because they won't propagate since the init container does not re-copy) i.e. not via CloudMan/Helmsman (at least for the beginning, although we could implement logic to make them editable there as well, but we'd need to start from the one in the mutable config directory, rather than the configmap, and make sure the upgrade is replacing in the mutable config directory). Also, the init container will automatically deal with all managedConfigs, and put them all in the mutable config directory, rather than listing them by name as is now.

Database dumps for 20.05

Hi,
I am trying to set up a new database from scratch, but the migrations always fail at some point using the same postgresql release version than in your chart.

Is there a new way of skipping the migrations in 20.05 (that was mentioned here: #139 (comment) )?
if not, could you add a SQL dump for the 20.05 version?
Thanks!

First crash causes this dump: https://pastebin.com/kZbxKCQC
Then the deployment-job reloads and seems to work after a large number of restarts (>20)?

[2.0] Helm upgrade breaks run

The following has happened twice (without any other time being succesful):

When attempting to do a helm update, the running Galaxy container fails with:

[pid: 561|app: 0|req: 16/29] 172.17.0.1 () {36 vars in 411 bytes} [Wed Sep 12 08:28:18 2018] GET /galaxy/ => generated 47534 bytes in 11 msecs (HTTP/1.1 200) 3 headers in 272 bytes (1 switches on core 3)
172.17.0.1 - - [12/Sep/2018:08:28:24 +0000] "GET /galaxy/ HTTP/1.1" 200 - "-" "kube-probe/1.10"
[pid: 560|app: 0|req: 14/30] 172.17.0.1 () {36 vars in 411 bytes} [Wed Sep 12 08:28:24 2018] GET /galaxy/ => generated 47534 bytes in 11 msecs (HTTP/1.1 200) 3 headers in 272 bytes (1 switches on core 2)
galaxy.queue_worker DEBUG 2018-09-12 08:28:29,152 [p:561,w:2,m:0] [ToolConfWatcher.thread] Executing toolbox reload on 'main.web.2'
Exception in thread ToolConfWatcher.thread:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "lib/galaxy/tools/toolbox/watcher.py", line 138, in check
  File "lib/galaxy/webapps/galaxy/config_watchers.py", line 24, in <lambda>
  File "lib/galaxy/queue_worker.py", line 92, in reload_toolbox
  File "lib/galaxy/queue_worker.py", line 111, in _get_new_toolbox
  File "lib/galaxy/tools/__init__.py", line 230, in __init__
  File "lib/galaxy/tools/fetcher.py", line 7, in __init__
  File "lib/galaxy/tools/fetcher.py", line 11, in __resolvers_dict
  File "lib/galaxy/util/plugin_config.py", line 18, in plugins_dict
  File "lib/galaxy/util/submodules.py", line 10, in submodules
  File "lib/galaxy/util/submodules.py", line 28, in __submodule_names
OSError: [Errno 2] No such file or directory: 'lib/galaxy/tools/locations'

galaxy.queue_worker DEBUG 2018-09-12 08:28:29,884 [p:560,w:1,m:0] [ToolConfWatcher.thread] Executing toolbox reload on 'main.web.1'
Exception in thread ToolConfWatcher.thread:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "lib/galaxy/tools/toolbox/watcher.py", line 138, in check
  File "lib/galaxy/webapps/galaxy/config_watchers.py", line 24, in <lambda>
  File "lib/galaxy/queue_worker.py", line 92, in reload_toolbox
  File "lib/galaxy/queue_worker.py", line 111, in _get_new_toolbox
  File "lib/galaxy/tools/__init__.py", line 230, in __init__
  File "lib/galaxy/tools/fetcher.py", line 7, in __init__
  File "lib/galaxy/tools/fetcher.py", line 11, in __resolvers_dict
  File "lib/galaxy/util/plugin_config.py", line 18, in plugins_dict
  File "lib/galaxy/util/submodules.py", line 10, in submodules
  File "lib/galaxy/util/submodules.py", line 28, in __submodule_names
OSError: [Errno 2] No such file or directory: 'lib/galaxy/tools/locations'

172.17.0.1 - - [12/Sep/2018:08:28:30 +0000] "GET /galaxy/ HTTP/1.1" 500 - "-" "kube-probe/1.10"
Error - <class 'mako.exceptions.TemplateLookupException'>: Cant locate template for uri '/js-app.mako'
URL: http://172.17.0.8/galaxy/
File 'lib/galaxy/web/framework/middleware/error.py', line 154 in __call__
File '/export/venv/local/lib/python2.7/site-packages/paste/recursive.py', line 85 in __call__
  return self.application(environ, start_response)
File 'lib/galaxy/web/framework/middleware/statsd.py', line 40 in __call__
File '/export/venv/local/lib/python2.7/site-packages/paste/httpexceptions.py', line 640 in __call__
  return self.application(environ, start_response)
File 'lib/galaxy/web/framework/base.py', line 142 in __call__
File 'lib/galaxy/web/framework/base.py', line 221 in handle_request
File 'lib/galaxy/webapps/galaxy/controllers/root.py', line 111 in index
File 'lib/galaxy/web/base/controller.py', line 324 in template
File 'lib/galaxy/web/framework/webapp.py', line 919 in fill_template
File 'lib/galaxy/web/framework/webapp.py', line 927 in fill_template_mako
File '/export/venv/local/lib/python2.7/site-packages/mako/lookup.py', line 247 in get_template
  return self._check(uri, self._collection[uri])
File '/export/venv/local/lib/python2.7/site-packages/mako/lookup.py', line 349 in _check
  "Cant locate template for uri %r" % uri)
TemplateLookupException: Cant locate template for uri '/js-app.mako'


CGI Variables
-------------
  DOCUMENT_ROOT: '/usr/share/nginx/html'
  HTTP_ACCEPT_ENCODING: 'gzip'
  HTTP_CONNECTION: 'close'
  HTTP_HOST: '172.17.0.8:80'
  HTTP_USER_AGENT: 'kube-probe/1.10'
  PATH_INFO: '/'
  REMOTE_ADDR: '172.17.0.1'
  REMOTE_PORT: '55488'
  REQUEST_METHOD: 'GET'
  REQUEST_URI: '/galaxy/'
  SCRIPT_NAME: '/galaxy'
  SERVER_PORT: '80'
  SERVER_PROTOCOL: 'HTTP/1.1'
  UWSGI_SCHEME: 'http'


WSGI Variables
--------------
  application: <paste.recursive.RecursiveMiddleware object at 0x7fcbfd230990>
  controller_action_key: u'web.root.index'
  is_api_request: False
  paste.cookies: (<SimpleCookie: >, '')
  paste.expected_exceptions: [<class 'paste.httpexceptions.HTTPException'>]
  paste.httpexceptions: <paste.httpexceptions.HTTPExceptionHandler object at 0x7fcbfc3197d0>
  paste.recursive.forward: <paste.recursive.Forwarder from /galaxy>
  paste.recursive.include: <paste.recursive.Includer from /galaxy>
  paste.recursive.include_app_iter: <paste.recursive.IncluderAppIter from /galaxy>
  paste.recursive.script_name: '/galaxy'
  paste.throw_errors: True
  request_id: 'd4301300b66511e8ad210242ac110008'
  uwsgi.core: 1
  uwsgi.node: 'sweet-llama-galaxy-stable-f49b4564f-c9kbn'
  uwsgi.version: '2.0.17'
  webob._parsed_query_vars: (GET([]), '')
  wsgi process: 'Multi process AND threads (?)'
  wsgi.file_wrapper: <built-in function uwsgi_sendfile>
------------------------------------------------------------

I reckon that this must be associated to the fact that both new and old Galaxy are looking at the same PVC.

Installation of tools via ephemeris produces a timeout

Installing tools via ephemeris using:

shed-tools install --install_resolver_dependencies \
    --toolsfile $lock_file \
    --galaxy $GALAXY_INSTANCE \
    --api_key $API_KEY 2>&1 | tee -a $name.log

produces the following time outs:

18:07:05 (12/46) Installing repository cutadapt from lparsons to section "RNA-Seq" at revision 660cffd8d92a (TRT: 0:12:18.154861)
18:07:50 Timeout during install of cutadapt, extending wait to 1h
18:07:50 	repository cutadapt installed successfully (in 0:01:20.257473) at revision 660cffd8d92a
18:07:50 (13/46) Installing repository cutadapt from lparsons to section "RNA-Seq" at revision e4691e1589d3 (TRT: 0:13:38.412541)
18:09:23 Timeout during install of cutadapt, extending wait to 1h
18:09:23 	repository cutadapt installed successfully (in 0:01:10.167707) at revision e4691e1589d3
18:09:23 (14/46) Installing repository multiqc from iuc to section "RNA-Seq" at revision 3d93dd18d9f8 (TRT: 0:14:48.580530)
18:10:08 Timeout during install of multiqc, extending wait to 1h
18:10:17 	repository multiqc installed successfully (in 0:01:19.812481) at revision 3d93dd18d9f8
18:10:17 (15/46) Installing repository multiqc from iuc to section "RNA-Seq" at revision b2f1f75d49c4 (TRT: 0:16:08.393306)

They seem to be harmless, but I wonder if this is somehow related to the lack of an nginx setup at the front?

The tools on the tool panel don't look good though (and we use this same setup to provision other Galaxy spin ups), but maybe that is because the process is still on-going?

Is it harmful to have more than a single web handler when requesting these installations? Maybe I should drop the --install_resolver_dependencies flag in this case? I know that, based on the logs, the conda packages are not being installed as expected. I will try tools once they are all installed and see whether they work out of the box with the mulled containers as I expect.

Something to keep an eye on for outsourcing our postgres setup

Not sure if this plays nicely with helm, but might be worth looking at:

https://kubedb.com/docs/0.9.0/guides/postgres/

Make a release that is backwards compatible until k8s 1.13 and then move on to remove deprecations

You might have seen that GKE is moving to k8s 1.16 which will drop a number of deprecated API object versions. We should merge the easiest PRs currently waiting, make a release that we can point to for k8s versions before 1.16, and start moving on to be ahead of the change on GKE.

Hello Google Kubernetes Engine Customer,

We are writing to let you know that upstream Kubernetes open source is removing deprecated APIs in v1.16. Once Google Kubernetes Engine (GKE) upgrades your clusters to v1.16, you will no longer be able to use the deprecated API versions.

What do I need to know?

GKE will gradually upgrade clusters to Kubernetes v1.16:

clusters subscribed to the regular release channel will begin upgrades on or after April 9, 2020
clusters subscribed to the stable release channel and non-release-channel clusters will be upgraded later this year; a reminder will be sent before these clusters are upgraded to v1.16
For more control over when an auto-upgrade can occur (or must not occur), you can configure maintenance windows and exclusions.

As the Kubernetes API evolves they get periodically reorganized or upgraded. When APIs evolve, the old API is deprecated and eventually removed.

The v1.16 release won’t serve deprecated versions of the following APIs, in favor of newer and more stable API versions:

DaemonSet versions extensions/v1beta and apps/v1beta2 are deprecated

use apps/v1, available since Kubernetes 1.9
Deployment versions extensions/v1beta1, apps/v1beta1, and apps/v1beta2 are deprecated

use apps/v1, available since Kubernetes 1.9
ReplicaSet versions extensions/v1beta1, apps/v1beta1, and apps/v1beta2 are deprecated

use apps/v1, available since Kubernetes 1.9
StatefulSet versions apps/v1beta1 and apps/v1beta2 are deprecated

use apps/v1, available since Kubernetes 1.9
NetworkPolicy version extensions/v1beta1 is deprecated

use networking.k8s.io/v1, available since Kubernetes 1.8
PodSecurityPolicy version extensions/v1beta1 is deprecated

use policy/v1beta1, available since Kubernetes 1.10
As of Kubernetes 1.16, API clients will no longer be able to use the deprecated API versions, and manifests using those versions will no longer be able to be applied.

Note that all previously persisted objects remain functional. They can be read and updated via the new API versions, before and after the Kubernetes 1.16 upgrade.

What do I need to do?

Migrate to use the current API versions before your clusters are upgraded to Kubernetes v1.16 to ensure your API clients and resource manifests can access and update API resources without interruption. More information is available from the Kubernetes project.

Broken helm repo

The galaxy-helm-repo from the README appears to be broken.

Postgres and rabbitmq leaving PVCs bound behind after helm delete

Issuing helm delete <release-name> is leaving behind postgres and (now) rabbitmq bound PVCs:

(base) C02WF18BHV2H:galaxy-helm pmoreno$ kubectl get pvc
NAME                                         STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-bailing-kangaroo-galaxy-postgres-0      Bound         pvc-f0b722b8-7cf2-4f22-b798-4b508eb1beaa   8Gi        RWO            standard       8d
data-bailing-kangaroo-galaxy-rabbitmq-0      Bound         pvc-6f344881-1dd7-41e2-9b83-a01b1121e7df   8Gi        RWO            standard       8d
data-bald-jaguar-galaxy-postgres-0           Bound         pvc-b2bc1dfe-4c92-41c5-abba-5dbbaaa86b2e   8Gi        RWO            standard       8d
data-bald-jaguar-galaxy-rabbitmq-0           Bound         pvc-8dbccbf8-68a3-4456-be85-8a701a5ba144   8Gi        RWO            standard       8d
data-bald-olm-galaxy-postgres-0              Bound         pvc-daf09629-467b-4fcc-8d0d-2f08970093c8   8Gi        RWO            standard       8d

Maybe there are some defaults in the values that we should set for those dep charts, or be explicit in the docs that PVCs will be kept and how to avoid it.

Galaxy / Kubernetes / HTCondor implementation on Azure

I'd like to use this thread to discuss and present issues around implementing the AnswerALS workflow on Azure using HTCondor and Kubernetes and the Azure Container Service.

The current implementation breaks into two core nodes: storage and compute. We are having issues enabling multiple nodes to be spun up with jobs brokered between them by HTCondor.

The ultimate goal of this implementation is presented in this doc.

Init containers copy and upgrades

This way of adding files https://github.com/galaxyproject/galaxy-helm/blob/master/galaxy/templates/_helpers.tpl#L84 causes a problem when upgrading from 20.05 to dev image which includes changes in tools/data_source/upload.py. The instance was booted up with 20.05 image, so the old file had been saved to database/tools/data_source/upload.py, and when the new image booted up with the updated file, it updated under tools/data_source/upload.py (which gave the confusing impression that it's updated), but didn't overwrite the file since it already existed.
An immediate solution could be to just let that specific directory update on each launch. One more complex solution which might give us a good backbone for future things would be to make a post-upgrade helm hook, that would copy over from the new image to the NFS after and if all probes come up. Just making them copy all the time would leave the possibility for a mismatch, since the copy happens as part of the init container, if the new main container is for example stuck in crash loop, it'll create a mismatch between the NFS (newer file) and the still running pod (older file) until the error is resolved and the upgrade is successful).
I would probably deal with this issue at the same time as #123

[2.0] welcome.html is not being served correctly

On the new helm chart setup, content is being served on http://<host:port>/galaxy/ prefix. In this setting, adding a welcome.html page as describe in docker-galaxy-stable (as done here) produces that Galaxy points the browser to find the html in http://<host:port>/galaxy/web/welcome.html, however nginx seems to be serving the page in http://<host:port>/web/welcome.html.

Helm Chart: Pulling it all together

Hi,

Thought of creating a new issue to discuss how we can pull all this work together in time for GCC, for integration with CloudMan. The original issues have become a bit stale now but for reference: #1, #2 and #4.

Have taken a stab at assembling some tasks/questions that (I think) remain, but may be off base on a few, so please correct/edit.

Merge final chart into master and remove simple chart etc. so we have a final set of charts to work on.
Currently, the Docker images come from various sources, including pcm32 and phnmnl registries. Consolidate all docker images for the helm charts into galaxyproject's registry.
Currently, the chart is using a custom Postgres container+service definitions. Transition to the official Postgres helm chart? This seems safer since it's likely to get HA modes and better health checks and maintenance as we go along.
Move proftpd to separate, self-contained chart?
New chart for Pulsar? (probably not for this iteration though).
Have a standardized mechanism for exposing all Galaxy config settings.
Rolling upgrade support for the Galaxy charts. Do upgrades work smoothly at present? We seem to be needing helm post-install/upgrade jobs for database migration.
Galaxy pod scale up/down support. Can we also have multiple Galaxy pods running for a zero-downtime rolling upgrade of Galaxy?
Add an option to switch between the HTCondor and Kubernetes Job runners. (that is, integrate the work contributed by Microsoft/AnswerALS), with possible support for future options (e.g. Pulsar, Slurm).
Perhaps have a separate chart for Galaxy alone, and an umbrella chart that pulls everything together?

ping: @abhi2cool @afgane @bgruening @jmchilton @natefoo @rc-ms @pcm32

Refactoring on secret breaks helm upgrade at same version 3.2.0 or helm install on top of pre-existing PVCs

I had an instance provisioned with the last commit I got merged a few days before #109 was merged. On an upgrade operation against the current master you get:

$ kubectl get pods
NAME                                                   READY   STATUS                       RESTARTS   AGE
lopsided-wildebeest-galaxy-job-0-596c85969c-nwwlz      0/1     Init:0/3                     0          2m18s
lopsided-wildebeest-galaxy-postgres-0                  0/1     CreateContainerConfigError   0          106s
lopsided-wildebeest-galaxy-web-688f57b96-zqhrp         0/1     Init:0/2                     0          2m18s
lopsided-wildebeest-galaxy-web-6c454d48b6-d6b5f        0/1     Running                      0          8d
lopsided-wildebeest-galaxy-web-6c454d48b6-vdrgt        0/1     Running                      0          8d
lopsided-wildebeest-galaxy-web-6c454d48b6-x5mxh        0/1     Running                      0          8d
lopsided-wildebeest-galaxy-workflow-7cfcbf96b5-8br67   1/1     Running                      0          9d
lopsided-wildebeest-galaxy-workflow-7fc6796db7-8d8sn   0/1     Init:0/2                     0          2m18s

on inspection, this is probably due to the refactoring of the secret.

  Warning  Failed     12s (x4 over 53s)  kubelet, 192.168.0.29  Error: secrets "lopsided-wildebeest-galaxy-secrets" not found

Upon these refactoring we should probably produce a new version so that whoever was using 3.2.0 is safe at least on that version.

Move requirements.yaml into the Chart dependencies

Please see this: https://helm.sh/docs/faq/#consolidation-of-requirements-yaml-into-chart-yaml
I am using ArgoCD and getting “requirements.lock is out of sync” errors when I try to install the Chart. For Helm v3 charts dependencies would be listed in the Chart.yaml.
Thanks!
EDIT: as an alternative either remove the requirements.lock file or keep it in sync after each commit...

Handlers restart while migrations happen in Job container

Web and workflow handlers usually restart 1-2 times while migrations happen in Job handler init container.

I think the migrations as a job would fix this, I want follow-up on that PR.

Add support for CVMFS CSI

Given there can be many job containers running on a given K8S node, it does not seem to make sense for each of them to have a CVMFS client for accessing Galaxy's reference data. This would complicate the job container plus lead to very redundant network traffic where each container is retrieving its own copy of the data.

It seems CVMFS CSI is intended to solve this issue by creating a volume on a node and making it available to the containers running on the node: https://gitlab.cern.ch/cloud-infrastructure/cvmfs-csi

Has anyone worked with this perhaps? Or some other CSI implementation as a concept? Is my understanding captured above even correct?

CVMFS disabled still leaves breaking config in galaxy.yml

Starting the helm chart with CVMFS disabled still leaves (with chart version 3.20 to 3.4.2 at least) a config map with galaxy.yml values that break because the point to the expected CVMFS path. It used to be I think that giving a galaxy.yml file in the configs of helm used to replace the galaxy.yml in the config map, but now apparently there is a merge going on. The deployment fails with:

No handlers could be found for logger "__main__"
Traceback (most recent call last):
  File "/galaxy/server/scripts/galaxy-main", line 299, in <module>
    main()
  File "/galaxy/server/scripts/galaxy-main", line 295, in main
    app_loop(args, log)
  File "/galaxy/server/scripts/galaxy-main", line 142, in app_loop
    attach_to_pools=args.attach_to_pool,
  File "/galaxy/server/scripts/galaxy-main", line 108, in load_galaxy_app
    **kwds
  File "/galaxy/server/lib/galaxy/app.py", line 115, in __init__
    self._configure_tool_data_tables(from_shed_config=False)
  File "/galaxy/server/lib/galaxy/config/__init__.py", line 1077, in _configure_tool_data_tables
    config_filename=self.config.tool_data_table_config_path)
  File "/galaxy/server/lib/galaxy/tools/data/__init__.py", line 80, in __init__
    self.load_from_config_file(single_config_filename, self.tool_data_path, from_shed_config=False)
  File "/galaxy/server/lib/galaxy/tools/data/__init__.py", line 117, in load_from_config_file
    tree = util.parse_xml(filename)
  File "/galaxy/server/lib/galaxy/util/__init__.py", line 236, in parse_xml
    root = tree.parse(fname, parser=ElementTree.XMLParser(target=DoctypeSafeCallbackTarget()))
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 647, in parse
    source = open(source, "rb")
IOError: [Errno 2] No such file or directory: '/cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml'

The only way to overcome this, without changing the chart version, is to make sure that the provided galaxy.yml overrides the values injected with CVMFS paths. I had to add the following to my local helm config:

configs:
  galaxy.yml:
    ...
    builds_file_path: "{{.Values.persistence.mountPath}}/tool-data/shared/ucsc/builds.txt"
    tool_data_table_config_path: "/galaxy/server/config/tool_data_table_conf.xml.sample"

that seems to fix it, but we need to check that CVMFS disabled works as expected. Also, not setting it explicitly (CVMFS to enabled: false) also tries to create the PVCs (and hence never starts).

The initdb secret cannot be created

Hi,
when trying to install galaxy-helm for the first time on Kubernetes 1.18 bare-metal, I get the following message: Secret "galaxy-galaxy-initdb" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
Can you either look into that or provide another way to initialize postgresql?
Thanks!

PS: the init2 restore file is extremely long, when using kubectl apply it ends up into the annotation.

galaxy deployment's preStop command fails

The preStop hook in the galaxy pod is failing to run, so I'm getting these events when I try to pull it down:

50s         Warning   FailedPreStopHook   Pod                     Exec lifecycle hook ([./run.sh --stop]) for Container "galaxy-stable" in Pod "quiet-parrot-galaxy-stable-687dbbc84d-7v5mj_default(7472cdb1-56dd-11e9-9f14-080027a959c5)" failed - error: command './run.sh --stop' exited with 126: , message: "OCI runtime exec failed: exec failed: container_linux.go:344: starting container process caused \"exec: \\\"./run.sh\\\": stat ./run.sh: no such file or directory\": unknown\r\n"

I think the path to run.sh specified in the lifecycle.preStop.exec.command should be relative to the export directory (which is specified as the workingDir), so "./galaxy-central/run.sh".

Initial proposal

We should create the community maintained Galaxy deployment on Kubernetes based on:

Helm is a package manager for Kubernetes which allows parametrised deployments, with sensible defaults. That means that other people can use the same Helm setup, but changing through parameters whatever we decide it can be changed, like the docker container used for Galaxy.

That for now. Will add more soon.

underlying postgres chart not responding to livenessProbes and readinessProbes changes

When I set:

postgresql:
  imageTag: "9.6.5_for_18.05"
  livenessProbe:
    initialDelaySeconds: 1500
  readinessProbe:
    initialDelaySeconds: 1500

the probes don't change. Probably an underlying issue of the postgres chart. Apparently stable/postgres has advanced a lot but now geared for postgres 10, so we might want to be careful with that change to keep support for older versions.

Heads up for job not being able to write to container space and producing failures.

Most of my usage in the past of Galaxy-k8s relied on tools being run as root, which is not ideal of course. In the current chart that is not longer the case (very good!), however that might bring unexpected issues when running some tools, as the one seen here. On that case I had to set a profile for the tools (18.01), but I wonder if there is something that we can make within this setup to avoid having to do that and being instantly compatible with other tools that might exhibit this behaviour.

Document all new values in README table

Some values were added and modified without reflecting the change in the README config table.

Paper-cut: Going through values and documenting the latest set of values as of now.
Paper-cut: Add a contribution default checklist for PRs
Long-term: Add a CI check to make sure any PR-ed values are reflected in the README.

Make it easier to override tool mappings

container_mapper_rules.yml allows you to manually remap tools to specific container overrides, or reassign their default resource allocations. However, because the rules are structured as an array, it's difficult to override the defaults in the helm chart when deploying in production. Therefore, turning it into a key based structure would greatly simplify administration in practice:

For example:

     mappings:
         - tool_ids:
            - toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.5+galaxy6
          container:
            docker_container_id_override: cloudve/jbrowse:1.16.5
        - tool_ids:
            - sort1
            - Grouping1
          container:
            docker_container_id_override: {{ .Values.image.repository }}:{{ .Values.image.tag }}
            resource_set: small
        - tool_ids:
            - toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/bwameth/bwameth/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/featurecounts/featurecounts/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/valet/valet/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/varscan_somatic/varscan_somatic/.*
            - toolshed.g2.bx.psu.edu/repos/nilesh/rseqc/rseqc_bam2wig/.*
          container:
            resource_set: medium

Would become something like:

     mappings:
        jbrowse:
          tool_ids:
            - toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.5+galaxy6
          container:
            docker_container_id_override: cloudve/jbrowse:1.16.5
        galaxy_tools:
          tool_ids:
            - sort1
            - Grouping1
          container:
            docker_container_id_override: {{ .Values.image.repository }}:{{ .Values.image.tag }}
            resource_set: small
        medium_group:
           tool_ids:
            - toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/bwameth/bwameth/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/featurecounts/featurecounts/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/hisat2/hisat2/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/valet/valet/.*
            - toolshed.g2.bx.psu.edu/repos/iuc/varscan_somatic/varscan_somatic/.*
            - toolshed.g2.bx.psu.edu/repos/nilesh/rseqc/rseqc_bam2wig/.*
          container:
            resource_set: medium

403 error minikube local deployment with galaxy-stable-phenomenal-18.01-minikube.yaml

logs-from-galaxy-k8s-in-f1-galaxy-stable-788f4fb5f8-vgn75.txt

Hi all,

I used galaxy-stable-phenomenal-18.01-minikube.yaml and helm to deploy galaxy-k8s on my Windows 10 computer. helm install -f galaxy-stable-phenomenal-18.01-minikube.yaml --version 1.5.5 galaxy-helm-repo/galaxy-stable

The created pods ran on local cluster. But couldn't be connected. Log file is attached.

Any idea what has gone wrong? Many thanks.

Need a better transport than kombu

According to Marius' comments on galaxyproject/galaxy#9483 (comment) (which also affects this setup), we need to add support for a better transport for kombu to avoid the described conflicts there.

Set additional Galaxy settings

id_secret: this is urgent and should be an auto-generated secret.
galaxy_infrastructure_url should point to the service_name of Galaxy
galaxy_infrastructure_web_port may need to be set.

Install not prefixed with release name

Currently, the galaxy installation is not prefixed with the release name, which means that only one instance of the chart can be installed per namespace. It would be good to support the standard convention of prefixing by release name so that we can support multiple galaxy installations. This will be particularly useful in future when we have multiple users having private galaxys, as well as during testing so that previous artefacts don't affect the current installation. Low priority item for now.

Adding custom tools

Is there an elegant way to add custom tools in a (somewhat-)declarative way? I naively attempted to define a simple custom tool in the extraFileMappings resource:

# overrides.yaml
configs:
  tool_conf.xml: |
    ... copy/paste from galaxy-helm/galaxy/values.yaml ...
    <section id="mytool" name="My Tool">
      <tool file="mytool/mytool.xml" />
    </section>

extraFileMappings:
  /galaxy/server/tools/mytool/mytool.xml:
    useSecret: false
    applyToJob: true
    applyToWeb: true
    content:

  /galaxy/server/tools/mytool/mytool.py:
    useSecret: false
    applyToJob: true
    applyToWeb: true
    content:

And then set their "content" values from the corresponding files (using Helm 3.1.x):

helm upgrade galaxy ./galaxy-helm/galaxy --values overrides.yaml \
  --set-file 'extraFileMappings./galaxy/server/tools/mytool/mytool\.xml.content=relative/path/to/mytool\.xml' \
  --set-file 'extraFileMappings./galaxy/server/tools/mytool/mytool\.py.content=relative/path/to/mytool\.py'

However, configMaps created from extraFileMappings to not appear to be mounted in the k8s Job resource that's created to run the tool.

TODOs for integration with docker-galaxy-stable compose containers

These are the main actionables I need to go through to have a first container based on galaxy-stable compose containers working with the helm deployment, as discussed in #2 . More to come here, as I go through them and find dependent tasks.

RFC: Branch organization and repo rename

I'd like to propose a branch (re)organization on this repo to the following:

Create/use version branches: v1.x, v2.x, etc. These branches would contain previous versions of the chart that are not really compatible with the version currently being developed. Note that these would not be chart releases, as indicated by the flexible 1.x number. The actual chart releases are captured in the Chart.yaml file.
Use master branch as the default branch. This is a standard across GitHub and would require least changes to contributors' pipeline.
Cleanup stale and unused branches

In addition, I'd like to suggest we rename this repo to galaxy-helm to make it more indicative of the actual repo content.

Rename repo to galaxy-helm

Best way to have an install / update rolling job for specific set of tools?

At this point we have deployments on multiple setups, including none cloud ones. So we keep all the tools that we want to have installed in our instances on a yaml file on a git repo from where ephemeris can install from a toolshed. We have some centralised CI that updates tools on each instance whenever there are changes in the git repo with the yaml files, for our non-cloud deployents.

On spin of a new instance with the helm chart, I would like to have a k8s job that runs once Galaxy is operational, possibly using a master API key, to run ephemeris using a desired git repo and install those tools. Then I would like to have that running with some periodicity, I guess through a cronjob, to have tools updated as new revisions come.

As such, I see this as a Job (for the first run) and a Cron Job (for subsequent runs), but of course I don't like the duplication arising there.

When running with multiple web processes, I have noticed that after you have installed a number of tools not all web processes see all tools, and restarting them fixes this. This would require I guess the newly implemented rollout restarts for deployments, which would need to be triggered via the service account I presume once the jobs succesfully installs/updates tools.

How would you do this? Would you use Jobs and Cron Jobs or a different strategy?

Thanks!

Ingress & PriorityClass versions are not compatible with k8s 1.13

Using current master. I'm running with the following versions:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-23T14:21:36Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

$ helm version
Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}

Using a values file for my settings:

webHandlers:
  replicaCount: 3

jobHandlers:
  replicaCount: 1

persistence:
  existingClaim: galaxy-pvc-nfs

service:
  type: NodePort
  port: 30700

On the values file I have additional configs for galaxy.yaml, but I doubt that they would have any impact.

When executing inside the galaxy directory:

helm install --values galaxy-config-helmv3.yaml .

I get:

Error: validation failed: [unable to recognize "": no matches for kind "Ingress" in version "networking.k8s.io/v1beta1", unable to recognize "": no matches for kind "PriorityClass" in version "scheduling.k8s.io/v1"]

Thanks!

ftp upload support

Currrently the Galaxy Helm chart v3 does not provide support for an ftp upload mechanism, which was the case in v2. Unless that more modern Galaxy versions have a better way of upload large files, which should probably add this support again. Unfortunately, there is no existing proftpd chart to simply re-use as dependency. So we need to decide whether to write a chart for that and use as dependency, or simply add all the logic on the current chart, reusing partly what was there for proftpd on chart v2.

Galaxy-stable chart doesn't start up with default settings.

How to recreate:
I checked out the develop branch and tested against minikube. Did not make any changes - simply ran a helm install .

The galaxy-stable chart installs without a hitch. However, when attempting to access the galaxy service via the NodePort, it simply returned a 404. The galaxy container's logs revealed the error:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file
It appears that the GALAXY_CONFIG_DATABASE_CONNECTION is not being overridden as expected.

I then exec'd into the container and manually ran the following:

export GALAXY_CONFIG_DATABASE_CONNECTION=postgresql://galaxy:change-me@postgresql-for-galaxy/galaxydb?client_encoding=utf8
/usr/bin/startup

Still the same error.

Finally, I edited galaxy-central/config/galaxy.ini, and set the database connection manually there. It then proceeded to startup and connected to the postgres database, instead of sqlite. However, it ran all the migrations from scratch, indicating that it's not a database with migrations included.

Finally, it failed with the following:

Traceback (most recent call last):
  File "/export/galaxy-central/lib/galaxy/webapps/galaxy/buildapp.py", line 58, in paste_app_factory
    app = galaxy.app.UniverseApplication( global_conf=global_conf, **kwargs )
  File "/export/galaxy-central/lib/galaxy/app.py", line 185, in __init__
    self.job_manager = manager.JobManager( self )
  File "/export/galaxy-central/lib/galaxy/jobs/manager.py", line 23, in __init__
    self.job_handler = handler.JobHandler( app )
  File "/export/galaxy-central/lib/galaxy/jobs/handler.py", line 33, in __init__
    self.dispatcher = DefaultJobDispatcher( app )
  File "/export/galaxy-central/lib/galaxy/jobs/handler.py", line 764, in __init__
    self.job_runners = self.app.job_config.get_job_runner_plugins( self.app.config.server_name )
  File "/export/galaxy-central/lib/galaxy/jobs/__init__.py", line 632, in get_job_runner_plugins
    rval[id] = runner_class( self.app, runner[ 'workers' ], **runner.get( 'kwds', {} ) )
  File "/export/galaxy-central/lib/galaxy/jobs/runners/kubernetes.py", line 44, in __init__
    assert KubeConfig is not None, K8S_IMPORT_MESSAGE
AssertionError: The Python pykube package is required to use this feature, please install it or correct the following error:
ImportError No module named pykube.config
root@torpid-kudu-galaxy-stable-bbcf5b445-4

Is this a problem with an out-of-date container (I have not tested this chart on? @pcm32 Where can I view the DockerFile that was used to build pcm32/galaxy-stable-k8s? I checked here: https://hub.docker.com/r/pcm32/galaxy-stable-k8s/ but it does not display the Dockerfile.

The docker images used by minikube were listed as:

pcm32/galaxy-stable-k8s                                          latest              a4e5ec729667        6 months ago        898MB
pcm32/galaxy-stable-k8s-init                                     latest              01867b81731e        7 months ago        1.41GB

Problem setting file through --set-file

I'm trying to upgrade an existing installation where I'm adding some setup for dynamic destinations like this:

helm upgrade --values=galaxy-config-helmv3.yaml \
--set-file "jobs.rules.k8s_destinations\.py"=files/k8s_destinations.py falling-moth \
--version 3.2.0 ~/Development/galaxy-helm/galaxy

however the config map where the job rules are being set doesn't seem to like it, failing with:

UPGRADE FAILED
ROLLING BACK
Error: render error in "galaxy/templates/configmap-galaxy-rules.yaml": template: galaxy/templates/configmap-galaxy-rules.yaml:14:17: executing "galaxy/templates/configmap-galaxy-rules.yaml" at <$entry>: wrong type for value; expected string; got map[string]interface {}
Error: UPGRADE FAILED: render error in "galaxy/templates/configmap-galaxy-rules.yaml": template: galaxy/templates/configmap-galaxy-rules.yaml:14:17: executing "galaxy/templates/configmap-galaxy-rules.yaml" at <$entry>: wrong type for value; expected string; got map[string]interface {}

I have tried both with the relative path and with the absolute path to the additional file, with no difference. The file is compliant python code (but I would guess that this would make no difference to helm). Anything obvious that I'm doing wrong? Thanks!

Helm version 2.13.1 on macOS.

Visualization with IGV on VCF files

Cannot load IGV local on VCF files when running on a GVL 5.0

(Example in this tutorial https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/non-dip/tutorial.html#looking-at-the-data)

Handling pre-installed data

Currently, the table creation script is being run on database startup (https://github.com/bgruening/docker-galaxy-stable/blob/master/compose/galaxy-postgres/init-galaxy-db.sql.in). How does this work with pre-loaded data? e.g. Preloaded workflows etc.

Do we have to consider a solution like the one below:
https://www.3pillarglobal.com/insights/how-to-initialize-a-postgres-docker-million-records?cn-reloaded=1

Or can we extend the postgres image to have pre-installed data, and have it push it out to the target volume?

admin.email isn't added to galaxy_conf.admin_users

Using the galaxy-stable chart version 2.0.2. When I configure a new user for my deployment through the admin: values (username, email, etc.) it doesn't get added to the galaxy_conf.admin_users: list. I would expect this to happen automatically since the new user account is configured in a section called admin :-)

Need to use a service account or be able to specify one

Currently, we are using the default service_account. Being able to specify an alternative account would be more secure.

Building Galaxy Interactive Environment with Kubernetes on my own server

I am trying to deploy the galaxy:19.05 with kubernetes following this project And it works.

But after setting the interactive environment following with https://docs.galaxyproject.org/en/master/admin/special_topics/interactive_environments.html, an error shows up when I tried to use the GIEs on the website.

error: GIEs are a fairly complex deployment and sometimes need maintenance. It seems that at the moment they are not launching, we are investigating.

I think it may because the sshfs didn’t be set in the right way, but I don’t know how to set it in the correct container (there was a series of containers running the galaxy together).

Would anybody know how to solve this problem? Thanks so much!

[2.0] db-connection config map doesn't get deleted on helm delete

Using the new chart specification:

 helm list
NAME             	REVISION	UPDATED                 	STATUS	CHART              	NAMESPACE
existing-platypus	1       	Tue Jul 10 09:44:57 2018	FAILED	galaxy-stable-2.0.0	default  
C02WF18BHV2H:galaxy-kubernetes pmoreno$ helm delete existing-platypus
release "existing-platypus" deleted
C02WF18BHV2H:galaxy-kubernetes pmoreno$ helm install -f ../container-galaxy-sc-tertiary/helm-configs/tertiary-portals-galaxy-18.05-minikube.yaml ./galaxy-stable
Error: configmaps "db-connection" already exists

Might be an issue of the chart not starting correctly the first time though... will monitor.

Test should check for adequate deployment on k8s

Currently, tests are just linting the charts, a deployment on minikube should be tried as well.

Quast HTML output

Quast is actually the one with the bad output.
Alex O found the issue a few weeks ago but we never got back to it.
I published a history. If you click on the "visualize" (eye) button on the HTML output of Quast, it's just printed not rendered:
https://htmltest.cloudve.org/initial/galaxy/u/alexm/h/quast-output

I added Quast to sanitize_whitelist.txt but it didn't fix it

Syncing with galaxy-docker-project

I'd like to start/document discussion about what will it take to make it possible to integrate and/or interchange resources from this repo with the resources from the https://github.com/bgruening/docker-galaxy-stable/tree/master/compose repo.

A few things come to mind and please add others:

For plugging in alternate containers, the README here says the alternate container "needs to be a Galaxy container compliant with our setup" - can we start by documenting what that setup is and then perhaps generalizing/modifying it as needed to accommodate containers from the galaxy-docker-project?
Allow conditional use and integration of containers for additional services, primarily FTP but also HTCondor and Slurm, ideally using containers available from the above mentioned repo, particularly bgruening/docker-galaxy-stable#347
Leverage env vars to enable runtime configuration changes (possibly implying integration with confd to react to those changes).

I'm sure there's more but those seem like the minimal set given my current familiarity with the two efforts. Please comment and let's see if&how this can be accomplished.

[2.0] Breaks HTTP access to history or welcome page depending on how is started

I'm trying the new 2.0 chart spec with a completely PhenoMeNal agnostic setup. If I set ingress to false , without changing any other defaults, it shows the following error in red on the Galaxy main page:

{
  "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
  "onLine": true,
  "version": "18.05",
  "xhr": {
    "readyState": 4,
    "responseText": "{\"err_msg\": \"History is not accessible by user\", \"err_code\": 403002}",
    "responseJSON": {
      "err_msg": "History is not accessible by user",
      "err_code": 403002
    },
    "status": 403,
    "statusText": "Forbidden"
  },
  "options": {
    "parse": true,
    "data": "keys=size%2Cnon_ready_jobs%2Ccontents_active%2Chid_counter",
    "emulateHTTP": false,
    "emulateJSON": false,
    "textStatus": "error",
    "errorThrown": "Forbidden"
  },
  "url": "http://192.168.99.100:30700/galaxy/api/histories/f2db41e1fa331b3e?keys=size%2Cnon_ready_jobs%2Ccontents_active%2Chid_counter",
  "model": {
    "model_class": "History",
    "id": "f2db41e1fa331b3e",
    "name": "Unnamed history",
    "state": "new",
    "deleted": false,
    "contents_active": {
      "active": 0,
      "deleted": 0,
      "hidden": 0
    },
    "contents_states": {},
    "hid_counter": 1,
    "update_time": "2018-07-10T08:52:30.118Z",
    "user_id": null,
    "importable": false,
    "tags": [],
    "contents_url": "/galaxy/api/histories/f2db41e1fa331b3e/contents",
    "slug": null,
    "username_and_slug": null,
    "url": "/galaxy/api/histories/f2db41e1fa331b3e",
    "genome_build": "?",
    "create_time": "2018-07-10T08:52:30.118Z",
    "published": false,
    "annotation": null,
    "size": 0,
    "purged": false,
    "nice_size": "(empty)"
  },
  "user": {
    "id": null,
    "username": "(anonymous user)",
    "total_disk_usage": 0,
    "nice_total_disk_usage": "0 bytes",
    "quota_percent": null,
    "is_admin": false
  }
}

This happens regardless of whether you try to access Galaxy on /galaxy or without a prefix. When you access on /galaxy, the errors above show an additional /galaxy added (so you see some /galaxy/galaxy. We never had such behaviour on 1.x.

If I then change my helm config to use ingress.path to an empty value, I get a different, apparently less severe error:

{
  "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
  "onLine": true,
  "version": "18.05",
  "xhr": {
    "readyState": 4,
    "responseText": "<html>\r\n  <head><title>Not Found</title></head>\r\n  <body>\r\n    <h1>Not Found</h1>\r\n    <p>The resource could not be found.\r\n<br/>No route for /galaxy/api/histories/1cd8e2f6b131e891\r\n<!--  --></p>\r\n    <hr noshade>\r\n    <div align=\"right\">WSGI Server</div>\r\n  </body>\r\n</html>\r\n",
    "status": 404,
    "statusText": "Not Found"
  },
  "options": {
    "parse": true,
    "data": "keys=size%2Cnon_ready_jobs%2Ccontents_active%2Chid_counter",
    "emulateHTTP": false,
    "emulateJSON": false,
    "textStatus": "error",
    "errorThrown": "Not Found"
  },
  "url": "http://192.168.99.100:30700/galaxy/api/histories/1cd8e2f6b131e891?keys=size%2Cnon_ready_jobs%2Ccontents_active%2Chid_counter",
  "model": {
    "model_class": "History",
    "id": "1cd8e2f6b131e891",
    "name": "Unnamed history",
    "state": "new",
    "deleted": false,
    "contents_active": {
      "active": 0,
      "deleted": 0,
      "hidden": 0
    },
    "contents_states": {},
    "hid_counter": 1,
    "update_time": "2018-07-10T08:56:16.700Z",
    "user_id": null,
    "importable": false,
    "tags": [],
    "contents_url": "/galaxy/api/histories/1cd8e2f6b131e891/contents",
    "slug": null,
    "username_and_slug": null,
    "url": "/galaxy/api/histories/1cd8e2f6b131e891",
    "genome_build": "?",
    "create_time": "2018-07-10T08:56:16.700Z",
    "published": false,
    "annotation": null,
    "size": 0,
    "purged": false,
    "nice_size": "(empty)"
  },
  "user": {
    "id": null,
    "username": "(anonymous user)",
    "total_disk_usage": 0,
    "nice_total_disk_usage": "0 bytes",
    "quota_percent": null,
    "is_admin": false
  }
}

This is accessing the URL without any prefix. Notice that the contents URLs still have the "/galaxy" part. I wonder if there is some config of the containers involved as well.

I created this deploy with this helm config:

# Settings for the init image
init:
  image:
    repository: pcm32/galaxy-sc-tertiary-init
    tag: v18.05
    pullPolicy: Always
  force_copy: "__venv__,__config__,__galaxy-central__"

image:
  repository: pcm32/galaxy-web-k8s
  tag: v18.05
  pullPolicy: Always

admin:
  email: [email protected]
  password: "change-me"
  api_key: askdhaskjdhqwkdnqdq
  username: admin

galaxy_conf:
  admin_users: [email protected]
  allow_user_creation: true
  allow_user_deletion: true

job_conf: {}

persistence:
  minikube:
    hostPath: "/data/galaxy-18.05-tertiary"

service:
  type: NodePort

ingress:
  enabled: false
  path: ""


postgresql:
  imageTag: "9.6.5_for_18.05"
  persistence:
    subPath: "postgres-tertiary"

proftpd:
  service:
    type: NodePort

Any ideas @nuwang? These containers where created from stock docker-galaxy-stable (master) and ansible-galaxy-extras on 18.05 plus an additional commit (https://github.com/pcm32/docker-galaxy-stable/tree/feature/k8s_container_building). Additional init simple places some config files for job conf, tool conf and tools.

Getting jobs to run on k8s native

Based on the template from here https://github.com/bgruening/docker-galaxy-stable/blob/master/compose/.env_k8_native, @nuwang updated the settings for galaxy configs to match: https://github.com/galaxyproject/galaxy-kubernetes/blob/update_chart_conventions/galaxy-stable/values.yaml#L63 with the intent of getting Galaxy jobs to run on the same k8s cluster as Galaxy. However, jobs just keep going to the local runner: from Galaxy log: Persisting job destination (destination id: local_no_container).

@pcm32 any advice on what might be misconfigured?

Cannot unset welcome.html set in extraFileMappings to prefer the one in the container

I have a container based on galaxy/galaxy-k8s:20.01 where I have our branding in welcome.html. However, I cannot seem to unset the one being added by the chart to prefer the one in the container. I have tried:

extraFileMappings: {}

extraFileMappings:
  /galaxy/server/static/welcome.html: ~

and

extraFileMappings:
  /galaxy/server/static/welcome.html:
    useSecret: false
    applyToJob: false
    applyToWeb: true
    content: ~

but I always end up with the one added by the chart (you can tell by the modification date of the file).

I was expecting the first one to work... any advice?

Start up error for job handler container only

Using the current ci values on minikube I'm getting this error on start only for the job handler deployment:

galaxy.jobs DEBUG 2020-08-30 11:24:14,481 Loading job configuration from /galaxy/server/config/job_conf.xml
galaxy.jobs ERROR 2020-08-30 11:24:14,481 Problem parsing the XML in file /galaxy/server/config/job_conf.xml, please correct the indicated portion of the file and restart Galaxy. '>=' not supported between instances of 'NoneType' and 'tuple'
Traceback (most recent call last):
  File "/galaxy/server/lib/galaxy/jobs/__init__.py", line 333, in __init__
    self._configure_from_dict(job_config_dict)
  File "/galaxy/server/lib/galaxy/jobs/__init__.py", line 372, in _configure_from_dict
    self._set_default_handler_assignment_methods()
  File "/galaxy/server/lib/galaxy/web_stack/handlers.py", line 148, in _set_default_handler_assignment_methods
    self.app.application_stack.init_job_handling(self)
  File "/galaxy/server/lib/galaxy/web_stack/__init__.py", line 117, in init_job_handling
    self._init_job_handler_assignment_methods(job_config, base_pool)
  File "/galaxy/server/lib/galaxy/web_stack/__init__.py", line 84, in _init_job_handler_assignment_methods
    self._set_default_job_handler_assignment_methods(job_config, base_pool)
  File "/galaxy/server/lib/galaxy/web_stack/__init__.py", line 537, in _set_default_job_handler_assignment_methods
    if ((dialect.name == 'postgresql' and dialect.server_version_info >= (9, 5))
TypeError: '>=' not supported between instances of 'NoneType' and 'tuple'
Failed to initialize Galaxy application

I think that I started seeing this error on rebase to the latest master.

Figure out how to reasonably appropriate job resources

Per-tool resource request limits are needed. However, it's not a simple table we can compose and use. The current usegalaxy.* efforts use a combination of dynamic rules and tables, which are not straightforward to apply or translate.

Main:

.au

https://github.com/usegalaxy-au/usegalaxy-au-tools/blob/master/tool_destinations.yml