Code Monkey home page Code Monkey logo

Comments (4)

rc-ms avatar rc-ms commented on August 11, 2024

Thanks for this update Nuwan. Great suggestions. I'm going to figure out if there's a way to fund this work in time for GCC. I'll keep you posted on progress.

from galaxy-helm.

pcm32 avatar pcm32 commented on August 11, 2024

Hi everyone,

First, let me tell you that both PhenoMeNal Public instance and Publicdev instances have been running with the new helm chart and community images based containers (plus our init flavouring) since around February or so. I also managed to make a deployment on a completely Google-native Kubernetes cluster (as opposed to kubenow or kubespray deployed on GCP).

Let me go point by point:

  1. Merge final chart into master and remove simple chart etc. so we have a final set of charts to work on.

I was about to do this, but my last trials with Galaxy community images that I haven't generated didn't go well, those containers need to be built with special environment variables being passed. I will add to the codebase a bash script to build the community images (which I use for our images) with adequate settings so that they can be used. I can do a merge to develop now, and possibly a release (to master) when that documentation and script has been tested by others.

  1. Currently, the Docker images come from various sources, including pcm32 and phnmnl registries. Consolidate all docker images for the helm charts into galaxyproject's registry.

The script that I mention sorts that out.

  1. Currently, the chart is using a custom Postgres container+service definitions. Transition to the official Postgres helm chart? This seems safer since it's likely to get HA modes and better health checks and maintenance as we go along.

This has a reason, the container includes an init script so that the schema is loaded through a postgres dump load instead of the very slow Galaxy migration scripts. We could move to a Postgres chart, but this is not currently urgent and since man power is limited, I would leave this for later. Pull requests are welcomed of course. Now, more than moving to the default postgres chart (which AFAIK doesn't support HA), I have been toying with the idea of moving to an HA postgres setup like the one provided by stolon. My current issue with it is that it uses shared filesystem for the different instances, which defies a bit the purpose of performance.

  1. Move proftpd to separate, self-contained chart?

I would give this even less priority. This makes sense when you either have someone else building the chart (such as in the postgres case) or you have a separate use for it, if neither is the case, I fail to see a good justification for this.

  1. New chart for Pulsar? (probably not for this iteration though).

What do you mean exactly here?

  1. Have a standardized mechanism for exposing all Galaxy config settings.

Yes, I have thought of these but haven't implemented. There are some construct on Helm that we could use to iterate on values within a specific scope so that they are exposed automatically as env variables that Galaxy can read.

  1. Rolling upgrade support for the Galaxy charts. Do upgrades work smoothly at present? We seem to be needing helm post-install/upgrade jobs for database migration.

If there are no schema changes in the DB, this should work out of the box, provided that you have a container of course for the following version. I think that there is a env var that you can set for Galaxy to do auto-migrations, with that it could work as it is. Having said that, there are sometimes breaking changes in Galaxy in terms of how it deals with its config, like the pass from .ini to .yaml for config or the change in the main log file, which required adjustment to the chart at some point.

  1. Galaxy pod scale up/down support. Can we also have multiple Galaxy pods running for a zero-downtime rolling upgrade of Galaxy?

Given the current setup of the Galaxy community images, this is currently infeasible (the pod scale). We need to move on the community images to separate containers for web and handlers (or the ability to run them with these roles) and avoiding the supevisor layer, so that the container orchestrator becomes the process manager. Then you should be easily able to scale on the web servers or the handlers as needed (as they run on individual pods). Until this is all secluded on the same container, this idea won't fly. What should happen is that the handlers are all left behind a Kubernetes service, instead of web/master having to be aware of each handler individually. In understand that there is support for this now on 18.01. My move of the charts to be 18.01 compatible was a bit hampered by some missing wheels that we needed for some modules that we use in PhenoMeNal, but I expect to resume that soon.

This ^^^ is in my view the most pressing thing to do from the architecture point of view, and if you have man power, I would put it here, in the form of pull requests to the docker-galaxy-stable/compose repo. This would truly make Galaxy completely scalable within Kubernetes (currently is scalable in terms of dispatching jobs, but if the Galaxy instance reaches the limits of the amount of jobs that it can track/manage, we are in trouble).

  1. Add an option to switch between the HTCondor and Kubernetes Job runners. (that is, integrate the work contributed by Microsoft/AnswerALS), with possible support for future options (e.g. Pulsar, Slurm).

The option is currently there, it is configurable through an environment variable. However I have personally no use for it and haven't tested it nor I haven't received any PRs with people integrating the HTCondor containers in the helm chart.

  1. Perhaps have a separate chart for Galaxy alone, and an umbrella chart that pulls everything together?

I would suggest to leave this for later, I think it is over-engineering it a bit and we have other more pressing things to do with limited man power.

from galaxy-helm.

nuwang avatar nuwang commented on August 11, 2024

@rc-ms Thanks RC, that'd be awesome, even if it's after GCC. For GCC, we may be able to get by just by limiting scope.

@pcm32 Thanks for the detailed update. A few comments.

I can do a merge to develop now, and possibly a release (to master) when that documentation and script has been tested by others.

This would be great. We can start issuing PRs on a common code base afterwards.

Pull requests are welcomed of course. Now, more than moving to the default postgres chart (which AFAIK doesn't support HA), I have been toying with the idea of moving to an HA postgres setup like the one provided by stolon.

I'm happy to give this a go. I've seen stolon, but as you said, no postgres chart with HA support has made it to the official repo yet. Still, we'll gain some advantages by going with the official chart for now to benefit from any improvements to liveness/readiness probes or any other community contributed improvements.

I would give this even less priority. This makes sense when you either have someone else building the chart (such as in the postgres case) or you have a separate use for it, if neither is the case, I fail to see a good justification for this.

I think that makes sense for this iteration. In future, there will likely be modularity benefits and eventually, the possibility of replacement with an official, third-party maintained chart. We could use a kind of "monorepo" till that happens.

New chart for Pulsar? (probably not for this iteration though).

What do you mean exactly here?

Having a separate chart for Galaxy-Pulsar has more to do with future applications, and we can/should drop it for this immediate iteration.

Yes, I have thought of these but haven't implemented. There are some construct on Helm that we could use to iterate on values within a specific scope so that they are exposed automatically as env variables that Galaxy can read.

This is an important piece, so we'd be happy to pitch in and make this happen. However, I guess the challenge is - which mechanism is best? We've discussed this issue before, but I don't think we ever settled on a final mechanism? A few possibilities that come to mind:

  1. Use a configmap and map each value as an envvar to the galaxy container, prefixed by GALAXY_CONFIG, which the Galaxy stable container recognizes.

  2. Use a configmap to map galaxy.ini and indeed, any other ini files we want, directly as volumes. That way, we can just map them into the appropriate location. However, this may not be compatible with the current operation of Docker-Galaxy-stable?

  3. Install confd in docker-galaxy-stable?

Given the current setup of the Galaxy community images, this is currently infeasible (the pod scale). We need to move on the community images to separate containers for web and handlers (or the ability to run them with these roles) and avoiding the supevisor layer, so that the container orchestrator becomes the process manager.

If we simply duplicate the pod, and point it at the same database, will that not work?

from galaxy-helm.

nuwang avatar nuwang commented on August 11, 2024

Closing because most of the work is done.

from galaxy-helm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.