I'd like to use this thread to discuss and present issues around implementing the Answ

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Galaxy / Kubernetes / HTCondor implementation on Azure about galaxy-helm HOT 19 CLOSED

galaxyproject commented on August 11, 2024

Galaxy / Kubernetes / HTCondor implementation on Azure

from galaxy-helm.

Comments (19)

bgruening commented on August 11, 2024

@rc-ms can you tell me how you run the containers, which containers and a few more details to reproduce it?

from galaxy-helm.

rc-ms commented on August 11, 2024

hi @bgruening apparently @abhi2cool got it running. I'm hoping he'll publich details of our setup here soon

from galaxy-helm.

bgruening commented on August 11, 2024

@rc-ms thats good to know. My setup is also working as expected. Let me know if we need to change something.

from galaxy-helm.

rc-ms commented on August 11, 2024

Hello there.

Abhik has published a Helm Chart version of our Galaxy cluster. We have run on Azure but want to confirm others can as well. Here's the Helm Chart:

https://github.com/abhi2cool/galaxy-kubernetes-htc-condor/tree/helmv1/helm/galaxy

If you have an Azure account and have installed the azure-cli tools enter the following in a terminal:

az acs create --orchestrator-type kubernetes --resource-group [your-resource-group] --name [k8s-cluster-name] --agent-count 1 --generate-ssh-keys

You should be automatically connected to the cluster after creation, but you can also connect to a pre-existing cluster using the following command:
az acs kubernetes get-credentials --resource-group [your-resource-group] --name [your-k8s-cluster]

from galaxy-helm.

pcm32 commented on August 11, 2024

Hi @rc-ms and @abhi2cool, did you try using first the galaxy-stable helm chart available on this other branch? Since most of the files in abhi2cool's are copies of our original chart (and is not work from scratch), I don't think it makes sense to diverge here. I don't personally see the point of deploying a Condor cluster on top of Kubernetes when the last can do the scheduling, but he can add it to that branch with the appropriate conditionals if he really needs it. @abhi2cool in this case is using files derived from my chart galaxy 0.3.3, which is a few months outdated (for instace it doesn't include the latest rbac fixes among others).

There is in particular documentation here. I'm currently making sure it works with my latest additions to the Galaxy-k8s runner which handles memory and CPU constraints for jobs, this is essential for a production grade running of Galaxy in Kubernetes (otherwise you have serious chances of choking your cluster).

from galaxy-helm.

rc-ms commented on August 11, 2024

Hi Pablo thanks for the prompt response. We went with HTCondor since we were advised to. @afgane and @nuwang can you advise here? Also @abhi2cool pushed those files by mistake. I think he's updating it and will also make it a PR.

from galaxy-helm.

nuwang commented on August 11, 2024

@pcm32 @rc-ms The overall plan was to have @abhi2cool's work be a PR against the work you've done Pablo, and be made part of the primary Galaxy Helm Chart at: https://github.com/galaxyproject/galaxy-kubernetes. The plan was for the PR to include a few enhancements, among other things, support for proftpd and HT-Condor. The reason we thought that it would be good to have HT-Condor as an option was to allow user-choice and to hedge our bets till the Kubernetes Job Runner becomes more mature. The original plan was to add support for SLURM too, as we've used it extensively, but we then fell-back to HTCondor since that's a bit more cloud friendly when it comes to autoscaling etc. At chart install time, the desired job runner would ideally be a configurable option (and we are still hoping SLURM would be an option too). This will allow for a gradual transition to the Kubernetes job runner in the long term, but more familiar and "battle-tested" job runners in the short term. Once Abhik's PR is ready, it would be great if you could review it and make any suggestions for enhancements etc.

from galaxy-helm.

pcm32 commented on August 11, 2024

@nuwang in that case, @abhi2cool should be looking at the branch (https://github.com/galaxyproject/galaxy-kubernetes/tree/feature/sync_with_galaxy_stable/galaxy-stable) I mentioned, where all the work for a non-phenomenal container support has been carried on. There is already there support for proftpd, and I'm more than happy to review a PR where Condor support is added with the adequate conditionals, and includes a number of improvements over 0.3.3 (I bumped that series to 0.4.x as many things changed, including helm variable nomenclature).

So please @abhi2cool, work on branch sync_with_galaxy_stable on the galaxy-stable chart (and not plain galaxy), ideally through pull requests so that I can review commits. Thanks!

from galaxy-helm.

pcm32 commented on August 11, 2024

Regarding maturity of Kubernetes (and runner), that is of course for each to consider on their own use case. What I can say is that we use it in PhenoMeNal as the job scheduler/dispatcher for more than a year now, and increasingly in high load scenarios. But I don't oppose in any way having an optional setup for spanning Condor containers inside k8s. I just didn't add it to the mentioned branch as we (as in PhenoMeNal) don't need it.

from galaxy-helm.

abhi2cool commented on August 11, 2024

Hi all
The Helm Chart version of the Galaxy Cluster is available at https://github.com/abhi2cool/galaxy-kubernetes-htc-condor
This work is basically an attempt to replicate the Galaxy Docker-compose/Swarm implementation on Kubernetes via Azure wherein we are using the identical containers and images (we went the Ht-condor way, so no slurm :P)
And also we are not using PVCs, but we have dedicated a particular node for storage and are using the local file system of this particular node for all intends and purposes
The cluster comes with totally scalable HT-condor worker nodes which have been implemented through a replication controller
I would really appreciate it if you guys could go through this work and post your valuable feedback
Thanks

from galaxy-helm.

rc-ms commented on August 11, 2024

Hola. I wanted to respond with an update and ideally some context as to how we ended up in a different place with our version @pcm32 and @bgruening . As written and referenced by @afgane and @nuwang, this doc outlines the core deliverables and implementation direction.

In addition, we wanted to set up a solution that enabled the following:

60 TB of attached storage administrable from the Galaxy server for jobs
Full FTP support from within the Galaxy interface
HTCondor-based implementation (per recommendations/guidance from team)
Dynamically scalable jobs across multiple nodes
Helm Chart based implementation (instead of SLURM)
Implementation and testing using AnswerALS AtaqSEQ and RNASEQ reference datasets and workflows

The biggest challenge was storage, specifically storage that could be accessed from Galaxy through some kind of POSIX filesystem that would both scale to 60 TB of addressable storage but also perform. When the work started an FTP solution wasn't available (that we were aware of), so we ended up building our own FTP implementation. We cycled through a variety of options, eventually deciding to mount (? somehow attach) NFS volumes directly to the clusters in some post-configuration steps outside of a Helm chart.

My hope is that we haven't strayed too far afield from your work, such that we can find an easy way to back our changes into your repo.

Thanks again,

from galaxy-helm.

rc-ms commented on August 11, 2024

hello Pablo @pcm32 I'm in the process of trying to get both @abhi2cool and your implementations running, sadly with limited success with each. I was able to I think successfully launch my cluster and deploy your chart but can't access the galaxy instance. could you elaborate on this section of your doc? https://github.com/galaxyproject/galaxy-kubernetes#sqlite-local-deploy-on-minikube. I couldn't get the direct url to load (you mention (normally 192.168.99.100), port 30700.) You can see in the output below that those ports are showing in the statuses but I can't get it to render in my browser. any thoughts? thank you sir!

`rc-cola:k8sGalaxy rc$ helm status coiled-newt
LAST DEPLOYED: Tue Jan 23 11:57:24 2018
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/PersistentVolume
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
galaxy-pv 20Gi RWX Retain Bound default/galaxy-pvc 7h

==> v1/PersistentVolumeClaim
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
galaxy-pvc Bound galaxy-pv 20Gi RWX 7h

==> v1/Service
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
galaxy-svc-k8s 10.0.115.151 8080:30700/TCP 7h

==> v1/ReplicationController
NAME DESIRED CURRENT READY AGE
galaxy-k8s 1 1 1 7h

i ran these commands after checking the browser urls. still no dice

rc-cola:Projects rc$ kubectl port-forward galaxy 8080:30700
Error from server (NotFound): pods "galaxy" not found
rc-cola:Projects rc$ kubectl port-forward galaxy-svc-k8s 8080:30700
Error from server (NotFound): pods "galaxy-svc-k8s" not found
rc-cola:Projects rc$

from galaxy-helm.

pcm32 commented on August 11, 2024

Hi @rc-ms you are probably using the branch that is supposed to work only with Galaxy containers that are "like" the PhenoMeNal one. Last time I checked @abhi2cool, it was based on that branch as well. You essentially have two choices here:

Try the other branch I mentioned above. That means checking out and doing helm install ./galaxy-stable with a proper config file.
Make a temporal Galaxy container that looks like PhenoMeNal's (https://github.com/phnmnl/container-galaxy-k8s-runtime/blob/develop/Dockerfile) to be able to use the branch you are currently using. Then you would set that new image when using that branch of helm chart.

Unfortunately I'm super busy this week with our release process, a hackathon and a family member at hospital, but will try my best to resume work on the newer branch which can use standard Galaxy containers next week.

from galaxy-helm.

pcm32 commented on August 11, 2024

You shouldn't need to do those port exposures... and kubernetes nodes won't normally allow you to access 8080... there is only a range allowed (where 30700 is within).

from galaxy-helm.

rc-ms commented on August 11, 2024

thanks Pablo. I'm going to take a stab at your suggestions (fyi i was trying to deploy your containers). My efforts to date showed me successfully installing things from a deployment perspective but unable to connect to the galaxy instance itself (I did get the nginx homepage though). have you deployed these containers on any cloud providers or are you using your own local infrastructure?

from galaxy-helm.

pcm32 commented on August 11, 2024

Can you paste the helm install command that you're using? Thanks!

from galaxy-helm.

pcm32 commented on August 11, 2024

We deploy PhenoMeNal with the galaxy chart (not galaxy-stable) on GCE, AWS, Azure and in various OpenStacks instances (EBI, de.NBI, Uppsala, etc). The helm chart should be in any case (and I think it is) completely cloud provider agnostic. The only requirement currently is a Kubernetes cluster (probably above 1.4 or so; haven't tested 1.9, but I see no reason for it not to work) and shared file system that that k8s cluster can access (how is a decision that is independent of the helm chart, which only expects to find a persistent volume).

from galaxy-helm.

pcm32 commented on August 11, 2024

@rc-ms just to update you, I have just tested things yesterday (https://github.com/galaxyproject/galaxy-kubernetes/tree/feature/sync_with_galaxy_stable/galaxy-stable) and the deployment is working fine. I have improved a few things, probably need to make sure that the documentation has everything, this is probably the main missing bit.

from galaxy-helm.

nuwang commented on August 11, 2024

I'll close this since it's stale now.

from galaxy-helm.

Galaxy / Kubernetes / HTCondor implementation on Azure about galaxy-helm HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent