Code Monkey home page Code Monkey logo

clearml-server-helm's People

Contributors

alegma avatar allegroai-git avatar jkhenning avatar sapir-allegro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clearml-server-helm's Issues

Question regarding data and backups

Hello,

I have installed trains-server using helm on my k8s cluster (AKS).

I have been wondering, how can I backup the data from trains-server?

Is it possible to set some sort of a cron job that will export the data and save it somewhere on Azure Files?

Also, I have noticed that upgrading might require a --purge. In case I do that, I assume that all my data will be gone, isn't it?

Readme.md

Important:

If you previously deployed a trains-server, you may encounter errors. If so, you must first delete old deployment using the following command:

  helm delete --purge trains-server
After running the helm delete command, you can run the helm install command.

What would be the best practice for backups and restoring data?

Thank you!
Shaked

Ingress and Limitation

Hello everyone,

I would like to know pls if possible using the application plus ingress like "myclearml.mydomain.com" and for the api and fileserver looks like myclearml.mydomain.com/fileserver and myclearml.mydomain.com/apiserver instead apiserver.mydomain.com and so on. Some application cannot do it and I just like know before request it from my provider.

Thank you

[JustANit] Revise Helm Charts

Just a nit, the helm charts seem like an after thought and just a mess. Did anyone follow good adherence to how other charts are structured to include a well structured and commented values.yaml. Obviously as is the helm chart fails to run since storage and claims were left out. I'll fork this and create a set of helm charts that are more robust. Seems like if you want individuals to adopt this and for TRAINS to become more of a standard or of high use that things would be structured in a way that insures first launch from the helm chart. I always challenge my own devs to do a clean deploy to insure success on all first-launches

DB

Subdomains and ingress

Hey,

In order to connect between the services (app, files, api) and our nginx LB, we use an ingress that looks like this:

# Source: trains/templates/ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: release-name-trains
  labels:
    app.kubernetes.io/name: trains
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/managed-by: Tiller
  annotations:
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "180"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "180"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "180"

spec:
  tls:
    - hosts:
        - "app.trains-prod.example.com"
        - "files.trains-prod.example.com"
        - "api.trains-prod.example.com"
      secretName: tls-secret-prod.app.trains-prod-example.com
  rules:
    - host: "app.trains-prod.example.com"
      http:
        paths:
          - path: /
            backend:
              serviceName: webserver-service
              servicePort: 80
    - host: "api.trains-prod.example.com"
      http:
        paths:
          - path: /
            backend:
              serviceName: apiserver-service
              servicePort: 8008
    - host: "files.trains-prod.example.com"
      http:
        paths:
          - path: /
            backend:
              serviceName: fileserver-service
              servicePort: 8081

This integrates with our certificate manager (letsencrypt) as well.

I was thinking, would it make sense to add a PR that supports something like:

helm install.... --set ingress.enabled=true --set ingress.annotations=THE_ANNOTATION --set ingress.app_host=app.trains-prod.example.com  --set ingress.files_host=files.trains-prod.example.com --set ingress.api_host=api.trains-prod.example.com 

Which will automatically support the above YAML?

Thank you
Shaked

EDIT:

It seems like it's important to add timeouts otherwise nginx LB might return a 504 timeout sometimes:

image

Multiple problems with the Helm chart

Unfortunately, the Trains chart is far from being any useful. Only a few problems I had with it:

  •  `- name: agent-data
      hostPath:
        path: /opt/trains/agent` - k8s has PVCs for that
    
  • creating storage classes manually - nobody does that, they are provided by k8s (eg to use AWS EBS autoprovisioning)

  • impossible to reduce PVC size or Elasticsearch CPU requests for development purposes

  • for HTTP stuff, ingresses are typically used, not zillions of services at separate ports

  • elasticsearch, mongo, etc - there are existing Helm charts for them which can be included, no need to reinvent the wheel

  • for many admins, Helm is now passe, long live Kustomize

  • and many many more...

Configuring user authentication

I'm trying to deploy ClearML via Helm (or more precisely Flux/Helm Operator) and need to enable user authentication. Unfortunately the documentation and the Helm chart is not quite clear how to do that.

For running via pip or in docker it was just a matter of modifying the /opt/clearml/config/apiserver.conf file. In the Helm chart the whole /opt/clearml/config directory seems to be at a persistent volume claim. There is some way to use FlexVolume & Azure keyvault by mounting another volume to /opt/clearml/secrets and adding it to TRAINS_CONFIG_DIR. I can imagine using .Values.apiserver.volumeMounts to mount another volume from sealed secret, but TRAINS_CONFIG_DIR can't be modified from values.yaml in another way than use_secrets_flexvolume.

IMHO in in the apiserver deployment TRAINS_CONFIG_DIR could be set /opt/clearml/config:/opt/clearml/secrets by default allowing the user to mount any other volume with the secrets.

A possibility (for now, without modifiying the Helm chart) would be to deploy the app without ingress, modify the config file in the persistent volume manually and then redeploy with ingress enabled. But that's a manual step, I'd like to avoid.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.