canonical / minio-operator Goto Github PK

View Code? Open in Web Editor NEW

2.0 7.0 9.0 285 KB

MinIO Operator

License: Apache License 2.0

Python 99.86% Shell 0.14%

minio operator operators kubernetes charm charmed-kubeflow single-charm

minio-operator's Introduction

MinIO Operator

Overview

This charm encompasses the Kubernetes operator for MinIO (see CharmHub).

The MinIO operator is a Python script that wraps the latest released MinIO, providing lifecycle management for each application and handling events such as install, upgrade, integrate, and remove.

Install

To install MinIO, run:

juju deploy minio

For more information, see https://juju.is/docs

MinIO console

Minio console is available under port 9001. To change this port use configuration variable console-port, run:

juju config minio console-port=9999

For more information, see minio-console documentation

Operation Modes

MinIO can be operated in the following modes:

server (default): MinIO stores any data "locally", handling all aspects of the data storage within the deployed workload and storage in cluster
gateway: MinIO works as a gateway to a separate blob storage (such as Amazon S3), providing an access layer to your data for in-cluster workloads

Example using `gateway` mode

This charm supports using the following backing data storage services:

s3
azure

To install MinIO in gateway mode for s3, run:

juju deploy minio minio-s3-gateway \
    --config mode=gateway \
    --config gateway-storage-service=s3 \
    --config access-key=<aws_s3_access_key> \
    --config secret-key=<aws_s3_secret_key>

To install MinIO in gateway mode for azure, run:

juju deploy minio minio-azure-gateway \
    --config mode=gateway \
    --config gateway-storage-service=azure \
    --config access-key=<azurestorageaccountname> \
    --config secret-key=<azurestorageaccountkey>

In case of using private endpoints for storage service specify storage-endpoint-service. This configuration is optional in case of using S3 or Azure public endpoints.

By default, the backing storage credentials are also used as the credentials to connect to the MinIO gateway itself. If you do not want to share your data storage service credentials with users, you can create users in the MinIO console with proper permissions for them.

For more information, see: https://docs.min.io/docs/minio-multi-user-quickstart-guide.html

The credentials access-key and secret-key differs for Azure and AWS. Improper credential error will be visible in container logs.

For more information, see: https://docs.min.io/docs/minio-gateway-for-azure.html and https://docs.min.io/docs/minio-gateway-for-s3.html

Charm Release Versioning

Note: Rather than versioning this charm by the workload itself, releases for this charm are versioned with ckf-x.y, indicating the Charmed Kubeflow version they're released with.

minio-operator's People

Contributors

Stargazers

Watchers

Forkers

domfleischmann isabella232 gogaaa variabledeclared natalian98 jardon gustavosr98 dparv erik78se

minio-operator's Issues

`juju config minio secret-key` returns empty string

The command currently returns an empty string but it shouldn't.

chore: libraries update and testing change

Update latest libraries

Minio Scaling - adding multiple units to the Minio breaks it

The Minio clustering requires additional configuration / different startup of the service. (https://min.io/docs/minio/kubernetes/upstream/operations/concepts.html#id5)

When multiple Minio Pods are created then each works as a different application. Additionally, because they are behind the same Service it looks like buckets and data disappear between calls.

Minio does not restart minio if someone changes `access-key`/`secret-key` via `juju config`

When credentials are updated via juju config minio access-key=somethingNew, the credential secret in k8s is updated but this secret update does not automatically get used by the minio workload’s pod because a change to a secret used by a pod does not cause the pod to restart and reload the secret.

Possible solutions:

have the charm initiate a restart of the pod whenever the config changes. This could be achieved directly via lightkube or maybe juju, or indirectly by using a spec annotation that includes a hash of the config
similar to above, this solution where you treat your configmap as immutable and on any config change create a new configmap and then modify the existing deployment to point to the new configmap. k8s will only scale down if the configmap is valid
use an existing controller like Reloader that automatically restarts workloads when secrets are refreshed.

Note that any solution restarting minio might result in downtime/interruption. The actual minio workload will go down. This shoudl be documented properly. There might be more nuanced solutions, too (can we communicate with the running minio workload to tell it to update the credentials? That sounds like a possible sidecar thing).

`minio-service` added for Charmed Kubeflow 1.8 should be moved to kfp-api to avoid bugs with deploying multiple minios at once

Bug Description

[the minio-service added to minio charm to fix an upstream kfp bug]

while prepping for the KF 1.8 release, minio-operator 151 added a new service called minio-service to the minio charm. This service was added because upstream kubeflow pipelines has an issue where it hard-codes the minio service name.

I propose we revert minio-operator 151 and instead fix the bug in kfp by adding this svc/minio-service to the kfp-api charm. The main reasons are:

as-is, we cannot deploy two instances of the minio charm in the same model (both minio's will deploy a service of the same name, at best with one overwriting the other). This is an issue because sometimes we have a minio for kubeflow + another minio for mlflow
as-is, the service that fixes a kfp bug is added for everyone, not just kfp. If we deploy mlflow+minio, the service is added anyway even though it isn't needed

In general, the minio-service is something not needed by minio, just to fix a kfp bug, so it doesn't feel right imo to put it in the minio charm

To Reproduce

Environment

Relevant Log Output

Additional Context

No response

Add integration test for upgrades

To avoid an issue with upgrades like we found when solving #78, we should have an integration test that:

deploys minio
puts some data in the storage
updates minio
confirms the data is still there

Minio secrets will have namespace collision if multiple applications deployed in same model

this line passes credentials to the minio workload via secret, but uses a statically defined name. If we deploy two minio applications to the same model (which is quite likely, as kfp could use different storage from general users) this will result in an error.

Possible solution: use something like CHARM_NAME-credentials?

Add integration with external OIDC

Minio integration described in upstream here: https://docs.min.io/minio/baremetal/security/openid-external-identity-management/configure-openid-external-identity-management.html

Have an action that returns the secret-key of MinIO

Context

Right now the secret-key config value of MinIO has a "" value as default. This then means that MinIO will create a random big password and use this value for the secret-key
https://github.com/canonical/minio-operator/blob/track/ckf-1.8/src/charm.py#L219-L234

Users should have a way in this case to be able to see the secret-key value that was autogenerated by the Charm.

Not that this change is only focused on the "interface" (actions) that users can use to get the value of secret-key. This could go hand in hand with #167, but can be a separate effort as well.

What needs to get done

Have an action that returns the value of the secret-key that MinIO will have, which is either
2. The value of the config
3. The value that MinIO generated

Definition of Done

Ensure the action returns the value, whether it was generated or not

Change in Minio credentials does not propagate to 'mlpipeline-minio-artifact' in user namespace

I'm not sure which component should be addressed here. I'm filling it against minio because the change starts from here.

When changing the minio creds it is changed in kubeflow namespace in secret mlpipeline-minio-artifact but it is not changed in the user's namespace.
A single user installation is also impacted.

Result
It is not possible to run workflows using KFP.

This step is in Error state with this message: Error (exit code 1): failed to put file: The Access Key Id you provided does not exist in our records.

Reproduce:
Deploy Kubeflow and after all is working change the access-key / secret-key for minio.

Workaround:
Copy the mlpipeline-minio-artifact from kubeflow namespace to user namespace

Minio access-key and secret-key as Environment variables instead of secrets

Minio Pod kubectl describe pod shows:

The access and secret keys are exposed to everyone who can describe the pod. The keys should be added as a secret to the pod.

Missing `secret-key` config value validation

It is not clear from the charm docs that the secret-key needs to be at least 8 char long
It is also not very verbose from Juju POV what is actually happening on the charm

Ideally I would expect to have some config value validation on the charm to set it maybe on blocked state but avoid having the actual service down

Reproduce

juju config minio secret-key=minio

Logs

I cannot access MinIO website

$ juju status | grep minio
minio                      res:oci-image@1755999    waiting      1  minio                    ckf-1.7/stable  186  10.152.183.165  no       
mlflow-minio               res:oci-image@1755999    active       1  minio                    ckf-1.7/edge    186  10.152.183.108  no       
minio/0*                      error     idle   10.1.149.251  9000/TCP,9001/TCP  crash loop backoff: back-off 2m40s restarting failed container=minio pod=minio-0_kubeflow(1980c8fe-8cb3-4099-b9eb-2c6...
mlflow-minio/0*               active    idle   10.1.150.41   9000/TCP,9001/TCP

$ microk8s.kubectl get pods -n kubeflow | grep minio
minio-operator-0                                1/1     Running            0              40d
mlflow-minio-operator-0                         1/1     Running            0              67m
mlflow-minio-0                                  1/1     Running            0              66m
minio-0                                         0/1     CrashLoopBackOff   5 (105s ago)   6m24s

$ microk8s.kubectl logs -n kubeflow minio-0
Defaulted container "minio" out of: minio, juju-pod-init (init)
ERROR Unable to validate credentials inherited from the shell environment: Invalid credentials
      > Please provide correct credentials
      HINT:
        Access key length should be at least 3, and secret key length at least 8 characters

`minio` charm does not refresh relations if relation is removed

If minio is related over the object-storage relation to something that does not fulfil the relation contract (for example, it does not send the SDI versions data), it will be stuck like:

minio/0*   waiting   idle   10.1.208.162  9000/TCP,9001/TCP  List of <ops.model.Relation object-storage:0> versions not found for apps: kfp-ui

If we then break the relation (juju remove-relation minio kfp-ui), the charm status remains as seen above. This is probably because we don't observe the relation broken event, or don't account for how the relation-broken event may or may not have the departing application's data in it.

The minio charm does work properly when we re-relate to something that functions properly, or likely on a config-changed event

Note: this issue likely affects other charms as well

Use JuJu application-secrets for MinIO credentials

Context

Those values now are generated by the Charm and are handled as config options. We should move to using juju application-secrets for storing these secrets, to make their handling more secure.

We propose that we currently go with application secrets (and not user secrets) since we would not expect users for now to need to update the credential values of MinIO.

What needs to get done

Convert the secret-key and access-key from config options to application secrets
Keep the logic of autogenerating the values, so MinIO can generate secure ones

For this work though we might need to keep the effort of being compliant with s3-interface #160

Definition of Done

Secrets are used for the sensitive values of MinIO
Spike for exploring if the Charm should generate values by default (we believe yes, but let's get feedback from DP team)

Support the `s3` charm relation interface

Context

It currently is not possible to integrate s3 requirers with minio using the s3 charm relation interface. Adding support for this interface to the minio-k8s charm would make it much easier to integrate requirers (like vault-k8s) with minio directly, improving user experience.

Reference

S3 Charm relation interface definition: https://github.com/canonical/charm-relation-interfaces/tree/main/interfaces/s3/v0
S3 Charm library: https://github.com/canonical/data-platform-libs/blob/main/lib/charms/data_platform_libs/v0/s3.py
Vault K8s Charm: https://charmhub.io/vault-k8s

Integration tests are still using 1.21 microk8s instead of 1.24 for Kubeflow 1.6

In the integration tests config microk8s 1.21 is used like in Kubeflow 1.4 but we can leverage 1.24 kubernetes now.

minio-operator/.github/workflows/integrate.yaml

Line 61 in 368f707

channel: 1.21/stable

test_prometheus_data_set unit test failed

After upgrading prometheus scrape library to v0.30 in PR #109 , getting this error in unit tests:

Traceback (most recent call last):
  File "/home/runner/work/minio-operator/minio-operator/tests/unit/test_charm.py", line 410, in test_prometheus_data_set
    assert json.loads(harness.get_relation_data(rel_id, harness.model.app.name)["scrape_jobs"])[0][
KeyError: 'scrape_jobs'

update config hash unit tests

The config hash unit tests no longer passes after upgrading ops from 1.2 to 1.4. update_config unset functionality changes from removing the config value to resetting to config.yaml defaults.
A new solution is needed for removing configs.

Broken charm icons for minio, mysql and katib-*

The svg icons need to be fixed for these charms

Minimum requirement for Minio storage (10GB) is too much for gateway mode.

I suggest removing this requirement from storage because for Minio in gateway mode this storage is not used at all.

minio-operator/metadata.yaml

Line 30 in 429a152

minimum-size: 10G

Add `ingress` relation to expose console, api

#36 enabled accessing the minio console through a specified port on the container. This should be exposed through an ingress relation to make it easier to access.

The minio api can also be exposed through the ingress, although I'm not sure how that would work when authentication was pulled into the loop

Make charm's images configurable in track/<last-version> branch

Description

The goal of this task is to make all images configurable so that when this charm is deployed in an airgapped environment, all image resources are pulled from an arbitrary local container image registry (avoiding pulling images from the internet).
This serves as a tracking issue for the required changes and backports to the latest stable track/* Github branch.

TL;DR

Mark the following as done

Required changes (in metadata.yaml, config.yaml, src/charm.py)
Test on airgap environment
Publish to /stable

Required changes

WARNING: No breaking changes should be backported into the track/<version> branch. A breaking change can be anything that requires extra steps to refresh from the previous /stable other than just juju refresh. Please avoid at all costs these situations.

The following files have to be modified and/or verified to enable image configuration:

metadata.yaml - the container image(s) of the workload containers have to be specified in this file. This only applies to sidecar charms. Example:

containers:
  training-operator:
    resource: training-operator-image
resources:
  training-operator-image:
    type: oci-image
    description: OCI image for training-operator
    upstream-source: kubeflow/training-operator:v1-855e096

config.yaml - in case the charm deploys containers that are used by resource(s) the operator creates. Example:

apiVersion: v1
kind: ConfigMap
metadata:
  name: seldon-config
  namespace: {{ namespace }}
data:
  predictor_servers: |-
    {
        "TENSORFLOW_SERVER": {
          "protocols" : {
            "tensorflow": {
              "image": "tensorflow/serving", <--- this image should be configurable
              "defaultImageVersion": "2.1.0"
              },
            "seldon": {
              "image": "seldonio/tfserving-proxy",
              "defaultImageVersion": "1.15.0"
              }
            }
        },
...

tools/get-images.sh - is a bash script that returns a list of all the images that are used by this charm. In the case of a multi-charm repo, this is located at the root of the repo and gathers images from all charms in it.
src/charm.py - verify that nothing inside the charm code is calling a subprocess that requires internet connection.

Testing

Spin up an airgap environment following canonical/bundle-kubeflow#682 and canonical/bundle-kubeflow#703 (comment)
Build the charm making sure that all the changes for airgap are in place.
Deploy the charms manually and observe the charm go to active and idle.
Additionally, run integration tests or simulate them. For instance, creating a workload (like a PytorchJob, a SeldonDeployment, etc.).

Publishing

After completing the changes and testing, this charm has to be published to its stable risk in Charmhub. For that you must wait for the charm to be published to /edge, which is the revision to be promoted to /stable. Use the workflow dispatch for this (Actions>Release charm to other tracks...>Run workflow).

Suggested changes/backports

42fef35

minio revisions>57 cannot be deployed in charmed kubernetes

Observed behaviour

minio ckf-1.6/beta hangs in a WaitingStatus for a long time and the storage that is attached to the unit remains in a pending status also. This causes minio to never be active.

juju status
minio/0*    waiting   idle    waiting for container

Steps to reproduce

juju add-model minio-test
juju deploy minio --channel ckf-1.6/beta
juju status

Environment

Charmed Kubernetes 1.22 on AWS
RBAC and Metallb enabled
Node constraints (kubernetes workers): kubernetes-worker cores=8 mem=32G root-disk=100G

Workaround

Remove the application and deploy an older version

juju remove-application minio
juju deploy minio --channel latest/stable

Missing healthcheck probe to check if minio is working or just Pod is UP

We have no indication if the minio application is working or not. It's important because juju thinks that Minio works fine when it is showing errors like - unable to connect (when using gateway mode with the wrong private endpoint). Currently, some errors flip the switch on juju to error but some are unnoticed.

We can use health check probes from minio documentation: https://docs.min.io/docs/minio-monitoring-guide.html

action to obtain secret-key

There should be an action to obtain a secret key that has been randomly generated and is set here as the default.

The share file option uses the Pod internal IP to share, which is inaccessible outside of the cluster

Go to the “Object Browser” tab, select the bucket and the file you want to create a link for, click on it to see its details. Select a share button on the top right and copy the share link from the popup window.

Expected
Use the IP/URL which will be accessible outside the cluster.

Workaround
Exchange the first part of URL with the "localhost" if port-forward or your external IP where it was exposed.

Add actions to support user management in MinIO

It would be nice to allow using actions to manage users in the MinIO. Now the only option is to use the console (UI) or mc (CLI).

Supported actions should be based on upstream documentation: https://docs.min.io/docs/minio-multi-user-quickstart-guide.html

After the CRUD operations are implemented, then integration test covering these actions should be created. Users created this way should be able to access data in both server and gateway mode.

Minio failed to upgrade 1.6 to 1.7

Minio failed to upgrade with error message:

ERROR Juju on containers does not support updating storage on a statefulset.
The new charm's metadata contains updated storage declarations.
You'll need to deploy a new charm rather than upgrading if you need this change.
 not supported (not supported)

Jira

Should Minio provide relation to create a bucket?

In working on canonical/mlflow-operator#34, it was discussed whether or not a charm that wants to relate to and use the object store should be responsible (and thus have the code/tests around) creating it's own bucket, or if the minio charm should provide this somehow. This idea should be elaborated on.

Pros: to remove complexity in all the charms that would use object storage. Without central function for this in the minio charm, all other charms that need a bucket (eg: mlflow) need the logic to create their own if it does not exist.

Cons: not sure, but I think there are some. Not sure how this would map to multi-user scenarios (it isn't as complex for our single-user minio, but if we want per-user buckets etc this gets much more complex).

Add charm for minio console (web portal for minio)

Logging this here as it relates to minio, although it does not really affect this charm specifically.

We could add an additional charm to provide the minio console (similar to the Argo server charm)

`minio` fails to build during integration tests

Seems like minio cannot be built during integration tests, resulting in the following error message:

RuntimeError: Failed to build charm .:
Packing the charm.
Launching environment to pack for base name='ubuntu' channel='20.04' architectures=['amd64'] (may take a while the first time but it's reusable)
Packing the charm
Packing the charm.
Building charm in '/root'
Running step PULL for part 'charm'
Running step BUILD for part 'charm'
Parts processing error: Failed to run the build script for part 'charm'.
Failed to build charm for bases index '0'.

For details and logs see here and follow the latest CI runs of #134

This seems to be affecting the publish job of that PR as well.

Move backup/restore to the Charm

Context

similarly to canonical/mlmd-operator#80

Right now our backup/restore guide has the following issues

It has manual commands the user needs to run to push the data to S3 (rclone)
Users need to download binaries to parse the secret-key and download the data locally (rclone)
The data needs to go via the host-machine that runs the rclone command, for which we use kubectl port-forward

We should move all of this logic to the Charm, to ensure users don't need to install any binaries and the data will go directly from the Charm to S3

What needs to get done

Include all needed binaries (or python libraries) for backup to the Charm
Have an action that pushes the buckets of MinIO to another S3

Definition of Done

Have a spike to confirm our understanding of rclone sync, and which files it copies
Have a spike to discuss if we want to copy all files, or ones that fit a timeframe (could get field input)
The action can be executed in an airgap environment
Users don't need to run any other commands from their machine
The data will go directly to the S3 from the Charm

Allow integration of Minio with DEX

Dex integration available here: https://github.com/minio/minio/blob/master/docs/sts/dex.md

HA storage for MinIO

I understand MinIO charm does not support to be clustered (App POV)
Also I understand that to have HA storage (Storage POV) you could use gateway mode + S3 (Being public cloud or Ceph)

However, do we have a supported or tested way to do HA Storage when deploying on-premises (no Ceph involved)?

I know for COS Lite we have used Maystor Microk8s add-on
With this bug ATM

canonical / minio-operator Goto Github PK

minio-operator's Introduction

MinIO Operator

Overview

Install

MinIO console

Operation Modes

Example using gateway mode

Charm Release Versioning

minio-operator's People

Contributors

Stargazers

Watchers

Forkers

minio-operator's Issues

Bug Description

To Reproduce

Environment

Relevant Log Output

Additional Context

Context

What needs to get done

Definition of Done

Reproduce

Logs

Context

What needs to get done

Definition of Done

Context

Reference

Description

TL;DR

Required changes

Testing

Publishing

Suggested changes/backports

Observed behaviour

Steps to reproduce

Environment

Workaround

Context

What needs to get done

Definition of Done

Recommend Projects

Recommend Topics

Recommend Org

Example using `gateway` mode