Code Monkey home page Code Monkey logo

charmed-kubeflow-chisme's Introduction

Charmed Kubeflow Chisme

Chisme: a Spanish word for gossip, or a story worth telling to your friends.

This repository is for chisme within the Charmed Kubeflow team's codebase - it is a collection of helpers for use in both the Charmed Operators maintained by the Charmed Kubeflow team as well as anyone else who benefits from them.

Contents

  • Exceptions: A collection of standard Exceptions for use when writing charms.
  • Kubernetes: Helpers for interacting with Kubernetes
  • Lightkube: Helpers specific to using or extending Lightkube
  • Pebble: Helpers for managing pebble when writing charms
  • Reusable Charm Components: The Component abstraction that encapsulates any piece of logic for a Charm, a reusable reconcile function CharmReconciler that executes Components, and a collection of Components for things like running Pebble containers or deploying Kubernetes resources
  • Rock: Utilities for testing ROCKs
  • Status Handling: Helpers for working with Charm Status objects
  • Testing: Utilities for testing Charms
  • Types: Reusable typing definitions, useful for adding type hints

Publishing to PyPi

To publish a new release to Pypi:

  1. Update setup.cfg to the new version and commit it to the repo via a completed PR
  2. Apply local git tag according to the format X.X.X (semantic versioning) on the main branch
  3. Push tag to the repo. Example: git push origin 0.0.8
  4. GitHub Action will create a new release on GitHub
  5. Edit release via GitHub UI and click publish
  6. GitHub Action will automatically publish the same commit to PyPi repository

charmed-kubeflow-chisme's People

Contributors

beliaev-maksim avatar ca-scribner avatar dnplas avatar domfleischmann avatar i-chvets avatar kimwnasptd avatar misohu avatar nohaihab avatar orfeas-k avatar phoevos avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

charmed-kubeflow-chisme's Issues

base charm: Enhance the pebble component to accept the contents of a file

Context

ContainerFileTemplate currently accepts the path to a file, we've encountered use cases where we want to pass the file contents rather. For example, when the file content is created as part of the charm code like creating the certs.
At the moment, the workaround for this is to create a temp file and pass the path of the temp file to ContainerFileTemplate i.e. read the temp file back in.
Example of the workaround.

What needs to get done

modify the pebble component to accept file contents rather than just file paths in files_to_push

Definition of Done

  1. we can pass the file content to files_to_push
  2. tests are added for passing file content
  3. refactor the charms where we are doing the workaround mentioned in the context.

Publish a new release

Context

PR #81 sent changes with new helpers for testing rocks

What needs to get done

Publish a new release of chisme following the instructions in the README.md
unpin the dependency in (oidc rock repo)[https://github.com/canonical/oidc-authservice-rock] tests

Definition of Done

new version is published to PyPi and library is not pinned in any of our repos

Implement functionality for custom images for air gap deployment

Context

We want to unify implementation for handling custom images in the charms. We can implement key functions like parse_image_config (example here) and get_images (example here).

Note: There might be two versions for for get_images where one does not handle splitting the images into name and tag and the other one is handling it.

What needs to get done

Implement the:

  • parse_image_config function
  • get_images function
    which can be share across charms. Make sure the functions are properly tested.

Use the functions in target charms. List (may not be complete please double check):

Definition of Done

  1. Functions are implemented tested and merged
  2. Function from chisme is used in the charms

Add a feature to `KubernetesResourceHandler` to reuse Jinja templates

Context

The current implementation of KubernetesResourceHandler has a limitation where using jinja's tags to reference a template from within another template, for example the include tag and the import tag.
With this limitation, we resort to duplicating files as a workaround. See an example in this pr, look at katib-config.yaml.j2 and katib-config-configmap.yaml.j2 in the katib-controller templates.

What needs to get done

  1. Look into the usage of Jinja's tags
  2. Implement this feature in KubernetesResourceHandler
  3. Add an API in the base charm's kubernetes compontent
  4. Write tests for krh and the base charm

Definition of Done

chisme's KubernetesResourceHandler and the base charm's kubernetes component allow reusing of jinja templates, tests included.

Update `PebbleServiceComponent` to allow for empty returns of `get_layer`

Sometimes we may run pebble containers that have their own services/environment, etc, set up and do not need us to change anything. Currently, the PebbleServiceComponent would not handle this properly as:

  • get_layer() is an abstract method that must be implemented
  • get_layer() and the tools that use it (_update_layer(), get_services_not_active()), were not written the a null return in mind

The current workaround is to define everything in get_layer(), even if you didn't need to change anything.

It is not valid to return an empty layer from get_layer() because the update_layer code makes the assumption that we wrote everything in a single layer.

add a `block_until_ready` to Kubernetes actions like apply or maybe delete

Sometimes, it is useful to do something in kubernetes (eg: apply) and then wait for those resources to be ready before continuing. We have done this in past sometimes by locally implementing a watcher loop around some lightkube code.

It would be nice if this option was available, perhaps as an option to KRH.apply(... block_until_ready=True) and similar commands? If we implement this, we could do something like apply the resources as normal and then loop around them waiting until they all show a ready status.

`deepdiff>6.2.1` causes `error: can't find Rust compiler `

It seems like charms installing chisme's deepdiff dependency will end in error with the following message:

::    ::             running build_rust                                                                                          
::    ::             error: can't find Rust compiler                                                                             
::    ::                                                                                                                         
::    ::             If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.      

Workaround

pin deepdiff==6.2.1 in charms

Solution

pin deepdiff in the setup.cfg file

`ErrorWithStatus` may not be setting the unit status correctly

When deploying the kubeflow bundle, at some point I got the following error in the juju debug-log:

unit-training-operator-0: 17:29:52 ERROR unit.training-operator/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 207, in <module>
    main(TrainingOperatorCharm)
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/main.py", line 436, in main
    framework.reemit()
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/framework.py", line 866, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/framework.py", line 931, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 189, in _on_install
    self._check_container_connection()
  File "./src/charm.py", line 135, in _check_container_connection
    raise ErrorWithStatus("Pod startup is not complete", MaintenanceStatus)
charmed_kubeflow_chisme.exceptions._with_status.ErrorWithStatus: Pod startup is not complete

The error suggests the unit should be in MaintenanceStatus, but instead was in ErrorStatus. Although this does not prevent the unit from going to active and idle, this behaviour is not what we are expecting.

Steps to reproduce

  1. Deploy this bundle
  2. Watch the logs for training-operator
  3. Watch the status of training-operator
  4. For a brief moment, the unit is in error status rather than maintenance

_get_missing_kubernetes_resources() will not validate applied namespaced k8s resources without a namespace

Context

KubernetesComponent's function _get_missing_kubernetes_resources() returns desired resources that the Component expects to be present in the Kubernetes cluster but are not. However, it won't be able to verify namespaced K8s resources that are provided without a namespace and thus it will return the the resource is missing while it will have been applied successfully. E.g. if we provide a template with a Role (which is a namespaced resource) but without a namespace in its metadata, Kubernetes will append a (default?) namesapce to it when applying it. As a result, _get_missing_kubernetes_resources() will then not locate the resource applied.

Reproduce

Using a charm that follows Reconcile style (aka Base charm class):

  • add a namespaced resource in its template but remove its namespace (e.g. remove metadata.namespace from this role)
  • run a simple integration test (deploying the charm)
  • watch the charm being in the following state
Unit           Workload  Agent  Address     Ports  Message
kfp-viewer/0*  blocked   idle   10.1.4.207         [kubernetes:auth-and-crds] Not all resources found in cluster.  This may be transient if we haven't tried to deploy t...

Proposed Solution

We should either:

  • state explicitly that _get_missing_kubernetes_resources() expects namespaced resources (if present) to always come with a metadata.namespace field in order to be able to validate their existence in the cluster orp
  • modify _get_missing_kubernetes_resources() to take into account the case of a namespaced resource without a metadata.namespace and then try to locate that resource in any namespace.

refactor `apply_many` and `delete_many` so they (optionally?) do not fail fast when hitting an error

As is, apply_many and delete_many will immediately fail on the first lightkube error. We should enable them to (either by default or optionally) complete all the operations they are able to before raising an exception.

For example, if you use delete_many on a list of resources where the first in the list does not exist, delete_many will attempt to delete the first, see a ApiError(404), and fail immediately without trying to delete the other resources. This is particularly troubling for charm remove hooks where we want to attempt to delete everything, but are ok if some things are already gone.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.