Code Monkey home page Code Monkey logo

kubeflow-tensorboards-operator's Introduction

Kubeflow Tensorboards Operator

Overview

This charm encompasses the Kubernetes Python operator for Kubeflow Tensorboards (see CharmHub).

The Kubeflow Tensorboards operator is a Python script that wraps the latest released version of Kubeflow Tensorboards, providing lifecycle management and handling events such as install, upgrade, integrate, and remove.

More information on Tensorboard can be found on the upstream website.

Usage

To get started, install Kubeflow lite bundle (See Quick start guide).

Then, deploy Tensorboard Controller and Tensorboard Web App with

juju deploy tensorboard-controller --channel=latest/edge --trust
juju deploy tensorboards-web-app --channel=latest/edge --trust

And Create the following relations

juju relate istio-pilot:ingress tensorboards-web-app:ingress
juju relate istio-pilot:gateway-info tensorboard-controller:gateway-info

For a quick example on using Tensorboard, see https://charmed-kubeflow.io/docs/kubeflow-basics For more information, see https://juju.is/docs

Looking for a fully supported platform for MLOps?

Canonical Charmed Kubeflow is a state of the art, fully supported MLOps platform that helps data scientists collaborate on AI innovation on any cloud from concept to production, offered by Canonical - the publishers of Ubuntu.

Kubeflow diagram

Charmed Kubeflow is free to use: the solution can be deployed in any environment without constraints, paywall or restricted features. Data labs and MLOps teams only need to train their data scientists and engineers once to work consistently and efficiently on any cloud – or on-premise.

Charmed Kubeflow offers a centralised, browser-based MLOps platform that runs on any conformant Kubernetes – offering enhanced productivity, improved governance and reducing the risks associated with shadow IT.

Learn more about deploying and using Charmed Kubeflow at https://charmed-kubeflow.io.

Key features

  • Centralised, browser-based data science workspaces: familiar experience
  • Multi user: one environment for your whole data science team
  • NVIDIA GPU support: accelerate deep learning model training
  • Apache Spark integration: empower big data driven model training
  • Ideation to production: automate model training & deployment
  • AutoML: hyperparameter tuning, architecture search
  • Composable: edge deployment configurations available

What’s included in Charmed Kubeflow

  • LDAP Authentication
  • Jupyter Notebooks
  • Work with Python and R
  • Support for TensorFlow, Pytorch, MXNet, XGBoost
  • TFServing, Seldon-Core
  • Katib (autoML)
  • Apache Spark
  • Argo Workflows
  • Kubeflow Pipelines

Why engineers and data scientists choose Charmed Kubeflow

  • Maintenance: Charmed Kubeflow offers up to two years of maintenance on select releases
  • Optional 24/7 support available, contact us here for more information
  • Optional dedicated fully managed service available, contact us here for more information or learn more about Canonical’s Managed Apps service.
  • Portability: Charmed Kubeflow can be deployed on any conformant Kubernetes, on any cloud or on-premise

Documentation

Please see the official docs site for complete documentation of the Charmed Kubeflow distribution.

Bugs and feature requests

If you find a bug in our operator or want to request a specific feature, please file a bug here: https://github.com/canonical/kubeflow-tensorboards-operator/issues

License

Charmed Kubeflow is free software, distributed under the Apache Software License, version 2.0.

Contributing

Canonical welcomes contributions to Charmed Kubeflow. Please check out our contributor agreement if you're interested in contributing to the distribution.

Security

Security issues in Charmed Kubeflow can be reported through LaunchPad. Please do not file GitHub issues about security issues.

kubeflow-tensorboards-operator's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubeflow-tensorboards-operator's Issues

tensorboard charms offline deployment fails "local charm missing OCI images. .."

Same as canonical/kfp-operators#351
Affecting both

  • tensorboards-web-app
  • tensorboard-controller
$ juju deploy --trust --debug ./tensorboard-controller_840554c.charm --resource oci-image=$AWS_ECR_URL/kubeflownotebookswg/tensorboard-controller:v1.7.0

22:20:45 INFO  juju.cmd supercommand.go:56 running juju [2.9.45 afb8ee760af71d0bca8c3e4e0dc28af2dabc9b1d gc go1.20.8]
22:20:45 DEBUG juju.cmd supercommand.go:57   args: []string{"juju", "deploy", "--trust", "--debug", "./tensorboard-controller_840554c.charm", "--resource", "oci-image=701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflownotebookswg/tensorboard-controller:v1.7.0"}
22:20:45 DEBUG juju.jujuclient proxy.go:65 unmarshalled proxy config for "kubernetes-port-forward"
22:20:45 INFO  juju.juju api.go:86 connecting to API addresses: [10.100.189.154:17070]
22:20:45 DEBUG juju.api apiclient.go:624 starting proxier for connection
22:20:45 DEBUG juju.api apiclient.go:628 tunnel proxy in use at localhost on port 34245
22:20:45 DEBUG juju.api apiclient.go:1151 successfully dialed "wss://localhost:34245/api"
22:20:45 INFO  juju.api apiclient.go:686 connection established to "wss://localhost:34245/api"
22:20:45 DEBUG juju.jujuclient proxy.go:65 unmarshalled proxy config for "kubernetes-port-forward"
22:20:45 INFO  juju.juju api.go:86 connecting to API addresses: [10.100.189.154:17070]
22:20:45 DEBUG juju.api apiclient.go:624 starting proxier for connection
22:20:45 DEBUG juju.api apiclient.go:628 tunnel proxy in use at localhost on port 39319
22:20:45 DEBUG juju.api apiclient.go:1151 successfully dialed "wss://localhost:39319/model/79be7fe7-68f5-488b-8a9e-231e6a0fafbb/api"
22:20:45 INFO  juju.api apiclient.go:686 connection established to "wss://localhost:39319/model/79be7fe7-68f5-488b-8a9e-231e6a0fafbb/api"
22:20:45 DEBUG juju.core.charm computedseries.go:27 series "focal" for charm "tensorboard-controller" with format 2, Kubernetes true
22:20:45 DEBUG juju.core.charm computedseries.go:27 series "focal" for charm "tensorboard-controller" with format 2, Kubernetes true
22:20:45 DEBUG juju.api monitor.go:35 RPC connection died
22:20:45 DEBUG juju.api monitor.go:35 RPC connection died
ERROR local charm missing OCI images for: tensorboard-controller-image
22:20:45 DEBUG cmd supercommand.go:537 error stack: 
github.com/juju/juju/cmd/juju/application/deployer.(*factory).validateResourcesNeededForLocalDeploy:677: local charm missing OCI images for: tensorboard-controller-image
github.com/juju/juju/cmd/juju/application/deployer.(*factory).maybeReadLocalCharm:406: 
github.com/juju/juju/cmd/juju/application/deployer.(*factory).GetDeployer:71: 
github.com/juju/juju/cmd/juju/application.(*DeployCommand).Run:909: 
juju deploy --trust --debug ./tensorboards-web-app_d2c0149.charm --resource oci-image=$AWS_ECR_URL/kubeflownotebookswg/tensorboards-web-app:v1.7.0

22:23:35 INFO  juju.cmd supercommand.go:56 running juju [2.9.45 afb8ee760af71d0bca8c3e4e0dc28af2dabc9b1d gc go1.20.8]
22:23:35 DEBUG juju.cmd supercommand.go:57   args: []string{"juju", "deploy", "--trust", "--debug", "./tensorboards-web-app_d2c0149.charm", "--resource", "oci-image=701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflownotebookswg/tensorboards-web-app:v1.7.0"}
22:23:35 DEBUG juju.jujuclient proxy.go:65 unmarshalled proxy config for "kubernetes-port-forward"
22:23:35 INFO  juju.juju api.go:86 connecting to API addresses: [10.100.189.154:17070]
22:23:35 DEBUG juju.api apiclient.go:624 starting proxier for connection
22:23:35 DEBUG juju.api apiclient.go:628 tunnel proxy in use at localhost on port 33553
22:23:35 DEBUG juju.api apiclient.go:1151 successfully dialed "wss://localhost:33553/api"
22:23:35 INFO  juju.api apiclient.go:686 connection established to "wss://localhost:33553/api"
22:23:35 DEBUG juju.jujuclient proxy.go:65 unmarshalled proxy config for "kubernetes-port-forward"
22:23:35 INFO  juju.juju api.go:86 connecting to API addresses: [10.100.189.154:17070]
22:23:35 DEBUG juju.api apiclient.go:624 starting proxier for connection
22:23:35 DEBUG juju.api apiclient.go:628 tunnel proxy in use at localhost on port 38889
22:23:35 DEBUG juju.api apiclient.go:1151 successfully dialed "wss://localhost:38889/model/79be7fe7-68f5-488b-8a9e-231e6a0fafbb/api"
22:23:35 INFO  juju.api apiclient.go:686 connection established to "wss://localhost:38889/model/79be7fe7-68f5-488b-8a9e-231e6a0fafbb/api"
22:23:35 DEBUG juju.core.charm computedseries.go:27 series "focal" for charm "tensorboards-web-app" with format 2, Kubernetes true
22:23:35 DEBUG juju.core.charm computedseries.go:27 series "focal" for charm "tensorboards-web-app" with format 2, Kubernetes true
22:23:35 DEBUG juju.api monitor.go:35 RPC connection died
22:23:35 DEBUG juju.api monitor.go:35 RPC connection died
ERROR local charm missing OCI images for: tensorboards-web-app-image
22:23:35 DEBUG cmd supercommand.go:537 error stack: 
github.com/juju/juju/cmd/juju/application/deployer.(*factory).validateResourcesNeededForLocalDeploy:677: local charm missing OCI images for: tensorboards-web-app-image
github.com/juju/juju/cmd/juju/application/deployer.(*factory).maybeReadLocalCharm:406: 
github.com/juju/juju/cmd/juju/application/deployer.(*factory).GetDeployer:71: 
github.com/juju/juju/cmd/juju/application.(*DeployCommand).Run:909: 

`tensorboards-web-app` fails to build during integration tests

 RuntimeError: Failed to build charm .:
Packing the charm.
Launching environment to pack for base name='ubuntu' channel='20.04' architectures=['amd64'] (may take a while the first time but it's reusable)
Packing the charm
Packing the charm.
Building charm in '/root'
Running step PULL for part 'charm'
Running step BUILD for part 'charm'
Parts processing error: Failed to run the build script for part 'charm'.
Failed to build charm for bases index '0'.
Full execution log: '/home/runner/.local/state/charmcraft/log/charmcraft-20230724-093116.474552.log'

For details and logs, see the latest integration test job for #83

"No healthy upstream" when using PVC

When creating a new tensorboard with a PVC mounted at mnt/tfboard, I get the error "no healthy upstream" when trying to connect to the board.

If I shell into the container, I can see that the tensorBoard is running perfectly fine:

microk8s.kubectl exec -it $(microk8s.kubectl get pods -nrob | grep ^demo- | awk '{ print $1 }' | tail -n 1) -nrob -- bash
root@demo-7498d9d94-zk5gw:/# curl -s localhost:6006/ | head -n 17
<!doctype html><!--
@license
Copyright 2016 The TensorFlow Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--><meta charset="utf-8">
<title>TensorBoard</title>

image

image

tests(twa): Replace istio in integration tests

Context

During canonical/bundle-kubeflow#621, it was agreed to test web apps charms by using the unit/k8s-svc IP directly instead of going through istio's ingressgateway. Thus, we should remove istio charms from integration tests. However, kubeflow-tensorboards-webapp charm needs ingress relation in order to be unblocked and go to active. Thus, it should be replaced instead by a test entity that will be able mock that relation.

By removing istio, tests will also be independent of istio charms' failures.

What needs to get done

  1. Remove istio from tensorboards-web-app
  2. Replace it with a mock charm that offers ingress relation to charm

Definition of Done

Charm's integration tests pass without deploying istio charms

Make charm's images configurable in track/<last-version> branch

Description

The goal of this task is to make all images configurable so that when this charm is deployed in an airgapped environment, all image resources are pulled from an arbitrary local container image registry (avoiding pulling images from the internet).
This serves as a tracking issue for the required changes and backports to the latest stable track/* Github branch.

TL;DR

Mark the following as done

  • Required changes (in metadata.yaml, config.yaml, src/charm.py)
  • Test on airgap environment
  • Publish to /stable

Required changes

WARNING: No breaking changes should be backported into the track/<version> branch. A breaking change can be anything that requires extra steps to refresh from the previous /stable other than just juju refresh. Please avoid at all costs these situations.

The following files have to be modified and/or verified to enable image configuration:

  • metadata.yaml - the container image(s) of the workload containers have to be specified in this file. This only applies to sidecar charms. Example:
containers:
  training-operator:
    resource: training-operator-image
resources:
  training-operator-image:
    type: oci-image
    description: OCI image for training-operator
    upstream-source: kubeflow/training-operator:v1-855e096
  • config.yaml - in case the charm deploys containers that are used by resource(s) the operator creates. Example:
apiVersion: v1
kind: ConfigMap
metadata:
  name: seldon-config
  namespace: {{ namespace }}
data:
  predictor_servers: |-
    {
        "TENSORFLOW_SERVER": {
          "protocols" : {
            "tensorflow": {
              "image": "tensorflow/serving", <--- this image should be configurable
              "defaultImageVersion": "2.1.0"
              },
            "seldon": {
              "image": "seldonio/tfserving-proxy",
              "defaultImageVersion": "1.15.0"
              }
            }
        },
...
  • tools/get-images.sh - is a bash script that returns a list of all the images that are used by this charm. In the case of a multi-charm repo, this is located at the root of the repo and gathers images from all charms in it.

  • src/charm.py - verify that nothing inside the charm code is calling a subprocess that requires internet connection.

Testing

  1. Spin up an airgap environment following canonical/bundle-kubeflow#682 and canonical/bundle-kubeflow#703 (comment)

  2. Build the charm making sure that all the changes for airgap are in place.

  3. Deploy the charms manually and observe the charm go to active and idle.

  4. Additionally, run integration tests or simulate them. For instance, creating a workload (like a PytorchJob, a SeldonDeployment, etc.).

Publishing

After completing the changes and testing, this charm has to be published to its stable risk in Charmhub. For that you must wait for the charm to be published to /edge, which is the revision to be promoted to /stable. Use the workflow dispatch for this (Actions>Release charm to other tracks...>Run workflow).

Suggested changes/backports

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.