Code Monkey home page Code Monkey logo

envoy-operator's Introduction

Envoy Operator

Overview

This charm encompasses the Kubernetes Python operator for Envoy (see CharmHub).

The Envoy operator is a Python script that wrap the latest released version of Envoy, providing lifecycle management and handling events such as install, upgrade, integrate, and remove.

Install

To install Envoy, run:

juju deploy envoy

For more information, see https://juju.is/docs

envoy-operator's People

Contributors

beliaev-maksim avatar ca-scribner avatar dnplas avatar i-chvets avatar kimwnasptd avatar knkski avatar misohu avatar natalian98 avatar nohaihab avatar orfeas-k avatar renovate[bot] avatar wrfitch avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

aym-frikha

envoy-operator's Issues

When deploying on air gapped environment gets stuck installing

Bug Description

Trying to deploy air gapped I am seing a couple issues with multiple charms

Now, AFAIU there seem to be at least some dependencies like this
kfp-metadata-writer -> mlmd -> envoy

And envoy shows stuck "installing" no much useful logs I could find

To Reproduce

# [..] Deploy other charms offline as per kubeflow-bundle repo

juju deploy --trust --debug ./envoy envoy --resource oci-image=10.10.11.39:32000/gcr.io/ml-pipeline/metadata-envoy:2.0.2

# [..] Add relations as per kubeflow-bundle repo

Environment

Kubeflow 1.8/stable
Microk8s 1.28-strict/stable
Juju 3.1.7/stable

Air Gapped

Relevant Log Output

# juju status | grep -v active
Model     Controller  Cloud/Region              Version  SLA          Timestamp
kubeflow  lxd-mgmt    microk8s-train/localhost  3.1.7    unsupported  06:04:11Z

App                        Version                         Status       Scale  Charm                    Channel  Rev  Address         Exposed  Message
envoy                                                      maintenance      1  envoy                               0                  no       installing charm software
istio-pilot                                                waiting          1  istio-pilot                         2  10.152.183.244  no       installing agent
kfp-metadata-writer                                        waiting          1  kfp-metadata-writer                 0  10.152.183.26   no       installing agent
kfp-profile-controller                                     waiting          1  kfp-profile-controller              0  10.152.183.27   no       installing agent
kfp-ui                                                     waiting          1  kfp-ui                              0  10.152.183.127  no       installing agent
mlmd                       .../tfx-oss-public/ml_metad...  waiting          1  mlmd                                0  10.152.183.212  no       List of <ops.model.Relation grpc:25> versions not found for apps: envoy
oidc-gatekeeper                                            waiting          1  oidc-gatekeeper                     0  10.152.183.84   no       installing agent

Unit                          Workload     Agent      Address       Ports          Message
envoy/0*                      maintenance  executing                               (leader-elected) installing charm software
istio-pilot/0*                waiting      idle       10.1.195.250                 Execution handled 1 errors.  See logs for details.
kfp-metadata-writer/0*        blocked      idle       10.1.195.245                 [relation:grpc] Expected data from exactly 1 related applications - got 0.
kfp-profile-controller/0*     maintenance  idle       10.1.195.197                 Reconciling charm: executing component container:kfp-profile-controller
kfp-ui/0*                     waiting      idle       10.1.195.231                 [container:ml-pipeline-ui] Waiting for Pebble services (ml-pipeline-ui).  If this persists, it could be a blocking co...
mlmd/0*                       waiting      idle       10.1.195.212  8080/TCP       List of <ops.model.Relation grpc:25> versions not found for apps: envoy
oidc-gatekeeper/0*            blocked      idle       10.1.195.225                 Failed to replan

# kk logs envoy-operator-0
Defaulted container "juju-operator" out of: juju-operator, juju-init (init)
2024-02-27 04:53:19 INFO juju.cmd supercommand.go:56 running jujud [3.1.7 0cd207d999fef1fc8b965c410e9f58fafe7ee335 gc go1.21.5]
2024-02-27 04:53:19 DEBUG juju.cmd supercommand.go:57   args: []string{"/var/lib/juju/tools/jujud", "caasoperator", "--application-name=envoy", "--debug"}
2024-02-27 04:53:19 DEBUG juju.agent agent.go:593 read agent config, format "2.0"
2024-02-27 04:53:19 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.1.7 have already been run.
2024-02-27 04:53:19 INFO juju.cmd.jujud caasoperator.go:205 caas operator application-envoy start (3.1.7 [gc])
2024-02-27 04:53:19 DEBUG juju.cmd.jujud runner.go:402 start "api"
2024-02-27 04:53:19 INFO juju.cmd.jujud runner.go:578 start "api"
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "caas-units-manager" manifold worker started at 2024-02-27 04:53:19.415579542 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "clock" manifold worker started at 2024-02-27 04:53:19.416484295 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-gate" manifold worker started at 2024-02-27 04:53:19.416651233 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "agent" manifold worker started at 2024-02-27 04:53:19.417316281 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:603 "caas-units-manager" manifold worker completed successfully
2024-02-27 04:53:19 DEBUG juju.worker.introspection worker.go:135 introspection worker listening on "@jujud-application-envoy"
2024-02-27 04:53:19 DEBUG juju.cmd.jujud runner.go:410 "api" started
2024-02-27 04:53:19 DEBUG juju.worker.introspection worker.go:161 stats worker now serving
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-flag" manifold worker started at 2024-02-27 04:53:19.426088114 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "caas-units-manager" manifold worker started at 2024-02-27 04:53:19.426203073 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.apicaller connect.go:129 connecting with old password
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "api-config-watcher" manifold worker started at 2024-02-27 04:53:19.428977713 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "migration-fortress" manifold worker started at 2024-02-27 04:53:19.436578166 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.api apiclient.go:1172 successfully dialed "wss://10.10.11.54:17070/model/c41734e0-aa2c-4028-8105-ccefc9d4111e/api"
2024-02-27 04:53:19 INFO juju.api apiclient.go:707 connection established to "wss://10.10.11.54:17070/model/c41734e0-aa2c-4028-8105-ccefc9d4111e/api"
2024-02-27 04:53:19 INFO juju.worker.apicaller connect.go:163 [c41734] "application-envoy" successfully connected to "10.10.11.54:17070"
2024-02-27 04:53:19 DEBUG juju.api monitor.go:35 RPC connection died
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:603 "api-caller" manifold worker completed successfully
2024-02-27 04:53:19 DEBUG juju.worker.apicaller connect.go:129 connecting with old password
2024-02-27 04:53:19 DEBUG juju.api apiclient.go:1172 successfully dialed "wss://10.10.11.54:17070/model/c41734e0-aa2c-4028-8105-ccefc9d4111e/api"
2024-02-27 04:53:19 INFO juju.api apiclient.go:707 connection established to "wss://10.10.11.54:17070/model/c41734e0-aa2c-4028-8105-ccefc9d4111e/api"
2024-02-27 04:53:19 INFO juju.worker.apicaller connect.go:163 [c41734] "application-envoy" successfully connected to "10.10.11.54:17070"
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "api-caller" manifold worker started at 2024-02-27 04:53:19.496843869 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:603 "caas-units-manager" manifold worker completed successfully
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "caas-units-manager" manifold worker started at 2024-02-27 04:53:19.50550414 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "migration-minion" manifold worker started at 2024-02-27 04:53:19.507115444 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "upgrader" manifold worker started at 2024-02-27 04:53:19.507267948 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "log-sender" manifold worker started at 2024-02-27 04:53:19.507510505 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-runner" manifold worker started at 2024-02-27 04:53:19.509256653 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:603 "upgrade-steps-runner" manifold worker completed successfully
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "migration-inactive-flag" manifold worker started at 2024-02-27 04:53:19.512624978 +0000 UTC
2024-02-27 04:53:19 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received
2024-02-27 04:53:19 DEBUG juju.worker.caasupgrader upgrader.go:128 current agent binary version: 3.1.7
2024-02-27 04:53:19 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check
2024-02-27 04:53:19 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "charm-dir" manifold worker started at 2024-02-27 04:53:19.522822714 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:618 "operator" manifold worker stopped: fortress operation aborted
stack trace:
github.com/juju/juju/worker/fortress.init:43: fortress operation aborted
github.com/juju/juju/worker/fortress.Occupy:60:
github.com/juju/juju/cmd/jujud/agent/engine.Housing.Decorate.occupyStart.func1:93:
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "secret-drain-worker" manifold worker started at 2024-02-27 04:53:19.523128737 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "api-address-updater" manifold worker started at 2024-02-27 04:53:19.523210402 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.logger logger.go:65 initial log config: "<root>=DEBUG"
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "logging-config-updater" manifold worker started at 2024-02-27 04:53:19.523368699 +0000 UTC
2024-02-27 04:53:19 INFO juju.worker.logger logger.go:120 logger worker started
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "proxy-config-updater" manifold worker started at 2024-02-27 04:53:19.524860739 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.logger logger.go:93 reconfiguring logging from "<root>=DEBUG" to "<root>=INFO"
2024-02-27 04:53:19 WARNING juju.worker.proxyupdater proxyupdater.go:241 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
2024-02-27 04:53:19 INFO juju.worker.caasoperator.charm bundles.go:81 downloading local:focal/envoy-0 from API server
2024-02-27 04:53:19 INFO juju.downloader download.go:109 downloading from local:focal/envoy-0
2024-02-27 04:53:19 INFO juju.downloader download.go:92 download complete ("local:focal/envoy-0")
2024-02-27 04:53:19 INFO juju.downloader download.go:172 download verified ("local:focal/envoy-0")
2024-02-27 04:53:23 INFO juju.worker.caasoperator caasoperator.go:430 operator "envoy" started
2024-02-27 04:53:23 INFO juju.worker.caasoperator.runner runner.go:578 start "envoy/0"
2024-02-27 04:53:23 INFO juju.worker.leadership tracker.go:194 envoy/0 promoted to leadership of envoy
2024-02-27 04:53:23 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-envoy-0
2024-02-27 04:53:23 INFO juju.worker.caasoperator.uniter.envoy/0 uniter.go:363 unit "envoy/0" started
2024-02-27 04:53:23 INFO juju.worker.caasoperator.uniter.envoy/0 uniter.go:689 resuming charm install
2024-02-27 04:53:23 INFO juju.worker.caasoperator.uniter.envoy/0.charm bundles.go:81 downloading local:focal/envoy-0 from API server
2024-02-27 04:53:23 INFO juju.downloader download.go:109 downloading from local:focal/envoy-0
2024-02-27 04:53:23 INFO juju.downloader download.go:92 download complete ("local:focal/envoy-0")
2024-02-27 04:53:24 INFO juju.downloader download.go:172 download verified ("local:focal/envoy-0")
2024-02-27 04:53:27 INFO juju.worker.caasoperator.uniter.envoy/0 uniter.go:389 hooks are retried true
2024-02-27 04:53:27 INFO juju.worker.caasoperator.uniter.envoy/0 resolver.go:165 found queued "install" hook
2024-02-27 04:53:28 INFO juju-log Running legacy hooks/install.
2024-02-27 04:53:29 WARNING juju-log 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.
2024-02-27 04:53:30 INFO juju.worker.caasoperator.uniter.envoy/0.operation runhook.go:186 ran "install" hook (via hook dispatching script: dispatch)
2024-02-27 04:53:30 INFO juju.worker.caasoperator.uniter.envoy/0 resolver.go:165 found queued "leader-elected" hook
2024-02-27 04:53:31 WARNING juju-log 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.

Additional Context

No response

Debug `envoy` for in airgapped environments in `track/2.0`

Context

#72 reports envoy not working in an airgapped environment. We need to debug this, fix the issues, and release them to track/2.0

It is unclear what version of envoy was used in the previous failed ckf 1.8 deployments - its possible that:

  • track/2.0 does actually work in airgapped
  • our current main charm works in airgapped now based on other recent changes.

If neither work, we need to debug and implement a fix to track/2.0

What needs to get done

  1. investigate if envoy is broken for airgapped environments
  2. fix the issue and land the fix in track/2.0

Definition of Done

  1. track/2.0 works in airgapped

Difficulty fetching oci-image creates transient blocked status

When running the build_and_deploy integration tests in this PR, if raise_on_blocked is set to True when waiting for idle, the test fails. This is due to a transient blocked status, which shows up in the next test in the file, which hangs on "getting oci-image" for around 5 minutes. Since a blocked status means "I require human intervention", this might just need updating to a maintenance/waiting status.

Test Logs:

Grafana integration leaves no data dashboards

Bug Description

I deployed COS and integrated it with envoy:

$   juju status --relations | grep envoy
envoy                                  res:oci-image@cc06b3e    active       1  envoy                     2.0/stable           101  10.152.183.122  no       
grafana-agent-envoy                    0.35.2                   active       1  grafana-agent-k8s         latest/stable         58  10.152.183.22   no       
envoy/5*                                  active       idle   192.168.13.250   9090,9901/TCP  
grafana-agent-envoy/0*                    active       idle   192.168.212.194                 
cos-loki:logging                                                   grafana-agent-envoy:logging-consumer                     loki_push_api             regular  
cos-prometheus:receive-remote-write                                grafana-agent-envoy:send-remote-write                    prometheus_remote_write   regular  
envoy:grafana-dashboards                                           cos-grafana:grafana-dashboard                            grafana_dashboard         regular  joining  
envoy:grafana-dashboards                                           grafana-agent-envoy:grafana-dashboards-consumer          grafana_dashboard         regular  
envoy:metrics-endpoint                                             grafana-agent-envoy:metrics-endpoint                     prometheus_scrape         regular  
grafana-agent-envoy:grafana-dashboards-provider                    cos-grafana:grafana-dashboard                            grafana_dashboard         regular  
grafana-agent-envoy:peers                                          grafana-agent-envoy:peers                                grafana_agent_replica     peer     
istio-pilot:ingress                                                envoy:ingress                                            ingress                   regular  
mlmd:grpc                                                          envoy:grpc                                               grpc                      regular  

As you can see a lot of dashboards have no data:
Screenshot from 2024-03-04 12-15-19

To Reproduce

  1. Juju deploy COS
  2. Juju deploy kubeflow 1.8
  3. Integrate it

Environment

Relevant Log Output

-

Additional Context

No response

envoy responds upstream connect error: connection termination

When KFP-UI sends a request to envoy in order to fetch ML metadata from metadata-grpc-service, we see the following error:

Cannot find context with {"typeName":"system.PipelineRun","contextName":"2b2eae10-e8fc-4427-8142-7e8c30fcfb27"}: upstream connect error or disconnect/reset before headers. reset reason: connection termination

This is a message received from envoy service which is probably not configured properly.

Debug

Applying upstream envoy Deployment, Service and VirtualService solved the issue, thus we confirm that it's our envoy configuration that creates an issue.

update the grpc relation to use the mlops-libs k8s_service_info library

Context

Previous versions of this charm use an SDI-backed implementation of the grpc relation with mlmd. For the Charmed Kubeflow 1.9 release, mlmd's relation handling is changing to use the mlops-libs k8s_service_info library. We need to update that here as well to keep them compatible.

What needs to get done

  1. update the relation handling here to use the mlops-libs k8s_service_info library

Definition of Done

  1. charm has the new relation handling and is demonstrated working. If this is implemented before mlmd is upgraded, we might need to demonstrate this using a dummy charm of an intermediate implementation of mlmd because we have no other charms that use this library yet.

Publish envoy-operator charm to `metadata-envoy` instead of `envoy`

Context

This charm is for KFP's metadata-envoy component and not a general envoy component. During the update for CKF 1.8 release, we changed from using image envoyproxy/envoy:v1.12.2 to gcr.io/ml-pipeline/metadata-envoy:2.0.2. This made sense, since the charm is not a really configurable general envoy charm but rather configures envoy in order to imitate the kfp's metadata-envoy functionality.

At the same time, there are two charms in charmhub:

Proposal

We should archive envoy charm in Charmhub and start publishing under metadata-envoy. This way, we 'll avoid confusion and also make explicit that this is not a generalized envoy charm.

Sidecar rewrite: envoy

Context

We rewrite all of our charms using the sidecar with base charm pattern instead of the old podspec.

What needs to get done

Rewrite the charm using sidecar with base charm pattern.

Definition of Done

Charm is rewritten with sidecar with base charm pattern.
All of the tests are rewritten and passing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.