Code Monkey home page Code Monkey logo

namespace-node-affinity-operator's Issues

charm is constantly logging errors

Bug Description

the charm gets into this state wheres logging tls errors and it stays active idle. Its not working as expected and is not injecting configs specified in its settings_yaml config to the other pods (in the corresponding namespaces.)

There is no proper visibility into the state of this charm outside of logs. It would be nice if the charm workload status reflected its state and it could forward its logs to COS(loki).

To Reproduce

N/A

Environment

namespace-node-affinity                                    active       1  namespace-node-affinity  0.1/beta              5  REDACTED    no

Relevant Log Output

2024/01/29 1835 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1838 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1840 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1842 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1857 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1857 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1857 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1836 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1805 http: TLS handshake error from REDACTED remote error: tls: bad certificate

Additional Context

settings_yaml config

      controller-k8s: |
        nodeSelectorTerms:
          - matchExpressions:
            - key: kubeflowserver
              operator: In
              values:
              - true
      kubeflow: |
        nodeSelectorTerms:
          - matchExpressions:
            - key: kubeflowserver
              operator: In
              values:
              - true
      metallb: |
        nodeSelectorTerms:
          - matchExpressions:
            - key: kubeflowserver
              operator: In
              values:
              - true

namespace-node-affinity is not working as expected with DaemonSet

Bug Description

The nodeSelector term injected to DaemonSet pods will be ignored due to the fact that

if multiple nodeSelectorTerms are associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified nodeSelectorTerms can be satisfied.

To Reproduce

  1. juju deploy metallb --channel 1.28/stable --trust --config namespace=metallb
  2. juju deploy namespace-node-affinity --trust
  3. kubectl label namespaces metallb namespace-node-affinity=enabled
  4. settings.yaml
~$ cat settings.yaml 
metallb: |
  nodeSelectorTerms:
    - matchExpressions:
      - key: kubeflowserver
        operator: In
        values:
        - true

SETTINGS_YAML=$(cat settings.yaml)
5. juju config namespace-node-affinity settings_yaml="$SETTINGS_YAML"
6. kubectl delete pods -n metallb metallb-0 (the operator pod for metallb app.)
7. kubectl get pods -n metallb metallb-0 -o yaml, we can see in the yaml for the newly created pod:

  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: metallb
    uid: 5214cbc3-20b5-41ba-b7f9-69f2c16da6ca
  resourceVersion: "84536"
  uid: c0bf5e77-8b82-43e8-8670-e8ee3ecd43f9
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubeflowserver
            operator: In
            values:
            - "true"
  1. now pick another pod speaker-ngkg8 which was spawned by the charm pod, before deletion:
kubectl get pods -n metallb speaker-ngkg8 -o yaml
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: speaker
    uid: 9332e100-c788-421d-b6a7-08751831a22d
  resourceVersion: "83722"
  uid: 1fffc42d-6d24-446f-bade-8ca1d7de6a15
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - vm-0
  containers:
  1. kubectl delete pods -n metallb speaker-ngkg8
  2. kubectl get pods -n metallb speaker-k59pk -o yaml, check this newly created pod:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: speaker
    uid: 9332e100-c788-421d-b6a7-08751831a22d
  resourceVersion: "84818"
  uid: d334c13c-fb14-487d-8cbd-b16352a42e21
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - vm-0
        - matchExpressions:
          - key: kubeflowserver
            operator: In
            values:
            - "true"
  containers:

We can see the nodeSelector term is injected. However, this newly created pod will stay in vm-0 due to the fact that

If you specify multiple nodeSelectorTerms associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified nodeSelectorTerms can be satisfied. i.e., Multiple nodeSelectorTerms within nodeAffinity are evaluated using OR logic. If any one of the terms is satisfied, the Pod can be scheduled on that node.

In this case, the second term is going to be ignored.

Environment

juju version: 2.9.43
kubernetes: v1.24.17

Relevant Log Output

N/A

Additional Context

No response

'juju remove-application' does NOT remove the webhook pod.

Bug Description

'juju remove-application' only removes the operator pod.
It does NOT remove the webhook pod.

To Reproduce

  1. juju deploy namespace-node-affinity --trust

You will see two pods started: namespace-node-affinity-0 and namespace-node-affinity-pod-webhook-*****

  1. juju remove-application namespace-node-affinity --destroy-storage --force --no-wait

You will see namespace-node-affinity-0 is deleted, but namespace-node-affinity-pod-webhook-***** is still running.

Environment

juju version: 2.9.43
kubernetes: v1.24.17

Relevant Log Output

model-7c837905-ac2e-48e7-8d3e-29546adc5dfc: 12:35:58 INFO juju.worker.caasupgrader abort check blocked until version event received
model-7c837905-ac2e-48e7-8d3e-29546adc5dfc: 12:35:58 INFO juju.worker.caasupgrader unblocking abort check
model-7c837905-ac2e-48e7-8d3e-29546adc5dfc: 12:35:59 INFO juju.worker.muxhttpserver starting http server on [::]:17071
model-7c837905-ac2e-48e7-8d3e-29546adc5dfc: 12:35:59 INFO juju.worker.caasadmission ensuring model k8s webhook configurations
controller-0: 12:36:17 INFO juju.worker.caasapplicationprovisioner.runner start "namespace-node-affinity"
controller-0: 12:36:21 INFO juju.worker.caasapplicationprovisioner.namespace-node-affinity scaling application "namespace-node-affinity" to desired scale 1
controller-0: 12:36:22 INFO juju.worker.caasapplicationprovisioner.namespace-node-affinity scaling application "namespace-node-affinity" to desired scale 1
unit-namespace-node-affinity-0: 12:36:29 INFO juju.cmd running containerAgent [2.9.43 3cb3f8beac4a0b05e10bdfb8014f5666118a269d gc go1.20.4]
unit-namespace-node-affinity-0: 12:36:29 INFO juju.cmd.containeragent.unit start "unit"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.upgradesteps upgrade steps for 2.9.43 have already been run.
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.probehttpserver starting http server on [::]:65301
unit-namespace-node-affinity-0: 12:36:29 INFO juju.api connection established to "wss://controller-service.controller-myk8scloud-localhost.svc.cluster.local:17070/model/7c837905-ac2e-48e7-8d3e-29546adc5dfc/api"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.apicaller [7c8379] "unit-namespace-node-affinity-0" successfully connected to "controller-service.controller-myk8scloud-localhost.svc.cluster.local:17070"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.api connection established to "wss://controller-service.controller-myk8scloud-localhost.svc.cluster.local:17070/model/7c837905-ac2e-48e7-8d3e-29546adc5dfc/api"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.apicaller [7c8379] "unit-namespace-node-affinity-0" successfully connected to "controller-service.controller-myk8scloud-localhost.svc.cluster.local:17070"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.migrationminion migration phase is now: NONE
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.logger logger worker started
unit-namespace-node-affinity-0: 12:36:29 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.caasupgrader abort check blocked until version event received
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.caasupgrader unblocking abort check
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.leadership namespace-node-affinity/0 promoted to leadership of namespace-node-affinity
unit-namespace-node-affinity-0: 12:36:30 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-namespace-node-affinity-0
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.uniter unit "namespace-node-affinity/0" started
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.uniter resuming charm install
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.uniter.charm downloading ch:amd64/focal/namespace-node-affinity-18 from API server
unit-namespace-node-affinity-0: 12:36:30 INFO juju.downloader downloading from ch:amd64/focal/namespace-node-affinity-18
unit-namespace-node-affinity-0: 12:36:30 INFO juju.downloader download complete ("ch:amd64/focal/namespace-node-affinity-18")
unit-namespace-node-affinity-0: 12:36:30 INFO juju.downloader download verified ("ch:amd64/focal/namespace-node-affinity-18")
unit-namespace-node-affinity-0: 12:36:31 INFO juju.worker.uniter hooks are retried true
unit-namespace-node-affinity-0: 12:36:32 INFO juju.worker.uniter found queued "install" hook
unit-namespace-node-affinity-0: 12:36:32 INFO unit.namespace-node-affinity/0.juju-log Running legacy hooks/install.
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log _gen_certs_if_missing
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install Generating RSA private key, 2048 bit long modulus (2 primes)
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install .................................+++++
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install ......................................................+++++
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install e is 65537 (0x010001)
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install Generating RSA private key, 2048 bit long modulus (2 primes)
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install ...................................................................................................................................+++++
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install .........................................................................................................................+++++
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install e is 65537 (0x010001)
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install Signature ok
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install subject=C = GB, ST = Canonical, L = Canonical, O = Canonical, OU = Canonical, CN = 127.0.0.1
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install Getting CA Private Key
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log Starting main
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log _check_leader
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log _deploy_k8s_resources
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log Rendering manifests
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log Reconcile completed successfully
unit-namespace-node-affinity-0: 12:36:34 INFO juju.worker.uniter.operation ran "install" hook (via hook dispatching script: dispatch)
unit-namespace-node-affinity-0: 12:36:34 INFO juju.worker.uniter found queued "leader-elected" hook
unit-namespace-node-affinity-0: 12:36:34 INFO unit.namespace-node-affinity/0.juju-log _gen_certs_if_missing
unit-namespace-node-affinity-0: 12:36:34 INFO unit.namespace-node-affinity/0.juju-log Starting main
unit-namespace-node-affinity-0: 12:36:34 INFO unit.namespace-node-affinity/0.juju-log _check_leader
unit-namespace-node-affinity-0: 12:36:34 INFO unit.namespace-node-affinity/0.juju-log _deploy_k8s_resources
unit-namespace-node-affinity-0: 12:36:35 INFO unit.namespace-node-affinity/0.juju-log Rendering manifests
unit-namespace-node-affinity-0: 12:36:35 INFO unit.namespace-node-affinity/0.juju-log Reconcile completed successfully
unit-namespace-node-affinity-0: 12:36:35 INFO juju.worker.uniter.operation ran "leader-elected" hook (via hook dispatching script: dispatch)
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log _gen_certs_if_missing
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log Starting main
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log _check_leader
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log _deploy_k8s_resources
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log Rendering manifests
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log Reconcile completed successfully
unit-namespace-node-affinity-0: 12:36:36 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-namespace-node-affinity-0: 12:36:36 INFO juju.worker.uniter found queued "start" hook
unit-namespace-node-affinity-0: 12:36:37 INFO unit.namespace-node-affinity/0.juju-log Running legacy hooks/start.
unit-namespace-node-affinity-0: 12:36:37 INFO unit.namespace-node-affinity/0.juju-log _gen_certs_if_missing
unit-namespace-node-affinity-0: 12:36:37 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)

controller-0: 12:37:23 INFO juju.worker.caasapplicationprovisioner.namespace-node-affinity scaling application "namespace-node-affinity" to desired scale 0
controller-0: 12:37:23 INFO juju.worker.caasapplicationprovisioner.namespace-node-affinity scaling application "namespace-node-affinity" to desired scale 0
unit-namespace-node-affinity-0: 12:37:23 WARNING juju.worker.uniter.operation we should run a leader-deposed hook here, but we can't yet
controller-0: 12:37:24 WARNING juju.worker.caasapplicationprovisioner.namespace-node-affinity update units application "namespace-node-affinity" not found
controller-0: 12:37:24 WARNING juju.worker.caasapplicationprovisioner.namespace-node-affinity update units application "namespace-node-affinity" not found
controller-0: 12:37:25 WARNING juju.worker.caasapplicationprovisioner.namespace-node-affinity update units application "namespace-node-affinity" not found
controller-0: 12:37:25 WARNING juju.worker.caasapplicationprovisioner.namespace-node-affinity update units application "namespace-node-affinity" not found
controller-0: 12:37:26 INFO juju.worker.caasapplicationprovisioner.runner stopped "namespace-node-affinity", err: cannot scale dying application to 0: application "namespace-node-affinity" not found
controller-0: 12:37:26 ERROR juju.worker.caasapplicationprovisioner.runner exited "namespace-node-affinity": cannot scale dying application to 0: application "namespace-node-affinity" not found
controller-0: 12:37:26 INFO juju.worker.caasapplicationprovisioner.runner restarting "namespace-node-affinity" in 3s
controller-0: 12:37:29 INFO juju.worker.caasapplicationprovisioner.runner start "namespace-node-affinity"
controller-0: 12:37:29 INFO juju.worker.caasapplicationprovisioner.runner stopped "namespace-node-affinity", err: <nil>

Additional Context

No response

Charm goes to Active/Idle before workload is active

The charm will go to Active/Idle before the workload has successfully stood up. This could result in a race condition (charm active before workload active) that resolves itself, but can also hide other issues. If there is a problem with the deployment (ex: the image does not exist and the Pod goes to ImagePullBackoff) the charm will still be Active/Idle.

Chicken-and-egg issue: no clear way to apply affinity rules to the webhook pod itself

Bug Description

In a test Microk8s environment, I was able to deploy this charm and have it, upon deletion of existing pods, apply the specified affinity rules to new pods.

However, the namespace-node-affinity-pod-webhook-* pod cannot be deleted in the same way, for likely obvious reasons - I suspect it is the pod which is performing the work of injecting affinity rules, and cannot apply them to itself.

Is there some way that we could apply the same affinity rules to the webhook pod itself, e.g. through the operator pod? (I haven't written this type of Kubernetes charm, but it seems like the operator pod spawns the webhook pod? In which case, perhaps the rules need to be injected by the operator pod when spwaning the webhook pod? Just an idea.)

To Reproduce

  1. Deploy a model with this charm.
  2. Set the settings_yaml config to point at the same model where this charm is deployed.
  3. Delete the namespace-node-affinity-0 pod. (It will receive the modified affinity rules.)
  4. Delete the namespace-node-affinity-pod-webhook-* pod. (It will not receive the modified affinity rules.)

Environment

Juju controller: 2.9.43
K8s: Microk8s v1.28.7
namespace-node-affinity charm version: revision 18, latest/stable

Relevant Log Output

N/A

Additional Context

No response

webhook pods haven't been recreated during upgrade

During upgrade fron 1.6 to 1.7 namespace-node-affinity-pod-webhook haven't been recreated however namespace-node-affinity-webhook-certs secret was refreshed.
Which leading to old secrets are still mounted to the pod. And it can't work with the following error message in the log:

"failed calling webhook "namespace-node-affinity-pod-webhook.default.svc": failed to call webhook: Post "https://namespace-node-affinity-pod-webhook.kubeflow.svc:443/mutate?timeout=5s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "127.0.0.1")"

To fix the issue just delete webhook pod was enough.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.