Code Monkey home page Code Monkey logo

namespace-node-affinity-operator's Introduction

Namespace Node Affinity Operator

This Charm deploys a modified version of the Namespace Node Affinity Kubernetes MutatingWebhook.

The Namespace Node Affinity webhook allows a user to add a given set of node affinities and/or tolerations to all pods deployed in a namespace. This is useful for example in a case where you have a cluster that has some nodes with specific labels (eg: nodes labeled control-plane) and you want all workloads in a Kubernetes namespace to be deployed only on those nodes and not any others in the cluster. More descriptions of the tool are given in the upstream README.md.

Usage

This charm is deployed using the Juju command line tool as follows:

juju deploy namespace-node-affinity --trust

By default, the webhook is not configured to modify pods in any namespace. To add namespaces to its scope, the user must:

  • provide a settings_yaml config file
  • label any namespace we want to work on with the label namespace-node-affinity=enabled

These configurations can be modified during charm runtime, and the webhook always uses the most up to date value.

Defining settings_yaml

We must provide the settings_yaml config, which is a YAML string as described upstream. For example, we can configure the tool to apply:

  • apply a node affinity for pods in testing-ns-a to look for pods with the label control-plane=true, but only to pods that do not have the label ignoreme: ignored
  • apply a node affinity for pods in testing-ns-b to look for pods with the label other-key: other-value

by setting the charm config:

cat <<EOF > settings.yaml

testing-ns-a: |
  nodeSelectorTerms:
    - matchExpressions:
      - key: control-plane
        operator: In
        values:
        - true
  excludedLabels:
    ignoreme: ignored
testing-ns-b: |
  nodeSelectorTerms:
    - matchExpressions:
      - key: other-key
        operator: In
        values:
        - other-value
EOF
SETTINGS_YAML=$(cat settings.yaml)
juju config namespace-node-affinity settings_yaml="$SETTINGS_YAML"

Setting the namespace labels

We must apply the label namespace-node-affinity=enabled to all namespaces being acted on by this tool (this is a requirement by the tool itself, not the chaming application. We might change this in future as it feels like a redundant setting). For example, you can do:

kubectl label ns testing-ns-a namespace-node-affinity=enabled
kubectl label ns testing-ns-b namespace-node-affinity=enabled

Development

When debugging this charm, it is sometimes useful to send AdmissionReview JSON payloads to the webhook pod in the same format as what the Kubernetes API would send in order to check if the webhook pods are working properly. To facilitate that, this tool was used during charm development and might be useful.

namespace-node-affinity-operator's People

Contributors

ca-scribner avatar dnplas avatar i-chvets avatar kimwnasptd avatar misohu avatar nohaihab avatar renovate[bot] avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

namespace-node-affinity-operator's Issues

webhook pods haven't been recreated during upgrade

During upgrade fron 1.6 to 1.7 namespace-node-affinity-pod-webhook haven't been recreated however namespace-node-affinity-webhook-certs secret was refreshed.
Which leading to old secrets are still mounted to the pod. And it can't work with the following error message in the log:

"failed calling webhook "namespace-node-affinity-pod-webhook.default.svc": failed to call webhook: Post "https://namespace-node-affinity-pod-webhook.kubeflow.svc:443/mutate?timeout=5s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "127.0.0.1")"

To fix the issue just delete webhook pod was enough.

namespace-node-affinity is not working as expected with DaemonSet

Bug Description

The nodeSelector term injected to DaemonSet pods will be ignored due to the fact that

if multiple nodeSelectorTerms are associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified nodeSelectorTerms can be satisfied.

To Reproduce

  1. juju deploy metallb --channel 1.28/stable --trust --config namespace=metallb
  2. juju deploy namespace-node-affinity --trust
  3. kubectl label namespaces metallb namespace-node-affinity=enabled
  4. settings.yaml
~$ cat settings.yaml 
metallb: |
  nodeSelectorTerms:
    - matchExpressions:
      - key: kubeflowserver
        operator: In
        values:
        - true

SETTINGS_YAML=$(cat settings.yaml)
5. juju config namespace-node-affinity settings_yaml="$SETTINGS_YAML"
6. kubectl delete pods -n metallb metallb-0 (the operator pod for metallb app.)
7. kubectl get pods -n metallb metallb-0 -o yaml, we can see in the yaml for the newly created pod:

  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: metallb
    uid: 5214cbc3-20b5-41ba-b7f9-69f2c16da6ca
  resourceVersion: "84536"
  uid: c0bf5e77-8b82-43e8-8670-e8ee3ecd43f9
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubeflowserver
            operator: In
            values:
            - "true"
  1. now pick another pod speaker-ngkg8 which was spawned by the charm pod, before deletion:
kubectl get pods -n metallb speaker-ngkg8 -o yaml
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: speaker
    uid: 9332e100-c788-421d-b6a7-08751831a22d
  resourceVersion: "83722"
  uid: 1fffc42d-6d24-446f-bade-8ca1d7de6a15
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - vm-0
  containers:
  1. kubectl delete pods -n metallb speaker-ngkg8
  2. kubectl get pods -n metallb speaker-k59pk -o yaml, check this newly created pod:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: speaker
    uid: 9332e100-c788-421d-b6a7-08751831a22d
  resourceVersion: "84818"
  uid: d334c13c-fb14-487d-8cbd-b16352a42e21
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - vm-0
        - matchExpressions:
          - key: kubeflowserver
            operator: In
            values:
            - "true"
  containers:

We can see the nodeSelector term is injected. However, this newly created pod will stay in vm-0 due to the fact that

If you specify multiple nodeSelectorTerms associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified nodeSelectorTerms can be satisfied. i.e., Multiple nodeSelectorTerms within nodeAffinity are evaluated using OR logic. If any one of the terms is satisfied, the Pod can be scheduled on that node.

In this case, the second term is going to be ignored.

Environment

juju version: 2.9.43
kubernetes: v1.24.17

Relevant Log Output

N/A

Additional Context

No response

'juju remove-application' does NOT remove the webhook pod.

Bug Description

'juju remove-application' only removes the operator pod.
It does NOT remove the webhook pod.

To Reproduce

  1. juju deploy namespace-node-affinity --trust

You will see two pods started: namespace-node-affinity-0 and namespace-node-affinity-pod-webhook-*****

  1. juju remove-application namespace-node-affinity --destroy-storage --force --no-wait

You will see namespace-node-affinity-0 is deleted, but namespace-node-affinity-pod-webhook-***** is still running.

Environment

juju version: 2.9.43
kubernetes: v1.24.17

Relevant Log Output

model-7c837905-ac2e-48e7-8d3e-29546adc5dfc: 12:35:58 INFO juju.worker.caasupgrader abort check blocked until version event received
model-7c837905-ac2e-48e7-8d3e-29546adc5dfc: 12:35:58 INFO juju.worker.caasupgrader unblocking abort check
model-7c837905-ac2e-48e7-8d3e-29546adc5dfc: 12:35:59 INFO juju.worker.muxhttpserver starting http server on [::]:17071
model-7c837905-ac2e-48e7-8d3e-29546adc5dfc: 12:35:59 INFO juju.worker.caasadmission ensuring model k8s webhook configurations
controller-0: 12:36:17 INFO juju.worker.caasapplicationprovisioner.runner start "namespace-node-affinity"
controller-0: 12:36:21 INFO juju.worker.caasapplicationprovisioner.namespace-node-affinity scaling application "namespace-node-affinity" to desired scale 1
controller-0: 12:36:22 INFO juju.worker.caasapplicationprovisioner.namespace-node-affinity scaling application "namespace-node-affinity" to desired scale 1
unit-namespace-node-affinity-0: 12:36:29 INFO juju.cmd running containerAgent [2.9.43 3cb3f8beac4a0b05e10bdfb8014f5666118a269d gc go1.20.4]
unit-namespace-node-affinity-0: 12:36:29 INFO juju.cmd.containeragent.unit start "unit"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.upgradesteps upgrade steps for 2.9.43 have already been run.
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.probehttpserver starting http server on [::]:65301
unit-namespace-node-affinity-0: 12:36:29 INFO juju.api connection established to "wss://controller-service.controller-myk8scloud-localhost.svc.cluster.local:17070/model/7c837905-ac2e-48e7-8d3e-29546adc5dfc/api"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.apicaller [7c8379] "unit-namespace-node-affinity-0" successfully connected to "controller-service.controller-myk8scloud-localhost.svc.cluster.local:17070"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.api connection established to "wss://controller-service.controller-myk8scloud-localhost.svc.cluster.local:17070/model/7c837905-ac2e-48e7-8d3e-29546adc5dfc/api"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.apicaller [7c8379] "unit-namespace-node-affinity-0" successfully connected to "controller-service.controller-myk8scloud-localhost.svc.cluster.local:17070"
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.migrationminion migration phase is now: NONE
unit-namespace-node-affinity-0: 12:36:29 INFO juju.worker.logger logger worker started
unit-namespace-node-affinity-0: 12:36:29 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.caasupgrader abort check blocked until version event received
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.caasupgrader unblocking abort check
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.leadership namespace-node-affinity/0 promoted to leadership of namespace-node-affinity
unit-namespace-node-affinity-0: 12:36:30 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-namespace-node-affinity-0
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.uniter unit "namespace-node-affinity/0" started
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.uniter resuming charm install
unit-namespace-node-affinity-0: 12:36:30 INFO juju.worker.uniter.charm downloading ch:amd64/focal/namespace-node-affinity-18 from API server
unit-namespace-node-affinity-0: 12:36:30 INFO juju.downloader downloading from ch:amd64/focal/namespace-node-affinity-18
unit-namespace-node-affinity-0: 12:36:30 INFO juju.downloader download complete ("ch:amd64/focal/namespace-node-affinity-18")
unit-namespace-node-affinity-0: 12:36:30 INFO juju.downloader download verified ("ch:amd64/focal/namespace-node-affinity-18")
unit-namespace-node-affinity-0: 12:36:31 INFO juju.worker.uniter hooks are retried true
unit-namespace-node-affinity-0: 12:36:32 INFO juju.worker.uniter found queued "install" hook
unit-namespace-node-affinity-0: 12:36:32 INFO unit.namespace-node-affinity/0.juju-log Running legacy hooks/install.
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log _gen_certs_if_missing
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install Generating RSA private key, 2048 bit long modulus (2 primes)
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install .................................+++++
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install ......................................................+++++
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install e is 65537 (0x010001)
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install Generating RSA private key, 2048 bit long modulus (2 primes)
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install ...................................................................................................................................+++++
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install .........................................................................................................................+++++
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install e is 65537 (0x010001)
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install Signature ok
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install subject=C = GB, ST = Canonical, L = Canonical, O = Canonical, OU = Canonical, CN = 127.0.0.1
unit-namespace-node-affinity-0: 12:36:33 WARNING unit.namespace-node-affinity/0.install Getting CA Private Key
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log Starting main
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log _check_leader
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log _deploy_k8s_resources
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log Rendering manifests
unit-namespace-node-affinity-0: 12:36:33 INFO unit.namespace-node-affinity/0.juju-log Reconcile completed successfully
unit-namespace-node-affinity-0: 12:36:34 INFO juju.worker.uniter.operation ran "install" hook (via hook dispatching script: dispatch)
unit-namespace-node-affinity-0: 12:36:34 INFO juju.worker.uniter found queued "leader-elected" hook
unit-namespace-node-affinity-0: 12:36:34 INFO unit.namespace-node-affinity/0.juju-log _gen_certs_if_missing
unit-namespace-node-affinity-0: 12:36:34 INFO unit.namespace-node-affinity/0.juju-log Starting main
unit-namespace-node-affinity-0: 12:36:34 INFO unit.namespace-node-affinity/0.juju-log _check_leader
unit-namespace-node-affinity-0: 12:36:34 INFO unit.namespace-node-affinity/0.juju-log _deploy_k8s_resources
unit-namespace-node-affinity-0: 12:36:35 INFO unit.namespace-node-affinity/0.juju-log Rendering manifests
unit-namespace-node-affinity-0: 12:36:35 INFO unit.namespace-node-affinity/0.juju-log Reconcile completed successfully
unit-namespace-node-affinity-0: 12:36:35 INFO juju.worker.uniter.operation ran "leader-elected" hook (via hook dispatching script: dispatch)
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log _gen_certs_if_missing
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log Starting main
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log _check_leader
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log _deploy_k8s_resources
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log Rendering manifests
unit-namespace-node-affinity-0: 12:36:36 INFO unit.namespace-node-affinity/0.juju-log Reconcile completed successfully
unit-namespace-node-affinity-0: 12:36:36 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-namespace-node-affinity-0: 12:36:36 INFO juju.worker.uniter found queued "start" hook
unit-namespace-node-affinity-0: 12:36:37 INFO unit.namespace-node-affinity/0.juju-log Running legacy hooks/start.
unit-namespace-node-affinity-0: 12:36:37 INFO unit.namespace-node-affinity/0.juju-log _gen_certs_if_missing
unit-namespace-node-affinity-0: 12:36:37 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)

controller-0: 12:37:23 INFO juju.worker.caasapplicationprovisioner.namespace-node-affinity scaling application "namespace-node-affinity" to desired scale 0
controller-0: 12:37:23 INFO juju.worker.caasapplicationprovisioner.namespace-node-affinity scaling application "namespace-node-affinity" to desired scale 0
unit-namespace-node-affinity-0: 12:37:23 WARNING juju.worker.uniter.operation we should run a leader-deposed hook here, but we can't yet
controller-0: 12:37:24 WARNING juju.worker.caasapplicationprovisioner.namespace-node-affinity update units application "namespace-node-affinity" not found
controller-0: 12:37:24 WARNING juju.worker.caasapplicationprovisioner.namespace-node-affinity update units application "namespace-node-affinity" not found
controller-0: 12:37:25 WARNING juju.worker.caasapplicationprovisioner.namespace-node-affinity update units application "namespace-node-affinity" not found
controller-0: 12:37:25 WARNING juju.worker.caasapplicationprovisioner.namespace-node-affinity update units application "namespace-node-affinity" not found
controller-0: 12:37:26 INFO juju.worker.caasapplicationprovisioner.runner stopped "namespace-node-affinity", err: cannot scale dying application to 0: application "namespace-node-affinity" not found
controller-0: 12:37:26 ERROR juju.worker.caasapplicationprovisioner.runner exited "namespace-node-affinity": cannot scale dying application to 0: application "namespace-node-affinity" not found
controller-0: 12:37:26 INFO juju.worker.caasapplicationprovisioner.runner restarting "namespace-node-affinity" in 3s
controller-0: 12:37:29 INFO juju.worker.caasapplicationprovisioner.runner start "namespace-node-affinity"
controller-0: 12:37:29 INFO juju.worker.caasapplicationprovisioner.runner stopped "namespace-node-affinity", err: <nil>

Additional Context

No response

charm is constantly logging errors

Bug Description

the charm gets into this state wheres logging tls errors and it stays active idle. Its not working as expected and is not injecting configs specified in its settings_yaml config to the other pods (in the corresponding namespaces.)

There is no proper visibility into the state of this charm outside of logs. It would be nice if the charm workload status reflected its state and it could forward its logs to COS(loki).

To Reproduce

N/A

Environment

namespace-node-affinity                                    active       1  namespace-node-affinity  0.1/beta              5  REDACTED    no

Relevant Log Output

2024/01/29 1835 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1838 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1840 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1842 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1857 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1857 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1857 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1836 http: TLS handshake error from REDACTED remote error: tls: bad certificate
2024/01/29 1805 http: TLS handshake error from REDACTED remote error: tls: bad certificate

Additional Context

settings_yaml config

      controller-k8s: |
        nodeSelectorTerms:
          - matchExpressions:
            - key: kubeflowserver
              operator: In
              values:
              - true
      kubeflow: |
        nodeSelectorTerms:
          - matchExpressions:
            - key: kubeflowserver
              operator: In
              values:
              - true
      metallb: |
        nodeSelectorTerms:
          - matchExpressions:
            - key: kubeflowserver
              operator: In
              values:
              - true

Chicken-and-egg issue: no clear way to apply affinity rules to the webhook pod itself

Bug Description

In a test Microk8s environment, I was able to deploy this charm and have it, upon deletion of existing pods, apply the specified affinity rules to new pods.

However, the namespace-node-affinity-pod-webhook-* pod cannot be deleted in the same way, for likely obvious reasons - I suspect it is the pod which is performing the work of injecting affinity rules, and cannot apply them to itself.

Is there some way that we could apply the same affinity rules to the webhook pod itself, e.g. through the operator pod? (I haven't written this type of Kubernetes charm, but it seems like the operator pod spawns the webhook pod? In which case, perhaps the rules need to be injected by the operator pod when spwaning the webhook pod? Just an idea.)

To Reproduce

  1. Deploy a model with this charm.
  2. Set the settings_yaml config to point at the same model where this charm is deployed.
  3. Delete the namespace-node-affinity-0 pod. (It will receive the modified affinity rules.)
  4. Delete the namespace-node-affinity-pod-webhook-* pod. (It will not receive the modified affinity rules.)

Environment

Juju controller: 2.9.43
K8s: Microk8s v1.28.7
namespace-node-affinity charm version: revision 18, latest/stable

Relevant Log Output

N/A

Additional Context

No response

Charm goes to Active/Idle before workload is active

The charm will go to Active/Idle before the workload has successfully stood up. This could result in a race condition (charm active before workload active) that resolves itself, but can also hide other issues. If there is a problem with the deployment (ex: the image does not exist and the Pod goes to ImagePullBackoff) the charm will still be Active/Idle.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.