Code Monkey home page Code Monkey logo

kupid's People

Contributors

andreasburger avatar ashwani2k avatar ccwienk avatar dependabot[bot] avatar dkistner avatar gardener-robot-ci-1 avatar gardener-robot-ci-2 avatar gardener-robot-ci-3 avatar petersutter avatar raphaelvogel avatar renormalize avatar rfranzke avatar shreyas-s-rao avatar timebertt avatar timuthy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kupid's Issues

Drop Kupid in favor of an alternative (OPA Gatekeeper or Kyverno...)

What would you like to be added:

Readme says:

The OPA Gatekeeper allows to define policy to validate and mutate any kubernetes resource. Technically, this can be used to dynamically inject anything, including scheduling policy into pods. But this is too big a component to introduce just to dynamically inject scheduling policy. Besides, the policy definition as code is undesirable in this context because the policy itself would be non-declarative and hard to validate while deploying the policy.

However, it doesn't seem this justifies building our own component (which is currently unmaintained?) in comparison to the relatively low effort to reuse a well-established project from the community.

This repository could basically be a few yaml files instead of thousands of lines of code.

Why is this needed:

  • relieve us from unnecessary maintenance effort (see open PRs, repository requires regular dependency updates, ref #32, see open dependabot vulnerability alerts)
  • OPA Gatekeeper will open new doors for many other mechanisms (e.g. mutating specific shoot control planes)

Pending tasks for gardener integration

What would you like to be added:

  • General solution for reserving excess capacity (both on common and dedicated nodes). A tentative solution is here.
  • Genera solution in the landscape deployment to create dedicated worker pools for etcd and to reserve excess capacity.

Why is this needed:
To complete the integration with gardener.

Mutating webhook should handle only relevant requests

What would you like to be added:

I would like Kupid's mutating webhook to only handle the requests that are relevant to it by using an ObjectSelector in the webhook configuration. The object selector can be set based on the PSPs and CPSPs that Kupid uses to mutate these resources.

Why is this needed:

Today Kupid receives every request in the cluster, while it only wishes to mutate specific resources (like etcd statefulset) based on resource labels. This allows for low resource consumption by Kupid (by avoiding irrelevant requests to it) and reduces log load by getting rid of unnecessary Handling request... logs.

Improve error logging

What would you like to be added:
Log errors at a level that is is convenient to configure without flooding the logs.

Why is this needed:
Help debugging issues while not flooding the logs.

Add healthz endpoint and metrics

What would you like to be added:
Add support for healthz endpoint and expose metrics.

Why is this needed:
To enabled livenessProbe and also to collect relevant metrics.

Kupid should not generate the MutatingWebhookConfiguration with the rule to mutate Jobs for updates

What happened:
Define a Job which has a selector for labels of Pod.
Define a Cluster or Namespace scoped PodScheduling policy using Kupid which works on the same labels which satisfies the Pod created by the above Job definition.

Once the Job has completed, change the PodScheduling policy by changing the label selector.
Try to delete the Job created before the PodScheduling policy was changed.
Once the new policy comes into affect, it stops the existing job to be deleted as it will try to update the PodSpec with the new Labels which is an immutable field.
We see following errors in KCM logs

I1207 11:25:10.408731       1 garbagecollector.go:529] remove DeleteDependents finalizer for item [batch/v1/Job, namespace: shoot--dev--test-ash-3, name: a3fc17-compact-job, uid: 728ec193-2a8e-4f8a-befb-1fa9526ef7f8]
E1207 11:25:10.431940       1 garbagecollector.go:309] error syncing item &garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"batch/v1", Kind:"Job", Name:"a3fc17-compact-job", UID:"728ec193-2a8e-4f8a-befb-1fa9526ef7f8", 
BlockOwnerDeletion:(*bool)(0xc001d277fa)}}}: Job.batch "a3fc17-compact-job" is invalid: spec.template: Invalid value: core.PodTemplateSpec: field is immutable

This keep the job orphaned unless manually deleted.

What you expected to happen:
Kupid should not update the spec of a Job as it runs to completion.

How to reproduce it (as minimally and precisely as possible):
Follow the steps above.

Anything else we need to know:
This happens only where the earlier Job is not deleted before creating the change in the PodScheduling Policy.

Environment:
k8s - v1.19.5
kupid - v0.1.6

Missing priorityClass for kupid-extension

What happened:
The kupid extension deployment is lacking a priority class. If the extension is running in a cluster with limited capacity existing Shoots which require the kupid extension can't be reconciled as other components (e.g. control plane components) might have a higher priority.

Similar to all other extensions I recommend to use a priority class with value 1000000000.
Ref: https://github.com/gardener/gardener-extension-provider-aws/blob/master/charts/gardener-extension-provider-aws/templates/priorityclass.yaml#L5

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

Environment:

/assign

Allow configuring QPS and burst via helm chart

What would you like to be added:
User should be allowed to configure the QPS and burst via the kupid helm chart.

Why is this needed:
Currently, kupid provides flags which users can use to set QPS and burst settings for the manager config. But these are not exposed via the helm chart. Helm chart needs to be enhanced to allow setting these values.

Handling failures in applying the Scheduling policies.

What would you like to be added:
Currently for Kupid it ignore failure on applying the scheduling policies.
Either we should fail in case the policy is not applied so that there are no side effects of pod being scheduled on workers which it was not defined in the scheduling policies or
We should log such errors and raise alerts to bring to the operator of such scheduling of pods to have a possibility to react to the anomaly.

Why is this needed:
Now in real scenario this can have pros and cons --
Pro

  • It ensure critical components like etcd are scheduled even if not on the worker pool the policy describes but whatever scheduler prescribes.

Cons

  • If scheduled on an unintended worker, the health of the worker can affect the availability of etcd and might get rolled even if there is nothing wrong with the etcd. This wouldn't have happened had the etcd pod was deployed on the intended worker as per the policy definitions.

Support for k8s 1.22+

What would you like to be added:
Support for Kubernetes 1.22+

Why is this needed:
With v1.22, Kubernetes dropped support for beta versions of the ValidatingWebhookConfiguration and MutatingWebhookConfiguration apis from admissionregistration.k8s.io/v1beta1, as these have moved to v1 now. It's the same case for CustomResourceDefinition api from apiextensions.k8s.io/v1beta1, which has also progressed to v1. More details can be found in the k8s 1.22 API changes page.

/assign

Log the first mutation done to affinity rules by kupid

What happened:
When Kupid mutates Affinity rules for a resource which originally has no affinity rules then that mutation is not logged. This is required for better diagnostics

What you expected to happen:
All mutations (if any) done by kupid to a resource for node affinity changes should be logged.

How to reproduce it (as minimally and precisely as possible):
Create a new STS and ensure that it is a target for kupid to inject affinity rules. You will see that kupid injects the rules as defined in ClusterPodSchedulingPolicy resource but it will not log the mutation for the first change.

Support full strategic merge patch

What would you like to be added:
If there is a conflict between the scheduling criteria (potentially more than one) being injected and what is already present in the target pod spec/template, merge done by kupid is currently ad hoc.

It is desirable to support full strategic merge patch in such a cases.

Why is this needed:
Consistency with Kubernetes best-practicies.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.