Code Monkey home page Code Monkey logo

cluster-resource-override-admission-operator's Introduction

Overview

This operator manages OpenShift ClusterResourceOverride Admission Webhook Server.

Scheduling is based on resources requested, while quota and hard limits refer to resource limits, which can be set higher than requested resources. The difference between request and limit determines the level of overcommit; for instance, if a container is given a memory request of 1Gi and a memory limit of 2Gi, it is scheduled based on the 1Gi request being available on the node, but could use up to 2Gi; so it is 200% overcommitted.

If OpenShift Container Platform administrators would like to control the level of overcommit and manage container density on nodes, ClusterResourceOverride Admission Webhook can be configured to override the ratio between request and limit set on developer containers. In conjunction with a per-project LimitRange specifying limits and defaults, this adjusts the container limit and request to achieve the desired level of overcommit.

ClusterResourceOverride Admission Webhook Server is located at cluster-resource-override-admission.

Prerequisites

Getting Started

A quick way to test your changes is to build the operator binary and run it directly from the command line.

# change to the root folder of the repo

# build the operator binary
make build

# the operator owns a CRD, so register the CRD
kubectl apply -f artifacts/olm/manifests/clusterresourceoverride/1.0.0/clusterresourceoverride.crd.yaml 

# make sure you have a cluster up and running
# create a namespace where the operator binary will manage its resource(s)
kubectl create ns cro

# before you run the operator binary, make sure you have the following
# OPERAND_IMAGE: this points to the image of ClusterResourceOverride admission webhook server.
# OPERAND_VERSION: the version of the operand.

# run the operator binary
OPERAND_IMAGE=quay.io/{openshift}/clusterresourceoverride@sha256:{image digest} OPERAND_VERSION=1.0.0 \
bin/cluster-resource-override-admission-operator start --namespace=cro --kubeconfig=${KUBECONFIG} --v=4

Now, if you want to install the ClusterResourceOverride admission webhook server then simply create a custom resource of ClusterResourceOverride type.

apiVersion: operator.autoscaling.openshift.io/v1
kind: ClusterResourceOverride
metadata:
  name: cluster
spec:
  podResourceOverride:
    spec:
      memoryRequestToLimitPercent: 50
      cpuRequestToLimitPercent: 25
      limitCPUToMemoryPercent: 200

This repo ships with an example CR, you can directly apply the YAML resource as well.

kubectl apply -f artifacts/example/clusterresourceoverride-cr.yaml
# or 
# make create-cro-cr

The operator watches for the custom resource(s) of ClusterResourceOverride type and will ensure that the ClusterResourceOverride admission webhook server is installed into the same namespace as the operator. You can check the current state of the admission webhook by checking the status of the cluster custom resource

kubectl get clusterresourceoverride cluster -o yaml
apiVersion: operator.autoscaling.openshift.io/v1
kind: ClusterResourceOverride
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"operator.autoscaling.openshift.io/v1","kind":"ClusterResourceOverride","metadata":{"annotations":{},"name":"cluster"},"spec":{"podResourceOverride":{"spec":{"cpuRequestToLimitPercent":25,"limitCPUToMemoryPercent":200,"memoryRequestToLimitPercent":50}}}}
  creationTimestamp: "2024-07-24T17:49:53Z"
  generation: 1
  name: cluster
  resourceVersion: "61017"
  uid: f75c6946-a556-429f-a1f6-99fe31368cb8
spec:
  podResourceOverride:
    spec:
      cpuRequestToLimitPercent: 25
      limitCPUToMemoryPercent: 200
      memoryRequestToLimitPercent: 50
status:
  certsRotateAt: null
  conditions:
  - lastTransitionTime: "2024-07-24T17:50:03Z"
    status: "True"
    type: Available
  - lastTransitionTime: "2024-07-24T17:50:02Z"
    status: "False"
    type: InstallReadinessFailure
  hash:
    configuration: 577fe3d2b05619ac326571a3504857e3e7e70a275c941e3397aa9db5c1a1d3a4
  image: quay.io/macao/clusterresourceoverride:dev
  resources:
    apiServiceRef:
      apiVersion: apiregistration.k8s.io/v1
      kind: APIService
      name: v1.admission.autoscaling.openshift.io
      resourceVersion: "60981"
      uid: 79385b4e-43bb-4b14-b145-d50399db4ad8
    configurationRef:
      apiVersion: v1
      kind: ConfigMap
      name: clusterresourceoverride-configuration
      namespace: clusterresourceoverride-operator
      resourceVersion: "60858"
      uid: a50a2095-bab9-4834-8be3-633028c35f9e
    deploymentRef:
      apiVersion: apps/v1
      kind: Deployment
      name: clusterresourceoverride
      namespace: clusterresourceoverride-operator
      resourceVersion: "61013"
      uid: 4f21e033-e806-48e0-b053-0bbb9d6f688d
    mutatingWebhookConfigurationRef:
      apiVersion: admissionregistration.k8s.io/v1
      kind: MutatingWebhookConfiguration
      name: clusterresourceoverrides.admission.autoscaling.openshift.io
      resourceVersion: "61016"
      uid: 4fd5e040-9445-48de-9ea6-0d7028e5ab5c
    serviceRef:
      apiVersion: v1
      kind: Service
      name: clusterresourceoverride
      namespace: clusterresourceoverride-operator
      resourceVersion: "60876"
      uid: 1e343ec2-8c71-4cbc-a236-8c9c3a791db2
  version: 1.0.0

Test Pod Resource Override

The ClusterResourceOverride admission webhook enforces an opt-in approach, object(s) belonging to a namespace that has the following label are admitted, all other objects are ignored.

apiVersion: v1
kind: Namespace
metadata:
  name: test
  labels:
    clusterresourceoverrides.admission.autoscaling.openshift.io/enabled: "true"
  • Create a namespace with the appropriate label.
  • Create a Pod in the above namespace. The requests and limits of the Pod's resources are overridden according to the configuration of the webhook server.
kubectl apply -f artifacts/example/test-namespace.yaml
kubectl apply -f artifacts/example/test-pod.yaml
# or
# make create-test-pod

The original request of the Pod has the following resources section.

spec:
  containers:
    - name: hello-openshift
      image: openshift/hello-openshift
      resources:
        limits:
          memory: "512Mi"
          cpu: "2000m"

The Admission webhook intercepts the original Pod request and overrides the resources according to the configuration.

spec:
  containers:
  - image: openshift/hello-openshift
    name: hello-openshift
    resources:
      limits:
        cpu: "1"
        memory: 512Mi
      requests:
        cpu: 250m
        memory: 256Mi

Deploy

You can also deploy the operator on an OpenShift cluster:

  • Build the operator binary
  • Build and push operator image
  • Apply kubernetes manifests
# change to the root folder of the repo

# build operator binary
make build

# build and push operator image
make local-image LOCAL_OPERATOR_IMAGE="{operator image}"
make local-push LOCAL_OPERATOR_IMAGE="{operator image}"

# deploy on local cluster
# LOCAL_OPERAND_IMAGE: operand image
# LOCAL_OPERATOR_IMAGE: operator image
make deploy-local LOCAL_OPERAND_IMAGE="{operand image}" LOCAL_OPERATOR_IMAGE="{operator image}"

Delete the operator and its resources.

make undeploy-local

Deploy via OLM

There are three steps:

  • Package the OLM manifests into an operator registry bundle image and push it to an image registry
  • Make the above operator catalog source available to your OpenShift cluster.
  • Deploy the operator via OLM.

Before you package the OLM manifests, make sure the CSV file artifacts/olm/manifests/clusterresourceoverride/1.0.0/clusterresourceoverride.v1.csv.yamlpoints to the right operator and operand image.

# build and push the image
make operator-registry OLM_IMAGE_REGISTRY=docker.io/{your org}/clusterresourceoverride-registry IMAGE_TAG=dev

# make your catalog source available to the cluster
kubectl apply -f artifacts/olm/catalog-source.yaml

# wait for the CatalogSource object to be in 'READY' state.
# one way to make sure is to check the 'status' block of the CatalogSource object
kubectl -n clusterresourceoverride-operator get catalogsource clusterresourceoverride-catalog -o yaml

# or, you can query to check if your operator has been registered
kubectl -n clusterresourceoverride-operator get packagemanifests | grep clusterresourceoverride 

# at this point, you can install the operator from OperatorHub UI.
# if you want to do it from the command line, then execute the following:

# create an `OperatorGroup` object to associated with the operator namespace
# and the create a Subscription object.
kubectl apply -f artifacts/olm/operator-group.yaml
kubectl apply -f artifacts/olm/subscription.yaml

# install the ClusterResourceOverride admission webhook server by creating a custom resource
kubectl apply -f artifacts/example/clusterresourceoverride-cr.yaml 

E2E Tests

To run local E2E tests, you need to have a running OpenShift cluster and the following environment variables set:

  • KUBECONFIG: path to the kubeconfig file.
  • LOCAL_OPERATOR_IMAGE: operator image.
  • LOCAL_OPERAND_IMAGE: operand image.
make e2e-local KUBECONFIG={path to kubeconfig} LOCAL_OPERATOR_IMAGE={operator image} LOCAL_OPERAND_IMAGE={operand image}

cluster-resource-override-admission-operator's People

Contributors

awgreene avatar deads2k avatar dobbymoodge avatar ehashman avatar haubenr avatar iam-veeramalla avatar jkyros avatar joelsmith avatar jrvaldes avatar locriandev avatar maxcao13 avatar openshift-bot avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar qiwang19 avatar rphillips avatar sjenning avatar sosiouxme avatar tkashem avatar ximinhan avatar yselkowitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-resource-override-admission-operator's Issues

e2e tests run against OCP 4.12+ throw warnings about missing pod security settings

When running the e2e tests against OCP 4.12+ clusters the following warnings are being generated:

kubectl -n clusterresourceoverride-operator rollout status -w deployment/clusterresourceoverride-operator
deployment "clusterresourceoverride-operator" successfully rolled out
export GO111MODULE=on
GO111MODULE=on GOFLAGS=-mod=vendor go test -v -count=1 -timeout=15m ./test/e2e/... --kubeconfig=/ocp4-workdir/auth/kubeconfig --namespace=clusterresourceoverride-operator
=== RUN   TestDynamicClient
W0220 11:08:19.409509    6778 warnings.go:70] would violate PodSecurity "restricted:latest": seccompProfile (pod or container "test-dynamic-client" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W0220 11:08:19.414411    6778 warnings.go:70] would violate PodSecurity "restricted:latest": seccompProfile (pod or container "test-dynamic-client" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W0220 11:08:19.421444    6778 warnings.go:70] would violate PodSecurity "restricted:latest": seccompProfile (pod or container "test-dynamic-client" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
--- PASS: TestDynamicClient (0.05s)

This happens due to the pod definition in test/e2e/dynamic_test.go missing the required security settings.

Daemonset is not upgraded after operator upgrade

Operator (4.8.0-202205121606) was deployed via OperatorHub with manual approval on OCPv.4.8.35
Simple ClusterResourceOverride with spec.podResourceOverride.spec.cpuRequestToLimitPercent = 25 was deployed.
Cluster was upgraded to v.4.9.43
Operator was upgraded to v.4.9.0-202208231335
Operator itself was upgraded however daemonset for CRO was not.
Operator logged the following error messages:
E0907 18:39:38.798875 1 worker.go:67] error syncing '/cluster': waiting for daemonset pods to be available name=clusterresourceoverride, requeuing

The issue was fixed by manually deleting daemonset and letting operator to recreate it.

I think daemonset recreation/update should be handled automatically by the operator

Until operator was upgraded and daemonset was recreated - all deployments (including scaling operations) were failing with the following error:
Error creating: Internal error occurred: failed calling webhook "clusterresourceoverrides.admission.autoscaling.openshift.io": failed to call webhook: an error on the server ("Internal Server Error: "/apis/[admission.autoscaling.openshift.io/v1/clusterresourceoverrides?timeout=5s](http://admission.autoscaling.openshift.io/v1/clusterresourceoverrides?timeout=5s%5C)": the server could not find the requested resource") has prevented the request from succeeding

The codegen target failed

[alukiano@alukiano-laptop cluster-resource-override-admission-operator] (master) make codegen
docker build -t cro:codegen -f Dockerfile.codegen .
Sending build context to Docker daemon  91.08MB
Step 1/8 : FROM golang:1.15
 ---> 4873f85e381b
Step 2/8 : WORKDIR /go/src/github.com/openshift/cluster-resource-override-admission-operator
 ---> Using cache
 ---> cc6885c76fc9
Step 3/8 : COPY Makefile Makefile
 ---> Using cache
 ---> d8110341fd34
Step 4/8 : COPY pkg pkg
 ---> Using cache
 ---> 5ce67e78d6ba
Step 5/8 : COPY vendor vendor
 ---> c3467131c71e
Step 6/8 : COPY boilerplate.go.txt boilerplate.go.txt
 ---> 0af46defedbf
Step 7/8 : RUN chmod a+x vendor/k8s.io/code-generator/generate-internal-groups.sh
 ---> Running in 2f07e49daecf
chmod: cannot access 'vendor/k8s.io/code-generator/generate-internal-groups.sh': No such file or directory
The command '/bin/sh -c chmod a+x vendor/k8s.io/code-generator/generate-internal-groups.sh' returned a non-zero code: 1
make: *** [Makefile:87: codegen] Error 1

when limits or requests are both set to 1m or 1Mi they get rewritten as 0, and the pod fails to start

This is my cr, note limitCPUToMemoryPercent isn't set ...

apiVersion: operator.autoscaling.openshift.io/v1
kind: ClusterResourceOverride
metadata:
    name: cluster
spec:
  podResourceOverride:
    spec:
      cpuRequestToLimitPercent: 5
      memoryRequestToLimitPercent: 5
      #limitCPUToMemoryPercent: 200

Example of get pod after deploying with the resource override operator enabled for this namespace ...

    Args:
      cp -rpf /tmp/Sample_data /tmp/sampledata;
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:        0
      memory:     0
    Environment:  <none>
    Mounts:
      /tmp/sampledata from sampledata-pv-volume (rw)

Events with namespace labeled ...

Warning  FailedCreate  20m    job-controller  Error creating: Pod "sample-data-job-bxxl9" is invalid: [spec.containers[0].resources.requests: Invalid value: "1m": must be less than or equal to cpu limit, spec.containers[0].resources.requests: Invalid value: "1Mi": must be less than or equal to memory limit]

... no issues deploying 1m/1Mi limits without the override operator in the middle.

ART builds failing due to OLM image references mismatch

ART builds for operator are failing due to:

2020-01-20 12:02:37,547 ERROR [containers/clusterresourceoverride-operator] Unable to find openshift/ose-clusterresourceoverride-rhel7-operator in image-references data for clusterresourceoverride-operator

Your operator image references (e.g.

- name: clusterresourceoverride-rhel7-operator
from:
kind: DockerImage
name: quay.io/openshift/clusterresourceoverride-rhel7-operator:4.4
- name: clusterresourceoverride-rhel7
)
Refer to two images:

  • clusterresourceoverride-rhel7-operator
  • clusterresourceoverride-rhel7

This is causing ART image builds to fail since the names need to match what is in ocp-build-data (minus the openshift/ose- prefix):

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.