elementtech / kube-reqsizer Goto Github PK

A Kubernetes controller for automatically optimizing pod requests based on their continuous usage. VPA alternative that can work with HPA.

Dockerfile 2.96% Makefile 17.29% Go 75.31% Smarty 4.43%

kubernetes vpa autoscaler autoscaling controller k8s optimization pod

kube-reqsizer's Introduction

A Kubernetes controller to optimize pod requests. A VPA Alternative.

kube-reqsizer is a kubernetes controller that will measure the usage of pods over time and optimize (reduce/increase) their requests based on the average usage.

When all required conditions meet, the controller calculates the result requirements based on all the samples taken so far a pod and its peers in the same deployment controller. It then goes "upstream" to the parent controller of that pod, for example Deployment, and updates the relevant containers for the pods inside the deployment as a reconciliation, as if its desired state is the new state with the new requirements.

Note: This is an alternative to Vertical-Pod-Autoscaler. The intended use of this project is to provide a simpler, more straightforward install and mechanism, without CRDs, and that can work with Horizontal-Pod-Autoscaler.

Deploy - Helm

helm repo add kube-reqsizer https://elementtech.github.io/kube-reqsizer/
helm repo update
helm install kube-reqsizer/kube-reqsizer

Core Values:

enabledAnnotation: true
sampleSize: 1
minSeconds: 1
enableIncrease: true
enableReduce: true
maxMemory: 0
minMemory: 0
maxCPU: 0
minCPU: 0
minCPUIncreasePercentage: 0
minCPUDecreasePercentage: 0
minMemoryIncreasePercentage: 0
minMemoryDecreasePercentage: 0
cpuFactor: 1
memoryFactor: 1
logLevel: info
concurrentWorkers: 10
persistence:
  enabled: true

Prerequisites

The metrics server must be deployed in your cluster. Read more about Metrics Server. This controller uses the metrics.k8s.io extension API group (apis/metrics.k8s.io/v1beta1)

Usage

kube-reqsizer has primary custom flags:

# Enable a annotation filter for pod scraping. 
# True will only set requests of controllers of which PODS or NAMESPACE 
# have the annotation set to "true".
# If "false", will ignore annotations and work 
# on all pods in the cluster unless
# they have "false".

# reqsizer.jatalocks.github.io/optimize=true
# reqsizer.jatalocks.github.io/optimize=false
--annotation-filter bool (default true)

# The sample size to create an average from when reconciling.
--sample-size int (default 1)

# Minimum seconds between pod restart.
# This ensures the controller will not restart a pod if the minimum time
# has not passed since it has started.
--min-seconds float (default 1)

# Allow controller to reduce/increase requests
--enable-increase (default true)
--enable-reduce (default true)

# Min and Max CPU (m) and Memory (Mi) the controller can set a pod request to. 0 is infinite
--max-cpu int (default 0)
--max-memory int (default 0)
--min-cpu int (default 0)
--min-memory int (default 0)

# Min CPU and Memory (%) the controller will count as a condition to resize requests.
# For Example:
# If reqsizer wants to increase from 90m to 100m, that's a 10% increase. 
# It will ignore it if min-cpu-increase-percentage is more than 10.
# If reqsizer wants to decrease from 100m to 10m, that's a 90% decrease. 
# It will ignore it if min-cpu-decrease-percentage is more than 90.
--min-cpu-increase-percentage int (default 0)
--min-memory-increase-percentage int (default 0)
--min-cpu-decrease-percentage int (default 0)
--min-memory-decrease-percentage int (default 0)

# Multiply requests when reconciling
--cpu-factor float (default 1)
--memory-factor float (default 1)

# How many pods to sample in parallel. This may affect the controllers stability.
--concurrent-workers (default 10)

# Persistence using Redis
--enable-persistence (default false)
--redis-host (default "localhost")
--redis-port (default "6379")
--redis-password (default "")
--redis-db (default 0)

Annotations

If annotation-filter is true:

reqsizer.jatalocks.github.io/optimize=true  # Optimize Pod/Namespace
reqsizer.jatalocks.github.io/optimize=false # Ignore Pod/Namespace

There are Pod/Namespace annotations available Regardless of annotation-filter:

reqsizer.jatalocks.github.io/optimize=false # Ignore Pod/Namespace when optimizing entire cluster
reqsizer.jatalocks.github.io/mode=average   # Default Mode. Optimizes based on average. If ommited, mode is average
reqsizer.jatalocks.github.io/mode=max       # Sets the request to the MAXIMUM of all sample points
reqsizer.jatalocks.github.io/mode=min       # Sets the request to the MINIMUM of all sample points

Disclaimer

sample-size is the amount of data-points the controller will store in cache before constructing an average for the pod. After a requests resizing, the cache will clean itself and a new average will be calculated based on the sample size. If min-seconds have not yet passed since the pod has been scheduled, the controller will keep sampling the pod until min-seconds have been reached and only then zero the sample and restart from cache.

Monitoring - Prometheus

Metric	Type	Description
kube_reqsizer_cpu_offset	Gauge	Number of milli-cores that have been increased/removed since startup. Can be a positive/negative value.
kube_reqsizer_memory_offset	Gauge	Number of megabits that have been increased/removed since startup. Can be a positive/negative value.
kube_reqsizer_cache_size	Gauge	Number of pod controllers currently in cache.

Edge Cases

All samples in a certain cycle report 0 (less than 1):
1. mode=average: The controller will ignore the pod and not reconcile.
2. mode=min: The controller will put 1m or 1Mi as a value.
3. mode=max: The controller will ignore the pod and not reconcile.
One or more of the samples in a certain cycle reports 0 (less than 1):
1. mode=average: Will take the 0 into consideration.
2. mode=min: Will consider the 0 as 1.
3. mode=max: Will ignore the sample.
annotation-filter is true (optimize=false is as strong as deny):
1. A namespace has optimize=false but a pod has optimize=true:
  1. The controller will ignore the pod and not reconcile.
2. A namespace has optimize=true but a pod has optimize=false:
  1. The controller will ignore the pod and not reconcile.

Limitations

Does not work with CRD controllers (such as Argo Rollouts)

Development

Getting Started

You’ll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster. Note: Your controller will automatically use the current context in your kubeconfig file (i.e. whatever cluster kubectl cluster-info shows).

Running on the cluster

Run the controller:

go run main.go

Build and push your image to the location specified by IMG:

make docker-build docker-push IMG=<some-registry>/kube-reqsizer:tag

Deploy the controller to the cluster with the image specified by IMG:

make deploy IMG=<some-registry>/kube-reqsizer:tag

Undeploy controller

UnDeploy the controller to the cluster:

make undeploy

Support

kube-reqsizer's People

Contributors

Stargazers

Watchers

Forkers

tanryberdi warmchang hurricane1988 kidmam engin phamngocsonls amitai-devops testwill darkxeno dragon-flyings

kube-reqsizer's Issues

Optimization is not working - Azure AKS - v1.25.6

Hi Team,

First of all, it looks like a new tool and it can play an important role as well.

I just quickly tested it in Azure AKS v1.25.6. Below are my findings/comments:

First, a small correction in the helm install command - We should use the name as well while installing.

helm install kube-reqsizer/kube-reqsizer --> helm install kube-reqsizer kube-reqsizer/kube-reqsizer

I've deployed a basic application in the default namespace with high CPU/memory requests to test, whether kube-reqsizer will optimize or not. Waited for 22 mins, but still, it was the same.
Logs FYR

I0530 15:58:39.252063 1 request.go:601] Waited for 1.996392782s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/argocd
I0530 15:58:49.252749 1 request.go:601] Waited for 1.995931495s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/argocd
I0530 15:58:59.450551 1 request.go:601] Waited for 1.994652278s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/argocd
I0530 15:59:09.450621 1 request.go:601] Waited for 1.994074539s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/kube-system
I0530 15:59:19.450824 1 request.go:601] Waited for 1.99598317s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/kubescape
I0530 15:59:29.650328 1 request.go:601] Waited for 1.993913908s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/tigera-operator
I0530 15:59:39.650831 1 request.go:601] Waited for 1.996110718s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/kubescape
I0530 15:59:49.850897 1 request.go:601] Waited for 1.995571438s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/kube-system
I0530 16:00:00.049996 1 request.go:601] Waited for 1.994819712s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/calico-system
I0530 16:00:10.050864 1 request.go:601] Waited for 1.991681441s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/default

How much time it will take to optimize? Will it restart the pod automatically?
I haven't customized any values, just used the below commands to install.

helm repo add kube-reqsizer https://jatalocks.github.io/kube-reqsizer/
helm repo update
helm install kube-reqsizer kube-reqsizer/kube-reqsizer

"Learning period" for reqsized pods

Add "Learning period" for reqsized pods - meaning add a configuration parameter that enables in watch(no resizing) mode for that period.

That period should be calculated from the the monitoring of that object, e.g. when pod is created after reqsizer already running in cluster - reqsizer will start counting the "learning period" for that pod

Controller Pod Theoretically becomes OOM after a long time

This might be because of the caching mechanism and large TTL. It would be best to manage the Cache to the point TTL is not needed and the controller makes sure the memory doesn't grow too large.
Also, the average mechanism saves a very large number. There might be a way to mitigate this.
This is not relevant for persistence mode.

Reconcile Errors

Hi I note in the logs some reconcile errors for some pods ,please advice
These pods namespaces already annotated with reqsizer.jatalocks.github.io/optimize: "true"

Document dependencies

What APIs does this controller need in order to work?

For example, the Kubernetes vertical pod autoscaler lists several, including the metrics.k8s.io extension API group: /apis/metrics.k8s.io/v1beta1/....

resizer support istio ?

In my cluster I am using Istio also I have the Prometheus up and running

In the logs of the reqsizer I see the following

Also In Prometheus shows the metrics counter 0,

I missed something in the configurations ? Please advice

Persistent Cache for Metrics history

Use a local persistent cache, like Redis, as an extra dependency to the Helm Chart. With the ability to use it if necessary in a flag in the controller.

Not updating deployment resources requests

Even after having the following log lines:

1.7075074599651477e+09	INFO	controllers.Pod	Sample Size and Minimum Time have been reached	{"pod": "staging/hermes-decoder-8cc5f5c4f-5z284"}
1.707507459965185e+09	INFO	controllers.Pod	hermes-decoder Comparing CPU: 25m <> 50m	{"pod": "staging/hermes-decoder-8cc5f5c4f-5z284"}
1.7075074599651916e+09	INFO	controllers.Pod	hermes-decoder Comparing Memory: 50Mi <> 150Mi	{"pod": "staging/hermes-decoder-8cc5f5c4f-5z284"}
1.7075074609485767e+09	INFO	controllers.Pod	Pod Requests Will Change	{"pod": "staging/hermes-decoder-8cc5f5c4f-5z284"}

1.7075074365512218e+09	INFO	controllers.Pod	Sample Size and Minimum Time have been reached	{"pod": "dev/vulcan-domains-stark-856df9cb4c-bcl4t"}
1.7075074365513973e+09	INFO	controllers.Pod	vulcan-domains-stark Comparing CPU: 25m <> 70m	{"pod": "dev/vulcan-domains-stark-856df9cb4c-bcl4t"}
1.7075074365514233e+09	INFO	controllers.Pod	vulcan-domains-stark Comparing Memory: 50Mi <> 150Mi	{"pod": "dev/vulcan-domains-stark-856df9cb4c-bcl4t"}
1.7075074375490775e+09	INFO	controllers.Pod	Pod Requests Will Change	{"pod": "dev/vulcan-domains-stark-856df9cb4c-bcl4t"}

The related deployments are unchanged:

# hermes-decoder yaml redacted
        resources:
          limits:
            cpu: "1"
            memory: 2000Mi
          requests:
            cpu: 50m
            memory: 150Mi

# vulcan-domains-stark yaml redacted
        resources:
          limits:
            cpu: "2"
            memory: 600Mi
          requests:
            cpu: 70m
            memory: 150Mi

Any ideas why deployment resources are not updated?

Add a factor parameter to multiply requests

The factor will be 1 by default and will multiply itself on memory/cpu based on the user's configuration

BUG: min-seconds is flaky and not working properly

Currently, min-seconds is calculated as the seconds difference between the Timestamp of sample number 1 to the Timestamp of the last sample.

This is bad because if a Deployment controller has multiple pods, and one of the pods is alive for more time than another, then min-seconds will be reached and other pods will be restarted that shouldn't have been, because they have been alive for too little time.

Solution:

Everytime the controller wants to reconcile a pod, the min-seconds function should check what's the minimum time that a pod has been alive in its parent controller, including other pods. That is instead of calculating the timestamp difference.

Re-Calculate HPAs after reconciliation to keep min-max pod ratio of HPA

Reconciliation loop too slow

Currently, as intended, the Kubernetes reconciliation loop works as a Watch mechanism on pods. It would be better to add a Timer to the loop, every few seconds, in order to insert faster and more accurate Sample points.

Export prometheus metrics about applied request changes

Add support for Custom CRDs like Argo Rollouts

Using https://caiorcferreira.github.io/post/the-kubernetes-dynamic-client/

Patch ArgoCD applications to exclude requests

An optional reconciliation loop

Ability to fetch Metrics from External Provider

Cloudwatch/Datadog/Prometheus etc. These can Auto-load the controller with pre-made samples.

Annotations for YAML reference in a git repository to open automatic requests PR

Something like

auto.request.operator/repository = https://github.com/myorg/myrepo
auto.request.operator/yaml = "charts/mychart/templates/deployment.yaml" # /statefulset.yaml /daemonset.yaml

With an attached secret credentials for the default service account to be able to make this PR in the first place.

Latency Metric

After I test your tool, I can now decide after N samples for a long time what the average consuming for my components.
Depend on that I can decide what the machine aws type I need, but some times after I scale down the machine some times we face latency issues.

Can we add a new metric to collect samples for requests latency ? so we can compare the average before and after scale down ?

@jatalocks

1.7075000655863886e+09	ERROR	controllers.Pod	failed to get stats from pod	{"pod": "dev/hotstorage-semantic-data-cronjob-28457520-smrv9"}
github.com/jatalocks/kube-reqsizer/controllers.(*PodReconciler).Reconcile
	/workspace/controllers/pod_controller.go:149
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
1.7075000655867434e+09	ERROR	Reconciler error	{"controller": "pod", "controllerGroup": "", "controllerKind": "Pod", "Pod": {"name":"hotstorage-semantic-data-cronjob-28457520-smrv9","namespace":"dev"}, "namespace": "dev", "name": "hotstorage-semantic-data-cronjob-28457520-smrv9", "reconcileID": "8222a6cf-2169-4a40-b9b3-e99ea85421da", "error": "the server could not find the requested resource"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234