Experimenting with K8s HorizontalPodAutoscaler

HPA docs · HPA v1 API ref · HPA v2 API ref · kubectl autoscale commands · Minikube docs · CDK8s+ docs

Experimenting with K8s HorizontalPodAutoscaler (HPA) by completing the recommended walkthroughs and logging notes in this README a long the way.

HorizontalPodAutoscaler walkthrough.
Autoscale on multiple metrics and custom metrics walkthrough

🧭 Table of contents

Experimenting with K8s HorizontalPodAutoscaler

🚀 Quick start

Start Minikube with 2 nodes
```
minikube start --nodes 2
```

Apply the metrics server

kubectl apply -f src/metrics-server.yaml

Apply the PHP apache application
```
kubectl apply -f src/php-apache.yaml
```
Apply the HorizontalPodAutoscaler
```
kubectl apply -f src/hpa.yaml
```

[Open new terminal] Increase the load on the PHP apache application

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

[Open new terminal] Watch the HorizontalPodAutoscaler scale up
```
kubectl get hpa -w
```
Stop the load generator (terminal used in step 5)
```
<Ctrl> + C
```
Watch the HorizontalPodAutoscaler scale down (terminal used in step 6)

➕ Useful commands

View the HPA status

kubectl describe hpa php-apache

📰 Deploy Kubernetes Dashboard

Apply the service

kubectl apply -f src/dashboard/service.yaml

Apply the admin-user

kubectl apply -f src/dashboard/admin-user.yaml

Get the token

kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}')

Start the proxy
```
kubectl proxy
```
Open the dashboard

http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

🔩 How HPA works

The HPA controller periodically queries the metrics API for the current CPU utilization of the pods in the deployment. - Default 15 seconds
The algorithm for scaling is: desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
- The control plane skips any scaling action if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, 0.1 by default)._
- All Pods with a deletion timestamp set (objects with a deletion timestamp are in the process of being shut down / removed) are ignored, and all failed Pods are discarded._

📈 Defining metrics on resources

targetAverageValue
targetAverageUtilization
averageUtilization - Utilization is the ratio between the current usage of resource to the requested resources of the pod.

🚩 HPA flags

--horizontal-pod-autoscaler-initial-readiness-delay - default is 30 seconds - determining whether to set aside certain CPU metrics for the first 30 seconds of the pod's life.
--horizontal-pod-autoscaler-initial-readiness-delay - default is 5 minutes - Once a pod has become ready, it considers any transition to ready to be the first if it occurred within a this configurable time since it started.
--horizontal-pod-autoscaler-downscale-stabilization - default is 5 minutes - The period since the last downscale, before another downscale can be performed in response to a new scale event.

✅ Requirements

API objects should follow the same constraints as subdomain names.
- contain no more than 253 characters
- contain only lowercase alphanumeric characters, '-' or '.'
- start with an alphanumeric character
- end with an alphanumeric character

📛 Changing HPA's target resource names

This can be done in the following way:

Add new name to the HPA target config.
Change the resource name
Remove the old name from the HPA target config.

🆕 Autoscaling v2

Supports custom metrics
Specify multiple metrics to scale on.
Allows setting a behavior for scaling up and down.
See status conditions via kubectl describe hpa <name> docs
- AbleToScale - Indicates whether or not the HPA is able to fetch and update scales, as well as whether or not any backoff-related conditions would prevent scaling.
- ScalingActive - Indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) and is able to calculate desired scales.
- ScalingLimited - Indicates that the desired scale was capped by the maximum or minimum of the HorizontalPodAutoscaler

🚥 Pod conditions

Useful to know since HPA scales depending on pod readiness. Docs

PodScheduled - the Pod has been scheduled to a node.
PodHasNetwork - (alpha feature; must be enabled explicitly) the Pod sandbox has been successfully created and networking configured.
ContainersReady - all containers in the Pod are ready.
Initialized - all init containers have completed successfully.
Ready - the Pod is able to serve requests and should be added to the load balancing pools of all matching Services.

🔎 Support for metrics APIs

By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs. In order for it to access these APIs, cluster administrators must ensure that:

The API aggregation layer is enabled.
The corresponding APIs are registered:
- For resource metrics, this is the metrics.k8s.io API, generally provided by metrics-server. It can be launched as a cluster add-on.
- For custom metrics, this is the custom.metrics.k8s.io API. It's provided by "adapter" API servers provided by metrics solution vendors. Check with your metrics pipeline to see if there is a Kubernetes metrics adapter available. See boilerplate to get started
- For external metrics, this is the external.metrics.k8s.io API. It may be provided by the custom metrics adapters provided above.

🔑 Aggregation layer

Configuring the aggregation layer allows the Kubernetes apiserver to be extended with additional APIs, which are not part of the core Kubernetes APIs. Docs

Note, I was not required to configure this for the metrics-server to work. Instead I disabled the TLS validation by adding a command to the container spec:

 command:
    - /metrics-server
    - --kubelet-insecure-tls
    - --kubelet-preferred-address-types=InternalIP

⚖️ Quantities

All metrics in the HorizontalPodAutoscaler and metrics APIs are specified using a special whole-number notation known in Kubernetes as a quantity. For example, the quantity 10500m would be written as 10.5 in decimal notation. The metrics APIs will return whole numbers without a suffix when possible, and will generally return quantities in milli-units otherwise. This means you might see your metric value fluctuate between 1 and 1500m, or 1 and 1.5 when written in decimal notation.

💡 Possible APIs

We will need an API to create the following:

HorizontalPodAutoscaler resource - The HPA object
Metric Enum - The metric to scale on
Scaling Policy construct - The scaling policy object (used in autoscale/v2's behavior field)
Possibly add a maintenanceMode option to Pod/Container resources (to prevent scaling on them). This would be useful for pods that are used for maintenance tasks (e.g. database migrations). See Implicit maintenance-mode deactivation docs

⬆️ Migrating to HPA

Migrating Deployments and StatefulSets to horizontal autoscaling docs - When an HPA is enabled, it is recommended that the value of spec.replicas of the Deployment and / or StatefulSet be removed from their manifest(s). If this isn't done, any time a change to that object is applied, for example via kubectl apply -f deployment.yaml, this will instruct Kubernetes to scale the current number of Pods to the value of the spec.replicas key. This may not be desired and could be troublesome when an HPA is active.

❓ Questions

Should we be focused on v2 or v1 of the HPA API?

ryparker / k8s-hoizontalpodautoscaler-example Goto Github PK

k8s-hoizontalpodautoscaler-example's Introduction