HPA docs · HPA v1 API ref · HPA v2 API ref · kubectl autoscale commands · Minikube docs · CDK8s+ docs
Experimenting with K8s HorizontalPodAutoscaler (HPA) by completing the recommended walkthroughs and logging notes in this README a long the way.
- HorizontalPodAutoscaler walkthrough.
- Autoscale on multiple metrics and custom metrics walkthrough
- Experimenting with K8s HorizontalPodAutoscaler
- 🧭 Table of contents
- 🚀 Quick start
- ➕ Useful commands
- 📰 Deploy Kubernetes Dashboard
- 🔩 How HPA works
- 📈 Defining metrics on resources
- 🚩 HPA flags
- ✅ Requirements
- 📛 Changing HPA's target resource names
- 🆕 Autoscaling v2
- 🚥 Pod conditions
- 🔎 Support for metrics APIs
- 🔑 Aggregation layer
- ⚖️ Quantities
- 💡 Possible APIs
- ⬆️ Migrating to HPA
- ❓ Questions
-
Start Minikube with 2 nodes
minikube start --nodes 2
-
Apply the metrics server
kubectl apply -f src/metrics-server.yaml
-
Apply the PHP apache application
kubectl apply -f src/php-apache.yaml
-
Apply the HorizontalPodAutoscaler
kubectl apply -f src/hpa.yaml
-
[Open new terminal] Increase the load on the PHP apache application
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
-
[Open new terminal] Watch the HorizontalPodAutoscaler scale up
kubectl get hpa -w
-
Stop the load generator (terminal used in step 5)
<Ctrl> + C
-
Watch the HorizontalPodAutoscaler scale down (terminal used in step 6)
View the HPA status
kubectl describe hpa php-apache
-
Apply the service
kubectl apply -f src/dashboard/service.yaml
-
Apply the admin-user
kubectl apply -f src/dashboard/admin-user.yaml
-
Get the token
kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}')
-
Start the proxy
kubectl proxy
-
Open the dashboard
-
The HPA controller periodically queries the metrics API for the current CPU utilization of the pods in the deployment. - Default 15 seconds
-
The algorithm for scaling is:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
-
The control plane skips any scaling action if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, 0.1 by default)._
-
All Pods with a deletion timestamp set (objects with a deletion timestamp are in the process of being shut down / removed) are ignored, and all failed Pods are discarded._
-
targetAverageValue
targetAverageUtilization
averageUtilization
- Utilization is the ratio between the current usage of resource to the requested resources of the pod.
-
--horizontal-pod-autoscaler-initial-readiness-delay
- default is 30 seconds - determining whether to set aside certain CPU metrics for the first 30 seconds of the pod's life. -
--horizontal-pod-autoscaler-initial-readiness-delay
- default is 5 minutes - Once a pod has become ready, it considers any transition to ready to be the first if it occurred within a this configurable time since it started. -
--horizontal-pod-autoscaler-downscale-stabilization
- default is 5 minutes - The period since the last downscale, before another downscale can be performed in response to a new scale event.
- API objects should follow the same constraints as subdomain names.
- contain no more than 253 characters
- contain only lowercase alphanumeric characters, '-' or '.'
- start with an alphanumeric character
- end with an alphanumeric character
This can be done in the following way:
- Add new name to the HPA target config.
- Change the resource name
- Remove the old name from the HPA target config.
- Supports custom metrics
- Specify multiple metrics to scale on.
- Allows setting a
behavior
for scaling up and down. - See status conditions via
kubectl describe hpa <name>
docsAbleToScale
- Indicates whether or not the HPA is able to fetch and update scales, as well as whether or not any backoff-related conditions would prevent scaling.ScalingActive
- Indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) and is able to calculate desired scales.ScalingLimited
- Indicates that the desired scale was capped by the maximum or minimum of the HorizontalPodAutoscaler
Useful to know since HPA scales depending on pod readiness. Docs
PodScheduled
- the Pod has been scheduled to a node.PodHasNetwork
- (alpha feature; must be enabled explicitly) the Pod sandbox has been successfully created and networking configured.ContainersReady
- all containers in the Pod are ready.Initialized
- all init containers have completed successfully.Ready
- the Pod is able to serve requests and should be added to the load balancing pools of all matching Services.
By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs. In order for it to access these APIs, cluster administrators must ensure that:
-
The API aggregation layer is enabled.
-
The corresponding APIs are registered:
- For resource metrics, this is the
metrics.k8s.io
API, generally provided by metrics-server. It can be launched as a cluster add-on. - For custom metrics, this is the
custom.metrics.k8s.io
API. It's provided by "adapter" API servers provided by metrics solution vendors. Check with your metrics pipeline to see if there is a Kubernetes metrics adapter available. See boilerplate to get started - For external metrics, this is the
external.metrics.k8s.io
API. It may be provided by the custom metrics adapters provided above.
- For resource metrics, this is the
Configuring the aggregation layer allows the Kubernetes apiserver to be extended with additional APIs, which are not part of the core Kubernetes APIs. Docs
Note, I was not required to configure this for the metrics-server to work. Instead I disabled the TLS validation by adding a command
to the container spec:
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
All metrics in the HorizontalPodAutoscaler and metrics APIs are specified using a special whole-number notation known in Kubernetes as a quantity. For example, the quantity 10500m would be written as 10.5 in decimal notation. The metrics APIs will return whole numbers without a suffix when possible, and will generally return quantities in milli-units otherwise. This means you might see your metric value fluctuate between 1 and 1500m, or 1 and 1.5 when written in decimal notation.
We will need an API to create the following:
HorizontalPodAutoscaler
resource - The HPA objectMetric
Enum - The metric to scale onScaling Policy
construct - The scaling policy object (used in autoscale/v2'sbehavior
field)- Possibly add a
maintenanceMode
option toPod
/Container
resources (to prevent scaling on them). This would be useful for pods that are used for maintenance tasks (e.g. database migrations). See Implicit maintenance-mode deactivation docs
Migrating Deployments and StatefulSets to horizontal autoscaling docs - When an HPA is enabled, it is recommended that the value of spec.replicas of the Deployment and / or StatefulSet be removed from their manifest(s). If this isn't done, any time a change to that object is applied, for example via kubectl apply -f deployment.yaml, this will instruct Kubernetes to scale the current number of Pods to the value of the spec.replicas key. This may not be desired and could be troublesome when an HPA is active.
- Should we be focused on v2 or v1 of the HPA API?