Node Refiner Operator

Description

In long-running clusters (production clusters) we often can improve the efficiency of resource usage by rescheduling workloads from under-utilized nodes.

This job can be partly performed by the cluster autoscaler (CA) if we look at each node separately - this can be done by setting a utilization threshold that the CA respects, as long as the node achieves utilization metric above that threshold the node stays, otherwise the CA will attempt to remove this node from the cluster.

The above mentioned solution might sound like it solves the problem of under-utilized nodes, but lets take another scenario as an example.

Pitfall

Lets assume that the utilization metric is set to 50%, that means if either the usage of RAM or CPU resources of the node is above 50% - the node stays.

In this figure we can see that even though the utilization target of over 50% is achieved, there is wasted resources that can be terminated without affecting the delivery of our tasks.

Proposed Solution

We introduce the notion of excess nodes. By calculating the excess resources from each node in the cluster and combine them together, we get a holistic view on how many resources can potentially be saved.

Node Refiner Process

Node Refiner (NR) determines the node with the largest potential to be terminated (the one with the least utilization metrics) and elects it as a potential node to drain.
NR cordons the node to avoid new pods being scheduled on this node while the operator is evicting the existing pods on this node.
Pods are being gracefully terminated in parallel. In case any of the pods have conditions that do not allow eviction the draining process halts and the node is uncordoned.
If all the Pods are succesfully evicted, NR will then leave the node cordoned; the cluster autoscaler should consequently pick that this node as it is under-utilized and needs to be deleted.
As long as there are excess nodes above a certain configurable threshold the process is repeated.

NR Conditions

Each cluster might need its own set of conditions to run NR efficiently. The following configurations allow to adjust the behavior of NR to be more relaxed (deliver jobs in a large amount of nodes) or aggressive (deliver jobs in the least amount of nodes) depending on the clusters needs.

Configuration	Config Map Variable	Description	Default Value
CalculationLoopFrequency	Not Configured	Time to recalculate the cluster utilization metrics	1m
DefaultMinimumTimeSinceLastAddition	time_since_last_addition	Grace period after a node is added to the cluster to ensure that no draining of nodes takes place before the cluster stabilizes its resources	60m
DefaultTimeGap	time_gap	Default time between node drains, or if a node fails to drain it's the time before another retry takes place	10m
DefaultMinimumNodes	minimum_nodes	The minimum number of nodes that should be in the cluster	2
DefaultMinimumNonTaintedNodes	minimum_non_tainted_nodes	The minimum number of non-tainted nodes that should be in the cluster	2
DefaultExcessNodes	excess_nodes_threshold	If the number of excess nodes in the cluster exceeds this number a scale down takes place.	2
DrainerEnabled	drainer_enabled	Flag for enabling the drainer to take any actions. Set to False for "dry run" mode	True

Summary

Node Refiner (NR) aims to collect information about the cluster by aggregating all the nodes and pods metrics to build an overview of the cluster utilization. By analyzing this information, we can make an informed decision on whether we should remove some of the existing nodes or not.

Also, it allows us to pick the node with lowest utilization metrics, making sure that we remove the node that will cause the smallest disturbance in the availability of our cluster. NR calculates the metrics. When a removal process is about to occur, NR drains the node gracefully and outsources the deletion process of the node to the Cluster Autoscaler; we took that decision for two reasons.

To not rewrite redundant code since the cluster autoscaler already has that functionality fulfilled.
CA notifies a lot of subscribers to the event of a removal of node (ex. Gardener), thus making sure that we do not fall in a limbo of removing/adding node; therefore, we found that it's the most stable when the CA handles that part of the process.

Process Summary

Gather information about the existing pods in the cluster and analyze their requests/usage
Analyze the cluster capacity and whether it can satisfy the pods requirements with less nodes
Analyze the individual nodes and check whether any of them can be evicted.
Drain under-utilized node gracefully

Documentation

Contributing

Please check the Development Conventions and Guidelines document.

Code of Conduct

Everyone participating in this joint project is welcome as long as our Code of Conduct is being adhered to.

Support

Feel free to open new issues for feature requests, bugs or general feedback on the GitHub issues page of this project.

Licensing

Copyright (2022) SAP SE and node-refiner contributors. Please see our LICENSE for copyright and license information. Detailed information including third-party components and their licensing/copyright information is available via the REUSE tool.

sap / node-refiner Goto Github PK

node-refiner's Introduction

Node Refiner Operator

Description

Pitfall

Proposed Solution

Node Refiner Process

NR Conditions

Summary

Process Summary

Documentation

Contributing

Code of Conduct

Support

Licensing

node-refiner's People

Contributors

Stargazers

Watchers

Forkers

node-refiner's Issues

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Recommend Projects

Recommend Topics

Recommend Org