Kubernetes clusters are typically using just 13% of CPU: CIOs could save a small fortune

Overprovisioning CPU and memory will keep the lights on, but it is costly. Underprovisioning them risks CPU throttling and out-of-memory kills, which cause applications to perform poorly or even crash."

Kubernetes clusters are typically using just 13% of the CPUs provisioned to power them and just 20% of memory on average, according to analysis of 4,000 clusters by CAST AI – suggesting rampant overprovisioning. 

Optimisation could save CIOs a small fortune, the company suggested in a report today – after analysing customers running on AWS, Azure, and GCP managed Kubernetes services between January and December of 2023. 

(With Gartner prediction spending on public cloud services to hit $678 billion in 2024 and FinOps continuing to rise up agendas, avoiding overspend on cloud services, is a priority for many CIOs and CTOs. Cost savings through Kubernetes optimisation can be significant. E.g. AI company Anthropic last year slashed its AWS bill by 40% using Karpenter.)

See also: AI firm Anthropic slashed its AWS bill 40% by using Karpenter

To CAST AI co-founder and CPO Laurent Gil, the findings suggested that companies are still “grappling with the complexity of manually managing cloud-native infrastructure” – with the company’s report noting that on Kubernetes, workloads are sized using requests and limits, which are set for CPU and memory: “Optimizing them is like walking a tightrope. 

“Overprovisioning CPU and memory will keep the lights on, but it is costly. 

“Underprovisioning them risks CPU throttling and out-of-memory kills, which cause applications to perform poorly or even crash. When teams do not fully understand what their container resource requirements are, they often play it safe and provision a lot more CPU and memory than needed.

“This is where automated workload rightsizing comes in,” he said. (CAST AI, which provides a Kubernetes cost optimisation platform, claims that open source alternatives add more configuration complexity to an already complex orchestration layer and that the plug-and-play of their commercial alternative is the simpler, cloud-spend cutting alternative.)

See also: PlayStation wants to get gameservers running on Kubernetes. Here's why.


The report’s findings are based on CAST AI’s analysis of 4,000 clusters running on Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure (Azure) between January 1 and December 31, 2023, before they were optimised by the company’s automation platform. 

It further notes that for larger clusters containing 1,000 to 30,000 CPUs, organizations on average only utilize 17% of provisioned CPUs.

The biggest drivers of waste, in short, it concludes, are:

"Overprovisioning: Allocating more computing resources than necessary to an application or system.
"Unwarranted headroom: Requests for the number of CPUs are set too high.
"Low Spot instance usage: Many companies are reluctant to use Spot instances due to concerns over perceived instability.
"Low usage of “custom instance size” on GKE – It is difficult to choose the best CPU and memory ratio, unless the selection of custom instances is dynamic and automated."

Its report is here