AI firm Anthropic slashed its AWS bill 40% by using Karpenter

"If spot instances weren’t available, we had no way to tell Cluster Autoscaler to fall back to on-demand Instances, and we ended up stuck in a few loops…" others note.

AI firm Anthropic slashed its AWS bill 40% by using Karpenter
Sometimes we just run a little low on stock imagery, OK?

AI company Anthropic slashed its AWS bill by 40% using an open source tool that AWS itself gifted to the community called Karpenter

That’s according to the company’s lead engineer Nova DasSarma.

Speaking at AWS’s annual reInvent conference they explained that Anthropic works primarily on Elastic Kubernetes Service (Amazon EKS) – “from training to data to inference, all of these run on top of Kubernetes.”

Anthropic uses a range of AWS instances for its AI models. And adopting Karpenter has helped it nearly halve its AWS bill they said.

“Year-over-year we had a 40% cost reduction over using on-demand instances by being able to utilize flexible instance types in Karpenter,as well as spot instances more generally; no Cluster Autoscaler tweak required,” DasSarma said. “A Cluster Autoscaler can often get stuck in various ways and Karpenter just doesn't do that. It stays snappy.” 

What's Karpenter?

AWS open-sourced Karpenter in 2021 under a permissive Apache 2.0 licence, despite what must be some subsequent hurt to its margins. In short, it's like Cluster Autoscaler, but better: It lets users optimise where their Kubernetes clusters run for improved latency and cost performance. It does this by: Watching for pods (the smallest deployable units of K8s compute) that the Kubernetes scheduler has marked as unschedulable; Evaluating scheduling constraints requested by the pods; Provisioning nodes that meet the requirements of the pods; Removing them when no longer needed.

Grafana: Cluster Autoscaler got us "stuck in some loops"

Among the other companies that have spoken highly of it is Grafana.

The monitoring software firm’s Paula Julve and Logan Ballard blogged on November 9, 2023 that “When we first landed on AWS in 2022 and began using Amazon Elastic Kubernetes Service (Amazon EKS), we went with Cluster Autoscaler (CA) as our autoscaling tool of choice. It’s open; it’s simple; it’s been battle tested by countless other people before us.”

“However… we quickly began running into a number of obstacles that limited the efficiency and flexibility of our EKS clusters” not least CA’s inherent limitations, i.e. it lets users “define a diverse range of instance types for your node groups (pools of compute). However, if you list multiple types for your group, CA will only run calculations for one of them in order to determine how many nodes it needs to scale up. 

See also: The future of Kubernetes at AWS: Slack, Anthropic lead a "Karpenter" love-in

“It will then request that many instances to AWS, but you have no control over which instance types you’re actually getting. You may end up with capacity that does not match your actual needs, leading to more readjusting” the two wrote  – among the worst pain points being that if you request cheaper AWS spot instances “it didn’t check for availability.

“If Spot Instances weren’t available, we had no way to tell CA to fall back to On-Demand Instances, and we ended up stuck in a few loops…” 

Anthropic and AWS: Lots of instances...

Explaining Anthropic’s AWS and Kubernetes use, DasSarma said: “We use Karpenter to provision spot EC2 instances for both Dask and Spark workloads. All of those are on EKS, [we take] data from those raw data buckets, tokenize that, bring it back into tokenization buckets in S3 and then back from there to accelerated EC2 instances like P4ds, the new P5s and Trainium instances as well. “Those are all inside of EKS...

"We then utilize S3 to store our model checkpoints. We actually don't use PVs [persistent volumes] for those because we find that we can get enough performance directly out of S3, which is pretty exciting,” DasSarma said.

For data processing, as suggested, Anthropic uses multiple instance classes: “Historically, we would use Cluster Autoscaler and we would put together, an R5 instance group or something like that for our specific Spark jobs – and we found that oftentimes, we would run out of capacity in a particular way, or we would want to scale something with a different ratio of CPU to memory. Karpenter means that we can express those constraints in the way that we think about them and not have to think about the individual instance types,” Anthropic’s DasSarma said. 

“We're also able to use spot, on-demand and reserved instances together here and that helps optimize cost… year-over-year we had a 40% cost reduction overusing on-demand instances by being able to utilize flexible instance types in Karpenter,as well as spot instances more generally; no Cluster Autoscaler tweak required. A cluster-autoscaler can often get stuck in various ways and Karpenter just doesn't do that. It stays snappy.” 

Don't be at the mercy of Google's mutating search algorithms or the fact that your friend's friend's liked a post on social whilst doom-scrolling; get yourself subscribed. It's simple and free.