We’ve established an intention to minimize distractions to help keep focused on delivering value.
Last week’s news about the economy, SVB’s collapse, and general weakness in the tech and banking sectors indicates that we’re in the right frame of mind. To maximize efficiency, we return to the fundamentals of our roles, resist unnecessary diversions, only take on projects with significant impact, and consistently strive for strong ROI on our time and money investments.
Last week detailed using GKE to simplify and optimize workload level resources, this post brings cluster level optimizations into focus for GKE Standard users. Know that this post isn’t necessary for Autopilot users because Google manages cluster scaling, node security, and reliability on GKE Autopilot.
Please review the previous post in this series before attempting optimizations in this post.
https://z3sty.com/gke-tuning-and-optimizations-to-reduce-cost-post-1/
Here’s the high level strategy we’ve established to tune our clusters:
Cost Optimization Process Overview
Phase 1- Workload Level Efficiencies (previous blog)
- Review Metrics: requested vs actual CPU and memory utilization.
- Identify candidate workloads for optimization
- Explore service usage patterns
- Rightsize the workload
- Revisit pod level metrics to validate more efficient utilization
- Summary: GKE Autopilot
Phase 2- Cluster Level Efficiencies (this blog)
- Review node metrics for requested vs actual utilization
- Review if cluster scales up and down appropriately based on load
- Tune cluster autoscaler and/or node auto provisioner.
- Review node level metrics and observe improvements
- Efficient use of Committed Use Discounts (CUDs)
- Spot instances for appropriate workloads
Phase 3- Extended visibility
- Central collection of resource utilization and recommendations
- Dashboard for multiple project and cluster visibility
- Central collection of resource utilization and recommendations
- Dashboard for multiple project and cluster visibility
Cluster Level Efficiencies
Overview
Last post, we completed a first pass at ‘right sizing’ applications so we’re in great shape to discuss cluster level tuning which ensures we save money in GKE standard environments.
Review node metrics for requested vs actual utilization
The most useful screen to get cluster level utilization is under the “Kubernetes Engine/Clusters menu.” At the top, click the “Cost Optimization” tab. On the bottom of this page is a list of clusters with visualizations representing utilization. Dark green is actual usage, lighter green represents requested resources, and gray total capacity. In the image below, one can see that actual usage is low across clusters. Also note that in the image, only the ‘k8s’ cluster has gray because the other two clusters are Autopilot clusters.
The ‘gray’ bar in the image above represents unused resources that we can optimize.
Verify cluster autoscaling
The cluster autoscaler handles provisioning and removal of GKE nodes within a node pool so we should check if it’s enabled. Under Kubernetes Engine/ Clusters/ Nodes, check for each node pool if auto scaling is enabled. Click one of the node pools and check if Autoscaling is on or off. In our case, shown below, it’s off. It should be enabled to scale the cluster and cost based on demand.
Enable the Cluster AutoScaler
Cluster autoscaler works on a per nodepool basis. It adds additional nodes when there’s a pod that’s unschedulable due to lack of resources. Scale down events are triggered when there’s greater than 50% unused capacity on nodes. Pods can be consolidated onto common nodes to more efficiently ‘binpack’, or free up nodes so that excess capacity can be deprovisioned.
Edit the node pool to enable auto scaling. For more details on enabling auto scaling on a node pool(s), refer to:
https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler#gcloud
A sample glcoud command to enable auto scaling on an existing node pool looks like this:
gcloud container clusters update $CLUSTER_NAME \
--enable-autoscaling \
--node-pool=$NODE_POOL_NAME \
--min-nodes=$MIN_NODES \
--max-nodes=$MAX_NODES \
--region=$COMPUTE_REGION
Min and max nodes should be specified to prevent running up a large bill due to unexpected bugs or behavior.
See this guide for additional details.
https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#how_cluster_autoscaler_works
Cluster autoscaler checks by default every 10 seconds to determine if there’s a need to scale a cluster up or down.
Is the Cluster AutoScaler Working?
While the cluster autoscaler operation is pretty straightforward, there are quite a few kubernetes features that can inhibit the cluster autoscaler from working the way that we think it should. Cluster metrics provide insight into cluster scaling events.
Cloud Metrics scaling events can be queried using the guide below.
https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler-visibility
More often than not, logs and metric events tell us exactly why the cluster’s unable to scale up or down.
Scaling warnings are displayed as banners at the cluster view. Open GKE clusters and click a cluster, note if there are any banners. Optionally click the Cost Optimization tab and note if there are any banners. A sample warning is shown below with a clickable link to Cloud Logging where logs can be explored in detail.
Features That Can Interfere With Cluster Autoscaler
Below are common features that can prevent the cluster autoscaler from scaling properly.
- Affinity and Anti-Affinity rules– Affinity and anti affinity rules are useful to set hard and soft rules for how pods should be scheduled. They can ensure that a pod that works with another pod is scheduled on the same host to reduce latency. Anti Affinity can also make sure that two pods aren’t scheduled on the same host, such as when you want them more distributed for failover reasons. Keep in mind that these definitions can prevent pods from being scheduled when error banners should show up in the console and log files.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity
- Local Storage (PVC)– Pods can be configured to use their host machine for storage but this prevents the pod from being moved due to the data dependency. This is an unsupported configuration of cluster autoscalers. Unless a specific application needs local storage, it’s generally preferred to avoid storing data on the host machine. If you need local storage, know that the cluster autoscaler won’t be able to down scale these nodes.
- Annotations such as “cluster-autoscaler.kubernetes.io/safe-to-evict“: “false”- This annotation tells the schedule that it’s not safe to evict this pod which prevents a scale-down event and the opportunity to save money.
- PodDisuptionBudgets (PDB)
- kube-system pods- A common issue is that nodes with kube-system pods running on them can’t scale down without defining PDBs. Creation of PDBs that specify and loosen minimum application availability permit nodes to down scale safely.
- Strict PodDistruptionBudgets– can prevent pods from being relocated- As noted above, PDB’s are recommended for all applications since they inform the scheduler how to safely roll out upgrades, handle scaling events, and maintain application uptime. When these are too restrictive, especially when running smaller clusters, PDB’s can prevent cluster scaling.
https://kubernetes.io/docs/tasks/run-application/configure-pdb/
- Taints and Tolerations- Useful to ensure that some hosts are available and reserved for special workloads. It’s common to use these to reserve special resources like GPU, high speed disk, or to enforce multi-tenant capabilities. They can also unintentionally prevent the cluster autoscaler from doing efficient bin packing.
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
Autoscaler Profiles
The default cluster autoscaler profile uses ‘Balanced’ mode and can be verified under “Kubernetes Engine / Clusters” as highlighted below.
The “‘balanced’ profile should work best for most users as it strikes a balance between saving money and giving your cluster operational stability. If you run applications that are more tolerant to being relocated and have fast start-up times, consider testing the ‘Optimize-Utilization’ profile which more aggressively scales down the cluster.
This link has details on the two profiles for cluster autoscaling.
https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#autoscaling_profiles
Node Auto Provisioning
Node Auto Provisioning is a feature that’s unique to GKE and which Google leverages on Autopilot clusters. When new workloads are unschedulable due to insufficient cluster capacity, Node Auto Provisioner selects the ideal GCE instance type to fit the workloads. This feature greatly helps with ‘binpacking’ and efficient use of cluster capacity.
Follow the process here to enable the Node Auto Provisioner.
https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#operation
Review node level metrics and observe improvements
Try and keep changes smaller changes and observe behavior so that each change is understood. Leverage metric data and dashboards to validate that the expected behavior is observed. Return often to the cluster level cost optimization page to ensure that it’s trending in the right direction. 💰💰💰
Additional cost saving options
If you’ve made it this far, you should have pretty efficient GKE cluster(s). For this section, we’ll cover a few additional levers that may stretch our GKE dollar even further.
Committed Use Discounts (CUDs)
Autopilot
Autopilot simplifies the CUD process because you have the option of purchasing 1 year and 3 year commitments for vCPU, GPU, and memory requests that are configured via Kubernetes manifests such as Deployments, StatefulSets, and Pods. With Autopilot, there’s no need to commit to a specific node type.
With workload tuning complete, metrics now help estimate the amount of vCPU, GPU, and Memory that we can commit to entitling us to an extra 20% discount for 1 year commitment or 45% for a 3 year commitment.
Standard Clusters
With GKE standard mode, we choose our instance types and region for each node pool and billing is based on the GCE instance type. CUD’s for standard clusters are slightly more complex because we must commit to an instance type and a region.
Flex CUDs were introduced toward the end of 2022 which add more flexibility so the commitment is not tied to a single region or instance type. Talk with your account team about moving toward Autopilot or GKE Flex CUDs for additional flexibility.
https://cloud.google.com/compute/docs/instances/committed-use-discounts-overview#spend_based
Additional details for each pricing model can be found at these links:
Autopilot-
https://cloud.google.com/kubernetes-engine/pricing
GKE Standard-
https://cloud.google.com/kubernetes-engine/pricing
https://cloud.google.com/products/calculator
Spot Instances
Spot instances are compute machines that are priced lower than standard GCE instances but the tradeoff is no guarantee of availability. Typical discount for spot instances is at least 60% below standard instance pricing and sometimes as high as 91%. Pricing changes can occur once per month. Spot instances are great to consider when workloads are stateless, batch jobs, fault tolerant, and able to handle disruption. Everything related to GKE workload and cluster autoscaling works the same as with standard instance types. NodePoolSelectors or Affinity rules along with taints and tolerations should be configured to keep stateful workloads from being scheduled on Spot instances.
Summary Phase 2: Cluster Level Efficiencies
Congratulations on making it through our cost optimization exercise. I’ve seen GKE customers with more than 50% savings by going through this exercise. If you’d like a hand to review your environments, as always, reach out to your account team or to me.
Keep an eye out for the final post which will focus on expanding our visibility across clusters and projects.