Back to Blog Hub

Karpenter vs Cluster Autoscaler — How We Run Both in Production to Maximise EKS Cost Efficiency

May 18, 2026 28 min read Compute FinOps

AWS Series | Part 13 — Building secure, cost-optimised, cloud-native infrastructure on AWS.

Karpenter vs Cluster Autoscaler EKS Architecture

TL;DR Comparison

Feature Cluster Autoscaler Karpenter
What it scales Node Groups (pre-defined) Individual nodes (directly via EC2 API)
Instance selection Fixed — whatever the node group uses Dynamic — picks best available instance
Spot support Via separate Spot node groups Native — diversifies across families
Scale-to-zero ❌ Min 1 node per group ✅ Full scale-to-zero
Provisioning speed 3–5 minutes 30–60 seconds
Bin packing ❌ Node group level only ✅ Per-pod, right-sized nodes
Graviton support Manual node group per arch ✅ Automatic amd64 + arm64
Consolidation ❌ Threshold-based only ✅ Active bin-packing
Configuration Annotations + ASG config NodePool + EC2NodeClass CRDs
Fallback to On-Demand ✅ Via dedicated node group ✅ Via NodePool capacity-type
Best for Stable On-Demand baseline Dynamic Spot + cost optimisation

Introduction — The Honest Architecture

Most blog posts about Karpenter frame it as a Cluster Autoscaler replacement. That is not always true — and it was not true for us.

At Rabobank, the fraud detection platform runs both. Cluster Autoscaler manages a stable On-Demand Managed Node Group that acts as the guaranteed baseline. Karpenter manages two Spot NodePools that handle the majority of the workload. When Spot capacity is unavailable in our AZs — which happens — Karpenter falls back to On-Demand provisioning within its own NodePool. The Cluster Autoscaler On-Demand group is the last resort for system-critical workloads that must never land on Spot.

The result: a meaningful reduction in EC2 spend — driven by Spot discounts, scale-to-zero overnight on Karpenter NodePools, and continuous bin-packing consolidation — with full resilience when Spot is unavailable.

This post documents the complete architecture — how both tools work, why they complement rather than replace each other, the exact Helm and YAML configuration, and the lessons from running this in a regulated production environment.

Why this matters in production: In a regulated financial environment, cost and compliance are not opposites. The three-tier model gives you Spot economics on 80% of workloads while keeping the compliance-critical path on guaranteed On-Demand capacity — documented, auditable, and isolated from Karpenter's disruption policies.

1. The Architecture — Three Compute Tiers

Before diving into either tool, understand the three-tier model this architecture implements:

Tier 1: System MNG (Managed Node Group — On-Demand, fixed)
  Managed by: Nothing — fixed size, always-on
  Hosts: DaemonSets, CloudWatch agent, Karpenter itself, Cluster Autoscaler itself
  Why: System components cannot tolerate Spot interruption or Karpenter eviction

Tier 2: On-Demand MNG (Managed Node Group — Cluster Autoscaler)
  Managed by: Cluster Autoscaler
  Hosts: Latency-sensitive fraud scoring pods + workloads tainted for On-Demand
  Why: Guaranteed capacity, no interruption risk, predictable baseline

Tier 3: Spot NodePools (Karpenter — two pools)
  Managed by: Karpenter
  Hosts: 80% of workloads — batch processors, enrichment services, general APIs
  Why: 60–70% cost saving vs On-Demand, Karpenter handles diversification + fallback
Traffic → EKS Cluster
           ├── [System MNG]     kube-system, karpenter, cluster-autoscaler
           ├── [On-Demand MNG]  fraud-scoring, payments-critical (Cluster Autoscaler)
           └── [Spot NodePools] everything else (Karpenter)
                   ↓
           Spot unavailable?
                   ↓
           Karpenter falls back to On-Demand within NodePool constraints
Why keep Cluster Autoscaler at all? Karpenter's On-Demand fallback is fast and reliable. But in a regulated environment, having a dedicated Cluster Autoscaler-managed On-Demand group for the most critical workloads provides an additional isolation layer — those pods are explicitly tainted to only land on On-Demand MNG nodes, and no Karpenter disruption policy can touch them.

2. Cluster Autoscaler — The On-Demand Baseline

Our On-Demand Node Group Configuration

# System Node Group — always-on, fixed size, Karpenter and CA run here
resource "aws_eks_node_group" "system" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "system"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids

  instance_types = ["m5.large"]   # 2 vCPU, 8 GB — sized for DaemonSets only

  scaling_config {
    desired_size = 2   # One per AZ — never changes
    min_size     = 2
    max_size     = 4   # Small burst for cluster events
  }

  taint {
    key    = "node-role"
    value  = "system"
    effect = "NO_SCHEDULE"
  }

  labels = { "node-role" = "system" }

  update_config { max_unavailable = 1 }

  tags = {
    Name                                              = "system-node-group"
    "k8s.io/cluster-autoscaler/enabled"             = "true"
    "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned"
  }
}

# On-Demand Application Node Group — managed by Cluster Autoscaler
resource "aws_eks_node_group" "on_demand_app" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "on-demand-app"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids

  instance_types = ["m5.xlarge", "m5a.xlarge", "m5n.xlarge"]

  scaling_config {
    desired_size = 2   # Two On-Demand nodes always warm for critical path
    min_size     = 2   # Never scale below 2 — one per AZ for HA
    max_size     = 10
  }

  taint {
    key    = "workload-type"
    value  = "on-demand-critical"
    effect = "NO_SCHEDULE"
  }

  labels = {
    "node-role"     = "on-demand-app"
    "workload-type" = "on-demand-critical"
  }

  update_config { max_unavailable = 1 }

  tags = {
    Name                                              = "on-demand-app-node-group"
    "k8s.io/cluster-autoscaler/enabled"             = "true"
    "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned"
    "karpenter.sh/managed"                          = "false"
  }
}
Why this matters in production: Setting min_size = 2 on the On-Demand MNG is non-negotiable in a regulated environment. It ensures one node per AZ survives any single-AZ failure — Cluster Autoscaler will never scale below this floor, giving you a guaranteed HA baseline for the critical path regardless of what Karpenter is doing on the Spot pools.

Cluster Autoscaler Deployment via Helm

resource "helm_release" "cluster_autoscaler" {
  name       = "cluster-autoscaler"
  repository = "https://kubernetes.github.io/autoscaler"
  chart      = "cluster-autoscaler"
  version    = "9.37.0"
  namespace  = "kube-system"

  values = [yamlencode({
    autoDiscovery = {
      clusterName = aws_eks_cluster.main.name
    }

    awsRegion = var.region

    rbac = {
      serviceAccount = {
        annotations = {
          "eks.amazonaws.com/role-arn" = aws_iam_role.cluster_autoscaler.arn
        }
      }
    }

    extraArgs = {
      "scale-down-delay-after-add"       = "10m"
      "scale-down-unneeded-time"         = "15m"
      "scale-down-utilization-threshold" = "0.5"
      "skip-nodes-with-system-pods"      = "true"
      "expander"                         = "least-waste"
      "balance-similar-node-groups"      = "true"
      "scan-interval"                    = "30s"

      # CRITICAL: Tell CA to ignore nodes managed by Karpenter
      "node-group-auto-discovery" = "asg:tag=k8s.io/cluster-autoscaler/enabled=true,k8s.io/cluster-autoscaler/${var.cluster_name}=owned"
    }

    tolerations = [{
      key      = "node-role"
      value    = "system"
      operator = "Equal"
      effect   = "NoSchedule"
    }]

    affinity = {
      nodeAffinity = {
        requiredDuringSchedulingIgnoredDuringExecution = {
          nodeSelectorTerms = [{
            matchExpressions = [{
              key      = "node-role"
              operator = "In"
              values   = ["system"]
            }]
          }]
        }
      }
    }
  })]
}

Cluster Autoscaler IAM Policy

resource "aws_iam_policy" "cluster_autoscaler" {
  name = "cluster-autoscaler-policy"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "autoscaling:DescribeAutoScalingGroups",
          "autoscaling:DescribeAutoScalingInstances",
          "autoscaling:DescribeLaunchConfigurations",
          "autoscaling:DescribeScalingActivities",
          "autoscaling:DescribeTags",
          "autoscaling:SetDesiredCapacity",
          "autoscaling:TerminateInstanceInAutoScalingGroup",
          "ec2:DescribeImages",
          "ec2:DescribeInstanceTypes",
          "ec2:DescribeLaunchTemplateVersions",
          "ec2:GetInstanceTypesFromInstanceRequirements",
          "eks:DescribeNodegroup"
        ]
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_role" "cluster_autoscaler" {
  name = "cluster-autoscaler"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.eks.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "${aws_iam_openid_connect_provider.eks.url}:sub" = "system:serviceaccount:kube-system:cluster-autoscaler"
          "${aws_iam_openid_connect_provider.eks.url}:aud" = "sts.amazonaws.com"
        }
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "cluster_autoscaler" {
  role       = aws_iam_role.cluster_autoscaler.name
  policy_arn = aws_iam_policy.cluster_autoscaler.arn
}

3. Karpenter — Spot NodePools with On-Demand Fallback

Karpenter Installation via Helm

# SQS queue for Spot interruption handling
resource "aws_sqs_queue" "karpenter_interruption" {
  name                      = "karpenter-interruption-${var.cluster_name}"
  message_retention_seconds = 300
  tags                      = { Name = "karpenter-interruption" }
}

# EventBridge rules — feed Spot interruption events to SQS
resource "aws_cloudwatch_event_rule" "karpenter_spot_interruption" {
  for_each = {
    spot_interruption  = "EC2 Spot Instance Interruption Warning"
    instance_rebalance = "EC2 Instance Rebalance Recommendation"
    instance_state     = "EC2 Instance State-change Notification"
  }

  name          = "karpenter-${each.key}"
  event_pattern = jsonencode({
    source      = ["aws.ec2"]
    detail-type = [each.value]
  })
}

resource "aws_cloudwatch_event_target" "karpenter_interruption" {
  for_each = aws_cloudwatch_event_rule.karpenter_spot_interruption

  rule = each.value.name
  arn  = aws_sqs_queue.karpenter_interruption.arn
}

# Pod Identity association — Karpenter controller auth
resource "aws_eks_pod_identity_association" "karpenter" {
  cluster_name    = aws_eks_cluster.main.name
  namespace       = "karpenter"
  service_account = "karpenter"
  role_arn        = aws_iam_role.karpenter_controller.arn
}

# Karpenter Helm release
resource "helm_release" "karpenter" {
  name                = "karpenter"
  repository          = "oci://public.ecr.aws/karpenter"
  repository_username = data.aws_ecrpublic_authorization_token.token.user_name
  repository_password = data.aws_ecrpublic_authorization_token.token.password
  chart               = "karpenter"
  version             = "1.0.6"
  namespace           = "karpenter"
  create_namespace    = true

  values = [yamlencode({
    settings = {
      clusterName       = aws_eks_cluster.main.name
      clusterEndpoint   = aws_eks_cluster.main.endpoint
      interruptionQueue = aws_sqs_queue.karpenter_interruption.name
    }

    serviceAccount = {
      name        = "karpenter"
      annotations = {}   # Pod Identity handles auth — no IRSA annotation needed
    }

    replicas = 2   # HA — two Karpenter controller pods

    podDisruptionBudget = { maxUnavailable = 1 }

    controller = {
      resources = {
        requests = { cpu = "100m", memory = "256Mi" }
        limits   = { cpu = "1000m", memory = "1Gi" }
      }
    }

    # Karpenter MUST run on the system node group
    # If it runs on a Karpenter-managed node, it can evict itself
    affinity = {
      nodeAffinity = {
        requiredDuringSchedulingIgnoredDuringExecution = {
          nodeSelectorTerms = [{
            matchExpressions = [{
              key      = "node-role"
              operator = "In"
              values   = ["system"]
            }]
          }]
        }
      }
    }

    tolerations = [{
      key      = "node-role"
      value    = "system"
      operator = "Equal"
      effect   = "NoSchedule"
    }]
  })]

  depends_on = [aws_eks_pod_identity_association.karpenter]
}
Why this matters in production: The SQS interruption queue is the single most impactful Karpenter config decision. Without it, Karpenter only detects Spot termination when the node disappears — after the 2-minute warning has already elapsed. With the queue, Karpenter receives the warning immediately and begins graceful drain before the instance is reclaimed, giving your pods the full 2-minute window to shut down cleanly.

EC2NodeClass — Shared Node Configuration

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023   # Amazon Linux 2023 — IMDSv2 enforced by default

  role: karpenter-node-role

  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: prod-eks-cluster

  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: prod-eks-cluster

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 50Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        encrypted: true

  # IMDSv2 — containers cannot reach instance metadata
  metadataOptions:
    httpEndpoint:            enabled
    httpPutResponseHopLimit: 1        # Blocks container access to IMDS
    httpTokens:              required  # IMDSv2 required — no v1 fallback

  tags:
    Environment:       production
    ManagedBy:         karpenter
    kubernetes.io/cluster/prod-eks-cluster: owned

NodePool 1 — Spot General Purpose (80% of workloads)

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-general
spec:
  template:
    metadata:
      labels:
        node-pool: spot-general
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: default

      requirements:
        # Spot first — fall back to On-Demand if no Spot available
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]   # Ordered: Spot preferred, OD fallback

        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]   # Compute, General, Memory

        - key: karpenter.k8s.aws/instance-cpu
          operator: In
          values: ["2", "4", "8", "16"]

        # Include Graviton (arm64) for better Spot availability + price
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]

        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]

      startupTaints:
        - key: node.cloudprovider.kubernetes.io/uninitialized
          effect: NoSchedule

  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
    budgets:
      - nodes: "20%"

  limits:
    cpu:    "500"
    memory: "2000Gi"

  weight: 10

NodePool 2 — On-Demand Latency-Sensitive (via Karpenter)

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: on-demand-latency
spec:
  template:
    metadata:
      labels:
        node-pool: on-demand-latency
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1
        kind: EC2NodeClass
        name: default

      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]

        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m"]

        - key: karpenter.k8s.aws/instance-cpu
          operator: In
          values: ["4", "8", "16"]

        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]

      taints:
        - key: workload-type
          value: latency-sensitive-karpenter
          effect: NoSchedule

  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 5m
    budgets:
      - nodes: "10%"

  limits:
    cpu:    "200"
    memory: "800Gi"

  weight: 20

4. Preventing Conflicts — The Critical Configuration

Running Cluster Autoscaler and Karpenter in parallel requires careful configuration to prevent them from interfering with each other. This is where most dual-autoscaler setups go wrong.

Rule 1 — Cluster Autoscaler Must Not Touch Karpenter Nodes

Cluster Autoscaler uses ASG tags for node group discovery. Karpenter-managed nodes are NOT in any ASG — they are standalone EC2 instances. CA will not see them. But you must ensure CA only manages the tagged node groups:

# In the CA Helm values — explicit node group auto-discovery filter
extraArgs = {
  "node-group-auto-discovery" = "asg:tag=k8s.io/cluster-autoscaler/enabled=true,k8s.io/cluster-autoscaler/${cluster_name}=owned"
}
# Only ASGs with BOTH tags are managed by CA
# Karpenter nodes have neither tag — CA ignores them completely

Rule 2 — Karpenter Must Not Consolidate MNG Nodes

Tag your Managed Node Group instances to signal Karpenter should not consolidate them:

resource "aws_eks_node_group" "on_demand_app" {
  labels = {
    "node-role"            = "on-demand-app"
    "karpenter.sh/managed" = "false"   # Karpenter reads this and skips consolidation
  }
}

Additionally, the pods on the On-Demand MNG have a taint (workload-type: on-demand-critical) that Karpenter's NodePools do not tolerate. Even if Karpenter tried to consolidate, it could not reschedule those pods.

Rule 3 — Karpenter Must Run on System MNG, Not Its Own Nodes

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: node-role
              operator: In
              values: ["system"]
tolerations:
  - key: node-role
    value: system
    operator: Equal
    effect: NoSchedule

If Karpenter runs on a node it manages, it can evict itself during consolidation — taking down the autoscaler and leaving the cluster unable to scale. This is a catastrophic failure mode that is easy to prevent and hard to recover from.

Rule 4 — Set Pod Priorities to Guide Scheduling

# High priority — latency-sensitive scoring pods
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
description: "Fraud scoring and payment-critical services"
---
# Default — general workloads on Spot
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: default-priority
value: 0
globalDefault: true
description: "Default priority — Spot NodePool workloads"
Why this matters in production: Pod priority is the mechanism that ensures fraud-scoring pods survive Spot interruption events. When Karpenter drains a Spot node, high-priority pods are rescheduled first — they get new nodes before lower-priority batch workloads. Without explicit PriorityClasses, all pods compete equally during mass eviction and the scoring path can go down.

5. Pod Assignment — Who Goes Where

Critical Path Pods → On-Demand MNG (Cluster Autoscaler)

# values-prod.yaml for fraud scoring service
# This pod ONLY lands on the Cluster Autoscaler On-Demand MNG
tolerations:
  - key: "workload-type"
    value: "on-demand-critical"
    operator: Equal
    effect: NoSchedule

nodeSelector:
  node-role: on-demand-app

priorityClassName: high-priority

Batch and General Pods → Spot NodePool (Karpenter)

# values-prod.yaml for enrichment service
# No special toleration — lands on Karpenter Spot NodePool by default
priorityClassName: default-priority

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "2000m"
    memory: "2Gi"

Latency-Sensitive but Not MNG Critical → Karpenter On-Demand NodePool

# values-prod.yaml for real-time feature service
tolerations:
  - key: "workload-type"
    value: "latency-sensitive-karpenter"
    operator: Equal
    effect: NoSchedule

nodeSelector:
  node-pool: on-demand-latency

priorityClassName: high-priority

6. The Cost Story — Where the Savings Come From

I will not put specific percentage savings in this section — every cluster is different, and a number I cannot stand behind is worse than no number at all. What I can tell you is where the savings mechanism is and how to calculate it for your own workload.

The Three Saving Levers

Lever 1 — Spot discount vs On-Demand
Spot instances typically run at 60–70% discount vs On-Demand for the same instance type. Karpenter's diversification across instance families and sizes increases Spot availability — reducing interruptions and allowing more workloads to run on Spot consistently.

Example calculation for your cluster:
On-Demand price (m5.xlarge, eu-west-1): $0.192/hr
Spot price (same, approximate):          $0.058/hr
Discount:                                ~70%

If 70% of your node-hours move from On-Demand to Spot:
Monthly saving per node = ($0.192 - $0.058) × 730h = $97.82/month
For 10 Spot nodes:                                   = $978/month

Lever 2 — Scale to zero overnight
Cluster Autoscaler cannot scale below min_size. If your min_size is 3 across two On-Demand node groups, you pay for 3 nodes 24/7 regardless of demand. Karpenter NodePools have no floor — they scale to zero when no pods need them.

Example: 3 × m5.xlarge On-Demand idling overnight (8pm–8am, weekends):
  Hours idle per month ≈ 12h/day × 22 weekdays + 48h weekend = 312h/month
  Cost of idle baseline: 3 × $0.192 × 312h = $179.71/month saved

Lever 3 — Bin-packing via consolidation
Cluster Autoscaler removes underutilised nodes but does not actively consolidate. Two nodes at 30% utilisation stay as two nodes. Karpenter runs continuous consolidation — it detects that all pods from Node A can fit on Node B, evicts Node A, and terminates it. This is the lever that is hardest to quantify upfront but often produces the largest saving in practice.

How to Measure Your Own Saving

Before enabling Karpenter, run this for two weeks with Cluster Autoscaler:

# Average node CPU utilisation across the cluster
kubectl top nodes | awk 'NR>1 {sum+=$3; count++} END {print sum/count "%"}'

# Count nodes by type
kubectl get nodes -L karpenter.sh/capacity-type,node.kubernetes.io/instance-type \
  --no-headers | awk '{print $6, $7}' | sort | uniq -c

# Total EC2 spend — use AWS Cost Explorer filtered by:
# Service: EC2, Tag: kubernetes.io/cluster/your-cluster-name
The honest framing: The cost saving from this architecture is real and significant — Spot discounts alone justify the operational investment for most clusters above 5 nodes. But the exact number depends on your workload shape, your AZ Spot availability, and how aggressively you tune the consolidation policy. Measure before and after. The tooling is free.

7. Spot Interruption Handling — How Karpenter Protects Workloads

When AWS sends a Spot interruption warning (2-minute notice), Karpenter:

  1. Receives the event via SQS (from EventBridge)
  2. Immediately cordons the node — no new pods scheduled
  3. Begins draining pods gracefully — respects PodDisruptionBudget
  4. Finds or provisions replacement capacity (next cheapest Spot or On-Demand fallback)
  5. Reschedules pods on new node
  6. Terminates the interrupted node
# PodDisruptionBudget — apply to every Deployment running on Spot
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: enrichment-service-pdb
  namespace: fraud-detection
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: enrichment-service
# Graceful shutdown — application handles SIGTERM before node reclaim
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: enrichment-service
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 15"]   # Wait for LB deregistration
Why the preStop sleep matters: Karpenter drains pods by sending SIGTERM and waiting terminationGracePeriodSeconds. But the ALB or NLB may still be routing traffic to the pod for a few seconds after SIGTERM is received. The preStop sleep creates a gap that allows the load balancer to complete deregistration before the application starts shutting down. Without it, you drop in-flight requests during Spot interruption handling.

8. Observability — Knowing What Each Autoscaler Is Doing

With two autoscalers running, you need visibility into which one is acting and why.

resource "aws_cloudwatch_dashboard" "autoscaling" {
  dashboard_name = "eks-autoscaling"

  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          title  = "Karpenter — Nodes Launched vs Terminated"
          metrics = [
            ["Karpenter", "nodes_created_total", "NodePool", "spot-general"],
            ["Karpenter", "nodes_terminated_total", "NodePool", "spot-general"]
          ]
          period = 60
          stat   = "Sum"
        }
      },
      {
        type = "metric"
        properties = {
          title  = "Karpenter — Pod Scheduling Latency (p99)"
          metrics = [
            ["Karpenter", "scheduler_scheduling_duration_seconds", { stat = "p99" }]
          ]
        }
      },
      {
        type = "metric"
        properties = {
          title  = "Cluster Autoscaler — Scale Events"
          metrics = [
            ["ContainerInsights", "cluster_autoscaler_scaled_up_nodes_count",
              "ClusterName", var.cluster_name],
            ["ContainerInsights", "cluster_autoscaler_scaled_down_nodes_count",
              "ClusterName", var.cluster_name]
          ]
        }
      }
    ]
  })
}
# Alert if Karpenter cannot provision nodes
resource "aws_cloudwatch_metric_alarm" "karpenter_provisioning_failure" {
  alarm_name          = "karpenter-provisioning-failure"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "provisioner_scheduling_errors_total"
  namespace           = "Karpenter"
  period              = 60
  statistic           = "Sum"
  threshold           = 5
  alarm_description   = "Karpenter failing to provision nodes — check Spot availability"
  alarm_actions       = [aws_sns_topic.platform_alerts.arn]
}

# Alert if pending pods persist
resource "aws_cloudwatch_metric_alarm" "persistent_pending_pods" {
  alarm_name          = "persistent-pending-pods"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 5
  metric_name         = "cluster_failed_node_count"
  namespace           = "ContainerInsights"
  period              = 60
  statistic           = "Average"
  threshold           = 0
  alarm_description   = "Pods pending for >5 minutes — autoscaler may be failing"
  alarm_actions       = [aws_sns_topic.platform_alerts.arn]
}
Why this matters in production: Without separate dashboards for each autoscaler, a scale-up event from CA and a consolidation event from Karpenter happening simultaneously look like random node churn. The CloudWatch dashboard with both autoscaler metrics side-by-side is what lets you tell the difference between expected behaviour and a conflict between the two systems.

9. Common Mistakes & Anti-Patterns

Mistake 1: Running Karpenter on a Node It Manages

Karpenter can consolidate the node it runs on, evicting itself. The cluster loses its autoscaler silently — pods stop scheduling, nodes pile up, no one knows why. Always use the System MNG taint and affinity to pin Karpenter to nodes it does not manage.

Mistake 2: Same Discovery Tags on MNG and Karpenter Nodes

If your Cluster Autoscaler discovery tags also match Karpenter-launched nodes, CA will try to manage them — calling ASG APIs that do not apply to standalone EC2 instances. This causes CA errors and log spam that masks real issues. Use explicit tag filters in CA's node-group-auto-discovery setting.

Mistake 3: No PodDisruptionBudget on Spot Workloads

Karpenter respects PDBs during consolidation and interruption handling. Without a PDB, Karpenter may evict all replicas of a service simultaneously during a consolidation event — causing an outage. Apply minAvailable: 1 PDBs to every Deployment that runs on Spot.

Mistake 4: Karpenter Spot-Only NodePool With No On-Demand Fallback

A spot-only NodePool (capacity-type: spot, no on-demand) means when Spot capacity is unavailable in all instance types in your AZs, pods remain pending indefinitely. Always include ["spot", "on-demand"] in the capacity-type requirement.

Mistake 5: No Startup Taint

Without the node.cloudprovider.kubernetes.io/uninitialized startup taint, pods can be scheduled on a Karpenter node before the VPC CNI has fully initialised networking — causing the pod to fail to get an IP and restart. Add the startup taint to all NodePools.

Mistake 6: Forgetting the SQS Interruption Queue

Karpenter without the interruption queue only detects Spot interruptions when the node disappears — 2 minutes after the warning. With the SQS queue, Karpenter gets the warning immediately and begins draining gracefully. The difference is whether pods are gracefully rescheduled or abruptly killed.

Mistake 7: Setting min_size = 0 on the On-Demand MNG

A min_size = 0 On-Demand MNG means CA can scale it to zero. If Karpenter is simultaneously provisioning On-Demand fallback nodes, you briefly have zero guaranteed On-Demand capacity. For regulated environments, keep min_size = 2 on the On-Demand MNG to maintain an HA baseline.

Architecture Decision Matrix

Requirement Cluster Autoscaler Only Karpenter Only Both (Our Pattern)
Guaranteed On-Demand baseline ✅ Via min_size ⚠️ Via NodePool OD fallback ✅ MNG min_size
Spot cost optimisation ⚠️ Manual Spot MNG ✅ Automatic diversification ✅ Karpenter NodePools
Scale to zero overnight ❌ min_size floor ✅ Full scale-to-zero ✅ Spot pools scale to zero
Provisioning speed ❌ 3-5 min ✅ 30-60 sec ✅ Karpenter for Spot
Bin-packing / consolidation ❌ No ✅ Continuous ✅ Karpenter handles Spot
Spot interruption handling ❌ Manual ✅ Automatic via SQS ✅ Karpenter handles Spot
Isolated critical path ✅ Dedicated MNG ⚠️ Taint-based only ✅ Dedicated MNG + CA
Compliance (no Spot on critical) ✅ Via node group ⚠️ Via tolerations ✅ MNG enforces it
Operational complexity ✅ Low ⚠️ Medium ⚠️ Medium (worth it)
Cost at scale ❌ High (idle On-Demand) ✅ Lowest ✅ Near-lowest

The Golden Rule

"Cluster Autoscaler and Karpenter are not competitors — they serve different purposes and work best together. Use Cluster Autoscaler to manage a stable, guaranteed On-Demand Managed Node Group for your most critical, latency-sensitive workloads that cannot tolerate Spot interruption. Use Karpenter to manage everything else — Spot for cost, On-Demand fallback for resilience, continuous consolidation for efficiency. The combination gives you the compliance safety of guaranteed On-Demand capacity and the cost efficiency of right-sized, bin-packed Spot compute. Running only one of them leaves either money on the table or operational risk on the table."

Tags: #AWS #EKS #Karpenter #ClusterAutoscaler #Spot #FinOps #Kubernetes #DevSecOps #IaC #Terraform #Helm

Ankush Panday

Specializing in highly scalable AWS infrastructure and automated quality engineering.

Connect on LinkedIn