Safe, versioned Worker Deployments on Kubernetes: Now with autoscaling!

When you update Workflow or Activity code in Temporal, you can’t just do a rolling update like you might for stateless microservices. In Durable Execution systems, long-running Workflows need to finish on the code version they started with, which means you end up running multiple Worker versions simultaneously. This pattern is sometimes called rainbow deployments: think blue/green, just with more colors. Managing it manually means juggling multiple Kubernetes Deployments, calling Temporal’s versioning API, timing version retirement, and cleaning up old resources.

The Temporal Worker Controller is an open-source Kubernetes operator that automates this coordination. Push a new image, and it creates a new versioned Kubernetes Deployment, works with Temporal to progressively ramp traffic to the new version, and automatically sunsets old versions when they’re safe to remove.

Until now, the controller handled Worker Versioning, but didn’t support autoscaling of your Worker versions. Different versions carry different amounts of work: a version actively receiving traffic needs more capacity than one being drained for deprecation. Without autoscaling support, there was no way to attach Kubernetes scaling configurations to each versioned Deployment. Today, we’re closing that gap.

The Worker Controller now lets you attach Kubernetes resources, such as Horizontal Pod Autoscalers (HPAs) or PodDisruptionBudgets (PDBs), to your versioned worker deployments automatically. Additionally, we’ve added Worker version labels to the approximate_backlog_count metric so that it’s possible to independently scale each versioned Deployment based on its own specific backlog depth.

How it works#

The Temporal Worker Controller manages Worker Versioning by creating a separate Kubernetes Deployment for each Temporal Worker Deployment Version (Build ID). When you push a new image, the Controller stands up a new versioned Kubernetes Deployment, progressively shifts traffic, and eventually sunsets old versions.

Previously, this created a problem for autoscaling. Because the Controller manages the naming of each Kubernetes Deployment and Build ID, there was no way to specify which Kubernetes Deployment a user-created HPA spec should target. There was also no way to scale each version independently based on its own backlog on the Temporal Server.

Now, the Worker Controller lets you attach your HPA configurations and specify the metrics that trigger HPA scaling actions: built-in Kubernetes metrics, or external metrics from Temporal Cloud like as approximate_backlog_count labeled by Worker version and ingested into Kubernetes as a custom metric.

WorkerResourceTemplate#

WorkerResourceTemplate is a new custom resource that acts as a template for any namespaced Kubernetes resource you want attached to your Workers. Define the resource once, and the Controller handles the rest:

One copy per version. The Controller creates a separate instance of your resource for each Worker version that has running Pods. Roll out v3 of your Worker? The Controller creates an HPA targeting v3’s Deployment automatically.
Auto-wiring. Set scaleTargetRef: {} in your template, and the Controller injects the correct versioned Deployment reference for each copy. No need to know or hardcode Deployment names.
Automatic cleanup. When a version is sunset and its Deployment is deleted, the Worker Controller removes the associated HPA or PDB with it.
Works with any autoscaler. HPA or custom CRDs: anything with a scaleTargetRef or matchLabels field gets auto-injection. You choose the scaling strategy that fits your workload.

Here are some examples of how this resource looks in practice:

Autoscale based on CPU metrics

apiVersion: temporal.io/v1alpha1
kind: WorkerResourceTemplate
metadata:
  name: my-worker-hpa
  namespace: my-namespace
spec:
  temporalWorkerDeploymentRef:
    name: my-worker

  object:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    spec:
      # Empty scaleTargetRef tells the controller to auto-inject the
      # correct versioned Deployment reference for each Build ID.
      scaleTargetRef: {}
      minReplicas: 2
      maxReplicas: 10
      metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 70

Autoscale based on backlog and slot utilization

# Setting up the Custom Metrics used in this example requires additional 
# configuration in your Kubernetes cluster.
# See temporal-worker-controller/internal/demo/README.md#metric-based-hpa-scaling-demo
apiVersion: temporal.io/v1alpha1
kind: WorkerResourceTemplate
metadata:
  name: helloworld-hpa-backlog
  namespace: default
spec:
  temporalWorkerDeploymentRef:
    name: helloworld

  template:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    spec:
      # Empty scaleTargetRef tells the controller to auto-inject the 
      # correct versioned Deployment reference for each Build ID.  
      scaleTargetRef: {}
      minReplicas: 1
      maxReplicas: 20
      metrics:
        # Metric 1: slot utilization — scale-down guard: prevent scale down if workers are busy.
        - type: External
          external: 
            metric: 
              name: temporal_slot_utilization 
              selector: 
                matchLabels: 
                  worker_type: "ActivityWorker"
                # temporal_worker_deployment_name: <auto-injected>
                # temporal_worker_build_id: <auto-injected>
                # temporal_namespace: <auto-injected>
            target: 
              type: Value
              value: "750m"

        # Metric: backlog count — scale up when tasks are queued but not yet picked up.
        - type: External
          external:
            metric:
              name: temporal_backlog_count_by_version
              selector:
                matchLabels:
                  task_type: "Activity"
                # temporal_worker_deployment_name: <auto-injected>
                # temporal_worker_build_id: <auto-injected>
                # temporal_namespace: <auto-injected>
            target:
              type: AverageValue
              averageValue: "1"
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 30
        scaleDown:
          stabilizationWindowSeconds: 120

Apply a WorkerResourceTemplate once, and every Worker version gets its own HPA: correctly targeted, correctly labeled, and automatically cleaned up when the version is retired.

Built on Kubernetes patterns, not around them#

A design principle we held throughout this work: don’t reinvent what Kubernetes already does well. The Worker Controller doesn’t contain its own autoscaling logic. Instead, it provides the plumbing: per-version, lifecycle-managed resource instances that let battle-tested Kubernetes components like HPA do what they do best.

This approach means:

You keep your existing autoscaling expertise and tooling.
You can use any metric source your autoscaler supports: CPU, memory, custom Prometheus metrics, or external metrics like Task Queue backlog depth.
Monitoring and debugging work the same way: kubectl get hpa shows each version’s scaling status, just like any other HPA.
Standard Kubernetes metrics pipelines continue to work as expected. Use Prometheus to scrape external metrics and present them to HPA as a custom metric.

diagram-metrics

Security by default#

WorkerResourceTemplate includes a validating webhook that enforces guardrails out of the box:

No privilege escalation. The webhook performs SubjectAccessReviews to verify that you have permission to create the embedded resource type. You can’t use WorkerResourceTemplate to create resources your RBAC wouldn’t otherwise allow.
Banned resource kinds. Workload types like Deployments, StatefulSets, Jobs, and Pods are blocked by default to prevent misuse.
Controller RBAC is explicit. The Helm chart defaults to granting the controller permission for HPAs and PDBs only. You opt in to additional resource types as needed.

Getting started#

New to the Worker Controller? Start with the getting started guide to set up automated versioned Deployments for your Temporal Workers on Kubernetes. Once you have a TemporalWorkerDeployment managing your Workers, adding autoscaling with WorkerResourceTemplate is straightforward.

Already using the Worker Controller? To add WorkerResourceTemplate, you’ll need:

The latest version of the Temporal Worker Controller Helm chart (Helm chart version v0.23.0, appVersion v1.5.0)
cert-manager installed in your cluster (required for the validating webhook’s TLS)
A TemporalWorkerDeployment already managing your Workers

From there, create a WorkerResourceTemplate that references your TemporalWorkerDeployment and includes the resource template you want attached. The worker resource template documentation has the full reference, including examples for HPAs, PDBs, and RBAC configuration.

What’s next#

The Temporal Worker Controller automates the hardest part of running Temporal Workers on Kubernetes: safe, versioned Deployments that coordinate with the Temporal Server. With WorkerResourceTemplate, autoscaling now comes along for the ride: each version scales independently, and cleanup is automatic.

If you’re managing Temporal Workers on Kubernetes, give the Worker Controller a try. And if you’re already using it, add a WorkerResourceTemplate and let us know how it works for your workloads.