Tips for running Temporal on Kubernetes

Kubernetes has become the foundation for deploying and operating cloud-native applications at scale. It excels at keeping your application topology healthy by replacing failed pods, managing rollouts, and handling networking and service discovery.

However, Kubernetes stops at the infrastructure boundary. It does not manage what happens inside your applications. It will not handle retries when a call fails, recover in-progress state after a crash, or coordinate multi-step workflows across services.

Why Temporal?#

Temporal provides durability as a service. It gives developers a programming model that automatically handles state persistence, retries, and recovery so Workflows continue reliably even when infrastructure is constantly changing. All you need to do is write the happy path and determine the business logic of your code, and Temporal takes care of the rest.

Running Temporal on Kubernetes combines operational and logical resilience. Kubernetes keeps workloads alive and healthy, while Temporal ensures that application logic remains consistent and durable. Organizations rely on Temporal to guarantee reliable execution — even when infrastructure is unstable or services fail mid-operation.

In this post, we share practical tips for deploying and operating Temporal on Kubernetes to achieve predictable latency, safe upgrades, and efficient scale.

Run the Temporal Service on Kubernetes with Helm charts#

Temporal provides official Helm charts that make it simple to deploy the Temporal Server components (Frontend, History, Matching, Worker) to Kubernetes. This Helm chart can also be used to install just the Temporal server, configured to connect to dependencies (such as a Cassandra, MySQL, or PostgreSQL database) that you may already have available in your environment.

Only the portions of the Helm chart that configure and manage Temporal itself are considered production-ready. The bundled configurations for Cassandra, Elasticsearch, Prometheus, and Grafana are minimal development configurations and should be reconfigured for a production deployment.

If you already use Helm for production deployments, you can use our Helm chart with these important modifications:

Disable bundled dependencies: Turn off all the included add-ons (Cassandra, Prometheus, etc.) that come with the chart.
Deploy only Temporal services: Configure the chart to install just the core Temporal components.
Use external dependencies: Connect to your existing, externally managed databases and infrastructure instead.

Upgrading is also an important consideration for managing Helm charts, and it’s crucial to update your database schema before running a Helm upgrade. This prevents schema mismatches that can lead to downtime or failed migrations. We recommend automating this step in your upgrade process for consistency and safety.

If you’re exposing the Temporal Web UI, it’s recommended to configure authentication and environment variables early, especially when integrating with SSO or embedding the UI into your own platform. Doing this upfront avoids surprises later when you want to scale access to multiple users.

Deploy Temporal Workers on Kubernetes#

Temporal Workers are long-running processes that poll for Tasks and execute your Workflow and Activity code. Kubernetes is an ideal platform for running Workers because it handles exactly what Workers need: automatic restarts on failure, horizontal scaling based on workload, and seamless deployments with zero downtime.

As your Workflow execution demands grow, Kubernetes lets you scale Worker pods independently from your other services, ensuring your Task Queues never bottleneck your system.

Basic deployment approach#

1. Containerize your Worker: First, create a Dockerfile for your Worker. Here’s an example for Python from the Quick Launch — Deploying your Workers on Amazon EKS guide:

# Use Python 3.11 slim image as base
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Install the Temporal Python SDK dependency
RUN pip install --no-cache-dir temporalio

# Copy application code
COPY . .

# Set Python to run in unbuffered mode
ENV PYTHONUNBUFFERED=1

# Run the worker
CMD ["python", "worker.py"]

Important note: You should dockerize your Worker code (which includes both your Workflow definitions and the Worker process that polls for Tasks) together, not separately. Create one Docker image containing both your Workflow code and Worker.

2. Deploy to Kubernetes: Create a Kubernetes Deployment manifest with YAML

apiVersion: apps/v1
kind: Deployment
metadata:
   name: your-app
   namespace: your-namespace
   labels:
      app: your-app
spec:
   selector:
      matchLabels:
         app: your-app
   replicas: 1
   template:
      metadata:
         labels:
            app: your-app
      spec:
         serviceAccountName: your-app
         containers:
            - name: your-app
              image: <your-ecr-image-name>
              env:
                - name: TEMPORAL_ADDRESS
                  valueFrom:
                    configMapKeyRef:
                      name: temporal-worker-config
                      key: TEMPORAL_ADDRESS
                - name: TEMPORAL_NAMESPACE
                  valueFrom:
                    configMapKeyRef:
                      name: temporal-worker-config
                      key: TEMPORAL_NAMESPACE
                - name: TEMPORAL_TASK_QUEUE
                  valueFrom:
                    configMapKeyRef:
                      name: temporal-worker-config
                      key: TEMPORAL_TASK_QUEUE
                - name: TEMPORAL_API_KEY
                  valueFrom:
                    secretKeyRef:
                      name: temporal-secret
                      key: TEMPORAL_API_KEY
              resources:
                limits:
                  cpu: "0.5"
                  memory: "512Mi"
                requests:
                  cpu: "0.2"
                  memory: "256Mi"

You’ll then need to apply the deployment.yaml file to the EKS cluster.

3. Use ConfigMaps for non-sensitive configuration and Secrets for sensitive data like API keys, as shown in the deployment guide.

When you deploy the Temporal Service and Workers in the same Kubernetes Cluster, Workers running inside the Cluster can simply connect to the Frontend service using the Cluster-internal DNS (for example, temporal-frontend.temporal:7233).

For Workers outside the Cluster (such as in another VPC, another cloud account, or on-prem), you’ll need to expose the Frontend via a load balancer or Ingress (for example, using AWS NLB/ALB annotations) so the gRPC port 7233 is reachable externally.

Some tips#

Use standard Kubernetes patterns: Workers are just regular applications — apply your existing K8s best practices when deploying them.
Consider the Worker Controller: For advanced version management, explore the Temporal Worker Controller for zero-disruption deployments. This Kubernetes operator enables you to safely deploy new Worker versions by ensuring old Workflows complete on existing Workers while new Workflows start on updated Workers.
Use Worker Versioning: This feature helps you manage different builds or versions, formally called Worker Deployment Versions. This feature enables you to ramp up traffic gradually to a new Worker Deployment Version, verify a new Deployment Version with tests before sending production traffic to it, or initiate an instant rollback when you detect that a new Deployment Version is broken.

Autoscaling Temporal Workers on Kubernetes#

Once your Workers are deployed to Kubernetes, the next challenge is scaling them. The key insight: traditional CPU and memory metrics often mislead when it comes to Temporal workloads.

Why standard metrics fall short#

You might see perfectly healthy CPU usage while your Task latency climbs. This happens because Workers spend significant time waiting on external operations, such as database queries, API calls, or coordination with the Temporal Server. During these waits, CPU stays low even though Workers are busy and can’t accept new tasks. Your autoscaler thinks everything’s fine while users experience degraded service.

Metrics that actually matter#

Task Queue Backlog gives you a direct count via the DescribeTaskQueueEnhanced API. If 100 Tasks are waiting and each Worker handles 10 concurrent Tasks, you need 10 Workers. Simple math, clear signal.
Schedule-to-Start latency is your primary signal. This measures the time from when a Task is scheduled until a Worker picks it up. Available as activity_schedule_to_start_latency and workflow_task_schedule_to_start_latency, increasing latency directly indicates Tasks are waiting in Queue — time to scale up.
Worker Task slots (temporal_worker_task_slots_available and temporal_worker_task_slots_used) tell you if Workers are at capacity. Calculate utilization as: (used / (used + available)) * 100.

Critical considerations#

Avoid premature scale-down. A backlog of zero doesn’t mean Workers are idle; they might be actively processing Tasks. Always check worker_task_slots_used alongside backlog metrics.
Ensure graceful shutdown so Workers complete in-flight Tasks before termination. Set appropriate termination grace periods in your Pod specs.

Rather than trying to scale manually, it’s even easier to use KEDA-based autoscaling to automatically scale Temporal Workers based on a Task Queue’s backlog. This is particularly useful for handling variable workload demands efficiently in a Kubernetes environment. You can see the full demo here and check out the repository to get started.

Planning capacity and tuning for production#

Scaling a Temporal Cluster isn’t a one-size-fits-all process. Every team has different workload patterns, business requirements, and operational constraints. When you first deploy Temporal, you’ll find it configured with development-level defaults: fine for getting started, but not ready for production traffic.

The key is to approach scaling iteratively. We recommend a continuous cycle: gradually increase your load through testing, watch where the system struggles, make targeted adjustments, then repeat.

Always load test on dedicated test Clusters before applying changes to production. Each iteration teaches you something new about how your specific workloads behave under pressure. Maybe you’ll discover you need more CPU headroom, or perhaps you’re over-provisioned and can scale down to save costs.

The hidden performance killer: CPU throttling#

As you tune your Cluster, you’ll likely encounter a subtle but critical issue with request latency. The Temporal Server needs low, consistent latencies to maintain high throughput, and Kubernetes can sometimes act counterintuitively against this. When a container tries to use more CPU than its limit allows within a 0.1-second window, Kubernetes throttles it.

GOMAXPROCS controls how many OS threads Go uses for concurrent execution. By default, Go detects all CPU cores on the host machine. If your Pod has a 2-core CPU limit but Go sees 64 cores on the node, it will try to use all 64, leading to CPU throttling.

The solution: set GOMAXPROCS to match your Pod’s CPU limit. For example, if you’ve allocated two cores, set GOMAXPROCS=2. This alignment prevents throttling, stabilizes latencies, and can actually reduce overall CPU usage while improving performance.

The good news? From Temporal 1.21.0 onward, this happens automatically if you haven’t configured it yourself.

You’re ready to get started!#

Now you’re ready to deploy and operate Temporal on Kubernetes with confidence. By following these patterns, you’ll build a resilient foundation that scales with your business needs. Remember to iterate on your capacity planning as your workload patterns evolve, and leverage Kubernetes-native tools like KEDA to automate scaling decisions.

If you want to focus on building applications and not worrying about clusters, Temporal Cloud handles all the operational complexity for you. Get started for free today or talk to our team about your use case.