Deploying Temporal Workers to Google Cloud Run

AUTHORS
Rick Ross, Brandon Chavis
DATE
Jan 06, 2025
DURATION
10 MIN

4/14/2026 update: Google Cloud Run has launched Worker Pools, a new resource type specifically designed for performing continuous background work, which is a better fit for running Temporal Workers on Cloud Run. Additionally, the Cloud Run team has introduced CREMA, (Cloud Run External Metrics Autoscaling), which can be used to scale Worker Pools based on external metrics, even scaling to zero! This blog post updates guidance on Worker deployment best practices based on the latest capabilities in Cloud Run.


Unlike Kubernetes, Google Cloud Run makes it trivial to deploy container-based applications. Have a web API that you want to deploy? Package it up in a container and let Cloud Run take over provisioning the underlying infrastructure, load balancer, and DNS endpoint, and your application is ready to receive traffic. As traffic to your popular API goes up and down, Cloud Run automatically scales based on the inbound traffic and CPU utilization. It’s truly amazing how easy it is to deploy, run, and scale web-based applications.

Temporal Worker applications, however, operate differently. They long-poll Temporal Cloud and process Tasks as they become available. This is a much better match for Worker Pools, which was purposefully designed to handle continuous background work. Worker Pools do not have a publicly exposed port or use a load balancer like Cloud Run Services. They also do not support auto-scaling.

To scale Worker Pools, Google provides Cloud Run External Metrics Autoscaling (CREMA), which leverages Kubernetes-based Event Driven Autoscaling (KEDA) to scale Worker Pools based on external metrics such as Approximate Backlog Count from a Temporal Task Queue. It follows the specification outlined here.

With some configuration and a sidecar container, getting a Temporal Worker running on Cloud Run using Worker Pools is straightforward. This article will focus on the required Cloud Run configuration and sidecar container aspects, and configuring CREMA to scale appropriately. It will not focus on handling mTLS secrets or other application-specific details. A complete example that includes application details can be found here.

Configuring your application UI#

After you have packaged up your application into a container, you use a Cloud Run Service YAML file to provide Cloud Run the details on how to run your application. The YAML file includes things like Service Account, the container image, startup and liveness probes along with how much memory and CPU your application requires.

You also specify the command required to start your application, which port will be mapped to a public endpoint, volumes and volume mounts. Volumes are used to provide the mTLS certificates required to authenticate to your Temporal Cloud namespace. You can find a working example of a YAML file here.

Configuring your Worker#

To deploy a Temporal Worker to Cloud Run, use a Worker Pool YAML file that provides the details necessary to deploy and run your Worker. You can think of the Worker Pool configuration as a paired down version of the Service configuration. You still need to specify a Service Account, your Worker application container image, memory, CPU and the command to start the Worker. Volumes and volume mounts are also used as well. You can find a working example of a YAML file here.

Sidecar container#

Sidecar containers are a common deployment pattern in the Kubernetes ecosystem. They are another application that is deployed alongside the primary service or application that provides additional functionality. Service meshes in Kubernetes like Istio and Linkerd use sidecars to control traffic in and out of the application. Cloud Run allows you to attach sidecar containers.

Temporal Workers can be configured to emit metrics. These metrics are made available as a Prometheus scrape endpoint. Since the Worker cannot make these metrics publicly visible, a sidecar container will be deployed to read the metrics endpoint and send the metrics to Google Cloud Managed Service for Prometheus using the OpenTelemetry Collector.

OpenTelemetry Collector#

OpenTelemetry is an observability framework and toolkit that is designed to manage telemetry data such as traces, metrics, and logs. It is vendor- and tool-agnostic. The OpenTelemetry Collector acts like a proxy to receive, process, and export data to a supported platform. In addition to supporting Google Cloud Managed Service for Prometheus, other exporters include Datadog, Splunk, and Google Cloud Pubsub. A full list of exporters is available here.

Collector configuration#

The OpenTelemetry Collector uses a configuration file that specifies the receivers, processors, exporters, and the service section. The receivers section needs to look similar to this:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'temporal-metrics-app'
          scrape_interval: 5s
          metrics_path: '/prometheus'
          static_configs:
            - targets: ['127.0.0.1:8081']

The two important lines are metrics_path, which is the path used to read the application’s metrics, and targets, which indicates the IP address and port number of the application. Notice that the IP refers to localhost, and the port must match the port of the application that exposes the metrics.

In Cloud Run, sidecar containers share the same network namespace and communicate with each other using localhost and the corresponding port.

For the exporters section, the configuration is trivial:

exporters:
  googlemanagedprometheus:

Simply defining googlemanagedprometheus is sufficient. The OpenTelemetry Collector supports multiple exporters (and receivers too), so if you wanted to send the metrics to an additional destination, or to somewhere other than Google Managed Service for Prometheus, you would need to add the appropriate configuration here.

There are other sections that were used but have left them out for brevity. The full configuration file is available here.

Viewing the metrics#

Once the application is deployed, and Workers have been triggered, metrics will be sent to Google Managed Service for Prometheus. To view them, open up the Google Cloud Console and navigate to Monitoring, Metrics Management. In the filter, you can type “temporal” and hit enter. The list will be filtered to those emitted by the SDK.

gcr-1

Find the metric temporal_long_request_total/counter in the list. Click the menu on the right side that looks like three vertically stacked dots, and choose View in Metrics Explorer.

gcr-2

Click in the box that shows a date and time range and select the last 12 hours. You should see a graph that looks similar to this:

gcr-3

Feel free to experiment with adding additional metrics. The Temporal documentation on SDK metrics provides detailed information on metrics, their type, and other key information. Key metrics for tuning performance on Workers can be found here.

Scaling with CREMA#

Worker Pools use manual scaling by default. Without an autoscaler, you’d need to manually set instance counts. CREMA solves this by reading your Temporal Task Queue backlog and scaling the Worker Pool accordingly.

How it works#

  1. CREMA polls Temporal Cloud every 15 seconds to check Task Queue backlog.
  2. When backlog / targetQueueSize > current instances, CREMA scales up.
  3. When the Queue empties, CREMA scales Workers back down (or to zero).

gcr-4

The scaling formula#

CREMA uses KEDA (Kubernetes Event-Driven Autoscaling) scalers under the hood. KEDA’s scaling algorithm applies the same ratio-based formula across all its scalers:

Desired replicas = ceil(metricValue / targetValue)

For Temporal:

  • metricValue = Task Queue backlog depth
  • targetValue = targetQueueSize

Example: ceil(25 / 5) = 5 Worker instances.

targetQueueSize is the key tuning parameter. It controls how many queued Tasks each Worker instance is expected to handle. Lower values scale more aggressively (more responsive, higher cost). Higher values pack more work onto fewer instances (more efficient, slower to react). The right value depends on your Worker’s concurrency settings, what your Activities do, and your Cloud Run instance size. Start with a reasonable value, observe Queue depth and processing latency, and adjust.

Parameter Example Notes
targetQueueSize 5 Tasks per replica. Tune to your workload: depends on Worker concurrency, Activity duration, and instance size
activationTargetQueueSize 1 Threshold to scale from zero. Set to 1 to spin up the first instance as soon as any Task appears
queueTypes workflow,activity Monitor both Task types for accurate backlog. Activity-only misses initial Workflow starts
selectUnversioned "true" Required for Workers without Build IDs (most deployments)
pollingInterval 15 Seconds between backlog checks. Lower = more responsive, higher = fewer API calls

CREMA configuration#

CREMA uses a YAML config stored in Google Cloud Parameter Manager. The key section is the Temporal KEDA scaler trigger:

apiVersion: crema/v1
kind: CremaConfig
spec:
  pollingInterval: 15

  triggerAuthentications:
    - metadata:
        name: temporal-cloud-auth
      spec:
        gcpSecretManager:
          secrets:
            - parameter: apiKey
              id: temporal-api-key
              version: latest

  scaledObjects:
    - spec:
        scaleTargetRef:
          name: projects/%PROJECT_ID%/locations/%REGION%/workerPools/temporal-worker
        # Change to 1 to always have an instance running
        # otherwise will scale down to zero
        minReplicaCount: 0
        maxReplicaCount: 10
        triggers:
          - type: temporal
            metadata:
              endpoint: %TEMPORAL_ENDPOINT%
              namespace: %TEMPORAL_NAMESPACE%
              taskQueue: MetricQueue
              targetQueueSize: "5"
              activationTargetQueueSize: "1"
              queueTypes: workflow,activity
              selectUnversioned: "true"
            authenticationRef:
              name: temporal-cloud-auth
        advanced:
          horizontalPodAutoscalerConfig:
            behavior:
              scaleUp:
                stabilizationWindowSeconds: 0
                policies:
                  - type: Percent
                    value: 100
                    periodSeconds: 15
              scaleDown:
                stabilizationWindowSeconds: 60
                policies:
                  - type: Pods
                    value: 2
                    periodSeconds: 30

Deploying CREMA#

CREMA itself is deployed as a Cloud Run Service (it exposes HTTP port 8080 for metrics):

# Upload config to Parameter Manager
gcloud parametermanager parameters versions create 1 \
  --location=global \
  --parameter=crema-config \
  --payload-data-from-file=./crema-config.yaml

# Deploy CREMA
gcloud beta run deploy crema-autoscaler \
  --image=us-central1-docker.pkg.dev/cloud-run-oss-images/crema-v1/autoscaler:1.0 \
  --region=us-central1 \
  --service-account=crema-sa@PROJECT.iam.gserviceaccount.com \
  --no-allow-unauthenticated \
  --no-cpu-throttling \
  --min-instances=1 \
  --set-env-vars="CREMA_CONFIG=projects/%PROJECT%/locations/global/parameters/crema-config/versions/1"

CREMA needs three IAM roles:

  • roles/run.developer to scale the Worker Pool
  • roles/secretmanager.secretAccessor to read the Temporal API key
  • roles/parametermanager.parameterViewer to read its own config

The example that comes with this article includes infrastructure automation using Pulumi. It creates a new project, enables the appropriate APIs, creates service accounts, and then deploys the UI application, the Worker in a Worker Pool, the sidecar metrics container, and a CREMA autoscaler. For more information on duplicating this project, see the readme.

Configuration gotchas#

From our testing, a few things to watch for:

  1. selectUnversioned: "true" is required for Workers without Build IDs. Without it, the scaler reports 0 backlog even when Tasks are queued.
  2. CREMA must stay running. Set min-instances=1 on the CREMA service. If it scales to zero, nobody is watching the Queue.
  3. Activity backlog is 0 until Workflows execute. If you only monitor queueTypes: activity, the scaler won’t react to new Workflow starts. Use workflow,activity for the full picture.
  4. Worker Pool paths use lowercase workerPools in the scaleTargetRef.name path (e.g., projects/%PROJECT_ID%/locations/%REGION%/workerPools/NAME).

Wrapping it all up#

When should you use Cloud Run, and when should you use Kubernetes? The answer to this question is not that simple because it depends on a number of factors. How much experience does your team have with Kubernetes? How quickly will you need to scale? How many distinct Temporal Workers will you be running? Are you using a microservices architecture?

My general recommendation, without knowing the specifics of your requirements, skills, and objectives, would be to start with Cloud Run. If you outgrow Cloud Run, then use GKE Autopilot, and if you run into a limitation on Autopilot, use GKE Standard.

Hopefully, this post, plus some help from a sidecar container, will prepare you for deploying Temporal Workers to Cloud Run, scale them with CREMA, and viewing SDK metrics. For a full working example, be sure to check out the repository here.

What types of Temporal Workers will you be deploying on Cloud Run? Let us know in our Community Slack Channel!

Temporal Cloud

Ready to see for yourself?

Sign up for Temporal Cloud today and get $1,000 in free credits.

Build invincible applications

It sounds like magic, we promise it's not.