Benchmarking Latency: Temporal Cloud vs. Self-Hosted Temporal

Temporal Symbol light 1@2x

Rob Holland and Meagan Speare

Developer Relations and Product Marketing

Developers often worry Temporal will add latency to their applications. Temporal provides a variety of features and Workflow design patterns to help you meet the latency requirements of most apps, but ultimately, your minimum latency will depend on the network latency between your Workers and the Temporal Service, and how well the Temporal Service is tuned.

You might assume your application latency will be higher if you use Temporal Cloud rather than self-host Temporal. After all, with Temporal Cloud, the Temporal Service will no longer be located on the same infrastructure as your Workers, increasing network latency.

In reality, the opposite is true: end-to-end application latency is significantly lower when using Temporal Cloud compared to self-hosted Temporal. Temporal Cloud offers important architectural advancements to reduce latency, including a custom persistence layer. The latency improvements in Temporal Cloud’s architecture are so effective that they eclipse the cost of the higher network latency incurred when talking to Temporal Cloud.

We’ve frequently noticed latency improvements when customers migrate from self-hosted Temporal to Temporal Cloud. To quantify these observations, we benchmarked application-side metrics against a self-hosted Temporal Service and Temporal Cloud, with application Workers hosted in the same region.

The results demonstrate lower latency in Temporal Cloud compared to the self-hosted instance, supporting what we often tell customers: Temporal Cloud is the best choice for internet scale and low latency workloads.

Benchmark Overview and Setup

We measured five application-side SDK metrics, which we chose because they contribute to application latency. Temporal’s SDKs emit these metrics by default.

The benchmarking infrastructure was established using a specialized latency benchmarking framework, which is accessible here. This framework constructs a Kubernetes cluster and deploys Omes, our designated tool for benchmarking and load testing. Omes incorporates Temporal Workers and a scenario runner to simulate an application environment.

To evaluate a wide range of SDK metrics we used the “throughput_stress” Omes scenario, which uses a comprehensive set of Temporal primitives. The activities used by the “throughput_stress” workflow are lightweight, with little CPU required. As this benchmark is focused on latency, rather than throughput, the scenario was configured to skip sleeps and to only run one workflow at a time. With this configuration, the workflow’s end-to-end latency can serve as an effective metric for comparing Temporal Service instances.

During the benchmark, all pods, nodes, and databases were below 80% CPU utilization.

For testing of a self-hosted Temporal Service, the tool installs a Temporal Service backed by a MySQL database.

All of the measurements in this post were recorded in clusters running in the AWS us-west-2 region.

Metric 1: WorkflowEndtoEnd Latency

benchmark-blog-image1

What it measures: Workflow_EndtoEnd_Latency measures total execution time, from schedule to completion, for a single Workflow Execution.

Why we benchmarked it: This metric can be used to quickly compare the performance of instances running the same workload, as it shows the latency of a full Execution.

Results

p50 Latency p90 Latency
Self-Hosted Temporal 750 ms 950 ms
Temporal Cloud 376 ms 476 ms

Metric 2: StartWorkflowExecution Latency

benchmark-blog-image2

What it measures: the round-trip time for requests to start a Workflow. To start a Workflow your application contacts the Temporal Service. The Temporal Service durably persists records to represent the Workflow, and acknowledges the request with a response to your application. As the request has been durable persisted, the Temporal Service does not have to wait until the Workflow has started executing to respond.

Why we benchmarked it: This metric is important because applications tend to start Workflows frequently, often as a result of inline handling of a web request. Keeping this latency low is particularly important to avoid holding up web requests.

Results

p50 Latency p90 Latency
Self-Hosted Temporal 23.8 ms 42.6 ms
Temporal Cloud 17.7 ms 23.8 ms

Metric 3: SignalWorkflowExecution Latency

benchmark-blog-image3

What it measures: the round-trip time for requests to signal a Workflow. As with Start workflow requests, these must be persisted by Temporal Service to ensure they will not be lost.

Why we benchmarked it: Applications use Signals to inform Workflows of external events. These Signals are often delivered as the result of some action in a UI or receiving an event on a message queue. Low latency here helps keep the application responsive and improves message queue efficiency.

Results

p50 Latency p90 Latency
Self-Hosted Temporal 17.5 ms 23.5 ms
Temporal Cloud 7.64 ms 9.76 ms

Metric 4: RespondWorkflowTaskCompleted Latency

benchmark-blog-image4

What it measures: the response time from Workers to the Temporal Service when a Workflow Task is completed. As Workflows make progress, the Workers must communicate with the Temporal Service, detailing which actions must be taken next. This may be starting a new child workflow, scheduling an Activity, or simply just setting a Timer to wake the Workflow up again later.

Why we benchmarked it: Workflow throughput is impacted by how quickly Workers can communicate with the Temporal Service. If the Temporal Service responds more quickly, then Worker performance improves, and subsequently application performance.

Results

p50 Latency p90 Latency
Self-Hosted Temporal 23.9 ms 51.5 ms
Temporal Cloud 17.8 ms 24.7 ms

Metric 5: RespondActivityTaskCompleted Latency

benchmark-blog-image5

What it measures: the response time from Workers to the Temporal Service when an Activity Task is completed.

Why we benchmarked it: Workflow throughput is impacted by how quickly Workers can communicate with the Temporal Service. Activities are used to perform a single, well-defined action such as calling another service or processing data. If the Temporal Service responds more quickly, then Worker performance improves, and subsequently application performance.

Results

p50 Latency p90 Latency
Self-Hosted Temporal 23.9 ms 61.7 ms
Temporal Cloud 17.3 ms 30.8 ms

Analysis of the latency differences

This benchmark supports what we’ve observed anecdotally after customer migrations: Temporal Cloud provides lower application-side latency than self-hosted Temporal. The lower latencies on Temporal Cloud across the board resulted in a reduced end-to-end latency for our test workflow. Temporal Cloud was able to complete the workflow nearly twice as fast as the Self-Hosted Instance, completing the workflow in 50.1% of the time at both p50 and p90.

These latency improvements can be attributed to Temporal Cloud’s custom persistence layer, which includes more efficient sharding, a write-ahead log (WAL), and tiered data storage. We designed this architecture specifically for high throughput and large scale. As is apparent in this benchmark, the benefits of the custom persistence layer far outweigh any incurred network latency when using Temporal Cloud.

What this benchmark means for production applications

Thousands of users currently run applications in production with both self-hosted Temporal and Temporal Cloud. Our recommendation is that you should consider Temporal Cloud for all production workloads due to the service level guarantees and support–especially for latency-sensitive, large-scale, or business-critical applications.

A final consideration is that Temporal Cloud provides a lower price per performance than self-hosted Temporal. The self-hosted Temporal Service in this benchmark was well-tuned and never overloaded (it was below 80% CPU at all times). In reality, this is not always the case. It can be labor-intensive to scale the Temporal Service for a high-throughput use case. You must scale your database (Postgres, MySQL, or Cassandra) and manage four additional independent services. The database and all services must be resourced properly: provisioned for peak load to avoid bottlenecks, and deployed in a highly available manner. These infrastructure and operational costs are steep compared to the consumption-based costs of Temporal Cloud.

Learn more and run the benchmark yourself

We’ve provided details on how to reproduce this benchmark here. As with any benchmark, results may vary depending on your workload and scale.

Here are some other helpful resources to learn more: