There are many applications for a highly scalable, fault-oblivious system like Temporal. Here are just some of the most popular, to jog your imagination.
- Microservices Orchestration
- Distributed Transactions
- Infrastructure Provisioning
- Monitoring and Polling
- Data Pipelines
- Long Running Processes
- DSL Workflows
- Scalable Actors
It is common to break a large application into microservices structured around the application's distinct business capabilities:
- Each microservice is owned by separate teams who make their own technology choices and release processes.
- Each microservice typically has its own storage and interacts with other services via a well-defined API.
- Application/Product developers need to call multiple microservices to achieve the desired outcome.
- Every time you cross a system boundary, your chance of failure multiplies.
- Product developers spend a lot of effort implementing queues, timeouts and retries to ensure that API calls eventually succeed, preserving business rules across multiple independent sub-domains.
- Service interdependencies can be remarkably complicated. Processes can run asynchronously or in parallel, some tasks need information from other systems, and the next steps often depend on the outcome of previous Activities.
Temporal solves this by providing a central orchestrator providing "reliability on rails" to every team.
- Retries, Timeouts, Sagas: It guarantees that Workflow code eventually completes, has built-in support for exponential Activity retries, and simplifies the coding of the compensation logic with native Saga pattern support. You can define retries, rollbacks, or even a human intervention step in the case of failure.
- Observability: Temporal tracks the entire event history of each Workflow's state. Contrast this with ad-hoc orchestration based on queues where getting a current status of each request is virtually impossible.
- Scale: Temporal seamlessly scales to a large number of Workflows running in parallel.
Here are two real-world examples of Temporal-powered service orchestration scenarios:
- Using Temporal Workflows to spin up Kubernetes by Banzai Cloud
- Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine
Provisioning resources depends on a series of potentially long-running operations with many possibilities for intermittent failures. While existing deployment tools support simple operations, many scenarios may still require a custom provisioning flow:
- Automatic infrastructure provisioning for a new customer in multi-tenant environments.
- Particularly large deployments when tens or even hundreds of thousands of resources should be configured.
- Provisioning of custom resources that are not supported by off-the-shelf tools.
- Complex configuration logic that is determined at deployment time.
It's beneficial to have a single workflow engine to manage all the various tasks: spinning up the cluster, long term monitoring, managing upgrades, database schema migrations, automated staged rollout of new features.
Some provisioning operations may take dozens of minutes or even hours to complete. Ad-hoc solutions may fail in the middle and leave the system in an undefined state.
Temporal Workflows can express complex decision trees using a general-purpose programming language. Support for long-running operations, polling, responding to events, automatic retries are excellent building blocks for a robust provisioning flow. If a lengthy provisioning Workflow fails in the middle, Temporal would handle the error and restart the flow at the right spot.
Temporal can route Activity execution to a specific process or host, which is useful for many provisioning scenarios.
Many resource management operations require locking to ensure that only one mutation is executed on any given resource at a time. Temporal provides a strong guarantee of uniqueness via the operation identifier. This primitive enables the implementation of locking behavior in a fault-tolerant and scalable manner.
CI/CD. Implementing CI/CD pipelines and deploying applications to containers or virtual or physical machines is a non-trivial process. The logic has to deal with complex requirements around rolling upgrades, canary deployments, and rollbacks. Temporal is a perfect platform for building a deployment solution because it provides all the necessary guarantees and abstractions, allowing developers to focus on business logic.
Temporal can assist and augment existing DevOps automation, deployments, load testing, orchestration of real-time analytics, builds, and integration testing.
Managed deployments. Imagine that you have to create a self-operating database similar to Amazon RDS. Multiple projects use Temporal to automate managing and automatic recovery of various products like MySQL, Elasticsearch, Apache Cassandra, and HashiCorp Consul.
Kubernetes provisioning. Kubernetes deployments involve managing components on multiple levels. A Workflow could create virtual machines, join them in a Kubernetes cluster, and install certain components like autoscaler. You may want to support this Workflow on multiple cloud providers, so you'd have to deal with their specifics and still have a resilient pipeline for those infrastructure management tasks.
Here are some real-world use cases of infrastructure provisioning:
- Using Temporal Workflows to spin up Kubernetes, a blog post by Banzai Cloud
- Using Temporal to orchestrate cluster lifecycle in HashiCorp Consul, a video by HashiCorp
Most businesses have to deal with managing complex monetary transactions and transfers, including:
- Handling consumer's subscriptions, installment payments, and communications in a reliable and timely manner.
- Integrating with multiple payment systems and shopping platform backends.
- Detecting suspicious and fraudulent Activities.
Similar to microservices orchestration, such workflows need a way to deliver the transactional consistency—but across multiple third-party vendors. Each of these third-party systems has a potential for failure, delays, or intermittent availability issues. Despite the challenges, the entire process represents a long-running transaction that needs to eventually complete in a predictable way.
In some cases, instead of trying to complete the process by continuously retrying, compensation rollback logic should be executed. Saga Pattern is one way to standardize compensation APIs.
Temporal provides an extensive toolset for dealing with the unpredictability of external services via reliable and transparent mechanisms: built-in execution guarantees, exponential Activity retries, timeouts.
Temporal boasts native Saga Pattern support out of the box. Simply define a compensation action for each Workflow Activity. That way, when a failure happens in one of the downstream services, compensation actions will run for each of the Activities that previously succeeded.
The Workflow snippet below orchestrates two Activity calls: booking a hotel and reserving a flight. If the first Activity fails (including all the configured retries), the Workflow returns directly.
However, if the first Activity (reservation) succeeds, but the second one fails, you need to cancel the already booked hotel to avoid undesirable charges. The error-handling block contains a call that cancels the hotel reservation before completing the Workflow.
Most business applications rely on data processing pipelines of some sort:
- ETL process that moves data between databases.
- Machine learning training solution.
- Data aggregation and analytics.
- Staging data from a transactional database to a warehouse.
Many of these jobs are not pure data manipulation programs. They also need to enrich the data and tie relevant services together. For example, processing a record may require external API calls that can fail and potentially take a long time.
It is common to have large data sets partitioned across many hosts or databases or have billions of files in a distributed storage. Running a myriad of data processing jobs in parallel is a hard engineering problem. You have to track to their individual statuses, schedule them on available workers, and ensuring that all the subtasks succeed.
Temporal provides hard guarantees around the durability of data and seamlessly deals with long-running operations, retries, and intermittent failures. Temporal handles the distributed nature of these systems automatically.
Temporal is an ideal solution for implementing a full scan of a dataset in a scalable and resilient way. The standard pattern is to run an Activity (or multiple parallel Activities for partitioned data sets) that performs the scan and heartbeats its progress back to Temporal. In the case of a host failure, the operation is retried on a different host and continues execution from the last reported progress.
One crucial feature of Temporal is its ability to route task execution to a specific process or host. It is often useful to control how ML models and other large files are distributed across hosts. For example, if an ML model is partitioned by the city, the requests should be routed to hosts that contain the corresponding city model.
Many business processes naturally have a long duration and may run for hours, days, months, or even years:
- Expense approval process requiring manual intervention from a human.
- A process of labeling data for ML where an expert fills metadata via a user interface.
- Fraud detection system, where workflows react to events generated by consumer behavior.
- Customer loyalty program where the workflow accumulates reward points and applies them when requested.
- User nurturing process that sends educational emails based on the schedule and past user Activities.
It's typical to use a distributed asynchronous event-driven architecture in such scenarios. Unfortunately, this means that the code is now scattered across multiple handlers and does not resemble the structure of the original business process.
Temporal provides a holistic approach for implementing long-running Workflows. Workflow execution is paused and resumed by the runtime as required. Developers no longer have to focus on edge cases and boilerplate code that handles long periods of inactivity, and can instead concentrate exclusively on business logic.
Temporal has direct support for asynchronous events (aka signals). Temporal's simple programming model handles a lot of complexity around state persistence and ensures external action execution through built-in retries.
A sophisticated Workflow may have a few different paths that branch out, that you want to execute in parallel, but then join them back together, even if each track might take days to run. With Temporal, it's straightforward to express complicated decision paths since everything is represented as code rather than via a GUI or JSON configuration.
There is often a need for monitoring and periodic maintenance of IT systems on top of infrastructure provisioning. Polling is executing a regular action to check for a state change, for example:
- Pinging a host to make sure it's online and responsive.
- Once-per-minute health checks of a production deployment.
- Polling an API for a specific resource to become available.
- Triggering and executing periodic backups.
- Pushing configuration updates when they become available.
- Failing over in an active-passive setup when the primary instance becomes unhealthy.
As monitoring is often an example of periodic execution of business logic, it can benefit from Temporal's distributed cron engine.
Temporal provides guaranteed execution with at-least-once semantics with automatic retries.
Polling configuration can be as straightforward or sophisticated as needed:
- Workflows can run on a cron schedule with a single configuration setting.
- Alternatively, you can manually control the delays between intervals with
sleepcommands. For example, you can switch to more frequent executions in case of detected downtime.
The history service provides visibility into history for periodic Workflow executions.
Scalability is another crucial advantage of using Temporal for periodic execution. Many use cases require periodic execution for a large number of entities. At Uber, some applications run recurring Workflows for each customer. Imagine 100s of millions parallel cron jobs that don't require a separate batch processing framework.
Temporal support for long-running Activities and unlimited retries also makes it a great fit for monitoring use cases.
Imagine a system that manages a large number of compute clusters. It monitors that a cluster is up and running, its CPU and network utilization, run backups and software upgrades.
You can model these operational Activities as one indefinitely long Workflow per cluster which can be alive anywhere from minutes to years, depending on the cluster lifetime. Each Workflow would run periodic Activities, react to user commands via signals, and coordinate multiple potentially conflicting operations.
Learn more about cron jobs in Temporal:
With Temporal, you usually implement business logic with programming languages like Java and Go. However, there are cases when using a domain-specific language (DSL) can be more appropriate.
Another use case would be a legacy system that uses some form of DSL for process definition but is not operationally stable and scalable. It could be a home-grown solution, or a system like Apache Airflow, various BPMN engines, and even AWS Step Functions.
An application can utilize the Temporal SDK to interpret the DSL definition. This automatically makes the DSL execution highly fault-tolerant, scalable, and durable since it's running on Temporal. This means that users can migrate their existing portfolio of internal DSL-based process definitions and take advantage of Temporal as an execution engine.
If your company uses multiple workflow engines internally, it can be very beneficial to unify them with Temporal. For one, it is more efficient to support a single product instead of many. Additionally, it's hard to overestimate the the benefit sharing Activities will bring across the company.
On top of that, Temporal comes with unmatched scalability and stability characteristics.
A typical pattern is to have a Workflow instance per business entity:
- A Workflow that tracks the status of a single IoT device.
- A loyalty program that accumulates reward points per customer.
- A routine that manages a unique resource in a conflict-free manner.
Each flow responds to asynchronous events from a target entity, persists some corresponding state, and takes actions according to the defined rules.
This programming paradigm is commonly known as the Actor Model.
Temporal Workflows are suitable to implement scalable actor systems. A Workflow execution represents a single actor, uses signals for events, and automatically keeps track of the state using the backend service.
There can be tens of millions of actors running simultaneously, and each actor will be in charge of processing messages for its corresponding entity.