Timers, Timeouts, and the Art of Waiting in Temporal

There's a question we see developers hit surprisingly often when they're first building with Temporal: should I use a timer or a timeout for this?

It seems like it should be obvious: both involve time, both involve waiting. But mixing them up causes some of the sneakiest bugs we’ve encountered, the kind of bugs where everything looks fine until a Worker is down for 20 minutes during a deployment, and then suddenly Workflows that were mid-flight just... die, with no cleanup and no explanation.

So let's talk through timers, Activity timeouts, and Workflow timeouts:

what they actually do
why they work the way they do
and how to choose the right tool for the job

The Core Mental Model: Detection vs. Business Logic#

Here's the most important thing to understand before anything else:

Timeouts detect failure. Timers implement business logic.

They are not interchangeable. They look similar on the surface - you're setting a duration and waiting - but they're fundamentally different tools answering different questions.

A timeout answers: "Has something gone wrong?" A timer answers: "Is it time to do the next thing?"

Think of it like the difference between an oven timer and a smoke detector. Your oven timer fires because dinner is ready. Your smoke detector fires because you burned dinner. You'd never swap those two: using a smoke detector as a timer (unless you like extra extra well done lasagna) would result in no dinner for the night, and using your oven timer as a smoke detector is likely to be wildly inaccurate.

Keep that analogy in mind. It'll save you.

Timers: Durable Waiting, for Free#

Temporal SDKs expose timer APIs. In most SDKs, it's just a Workflow.sleep() call that replaces whatever language-native sleep you'd otherwise reach for.

The reason you use Temporal's timer instead of your language's built-in sleep is durability. Temporal timers are persisted on the server. If your Worker goes down, the process crashes, or the entire cluster has a bad day while a timer is pending, the Workflow picks up right where it left off when things come back online. The timer still fires. Your code still runs.

A regular time.sleep(3600) in a long-running process would be lost if the process died. A Temporal timer survives it.

There's also a scaling benefit here that's easy to miss. Unlike a traditional timer, which would block the thread until the timer fires, Temporal Workers consume zero additional resources while waiting for a timer to fire. A single Worker can have millions of concurrent timers in flight. This is one of those areas where Temporal's design pays dividends you don't have to work for. You get massive concurrency on time-based waits basically for free.

A few practical notes on timers:

Duration can be anything from seconds to years. Need to send a follow-up email 30 days after signup? Want to check in on a long-running process every 4 hours? How about schedule monthly payments on your mortgage for the next 30 years? Just set the timer and wait.
Don't rely on sub-second precision. Temporal timers are durable, server-side constructs. They're not going to be accurate to the millisecond, so think of any duration you set as a minimum. A 10-second timer will fire in at least 10 seconds, but probably a bit more, due to scheduling latency.
Timers are the right tool for any "wait, then do something" business logic. Expiring a session, sending a reminder, escalating an approval, closing an order that hasn't been confirmed - these are all timer jobs, not timeout jobs.

It is very common with Temporal to wait for either a Timer or input via a Signal or Update, acting on whichever fires first. For example, in Java,

//in only one line of code we can wait for wall time or an event
boolean approved = Workflow.await(timeout, () -> approvalData != null);

You can see this pattern in detail in the Approval Design Pattern in the Temporal Design Patterns repository (kudos to Tao Guo for creating this).

How Activity Timeouts Work#

Activities are where your code actually talks to the outside world: calling APIs, writing to databases, processing files, and so on.

The names Start-to-Close, Schedule-to-Close, and Schedule-to-Start may first appear odd or foreign to you, but once you take a step back and look at the event history these should make a lot more sense.

Let's review the events that get admitted during an Activity execution in Temporal. Activity Execution results in three tasks being added to the Task Queue. The Scheduled event is when the Activity Execution was added to the queue, which is then followed by the Started event when the Activity matches with the Worker that will execute it. And finally, it enters a Closed state once execution has completed.

How Activity Timeouts Work

Now in the Event History you won’t see Activity Task Closed. Instead, you’re likely to see various other Events around failure mode such as Failed, Cancel Requested, or Timed Out. Each of these failed states, including the successful Completed state, represents a Closed state of the Activity. This means that for whatever reason, the Activity has concluded execution. It is these three quasi states Scheduled, Started, and Closed that give a hint of what the Timeouts below measure.

In case you’re curious how this works, in the Temporal Service, Timers and Activity Timeouts are set as a timer entry in Temporal's internal timer queue, which is essentially a sorted set (think skip list or min-heap) keyed by the fire_at timestamp. Temporal processes the timer queue and takes action when a new timer is ready to fire. For more details, there’s a great architecture document describing how it works.

Four Knobs, Each for a Different Purpose#

Because the outside world is unreliable, Temporal gives you four different timeout settings to control how you handle that unreliability and detect failures.

The four timeouts are:

Start-To-Close Timeout: set this to longer than the longest your Activity should ever run in a single attempt
Schedule-To-Close Timeout: set this when you have a hard deadline
Schedule-To-Start Timeout: rarely needed; useful for diagnosing Worker or queue problems
Heartbeat Timeout: essential for long-running Activities

When you are setting Timeouts on an Activity, you must either set Start-To-Close or Schedule-To-Close. You can set more than one, or even all of them, but you must set at least Start-To-Close or Schedule-To-Close. When an Activity Timeout fires, it tries to terminate the running Activity and either start a new one or continue on. More on that below.

All this talk of limited time, multiple tries, and getting killed makes me think of a “reset the day” survival movie – and the most apropos in this genre is Happy Death Day. In that movie, our heroine has someone out to murder her, but she keeps coming back to the start of the day when she dies. She can go through the day until she gets murdered at 9:00 PM. She doesn’t have unlimited restarts - eventually her injuries accumulate so she can’t keep retrying forever. This is a good analogy for how Activity Timeouts work.

Our heroine Tree being stalked by the Killer in Happy Death Day

Let's walk through each Timeout and how to use them.

Start-To-Close Timeout#

The first is the Start-to-Close Timeout. This limits how long a single attempt of an Activity can run. The clock starts when a Worker picks up the Activity task and stops when it completes (or fails, or is cancelled).

This is the timeout you should almost always set. It bounds each individual execution, and importantly, it resets on every retry. If your Activity fails and retries, the new attempt gets a fresh Start-To-Close window.

From an Event History perspective, this is the duration of time that measures the time from when a single Activity Task is started, to when it finishes.

Start-To-Close Timeout Image

Start-to-Close timeouts are like how the hero of the movie has only so much time in her day to figure out who the murderer is, or her day resets. Just like the day reset in the movie, when the Start-To-Close timeout fires, the Activity will be retried if allowed by the Retry Policy.

const { processPayment } = proxyActivities<typeof activities>({
  startToCloseTimeout: "30 seconds",
});

We recommend setting this to slightly longer than your Activity could ever take if it were to succeed on a single attempt. This Timeout lets you detect unusually long-running attempts and restart them after they would no longer reasonably succeed.

Schedule-To-Close Timeout#

Next is the Schedule-to-Close Timeout. This limits the total time for an Activity execution, including all retries. The clock starts when the Activity is first scheduled and doesn't reset between attempts.

This is the limit our movie heroine has before she can’t try anymore: she’s taken too much time to figure out the murderer and she dies permanently. This we usually don’t want to happen - either to our movie heroine or our activities. But sometimes you need to give up, and go to slasher movie heroine heaven, or proceed with your business workflows. Hence, the Schedule-to-Close timeout.

Schedule-To-Close Timeout

Use this when you have a genuine business-level deadline: "this payment must be processed within 10 minutes, no matter how many retries it takes." If the 10-minute window closes, Temporal will stop retrying and fail the Activity.

One important consideration: if you set a Schedule-To-Close timeout that's shorter than a single Activity attempt might take, your retries can get killed prematurely even if each individual attempt was making reasonable progress. Set it to account for both execution time and the total time you're willing to spend retrying.

result = await workflow.execute_activity(
    process_payment,
    payment_info,
    schedule_to_close_timeout=timedelta(minutes=10),
    start_to_close_timeout=timedelta(seconds=30),
)

You can use both together: Start-To-Close caps each attempt, Schedule-To-Close caps the whole effort. That's often the right combination.

Schedule-To-Start Timeout#

This limits how long an Activity task can sit in the queue before a Worker picks it up. If no Worker claims the task within this window, it times out. This is the duration between the Scheduled and Started events in the history.

Schedule-To-Start Timeout

This one is rarely needed. Its main use case is failing an Activity that needs to succeed-or-fail in a certain time window - even if Workers are down or your task queue is unexpectedly backed up. If you're routing activities to specific task queues and want to detect quickly when the right Workers aren't available, this is your tool.

For most use cases: skip this one.

Heartbeat Timeout#

For long-running activities such as:

processing a large file
polling an external system
running a batch process over multiple elements

For these you need a way for Temporal to detect if the Activity has silently died. Heartbeating is the mechanism: your Activity code periodically calls a heartbeat function to say "still alive, still working." The Heartbeat Timeout defines how long Temporal will wait between heartbeats before considering the Activity lost.

This matters because without heartbeating, Temporal can only detect Activity failure when the Start-To-Close Timeout expires. If you have a two-hour Activity and it hangs after five minutes, you'd wait nearly two hours to find out. With a 30-second heartbeat timeout, you know in 30 seconds.

Heartbeats also let you pass checkpoint data, so if an Activity does fail and retry, it can resume from where it left off rather than starting over from scratch.

activity.RecordHeartbeat(ctx, progress)

If your Activity runs for more than a minute or two, seriously consider heartbeating.

Zombies Detest a Beating Heart#

Now, some of you may say to yourself "Heartbeating sounds more complicated, I can just let my long running Activities time out instead.” And it’s this trap that leads to zombies eating your brainz.

Without heartbeating, a crashed or hung Activity may become a zombie. The Worker that was executing it has died or stalled, but Temporal has no way of knowing which. A Timeout causes two things to happen; the Temporal Service reschedules the Activity for another Worker to execute, and a cancellation request is sent to the Activity that timed out. The important part here is cancellation requests are delivered via the Heartbeat mechanism. This means if you aren’t heartbeating, your timed-out Activity won’t receive the cancellation and potentially continue executing.

But wait! As with all zombie movies, it gets worse!

What if that Activity never becomes unstuck? Well, it’s going to then bog down your Workers. Every Worker has a finite number of Slots that are reserved when execution is taking place and released when execution has concluded. Therefore as these begin to pile up, your Workers slowly become less and less performant, until eventually total deadlock occurs. At this point, the only thing to do is to reboot the Worker process.

So defeat Zombies with a beating heart.

Here There Be Monsters: Workflow Timeouts#

Here's where a lot of developers get into trouble: using Workflow-level timeouts to implement business logic, or using them as a substitute for timers.

The official Temporal guidance is blunt about this: we generally do not recommend setting Workflow timeouts. Workflows are designed to be long-running and resilient. Setting a timeout can undermine that design.

There are three Workflow timeouts:

Workflow Execution Timeout: Maximum time for the entire execution, including retries and Continue-As-New. Default is infinite.
Workflow Run Timeout: Maximum time for a single run (not counting retries). Defaults to the Execution Timeout.
Workflow Task Timeout: Maximum time a Worker has to process a single Workflow task. Default is 10 seconds. Don't increase this without a specific reason.

The critical reason to treat these with care: Workflow timeouts are enforced server-side, and your Workflow code gets no notification. When the timeout fires, the server terminates the execution. Your Workers never hear about it. There is no opportunity to run cleanup logic, send notifications, cancel in-flight activities gracefully, or do anything at all.

A Temporal community member asked about this not long ago. They wanted to perform cleanup when a Workflow timed out. The answer was clear: use a timer inside the Workflow. Workflow timeouts are like a landlord who terminates your lease with zero notice. A timer inside your Workflow gives you the chance to handle the end of the lease gracefully.

There are a few cases where a Workflow Execution Timeout makes sense: capping a Temporal Workflow (such as a scheduled job) that should stop running after a certain amount of total time has passed so the next run can begin. That's arguably a legitimate use case. For anything involving business logic that should happen when time is up, use a timer instead.

A Common Mistake, and How to Fix It#

Let me walk through a real pattern I've seen go wrong.

A developer wants their Workflow to give up after 48 hours if it hasn't finished - most critically it needed to receive a signal. They set a workflowRunTimeout of 48 hours.

The problem: that timeout fires even if the Workers are down. A signal could arrive while the Workers are offline, and the server will time out the Workflow before the Workers ever get a chance to process the signal. The signal is lost, the Workflow is terminated, and there's no record of what happened or why. Also, the Workflow might want to handle that timeout and do something useful with it, like set an informative response.

The solution is to use timers instead:

async def my_workflow(self) -> str:
    # Timer-based expiry: survives Worker restarts, 
    # allows cleanup, can be extended via signal
    result = await workflow.wait_condition(
        lambda: self.signal_received,
        timeout=timedelta(minutes=15)
    )

    if not result:
        # Timer fired before signal arrived, handle it gracefully
        await workflow.execute_activity(
            notify_requester_of_timeout,
            schedule_to_close_timeout=timedelta(minutes=2)
        )
        return "timed_out"

    return self.process_signal_result()

This version does something the timeout version couldn't: it lets you react when the time limit is reached. You can notify stakeholders, record the outcome, cancel any pending work, and complete cleanly.

Sleeping Before Beginning: Workflow Delay Start#

There is one other option to mention on the topic of “waiting in Temporal”: Workflow Delay Start. If you have a Workflow that you might start off with a Timer, use a Delay Start instead. Here’s an example of using a delay start parameter:

workflowOptions := client.StartWorkflowOptions{
  // ...
  // Start the workflow in 12 hours
  StartDelay: time.Hours * 12,
  // ...
}
workflowRun, err := c.ExecuteWorkflow(context.Background(), workflowOptions, YourWorkflowDefinition)
if err != nil {
  // ...
}

This starts the workflow in a dormant mode, and it “wakes up” when the start delay ends.

Think of this like Cthulhu: you summon it, it waits in the deep, and when the stars align it wakes and nothing can stop it.

The Decision You Actually Need to Make#

Here's a simple cheat sheet:

What you're trying to do	What to use
Wait for N days, then send an email	Timer
Cap how long a single Activity attempt runs	Start-To-Close Timeout
Set a hard deadline across all retries	Schedule-To-Close Timeout
Detect if your task queue has no Workers	Schedule-To-Start Timeout
Detect a silently hung long-running Activity	Heartbeat Timeout
Run cleanup logic when a time limit is reached	Timer (cancel, log, compensate, then complete)
Cap a Temporal cron job's total run time	Workflow Execution Timeout
Start a workflow after a delay	Workflow Start Delay

When in doubt: if you're thinking about time because of something the business cares about, use a timer. If you're thinking about time because of something that might have gone wrong, use a timeout.

Wrapping Up#

Temporal gives you an unusually rich set of time-management tools. The trade-off is that there are actually enough of them that you have to understand what each one does.

The short version:

Timers are for business logic: wait, then act.
Activity Timeouts are for failure detection: bound how long an operation can run and detect problems quickly.
Workflow Timeouts are for hard infrastructure limits: use sparingly, and never for anything your application code needs to react to.

And if you ever catch yourself adding a Workflow Execution Timeout to implement "the order expires after 24 hours" logic, reach for a timer instead, and let the Workflow react to the timer expiring, with business logic. Your future self (and your on-call rotation) will thank you.

Links and Further Reading:

Self-paced course: Crafting an Error Handling Strategy
Temporal Docs: Timers
Temporal Docs: Activity Timeouts
Temporal Docs: Workflow Timeouts
Temporal Docs: Heartbeating
4 Types of Activity Timeouts - Whiteboard Session with Maxim Fateev
Cancellation Scope Sample (Java)
For more spooky-themed Temporal guidance, check out this Temporal Anti-Patterns Blog