At Temporal, we have a recurring event dubbed Temporal University where the team explores the ins and outs of Temporal’s architecture. I have done a few presentations but one stands out as my personal favorite: Small-Step Operational Semantics of Temporal’s Durable Executions.

Small-step operational semantics (SOS) are often used for the specification and verification of programming languages — although SOS can be used for the specification and verification of entire systems.

In this post, I will design SagaLang, a tiny “programming language” for the development of sagas to illustrate the ideas behind SOS.

Sagas

In the paper Sagas from 1987, H. Garcia-Molina and K. Salem introduce sagas as long-running transactions, that is, a sequential composition of transactions with transaction-like guarantees:

A saga is a sequence of steps, any partial execution is undesirable.

A saga shall guarantee to execute observably equivalent to exactly once or observably equivalent to not at all.

In the absence of failure, this guarantee is trivially met. However, in the presence of failure, a saga must implement failure mitigation strategies:

  • Forward recovery attempts to mitigate any partial execution by moving forward, a common failure mitigation strategy that comes in the form of retries.

  • Backward recovery attempts to mitigate any partial execution by rolling backward, a common failure mitigation strategy that comes in the form of compensation.

A common failure mitigation strategy is to combine forward recovery and backward recovery. Attempt forward recovery and subsequently attempt backward recovery if the failure persists.

Informally, this failure mitigation strategy amounts to “Let’s try as hard as we can to make this happen (retries). If we can’t, let’s not leave a mess (compensation).”

SagaLang

SagaLang is a tiny programming language that allows you to specify a saga as a sequence of transactions along with their allotted retries and corresponding compensations.

SagaLang is minimal, with no variables, no conditionals, no loops, and no exception handling. Just enough to demonstrate the ideas of SOS.

SOS relates programs (the definition) to computation (the execution) by specifying how the program determines the execution one step at a time:

  • A program is a sequence of instructions, sometimes called statements, generating a computation.

  • A computation is a sequence of states and state transitions constrained by the program.

SOS bridges the gap between the program and its computation: SOS provides a set of rules that describe how the program executes, step by step.

In this blog post, I’ll introduce SagaLang in three iterations. First, a basic execution model, then sagas with forward recovery, and lastly sagas with backward recovery.

1. Basic SagaLang

The first iteration of SagaLang is straightforward: A Saga S is either a transaction s followed by another Saga, or the empty sequence ε.

S = s; S | ε

To get started, let’s define a Saga P as the sequence of three transactions a, b, and c.

P = a; b; c; ε

SOS is often presented as inference rules, which consist of an upper part (above the line) and a lower part (below the line):

  • The part above the line is called the antecedent, which represents the conditions that must be met.

  • The part below the line is called the consequent, which represents the state transition.

Here, each transition is labeled. The label allows us to build a trace, that is, a history of the execution.

Anatomy of an inference rule white text

Figure 1. Anatomy of an inference rule

Figure 2. shows the interference rules for the basic version of SagaLang, three rules are sufficient to specify its execution.

Interference rules for Basic SagaLang

Figure 2. Interference rules for Basic SagaLang

  • ① For a non-empty saga ⟨s; S⟩, if the transaction s executes successfully, we proceed with the continuation ⟨S⟩.
  • ② For an empty saga ε, without any condition, we proceed to complete the saga as a success.
  • ③ For a non-empty saga ⟨s; S⟩, if the transaction s fails, we proceed to complete the saga as a failure.

In the absence of a failure, our saga executes trivially exactly once and to completion:

trace(P) = a • b • c • success

However, in the presence of failure, our saga simply terminates without any attempt at correction. For example, if P fails after transaction b attempting to execute transaction c, the trace of P reveals the partial application:

trace(P) = a • b • failure

So far, our saga does not have transaction-like execution guarantees. Not even close.

2. SagaLang w/ Forward Recovery

The next iteration of SagaLang includes forward recovery, so we do not give up so easily in the presence of failure: A Saga S is either a tuple of a transaction s and a natural number n followed by another Saga, or the empty sequence ε.

S = (s × n); S | ε

Our previous sample saga P is now a sequence of three transactions a, b, and c and their retry budgets:

P = (a × 1); (b × 1); (c × 1); ε

Figure 3. shows the interference rules for this version of SagaLang, four inference rules are necessary to specify its execution.

SagaLang w/ Forward Recovery

Figure 3. SagaLang w/ Forward Recovery

Here, I’ll only discuss the major differences to the previous version:

  • ③ For a non-empty saga ⟨s; S⟩, if the transaction s fails and we still have retry budget, i.e. n > 0, we proceed with the continuation ⟨(s × n-1); S⟩.
  • ④ For a non-empty saga ⟨s; S⟩, if the transaction s fails and we do not have retry budget, i.e. n = 0, we proceed to complete the saga as a failure.

As before, in the absence of a failure, our saga executes trivially exactly once and to completion. Even in the presence of failure, our saga has a chance to execute observably equivalent to exactly once, for example:

trace(P) = a • a • b • c • success

However, in the presence of failure, we still may run out of our retry budget ultimately terminating without any further attempts of correction.

trace(P) = a • a • failure

But we need the cooperation of the developer: Forward recovery relies on the principle of idempotence, that is, the effect of invoking and potentially executing a multiple times must be equivalent to the impact of executing a exactly once — something our SOS does not express.

a … • a ≈ a

So far, our saga stands a better chance at success, yet still does not have transaction-like execution guarantees.

3. SagaLang w/ Backward Recovery

The last iteration of SagaLang includes backward recovery, now we do not give up in the presence of failure until we absolutely have to: A Saga S is either a triple of a transaction s, a compensation c, and a natural number n followed by another Saga, or the empty sequence ε.

S = (s × c × n); S | ε

Our previous sample saga P is now a sequence of three transactions a, b, and c, their compensating transactions, and their retry budgets:

P = (a × ¬a × 1); (b × ¬b × 1); (c × ¬c × 1); ε

Figure 4. shows the interference rules for this version SagaLang, our inference rules are much more involved to specify its execution.

SagaLang w/ Backward Recovery

Figure 4. SagaLang w/ Backward Recovery

Again, I’ll only discuss the major differences to the previous version:

  • ① For a non-empty saga ⟨(s × c × n); S⟩, if the transaction s executes successfully, we proceed with the continuation ⟨S⟩ and add the compensation c to the sequence of compensations ⟨C⟩.
  • ④ For a non-empty saga ⟨s; S⟩, if the transaction s fails and we do not have retry budget, i.e. n = 0, we proceed to replace the saga with the accumulated sequence of compensations ⟨C⟩.
  • ⑦ For a non-empty compensation ⟨c; C⟩, if the compensation c fails, we proceed to complete the saga as a failure.

Of course, we could add “forward recovery to the backward recovery,” that is, we could add retries to the compensations. However, that is a trivial, mechanical extension and would not add any new insights into SOS.

As before, in the absence of a failure, our saga executes trivially exactly once and to completion, while in the presence of failure, our saga has a chance to execute observably equivalent to exactly once.

However, now even in the presence of failure and no retry budget left, we have a chance to execute observably equivalent to not at all:

trace(P) = a • b • b • ¬a • success

But once again we need the cooperation of the developer: Backward recovery relies on the principle that the effect of executing a and later executing its compensation ¬a is equivalent to not executing a in the first place — again something our SOS does not express.

a • ¬a ≈ ε

However, if any compensation fails, all is lost 💀

Conclusion

Small-step operational semantics give meaning to definitions by providing a precise specification of their execution.

SOS is often used for the specification and verification of programming languages but they are equally capable as a thinking and communication tool.

And in case you want to dive deeper into sagas themselves, check out my recent webinar on Temporal’s Youtube channel.

https://youtu.be/0W8BtIwh824