The saga pattern is a distributed systems design pattern for a task that spans machine or microservice boundaries in which full execution of all steps is necessary. Partial execution is not desirable. A common life example used to explain when the saga pattern is useful is trip planning. If you’re planning on attending Replay, for example, you’d need to book a conference ticket, an airplane ticket, and a hotel. If you fail to acquire any one of these things, you’ll miss out on meeting fun people in backend engineering face-to-face.
Below the surface, there are two main ways microservices can talk to one another that make your saga possible: choreography and orchestration.
Choreography is analogous to ants in an ant colony. Like ants, each microservice has local knowledge, and shares information about state changes with other services via chemical signals called pheromones–I mean via message passing. Just as an ant trail to food emerges organically from pheromones, the overall behavior of a system as a whole that contains choreographed microservices emerges organically from each microservice’s instructions.
One tenant drilled into every software engineer’s head is the value of decoupling. Choreography embodies this idea and is straightforward to implement as a whole. Choreography can be a popular, easy choice for systems that are incrementally moving from a monolith to a microservices architecture. However, if you have any sort of ordering requirement of tasks, such as ordered steps in your saga, choreography can get unwieldy fairly quickly. Suppose we want to book the plane first so that the hotel can know your flight number and pick you up from the airport. Then we book our conference ticket (maybe there’s a discount with certain hotels). The sequence of messages that each service responded to would look like this:
However, just from looking at each microservice’s individual codebase, it’s difficult to understand the order that the system should have since that ordering is distributed throughout the code. This leads to all sorts of higher level business logic diagrams that need to be kept in sync with the code…but wouldn’t it be better if the code were just easier to read in the first place? It also can be difficult to debug the exact sequence of events that lead to a bug since control flow is not immediately clear. So, unless all of your microservices are truly independent of one another and don’t have any sort of “happens before” logic, consider using orchestration instead.
Orchestration, on the other hand, is like an air traffic control tower directing planes, or microservices. One service, a “super microservice” if you will, functions as the message broker sending messages directly to individual microservices telling them what to do, just like planes wait for permission to take off.
Because orchestration centralizes control flow, debugging and understanding control flow is much simpler. Additionally, since each step doesn’t need to keep track of what “happens before” messages it needs to listen to, the code for individual microservices is much simpler. Orchestration also shines in situations where many services need to interact in a single saga step. The glaring Achilles’ heel of this method is that bane of all distributed systems: the message broker is a single point of failure.
Putting it all together
So to summarize, choreography:
- Is decentralized and decoupled
- Is good for highly independent microservices
- Is “easier” to implement, at least initially
- Is an easy choice for converting established monoliths to microservices
- Can make control flow unclear
- Can be challenging to debug
- Has one service issuing “commands” to execute microservices
- Makes control flow easier to understand
- Easier to build with greenfield applications
- Makes debugging and failure handling clearer
- Is “harder” to implement initially, but pays dividends later
- Has a single point of failure (the message broker)
The interesting tradeoff between these two approaches is one wants to reach for the light, agile option (choreography) in the early days and avoid over-architecting your project, but counterintuitively, orchestration is often easier to build when one uses it from the start.
So, what does Temporal do?
Temporal automatically orchestrates for you, but also avoids that crucial drawback of a single point of failure. How is such a thing possible? Internally, Temporal records your program’s progress in a log. If the machine running your program goes offline, your entire program’s history will have been saved, so another machine can start up exactly where your program left off, as if nothing happened. This makes Temporal completely horizontally scalable.
To bring this idea back to the saga pattern, an important component of the saga pattern is driving towards completion of all steps of the saga. The fact that Temporal ensures no progress will ever be lost means it will pick up exactly where it left off no matter what, including failures for an unknown length of time, completing the saga with no extra code or heavy lifting on your part.
Additionally, unlike some orchestration engines, in Temporal, the logic of your workflow is expressed entirely in code, so you don’t have to deal with JSON or building graphs with your mouse. In essence, nothing additional is needed to make a robust, failure-resilient application other than the business logic of your application itself.
Choreography and orchestration provide different approaches to coordinating communication between microservices. Choreography is decoupled but can make debugging and control flow difficult to follow. Orchestration is more observable, debuggable, scalable, and centralized, but results in a single point of failure. Temporal uses orchestration under the covers, but by design safeguards against a single point of failure, allowing you to focus on writing your code with the confidence that it is failure resilient.