Because it's October—the spookiest time of year—I've used some of my favorite spooky movies to explain Temporal best practices and their corresponding anti-patterns in a fun, top 10 list format. If you haven't already seen it, please also check out Spooky Stories: Chilling Temporal Anti-Patterns (part 1).
If you prefer video/audio format, you can also check out the on-demand version of the original webinar.
4. The MEGA Workflow
What does Steve McQueen have to do with Temporal? Sometimes people build workflows that try to do everything: model 50 different processes, automate everything that a whole team or organization might want to do in a single workflow. And that can be very hard to manage, and might also eat a car or a train. ;-)
There is a better way! Instead of a single "blob" process, model processes that are simple and straight-forward. If a process kicks off another process which kicks off yet another process, you're probably looking at three workflows rather than one big workflow that does it all.
How to tell when you might be facing a "blob" workflow? Here are some questions to ask yourself include:
- Can you keep this whole process in your head at one time?
- Does it adhere to the principles of Domain-Driven Design?
- Can it be supported with a two-pizza team?
- If something goes wrong, is it very easy to track down by workflow name which part of the application was the culprit?
If you answered "no" to one or more of these, this starts to create some nervousness around operating in production and maintaining the code long-term. It might be better to take certain parts of it and push it to a "sub" workflow or a workflow that you call separately. Another approach is to abstract concepts away, such as putting a particularly complex piece into its own function and not in the main Workflow code.
There's also a new feature in Temporal called Nexus which was highlighted during the Replay 2024 keynote. Nexus makes breaking out complex processes into simpler steps and management of processes between teams a lot easier.
In short: Check out Nexus, workflows should model one process, and you can break sub-processes into separate workflows and use standard distributed systems patterns for managing them.
3. Arguing With Yourself
Let Workflows Manage Process Status
Sometimes people really love state machines, and they want to have a state machine in their Temporal Workflow. There are valid use cases for this, but I've noticed that sometimes the state machine can have a list of the next possible states and valid states to transition to, and the workflow can have the same thing, and sometimes they can be in conflict. At minimum, it's duplicate maintenance. But worst-case, the state machine may block valid workflow progress that could otherwise continue if there was no state machine. This can happen in the case of a bug or unanticipated state in the state machine, for example.
A great resource about the general area of state management and Temporal is our State Machines Simplified technical guide.
If you're going to use Temporal to model your processes and model process state, you probably don't need an extra state machine. Instead, use the code-first development of Temporal to make State Management really easy.
In short: Don't have split-brain; use Temporal to manage process State.
2. Hiding Behind the Chainsaws
Erroring Workflows and Activity Error Handling
From a classic Geico ad about horror movie characters making decisions that get them killed too soon. With Temporal, don't make decisions that kill your Workflows too soon!
By default with Temporal, when you call an Activity, it will retry forever. And this is fantastic in the situation where you're calling an external system such as an API or a database, because if it fails you don't need to write any retry code. You just write the code to call the external system, and Temporal will automatically retry activities until they work. This is one of Temporal's core strengths, and it can really simplify the code you write.
There are instances where this approach is suboptimal, however. For example, a workflow needs to finish really quickly or respond super fast so as not to hold up other work. Examples might be a workflow that blocks a UI response, or workflows that must finish within a certain time, such as a daily report. For this type of use case, Temporal allows you to customize the Retry Policies—for example, to only make 3 retry attempts, or stop retrying after X seconds.
If your Workflow can handle responding in the fastest time (seconds) as well as a longer time, such as 10 minutes or more, and this is okay for your business process, don't customize your retry policies! Because if you do these customizations and the Activity fails, even in such a way as it could recover on subsequent retries, your entire Workflow will fail.
So ask yourself: Do you have to try that Action only 3 times? Would it be better if the Workflow eventually succeeded? If so, don't worry about customization, and let Temporal smart defaults handle it for you!
In short: If it fits your business processes, use the default settings to let Temporal Activities infinitely retry. Your workflows will then automatically succeed and won't die too soon, unlike the young people in this commercial.
1. Hiding In a Room With One Exit
Use Compensation to Give Yourself a Successful Outcome
I'm sure you've all seen a movie where someone's running away from a bad guy, and they run into a room and they hide, and then the bad guy walks through the one door and then they're trapped. Workflows can be like this too! You can write a Workflow that optimistically assumes everything will work out, but what if something doesn't work?
When first starting out, it can be tempting to say, "Well, if something goes wrong, I'll just fail the Workflow, and that'll be the end of that." But an interesting thing about Temporal is we make it really easy to elevate your thinking: If something goes wrong, can we still set things right?
As a concrete example, let's say you're writing a process that does booking for a rental car, a hotel, and a flight. Let's further suppose that the rental car booking went fine, the hotel booking went fine, but something goes wrong with the flight and it can't be booked. If you don't give yourself a way out, a way to succeed, you might stop there and say the process failed. But now, the user has an error, and also two reservations they can't use.
Fortunately, there is a technique in programming called compensation, sometimes talked about in the context of using the Saga Pattern. This means that you effectively "undo" the actions that preceded the failed action. In our booking example, we would fail on the airline reservation, and then un-reserve the hotel and rental car.
As mentioned in the previous point, for technical failures, Activities handle these very well, for example if an API is down or a database is slow temporarily. However, Temporal also allows you to think at a higher level about how to manage business failures, such as not being able to make a booking because the plane is full, or a money transfer from Account A to Account B failing because Account B doesn't exist or is invalid.
Temporal allows you to ask a business stakeholder, "If I can't do step 2, what should I do about step 1?" The correct thing to do in these situations is to cancel the reservation and put money back into Account A—these are "success" from a business model point of view. And Temporal lets you program the answer into your Workflows so that they can always succeed.
In short: Give yourself an out, use compensation to have a backup plan in case your main plan fails. (And watch the Scream movies. :-))
But that's not all! There's also…
Bonus Spookiness
These are not necessarily Temporal anti-patterns, but more horror-themed suggestions.
11. Bonus: Wandering into a Dark Alley
Use Metrics and Visibility Built Into Temporal
Pro-tip: Don't be the horror actress who walks down a dark alley. Temporal offers lots of ways to help you get out of the dark and understand how your Workflows are working:
- Temporal SDK Metrics to monitor individual workers and your code's behavior
- Temporal Cloud Metrics to measure the health and performance of Temporal infrastructure
- Temporal Web UI with Workflow Execution state and metadata for debugging purposes
- Temporal Visibility which allows you to set Search Attributes on your workflows and view, filter, and search for Workflow Executions that have a certain status or important attribute
12. Bonus: Vampires Sucking Up All Your Resources
You don't want vampires to suck up all of your blood, nor do you want Temporal to run out of capacity. For folks who are self-hosting, you can have a workload on your Temporal server that ends up using a lot more resources than you might expect, and this can cause major performance problems.
If you find yourself impacted by this, use rate limits to prevent "Noisy Neighbor" problems. The community posts Rate limit configuration and best practices and Rate limiting by Namespace allude to strategies you can use here.
(Alternately, move to Temporal Cloud and our SaaS configuration will automatically make sure vampires aren't sucking up your resources.)
13. Bonus: Splitting the Party!
If you're a fan of the X-Files, you know that Mulder (the guy who believes in aliens) and Scully (the skeptical scientist) frequently don't stick together, which leaves Mulder seeing aliens and Scully not, and it drives me nuts.
So don't go it alone, don't split the party. Join the community, work with us, let's build some awesome applications together!