FireHydrant: Durable alerting and faster incident response with Temporal

The customer

FireHydrant builds incident management and alerting software that helps engineering teams define the services they own, ingest alerts from their observability tools, route and escalate notifications, open incidents directly from alerts, and manage the entire incident lifecycle through to retrospectives.

Signals, one of FireHydrant’s flagship products, competes with PagerDuty and sits at the center of that workflow.

Why Temporal?

The team wanted a platform they could trust during bad days, not just the easy ones. Visualization of executions mattered because it lets engineers see exactly what happened for a given alert rather than guess.

Being able to visualize exactly what happened gives us confidence the system behaved as expected.

Offloading state was equally important so the team could put more attention on customer problems and less on orchestration plumbing. And after a year-plus running Signals, they found Temporal a strong fit whenever they needed more observability or a single place to unify work.

The challenge

Signals was new work, but some surrounding runbook logic lived in a generic async job processor. During incident bursts, many events could arrive in a short window, and the system risked re-evaluating the same rules repeatedly. That led to duplicate work and queuing complexity right when teams needed clarity.

Troubleshooting third-party integration failures also took longer than it should have because it wasn’t obvious which customers or incidents were affected. Adopting a workflow mindset required moving away from “a simple queue” toward Workflows, Activities, and the guarantees between them.

Before Signals, the team had experimented with self-hosting Temporal. Once they scoped the product and saw how extensive it would be, they launched on Temporal Cloud to minimize operational risk while building expertise. Over time, they moved more workloads into Cloud and, with growing familiarity, felt confident evaluating hosting choices again.

The solution

The team modeled the alert journey as a Workflow. When a monitoring provider reports an issue, a single Workflow instance becomes the durable thread that ties everything together: identify customer configuration, deliver notifications, wait for acknowledgements, escalate if there’s no response, and record outcomes.

That same thread is where support starts when someone asks why a message didn’t arrive. Engineers open the Workflow’s history and work back from the alert.

Under the hood, long-lived Workflows coordinate short, idempotent Activities such as “send Slack message,” “send SMS,” or “create Slack channel.” This keeps side effects small and traceable while the Workflow owns timing and recovery. To avoid duplicate work during bursts, the Workflow evaluates rules and executes only those that have just become true. If another event arrives mid-evaluation, the design prevents unnecessary rework instead of piling up jobs in a queue.

Signals also spans languages. FireHydrant’s Ruby code can create Workflows that Go Workers execute, and vice versa. The team standardized on Protocol Buffers for parameters so cross-language calls are safe and predictable.

Temporal Cloud handled the heavy lifting from day one. Instead of running their own cluster while ramping up a complex new product, the team focused on behavior and user experience. As they grew comfortable with Temporal and their own needs, they moved additional services to Cloud and kept the door open to self-hosting where it makes sense.

The results

Stayed up when others couldn’t

During a Google Cloud authentication outage and the more recent AWS outage, many systems struggled to send notifications. But Signals kept moving. Durable Workflows, established connections, and clear escalation logic meant alerts continued their path and reached the people who needed to respond.

Temporal did very well during the recent AWS outage and we were able to continue serving alerts to our customers.

Faster debugging and support

Tying executions to specific alerts and incidents shortened the path from “something failed” to “which customer saw it and why.” What once felt like a black box became an execution you can open, read, and reason about.

Clearer engineering patterns

Workflows and Activities give reviewers a common mental model. Determinism encourages careful thinking earlier in the process, surfaces potential problems before they reach production, and makes stepwise rollouts easier to plan.

Once the code is written in the pattern, we know it will run to completion and we can track it every time.

Confidence to standardize

The experience with Signals made Temporal the default choice for new areas where observability and unification help. As the platform matured, the team moved additional workloads to Temporal Cloud and now evaluates hosting based on what each service needs rather than habit.

The takeaways

The FireHydrant team shared a few learnings:

Model the practical flow. Treat the alert lifecycle as a single durable thread so Operations, Support, and Engineering share the same source of truth.
Reduce duplication where it starts. Evaluate rules in one place and do only the work that just became necessary.
Make visibility a feature. Execution history and graphs shorten MTTR and build team confidence.
Choose the hosting path that lowers risk today and preserves options later. Cloud first gave the team room to build the product they wanted.

Tired of losing revenue during provider outages? Temporal’s failover capabilities can help.

Start today with a free trial of Temporal Cloud and $1,000 in credits.