How Emergent runs 1 billion+ agent Actions per month on Temporal Cloud

Industry

AI

Use Case

Agents & infra

Company Size

51-250

SDK

Go

Temporal

Cloud

The customer

Millions of people around the world use Emergent to build software. Emergent Agents coordinate multiple autonomous agents to handle everything from customization to full app creation, removing the need for a dedicated software engineer. A user can provide a prompt (for example, “Build me a project management tool with Kanban boards”) and Emergent’s agent system turns that prompt into a working app by generating a plan, provisioning a cloud environment, writing code, running tests, and iterating through failures. Each build takes 10–30 minutes and involves dozens of LLM calls, hundreds of tool executions, and multiple specialized agents working together on the same task. At scale, the platform processes over 1 billion Temporal Actions per month.

The challenge

The agent was running inside the same sandbox it depended on. If a build spiked resources, the Pod died, the agent got stuck until we detected it and respawned it, and we were stuck building recovery logic by hand.

Emergent’s original architecture ran the agent loop inside the same Kubernetes Pod that served as the code execution sandbox. This tightly coupled design created three critical problems:

The agent kept taking itself down. When a build got heavy (installing dependencies, running test suites, anything that spiked CPU or memory), the Pod would get OOM-killed, and the agent would die with it. They’d lose 15 minutes of work and then have to piece things back together with custom checkpointing, state serialization, and recovery code.
Running experiments was painful. Because the agent lived inside the sandbox Pod, even small changes meant rebuilding it and pushing that new version into sandboxes before they could try anything. That made fast iteration and parallel experiments much harder than they should have been.
The agent only existed when the sandbox did. If there was no sandbox yet, there was no agent. Provisioning a new environment (spinning up a Pod, pulling images, installing dependencies) could take 2–8 minutes, and during that window the agent couldn’t plan, summarize, or coordinate anything.

The results

The boring parts (retry logic, state persistence, failure recovery, long waits) are all Temporal. Our code focuses on agent behavior.

Emergent now processes over 1 billion Temporal Actions per month on Temporal Cloud, with the Workflow code identical across local development, staging, and production environments.

Zero-downtime agent recovery. Infrastructure failures (node crashes, Pod recycling, deploys) are invisible to users. Workflows resume automatically on healthy Workers with full state intact.
Unlimited human-in-the-loop waits. Users can ask a question at 11 PM, go to bed, and answer in the morning. The agent continues seamlessly. The cost savings from not keeping idle sandboxes running were significant.
Safe deployments via Workflow Replay. Temporal's Workflow Replay validates that new code is compatible with existing Workflow histories before deployment. In a system with long-running Workflows spanning hours or days, this safety net enables shipping agent Workflow changes multiple times per day.
Multi-agent coordination at scale. The orchestrator agent can spawn multiple subagents as Child Workflows, with each one running in its own failure domain and with its own timeout and execution history. That made it much easier to coordinate parallel work, isolate failures, and propagate cancellation cleanly across the entire agent tree.

The takeaways

Building reliable agents is about more than the autonomous loop. You need environment provisioning, tool execution, failure recovery, human-in-the-loop interactions, and coordination across multiple agents. Temporal gave us the foundation to handle all of that cleanly, so we could focus on making the agents better.

Emergent’s experience demonstrates that building reliable AI agents is fundamentally a distributed systems problem. The non-deterministic nature of LLMs makes Durable Execution not just useful but essential: agents that can crash, wait, recover, and coordinate across complex Workflows without custom infrastructure code. Temporal provided the foundation that let Emergent focus on agent intelligence rather than agent survival.

Build invincible apps

Ready to learn why companies like Netflix, Doordash, and Stripe trust Temporal as their secure and scalable way to build and innovate?

Talk to Sales

Webinar

Abstract illustration showing digital content elements including a document, video player, grid layout, and gradient colors.

Achieving Zero-Downtime Migration from Self-Hosted to Temporal Cloud

View Webinar

Webinar

Building Durable, Production-Ready Agents with the OpenAI Agents SDK and Temporal

View Webinar