Temporal Sandbox Orchestration Harness: The missing layer for running agents

Behind the scenes, your favorite AI agents and applications are Temporal Workflows. Many of our customers are running large-scale agentic workloads on Temporal Cloud to get the durability and reliability benefits that Temporal provides, allowing agent context to persist indefinitely.

Recently, we’ve noticed that customers running AI agents all have an emerging common requirement: secure, isolated, and temporary sandbox compute environments for execution of untrusted, agent-generated code.

We’ve seen many customers take many approaches to integrating sandboxes into Temporal themselves. That’s why we’re introducing reference materials to standardize the interfaces, patterns, and best practices for using sandboxes inside Temporal Workflows. We’re starting with a set of Temporal Code Exchange Samples for common use cases. Read on to learn more.

What are sandboxes?#

Every serious coding agent needs a sandbox. The agent generates code, runs tests, edits files, and installs packages. Because agent generated code is inherently untrusted, that work can’t happen on your servers. You need an environment isolated enough to contain the risk, and flexible enough to spin up and tear down frequently.

Sandbox providers, such as Modal, E2B, Daytona, GKE Agent Sandbox, and Amazon Bedrock AgentCore Runtime all provide fast, isolated execution environments for this untrusted agent-generated code. Sandboxes generally boot quickly (some under a second), allow for snapshotting their state, and provide process, filesystem, and network isolation. The hard runtime problem of isolated, quick-booting compute is solved, but the problem of integrating these environments into your agentic Workflows is not.

Agents are Temporal Workflows#

Temporal Workflows already provide the durable context that agents need. Temporal Workflows can persist indefinitely, survive infrastructure failures, and maintain state across any number of agent interactions. When you run an agent as a Temporal Workflow, the conversation or agent context never loses its place.

Naturally, Temporal customers want to extend the same durability that benefits their agents to the sandboxes that their agents create. It’s important to track the state of the sandbox so it exists when the agent needs it, and is torn down when it’s no longer needed, so you’re not burning compute spend on idle or orphaned sandboxes.

Every team building a coding agent ends up writing a similar layer on top for managing sandboxes: provisioning sandboxes, routing execution, persisting state, recovering from failures, and cleaning everything up afterward. Sandbox orchestration is a missing layer in agent infrastructure, and right now, everyone is rebuilding it from scratch. agent-architecture-dark

Example agent in a Workflow (pseudocode)#

func CodingAgent(ctx workflow.Context) error {
    sbx := newSandbox(ctx, modal)        // see real example below
    defer sbx.Stop(ctx)

    inbox := workflow.GetSignalChannel(ctx, "msg")

    for {                                // conversation loop
        var msg string
        inbox.Receive(ctx, &msg)
        history = append(history, msg)

        for {                            // tool loop
            resp := callLLM(ctx, history)
            history = append(history, resp)
            if len(resp.ToolCalls) == 0 {
                break                    // final answer; back to waiting
            }
            for _, call := range resp.ToolCalls {
                out := sbx.ExecuteCommand(ctx, call.Cmd)
                history = append(history, out)
            }
        }
    }
}

The orchestration problem#

A Sandbox Orchestration Harness is what connects sandbox compute runtimes to the agent’s Durable Execution lifecycle. Teams building agents that use sandboxes are all writing some version of the following functionality:

Provisioning on demand. The agent needs a sandbox. Something has to create it, configure it, and hand back a connection.
Driving execution from agent intent. The LLM decides to run pytest. That decision has to reach the sandbox, execute, and return results.
Persisting state across long runs. A coding agent working for hours or days needs its workspace to survive process restarts, worker migrations, and infrastructure failures.
Recovering from sandbox failures. The agent’s conversation state is fine (it’s in the Workflow), but the sandbox is gone. Something has to re-provision and restore.
Cleaning up. Sandboxes are meant to be temporary environments, but what “temporary” means is different for every use case. Tearing down the sandbox at the appropriate time for your use case is critical to providing the right level of persistence while balancing spend on sandbox compute.

As the Durable Execution layer for agents, we believe Temporal is uniquely positioned to establish the best patterns for managing the lifecycle of sandboxes within the context of the long-running agent Workflow as well.

Because we see so many questions about orchestrating sandboxes from within Workflows, we’re now codifying some of the base templates and best practices in these examples to help you get started. These are community code samples that provide a scalable-on-day-zero starting point for anyone wiring up sandbox providers with Temporal Workflows.

The examples are live on Temporal Code Exchange: https://temporal.io/code-exchange/temporal-sandbox-orchestration-harness

These code samples show the following:

Sandbox provisioning tied to Workflow lifecycle. The sandbox is created when the Workflow first needs it and destroyed when the Workflow ends. No orphans.
Durable Execution across sandbox operations. Tool calls (exec, read, write) are dispatched to the sandbox from Temporal Activities. Sandbox providers are robust to most failure modes, and Activity retries cover transient failures so a flaky network call doesn’t drop a tool call mid-Workflow.
Workspace persistence with auto-pause. The sandbox pauses when nothing is happening and resumes on the next command. Some providers implement this as “suspend.”
Workplace persistence with fork. The workspace is snapshotted and restored into a fresh sandbox, carrying state forward without keeping containers alive.

The broader ecosystem provides the underlying runtime. These examples show how to drive any of them durably. Now let’s look at a specific example.

Let’s take a look at some sample code. Here we provision and suspend a sandbox. WithIdleTimeout is the key knob: if no command arrives within 30 seconds (default: 5 min), the harness suspends the sandbox automatically.

For Modal specifically, “suspend” is implemented as snapshot + stop; the harness picks the right suspend path per provider. The next ExecuteCommand resumes (or restarts from snapshot) transparently. The agent can wait days between user messages without paying for an idle Modal container.

If you want to swap providers, it is just a config change. Replace the Type and Config fields to target E2B, Daytona, or AgentCore instead.

import (
    sandbox "[github.com/temporalio/ephemeral-workers-poc/sdk](https://github.com/temporalio/ephemeral-workers-poc/sdk)"
    "[github.com/temporalio/ephemeral-workers-poc/sdk/compute](https://github.com/temporalio/ephemeral-workers-poc/sdk/compute)"
    "go.temporal.io/sdk/workflow"
)

sbx, err := sandbox.NewSandbox(ctx, compute.ComputeProviderDetails{
    Type: compute.ComputeProviderTypeModal,
    Config: map[string]string{
        "image": "ubuntu:26.04",
    },
}, sandbox.WithIdleTimeout(30*time.Second))

if err != nil {
    return err
}
defer sbx.Stop(ctx)

result, err := sbx.ExecuteCommand(ctx, "pytest tests/")
// result.Stdout, result.Stderr, result.ExitCode

Let’s build together#

The Sandbox Orchestration Harness is infrastructure that every agent team needs but nobody should have to build from scratch. The patterns are converging. The providers are maturing. What’s missing is the orchestration layer that ties them together with lifecycle guarantees.

We’re starting with community code samples. Pick up the examples, try them with your sandbox provider, find the rough edges. If you’re building a sandbox orchestration harness yourself, extend our examples, file issues, or bring providers we haven’t covered.

The systems around the models are what make agents real. Sandbox orchestration is one of those systems. Let’s build the patterns together.

The sandbox orchestration examples are available on Temporal Code Exchange. For more on Temporal’s approach to agent infrastructure, see Orchestrating ambient agents with Temporal.

Temporal Sandbox Orchestration Harness: The missing layer for running agents

What are sandboxes?#

Agents are Temporal Workflows#

Example agent in a Workflow (pseudocode)#

The orchestration problem#

What we’re sharing today#

Provisioning a Modal sandbox (Go)#

Let’s build together#

More Posts