Introducing Temporal and agentic sandboxes: The OpenAI agents SDK

Yesterday, OpenAI shipped sandbox support in the Agents SDK. Agents can now execute code, manipulate files, and run shell commands inside isolated sandbox environments — powered by providers like Modal, Daytona, Docker, and E2B.

We worked with OpenAI's engineering team to build a Temporal extension for this release. The result is a fully runnable demo that ships in OpenAI's official repository, at openai/openai-agents-python/examples/sandbox/extensions/temporal. It shows how to run sandbox agents as durable Temporal workflows — with session management, backend switching, and the ability to fork a running agent onto a completely different sandbox provider.

OpenAI's own blog post calls out durable execution as a key capability for sandbox agents. We couldn't agree more.

Why sandbox agents need durablity#

A sandbox agent isn't a stateless API call. It creates files, installs dependencies, runs tests, and builds up a working environment over the course of a conversation. That conversation might span minutes or days. The sandbox is the agent's workspace — and the workspace is the work product.

This creates three infrastructure problems:

State loss. The agent spent thirty minutes setting up a development environment in its sandbox. The host process crashes. The sandbox, the conversation, and the work product are gone. The user starts over.
Resource waste. The user asks a question and walks away for an hour. The sandbox sits idle, consuming compute, waiting for input that may never come. Multiply this by hundreds of concurrent sessions, and the cost adds up fast.
Operational rigidity. The agent is running in Docker locally and the user wants to switch to a cloud-hosted environment like Daytona or E2B. Or they want to fork the session to try two different approaches in parallel. None of this is possible when the agent's lifecycle is tied to a single process.

Durable execution solves all three. An agent running as a Temporal Workflow persists its state automatically, resumes from exactly where it left off after a crash or server restart , and consumes zero compute resources while waiting for input. The agent's lifecycle is managed by Temporal — not by the process that happens to be running it.

Demo sandbox agent architecture: three components#

To illustrate the value of giving your agent a sandbox, we've provided a comprehensive demo to showcase the flexibility and safety a simple coding agent gains when you can run it within your choice of sandbox. The demo consists of three components that together turn an OpenAI SandboxAgent into a durable, multi-session system.

AgentWorkflow is a long-lived Temporal workflow that wraps the OpenAI SandboxAgent. It processes a user message, executes the agent turn (which may involve multiple tool calls, shell commands, and file operations inside the sandbox), and then idles — durably — waiting for the next message. The workflow persists indefinitely in Temporal. If the worker process crashes mid-turn, Temporal replays the workflow and resumes from the last completed state. Conversation history, sandbox session state, and the workspace snapshot all survive.

SessionManagerWorkflow orchestrates the lifecycle of agent session workflows. Rather than running a traditional server backed by some sort of database to track agent sessions, this session manager workflow durably tracks and manages this state using the same Temporal workflow abstractions running the agents themselves. It starts, stops, lists, renames, and forks sessions, responding directly to the TUI via signals and updates. All lifecycle operations route through the manager so the session registry is always consistent.

TUI is a sample Textual-based terminal interface that demonstrates how external clients interact with durable workflows. It uses Temporal signals to send messages, queries to poll turn state in real time, and updates for transactional operations like forking. The TUI is a reference client - in production, this could be a web UI, a Slack bot, or an API gateway. The durable workflow doesn't care who sends the signal.

Inside the code#

Three code snippets from the demo illustrate the key ideas.

The durable idle loop#

The core of AgentWorkflow is a while loop that processes messages and idles between them:

@workflow.run
async def run(self, request: AgentRequest) -> AgentResponse:
    # ... initialization ...

    while not self._done:
        await workflow.wait_condition(
            lambda: (len(self._pending_messages) > 0
                     or self._pause_requested
                     or self._done),
        )

        if self._done:
            break

        user_messages = list(self._pending_messages)
        self._pending_messages.clear()

        self._turn_id += 1
        self._turn_status = "thinking"

        try:
            agent = self._build_agent(manifest)
            await self._run_turn(agent, user_messages)
        finally:
            self._turn_status = "complete"

    return AgentResponse()

workflow.wait_condition is where the magic happens. When no messages are pending, the workflow consumes zero compute. It could idle for seconds or weeks. If the worker restarts, even while waiting for user input, the workflow resumes at the exact point where it left off with its full state intact. This is what "durable execution" means concretely - in-progress code that survives infrastructure failures. In the case of this coding agent that means no lost context history, no rerunning expensive shell commands, no rebuilding the workspace from scratch.

Forking a session across backends#

The SessionManagerWorkflow can fork a running session — including onto a completely different sandbox backend:

@workflow.update
async def fork_session(self, request: ForkSessionRequest) -> str:
    # Pause the source workflow so its session stops naturally
    await workflow.execute_activity(
        pause_workflow,
        request.source_workflow_id,
        start_to_close_timeout=timedelta(minutes=11),
    )

    # Fetch the source workflow's state via activity
    workflow_snapshot = await workflow.execute_activity(
        query_workflow_snapshot,
        request.source_workflow_id,
        start_to_close_timeout=timedelta(seconds=30),
    )

    # Start the forked workflow with the source's state and history
    await workflow.start_child_workflow(
        AgentWorkflow.run,
        AgentRequest(
            messages=[],
            backend=target_config.type,
            snapshot=snapshot,
            history=workflow_snapshot.history,
            manifest=manifest,
        ),
        id=workflow_id,
        parent_close_policy=ParentClosePolicy.ABANDON,
    )

Forking is a first-class operation. It pauses the source workflow, snapshots the workspace, and starts a new workflow with identical conversation history but an independent lifecycle. You can fork onto a completely different backend — start in Docker, fork to Daytona — while preserving the workspace filesystem. This is only possible because Temporal manages both the agent state and the session lifecycle as durable, composable workflows.

The bridge: one function makes sandbox operations durable#

The integration seam between OpenAI's SDK and Temporal is a single function call:

async def _run_turn(self, agent: SandboxAgent, user_messages: list[str]) -> None:
    hooks = _LiveStateHooks(self)

    run_config = RunConfig(
        sandbox=SandboxRunConfig(
		        # That's it! This single line configures your SandboxAgent
		        # to run durably as a Temporal workflow.
            client=temporal_sandbox_client(self._backend.value),
            options=self._resolve_sandbox_options(),
            session_state=self._sandbox_session_state,
            snapshot=self._snapshot,
        ),
        workflow_name="Temporal Sandbox workflow",
    )

    result = await Runner.run(
        agent,
        input_arg,
        run_config=run_config,
        hooks=hooks,
        previous_response_id=self._previous_response_id,
    )

temporal_sandbox_client() is the bridge. It wraps the sandbox client so that all sandbox operations - LLM model API calls, the full sandbox lifecycle, running commands - execute as Temporal activities, making them durable and retryable. The Runner.run() call is pure OpenAI Agents SDK. From the agent's perspective, nothing changed. From the infrastructure's perspective, everything is now fault-tolerant, making your agent more reliable and simplifying your path to production.

Multiple sandboxes: one worker can support various sandbox clients#

async def run_worker() -> None:
	# ... setup ...
	sandbox_clients: list[SandboxClientProvider] = [
      SandboxClientProvider("local", UnixLocalSandboxClient()),
      SandboxClientProvider("docker", DockerSandboxClient(docker.from_env()),
      SandboxClientProvider("daytona", DaytonaSandboxClient()),
      SandboxClientProvider("e2b", E2BSandboxClient()),
  ]

  plugin = OpenAIAgentsPlugin(
      model_params=ModelActivityParameters(
          start_to_close_timeout=timedelta(seconds=120),
      ),
      sandbox_clients=sandbox_clients,
  )

  temporal_client = await Client.connect("localhost:7233", plugins=[plugin])

  worker = Worker(
      temporal_client,
      task_queue=TASK_QUEUE,
      workflows=[AgentWorkflow, SessionManagerWorkflow],
      activities=[pause_workflow, query_workflow_snapshot, switch_workflow_backend],
  )
  await worker.run()

The existing OpenAIAgentsPlugin has been upgraded to now accept a list of sandbox_clients so a single Temporal worker can seamlessly support executing an arbitrary number of sandbox clients. Your worker can run various workflows each running in different sandboxes. You can even, as in this sample sandboxed coding agent, run a single workflow that dynamically leverages multiple sandboxes.

What this unlocks#

The demo shows capabilities that go well beyond "run an agent":

Orchestrate agents across sandbox providers. Run agents in Daytona, Docker, E2B, or local Unix environments. Switch between them mid-conversation with /switch — workspace files carry over via portable snapshots.
Start, stop, snapshot, and fork sessions. Pause an agent and resume it later. Fork a session to try two different approaches. Snapshot workspace state and carry it to a different backend.
Zero-cost idle. Agents waiting for human input consume no sandbox resources. The Temporal workflow persists on the server, not in a running process. Scale to thousands of concurrent sessions without thousands of idle sandboxes.n
Session persistence across restarts. Stop the worker, restart it, and resume your conversation. History, sandbox state, and the previous response ID are all preserved in the workflow.

Try it yourself#

The demo runs in three terminals:

# Terminal 1: Start Temporal
just temporal

# Terminal 2: Start the worker
just worker

# Terminal 3: Start the TUI
just tui

The local sandbox backend requires only Docker — no cloud API keys needed for a first run but you will need an OpenAI API key for LLM calls Once you're in the TUI, use /switch to change the current session onto a different backend, /fork to fork a session, and /done to exit.

The full source is at openai/openai-agents-python/examples/sandbox/extensions/temporal.

What's next#

This is the first step, not the last. Temporal is becoming the durability layer for the entire agent ecosystem. We have integrations with Pydantic AI, Vercel's AI SDK, and Amazon Bedrock Strands, with more coming soon.

The pattern is the same across all of them: the agent framework handles the AI. Temporal handles the infrastructure. Your agents get durability, state management, and fault tolerance without changing how you write them.

The sandbox makes agents powerful. Durable Execution makes them powerful and production-ready.

Get started with the demo

Introducing Temporal and agentic sandboxes: the OpenAI Agents SDK