Code Exchange - Durable Agentic Harness — crash-safe autonomous AI agents with human-in-the-loop approval

Temporal: The Durable OS for Agentic AI

Agents are easy to demo, hard to operate. Every production agent eventually hits the same wall — LLM calls flake, workers crash mid-tool-call, human approvals stall for hours, parallel work loses children on restart. The usual answer ("just add retries + Redis + a state machine") is the long road to badly reinventing Temporal.

This demo reframes Temporal not as a workflow engine but as OS primitives for agent loops, underneath an autonomous OpenAI-Agents-SDK stock-trading agent:

OS primitive	Temporal equivalent	What it gives agents
Process scheduling	Workers + task queues	LLM/tool work dispatched durably
Autosave / journaling	Event history	Replay from the exact event after a crash
IPC / interrupts	Signals, Updates & queries	Human-in-the-loop, mid-flight steering
Memory / state	Workflow state	Survives restarts — no Redis, no S3 checkpoints
Drivers	Activities	Side-effects, retried & idempotent by default
Long-lived sleep	`workflow.sleep()`	Pause days/weeks at zero CPU cost

Who this is for / use cases:

The trading agent is the vehicle — the real subject is the pattern for any long-running, autonomous, or human-supervised agent. Reach for this when:

Crash-safe agent loops — agents that run for minutes to weeks and must survive worker restarts, deploys, and infra failures without losing in-flight state or re-doing side-effects (orders placed, emails sent, payments made).
Human-in-the-loop approvals — workflows that pause indefinitely at zero CPU cost waiting on a human to approve/reject a high-stakes action (a large trade, a refund, a production change), then resume exactly where they left off.
Parallel fan-out / fan-in — exploring N strategies, prompts, or candidates concurrently in isolated sandboxes and selecting a winner, with automatic cleanup of children if the parent restarts.
Auditable AI decisions — every LLM call, tool call, and signal is a queryable event in history, giving you a replayable audit trail for compliance and debugging ("why did the agent do X at tick 14?").
LLM/tool reliability — wrapping flaky model and API calls as activities so they retry idempotently by default, instead of hand-rolling retry + backoff logic.

If you're building agentic systems and finding yourself bolting on Redis, Celery, a retry library, and a state machine for approvals, this demo shows what those concerns look like when Temporal owns them instead.

What the agent does:

Discovers a strategy by fanning out N parallel sandboxed backtests in airgapped Docker containers (child workflows).
Lives through a tick loop: market + news context, LLM trade-intent via the OpenAI Agents SDK (activity_as_tool), a deterministic risk guardrail, and human-in-the-loop approval for large trades.
Survives chaos — kill the worker mid-trade and Temporal replays the decision history to resume from the exact line.

Stack: FastAPI (sole Temporal client) + SSE, React 18 / Vite / Tailwind UI, temporalio[openai-agents] workflows wired via OpenAIAgentsPlugin, Docker sandboxes for backtests, and Mockoon for offline/deterministic market·news·broker data.

What it strips out — that you'd otherwise write yourself: no Celery, no Redis-backed queue, no hand-rolled retry policy, no "save progress to S3" code, no state machine for approvals, no orphan-child cleanup. Temporal owns all of it; what's left on top is just the agent logic.

What Linux did for processes, Temporal does for agent loops.

Durable Agentic Harness — crash-safe autonomous AI agents with human-in-the-loop approval

An autonomous stock-trading agent (OpenAI Agents SDK) that reframes Temporal as the Durable OS for agentic AI: workers = scheduling, event history = autosave, signals = human-in-the-loop. Kill the worker mid-trade and the agent replays from the exact line.