Durable Agentic Harness — crash-safe autonomous AI agents with human-in-the-loop approval
An autonomous stock-trading agent (OpenAI Agents SDK) that reframes Temporal as the Durable OS for agentic AI: workers = scheduling, event history = autosave, signals = human-in-the-loop. Kill the worker mid-trade and the agent replays from the exact line.
Temporal: The Durable OS for Agentic AI
Agents are easy to demo, hard to operate. Every production agent eventually hits the same wall — LLM calls flake, workers crash mid-tool-call, human approvals stall for hours, parallel work loses children on restart. The usual answer ("just add retries + Redis + a state machine") is the long road to badly reinventing Temporal.
This demo reframes Temporal not as a workflow engine but as OS primitives for agent loops, underneath an autonomous OpenAI-Agents-SDK stock-trading agent:
| OS primitive | Temporal equivalent | What it gives agents |
|---|---|---|
| Process scheduling | Workers + task queues | LLM/tool work dispatched durably |
| Autosave / journaling | Event history | Replay from the exact event after a crash |
| IPC / interrupts | Signals, Updates & queries | Human-in-the-loop, mid-flight steering |
| Memory / state | Workflow state | Survives restarts — no Redis, no S3 checkpoints |
| Drivers | Activities | Side-effects, retried & idempotent by default |
| Long-lived sleep | workflow.sleep() |
Pause days/weeks at zero CPU cost |
Who this is for / use cases:
The trading agent is the vehicle — the real subject is the pattern for any long-running, autonomous, or human-supervised agent. Reach for this when:
- Crash-safe agent loops — agents that run for minutes to weeks and must survive worker restarts, deploys, and infra failures without losing in-flight state or re-doing side-effects (orders placed, emails sent, payments made).
- Human-in-the-loop approvals — workflows that pause indefinitely at zero CPU cost waiting on a human to approve/reject a high-stakes action (a large trade, a refund, a production change), then resume exactly where they left off.
- Parallel fan-out / fan-in — exploring N strategies, prompts, or candidates concurrently in isolated sandboxes and selecting a winner, with automatic cleanup of children if the parent restarts.
- Auditable AI decisions — every LLM call, tool call, and signal is a queryable event in history, giving you a replayable audit trail for compliance and debugging ("why did the agent do X at tick 14?").
- LLM/tool reliability — wrapping flaky model and API calls as activities so they retry idempotently by default, instead of hand-rolling retry + backoff logic.
If you're building agentic systems and finding yourself bolting on Redis, Celery, a retry library, and a state machine for approvals, this demo shows what those concerns look like when Temporal owns them instead.
What the agent does:
- Discovers a strategy by fanning out N parallel sandboxed backtests in airgapped Docker containers (child workflows).
-
Lives through a tick loop: market + news context, LLM trade-intent via the OpenAI
Agents SDK (
activity_as_tool), a deterministic risk guardrail, and human-in-the-loop approval for large trades. - Survives chaos — kill the worker mid-trade and Temporal replays the decision history to resume from the exact line.
Stack: FastAPI (sole Temporal client) + SSE, React 18 / Vite / Tailwind UI,
temporalio[openai-agents] workflows wired via OpenAIAgentsPlugin, Docker sandboxes
for backtests, and Mockoon for offline/deterministic market·news·broker data.
What it strips out — that you'd otherwise write yourself: no Celery, no Redis-backed queue, no hand-rolled retry policy, no "save progress to S3" code, no state machine for approvals, no orphan-child cleanup. Temporal owns all of it; what's left on top is just the agent logic.
What Linux did for processes, Temporal does for agent loops.
Language
Temporal Verified
About the Author
