AI isn’t hypothetical anymore. It shows up in day-to-day developer work and in customer-facing products. Most teams say they’re using it in their work, and a large share is either already building AI/ML systems or learning how.
Where AI hits hardest today is in our code generation. It’s the top area of perceived impact, with other lifecycle stages trailing behind. That matters because the places AI touches code paths are exactly where reliability problems surface first.
In our latest survey of over 150 developers and technical leaders — largely from enterprise-scale companies across North America and Europe — nearly half are still exploring or prototyping, while 38% say AI is essential or actively scaling in production. However, confidence is still low: only 13% feel very confident observing and debugging AI workflows at scale, and 62% report measurable time or revenue lost to reliability issues each year.
Meanwhile, day-to-day operations are still bumpy. Only about a quarter of teams describe workflow operations as “smooth,” and many report overhead, fragile long-running work, and messy recovery. You can feel that in incident reviews and in the glue code holding things together.
The near-term focus reflects this reality. Our State of Development 2025 study found that reliability and compliance top the 12–24 month priority list, followed closely by automation and debt reduction. If you’re planning where to invest, start there.
What “production AI” actually asks of your platform#
AI systems stretch the parts of your stack that already creak:
- Non-deterministic steps: LLM calls, retrieval, and external APIs fail in partial, creative ways.
- Long-running, multi-actor flows: planning, tool use, human approvals, and compensations don’t fit a single request/response.
- Audit and cost: you need a record of prompts, decisions, and spend tied to business outcomes.
It’s familiar distributed-systems pain turned up to 11=
Five patterns that make AI shippable#
1. Guarded tool calls#
Wrap each LLM or tool step with timeouts, retries with backoff, circuit breaking, and explicit fallbacks. Persist inputs/outputs and attach cost metadata. Promote to human review when thresholds trip.
2. Idempotent side-effects#
Write once semantics with idempotency keys. When you can’t guarantee exactly-once, pair the write with a compensating action you can run automatically.
3. Human checkpoints#
Turn “DM me before sending” into a real approval step that pauses the run and resumes cleanly. Treat reviewer timeouts as first-class outcomes.
4. Deterministic plans, versioned prompts#
Stabilize the orchestration while allowing variability inside steps. Record prompt and model versions so replays are faithful.
5. Cost and policy budgets#
Track tokens and spend per run. Degrade gracefully or halt when budgets or safety checks fail. Escalate instead of silently dropping work. These patterns map directly to the operational issues teams report: overhead, long-running complexity, and brittle recovery.
Why Durable Execution becomes the backbone#
Teams turn to Durable Execution for state, retries, visibility, and long-running task management. It’s tightly linked with orchestration adoption, and usage is already widespread — even more so in large companies. The benefit is simple: durable workflows complete or resume after crashes without ad-hoc glue.
If you’re already investing in agents, note the trend lines: OpenAI Agents SDK leads reported usage, followed by Google’s ADK and then LangChain — and there’s a native path to bring orchestration into those flows.
The takeaway#
AI is here across the SDLC, and teams are at different stages: using assistants, building AI/ML into products, and building the reliability to run those systems at scale. The fastest way to make progress is to standardize on durable, observable workflows and treat reliability as a product requirement, not an afterthought.