I think having something like Temporal’s Durable Execution is going to become table stakes for building reliable agents, especially long-lived agents
Industry
Artificial Intelligence
Use Case
Agents & Infra
Company Size
51-250
SDK
Python
Temporal
Cloud
Replit is known as an industry leader in AI coding, although the company originally got its start selling a cloud integrated development environment (IDE) in 2016. They launched Replit Agent in 2024: an agent that automates software development, letting anyone build applications more easily. After a massive surge in adoption, Replit pivoted their business around the agent product. The company's momentum continues to grow with the recent release of Agent 3.
It's a pretty bad user experience to have the agent get super far into something and then hit a catastrophic error, and you lose everything and have to restart.
Replit Agent was initially built with custom orchestration and reliability logic. After the product launched, the Platform team faced the tasks of cleaning up tech debt and improving system reliability. Two main challenges were:
We launched Replit Agent in September 2024, and in November we looked into Temporal. It took a couple of weeks to get pretty much the whole thing migrated onto Temporal. So it’s very easy to use, with a very good DevEx.
An engineer on Replit’s Platform Team had familiarity with Temporal from past work experience. He understood Temporal as a leader in the orchestration space, and evaluated the product to see how easy it was to use.
Here’s what stood out to the team:
I appreciate the SDKs tend to be idiomatic for the language. The Python SDK really excels at that. For example, you can use regular “async/await.” And so it was really easy to just take our existing code, and start plugging pieces of it into the Workflow and Activities, and to get something working end-to-end.
The Platform team built a prototype on Temporal. After seeing good results, they migrated the control plane layer of Replit Agent.
Temporal solves Replit Agent’s orchestration challenges through the following architecture: Every Agent is its own Temporal Workflow. Workflow IDs are unique, so Temporal ensures there’s only one active Workflow at a time—and therefore only one agent process per user session. Temporal coordinates all the steps of the agent lifecycle, like spinning up the agent and turning it off.
The Workflows runs Activities, which contain failure-prone, non-deterministic logic. Activities ensure this logic automatically recovers. With the Workflow Updates feature, messages can be pushed into the Workflow that may include human-in-the-loop interactions. For example, Replit Agent can pause and wait for the user to accept a consent message, before the Workflow continues driving the agent.
Temporal gives us a lot more confidence to build the product and know that it's not going to have lots of edge cases that lead to bad user experiences.
With Temporal, the Platform Team has improved Replit Agent’s orchestration and reliability, at increasingly high scale. They also cited additional unexpected benefits of adopting Temporal. For example, the team has more confidence to build great products, and they can move faster because non-infrastructure engineers don’t need to worry about plumbing.
Replit Agent is just one of many use cases. The Platform team also uses Temporal for:
Replit started out using the Python SDK, and now they’ve adopted the Typescript and Go SDKs for the other use cases.
In the time we've used it, we haven't had any major incidents that trace back to Temporal Cloud, which is great.
As hoped, the team hasn’t experienced any incidents while using Temporal Cloud. They also started using Multi-Region Replication, a feature that replicates their primary Namespaces to a backup region. This feature allowed them to avoid an incident recently when there was a cloud provider degradation in their primary region.
We've been able to scale up, and Temporal has never been the bottleneck. The agent has massively increased in its usage, and not having to rebuild our entire orchestration engine is great.
Replit Agent is a game-changer for application development. With Temporal, the Replit Platform Team can provide users with the experience they deserve, and do so with confidence.
Looking to build reliable, scalable agents like Replit? Start today with a free trial of Temporal Cloud and $1,000 in credits.
Ready to learn why companies like Netflix, Doordash, and Stripe trust Temporal as their secure and scalable way to build and innovate?
Artificial Intelligence
Cheat Sheet: Temporal for AI
Artificial Intelligence
TypeScript
Lindy boosts reliability and observability of AI agents with Temporal Cloud