What agentic AI borrowed from microservices (and made worse)

What's old is new again. After decades in tech, I can tell you that nothing has been truer.

A few months ago, I was at an unconference-style meetup, deep in a small group discussion about protocols — MCP specifically. People were proposing new specs, debating transports, arguing message formats — all really great conversation. Then someone in the group, rather emphatically, said something quite true:

"We've solved a lot of these problems before when we, as an industry, moved to the cloud-native environment and microservices architectures."

This is something that he and I both know quite well, as we were both a part of that industry transformation.

Yes, some of what's happening in the AI space today is genuinely new. For example, back in the day, I didn't have LLM APIs at my fingertips — but a good bit of what we're dealing with isn't new at all. Yes, the setting is different and the nuances are real, but the core problems are very familiar. We need to take notes from prior creations.

In this piece, I'm going to call out three patterns… okay, three and a half 😉. For each one, I'll ground it in the microservices world, then draw the parallel to what we're seeing now in the AI agent space.

Big isn't Better: The case for splitting things up#

Back in the early 2000s, we were firmly in the three-tier architecture world: storage tier, services tier, UI tier. The whole thing was released as a single, metaphorical "binary," and new versions shipped every 12–18 months. Everything was tested together, bundled together, and deployed together — so you got the whole batch whether you needed all of it or not.

Then along came microservices, and everything changed. We broke those monoliths into smaller pieces, each service developed and released independently. The "two pizza teams" ideology took ownership. We built container-based platforms to run them efficiently, and we created systems that allowed for independent SDLCs.

Of course, we also had to combine these microservices into actual applications — and this is where orchestration tends to sneak in alongside decomposition. We start by breaking things apart, and pretty quickly we have to figure out how to stitch them back together again.

Now, let's look at the AI space.

After ChatGPT dropped in late 2022, there was very much a "bigger is better" race: bigger datasets, bigger models, larger and larger context windows. Agents started to emerge, leading to systems where LLMs make decisions about application flow based on the provided context. The prevailing intuition was: the more relevant context we can give the LLM, the better the outcome.

But then reality sobered us up, as it tends to do. We realized that even when we do a decent job filtering away bad or irrelevant data, LLMs exhibit recency bias — they get confused when there's just too much to keep straight. (Honestly... same.)

So now we have an apparent paradox: we want agents to do more and more, but giving them everything makes them worse instead of better.

And that's where microagents and orchestration take the stage.

Instead of one big monolithic agent, we build a set of microagents, each doing one thing and one thing well. (Yes, the Unix philosophy was onto something.) This gives us many of the same benefits we got from microservices: smaller, more focused scope, independent evolution, smaller blast radius, clear ownership. Two pizza teams… do coding agents eat pizza? 🤔

Then we stitch those microagents together into larger systems: agents orchestrating tools and other agents.

So what does that look like? Microagents inherently constrain scope, and smaller scope means a tighter context window, which means better LLM performance on those scoped tasks. Instead of one giant AI travel agent that handles planes, trains, automobiles, hotels, hostels, private rentals, and entertainment, you build a transport agent, a lodging agent, an entertainment agent. A higher-level travel agent then understands that a trip spans categories and delegates to the appropriate sub-agents to deliver a coherent whole. The LLM, making decisions at that higher level, is given tools and sub-agents that allow it to orchestrate at large without getting distracted by low-level details.

What about independent evolution? In the microservices world, as semantically versioned APIs evolved on their own schedules, we often struggled with keeping the orchestration up to date. But in the agent world, this is turning out to be less of a problem, because LLMs are actually quite good at negotiating interfaces. Instead of brittle contracts and translation glue code, the LLM can adapt between slightly mismatched inputs and outputs. Protocols like MCP and the support that model APIs have for structured outputs give LLMs the context they need to do this quite effectively.

Why request/response isn't enough anymore#

If you built microservices in the early days, you probably started with request/response. One service calls another over HTTP, waits, gets a response, and moves on. Even as we added WebSockets and some degree of asynchronicity, we were still largely thinking in request/response terms.

Entire ecosystems were built around this model. Netflix OSS gave us Hystrix for circuit breaking, Zuul for routing, and Eureka for service discovery. One of my favorite talks from that era was about the caching layers Netflix built specifically to make request/response systems more resilient. And it worked — but it took a lot of heavy lifting to get there.

Then we started moving toward event-driven architectures (EDA). Even though message-based systems had existed for a long time, it took a while for them to take real hold in microservices, in part because the scale of the problem increased. But eventually, Kafka became hugely popular, aided by Confluent evangelizing a whole host of patterns. Martin Kleppmann wrote what I'd consider a seminal book on the topic, Designing Data-Intensive Applications. But even now, I'd make an entirely unsubstantiated claim that EDA adoption is still lagging.

Let's come back to AI again and look at agents.

The evolution is eerily similar.

Let's start with MCP. Initially, it addressed tools only invoked via request/response. Then, in November 2025, async tasks were introduced with a key new constraint: tools must continue to function no matter how long it takes to execute the tool, and no matter what types of infrastructure failures occur. The MCP spec refers to tasks as "durable state machines" — systems that live for extended periods of time, where generating results is decoupled from the request that kicked them off.

Or look at agent SDKs: OpenAI Agents SDK, Pydantic AI, Vercel AI SDK, for example. They give you a nice programming model: define an agent by selecting a model, providing system instructions, and a set of tools (which are often themselves microagents). But tool invocation is synchronous request/response. For rapid prototyping, this is fine — but for production-scale systems with reliability requirements and long-running workflows, it's not sufficient. We need to evolve here.

While some things are similar, there is one notable difference: the time scale changed. In the microservices era, user interactions were generally measured in seconds. In the new era, where agents are able to work increasingly autonomously, workflows can take minutes or hours. Request/response just doesn't hold up there.

What all of this points to is event-driven architectures at runtime. You need them — but the tricky part is that EDA is hard both to build and to operate. This is where — and yes, I know this sounds self-serving, but I genuinely mean it — Temporal enters the picture. Temporal lets you write code as if everything is request/response. It's a far easier mental model for developers (turns out, for AI agents too 🤖), and Temporal turns that into an event-driven distributed system under the hood. It's the best of both worlds: code that is easy to reason over, coupled with durable runtime behaviors.

And the brilliant part is that this paradigm extends to agent SDKs. Temporal already integrates with OpenAI Agents SDK, Pydantic AI, and Vercel AI SDK — and, most recently, the Google ADK. These integrations preserve the simple developer ergonomics and implicitly deliver production-grade runtime behavior.

The short-term memory problem#

One of the most foundational cloud-native patterns is the stateless service. By externalizing all long-lived state from the compute (into SQL, NoSQL, Redis) we are able to optimize the way infrastructure is managed. Kubernetes pods spin up and down in an elegantly self-healing way, even as state is externally preserved. But this placed a real burden on developers: you had to be deliberate about everything. Session state in Redis. Explicit reads and writes everywhere. In other words, you had to manage your service's memory yourself.

Now let's look at agents.

The right data at the right time is the lifeblood of any effective agent, and that data comes from and goes to a number of different places.

Some of it is long-lived: corporate data stores brought into the agent's view via RAG or MCP, outputs persisted to external systems — summaries of legal cases, orders placed by an inventory management agent. Conceptually, this isn't that different from microservices accessing databases.

And then there's the conversation history: user inputs, LLM decisions, tool execution outputs. Even though we often call this short-term memory, it still needs to be preserved even when the agent lifetime outlives that of the infrastructure it is running on.

Some frameworks address this directly. LangGraph and AWS AgentCore both provide APIs for capturing and retrieving agent memory. But just as we did in the microservices era, those APIs leave much of the burden on the developer.

But there's an interesting pattern here: notice that short-term memory collects a series of events. This harkens back to a fairly well-known but little-used pattern from the microservices era: event sourcing. Event sourcing records every event in a time-ordered log that can be replayed whenever needed. Applied to agents, that means you can reconstruct an agent's short-term memory — its exact state at any step — even if the underlying compute disappeared entirely.

At its core, Temporal's state preservation model is event sourcing, yet it goes one step further: it not only records events, it stores the results of side-effecting operations so they aren't re-executed on replay. No re-burning of tokens on LLM calls. No re-execution of side-effecting operations. What this gives you — from the developer's perspective — is short-term memory for free, no re-execution of expensive operations, and full reconstruction of agent state even in the face of non-deterministic LLMs and long-running, complex workflows.

The more things change…#

If there's one thing that stands out looking across all of this, it's not just that we've "seen this before" — it's how we're seeing it again.

In the microservices era, we started with patterns that allowed us to experiment with building loosely coupled, highly distributed systems. Only after running into harsh realities — scale, reliability — did we evolve toward the patterns that actually worked. Monoliths, then decomposition. Request/response, then event-driven. Stateful services, then stateless with externalized memory.

In the agent space, we're compressing that same journey into a much shorter time frame. But we're still following the same shape.

The challenge isn't that we don't know how to build these systems. It's that default instincts are still pulling us toward bigger context, simpler request/response flows, managing memory ourselves — all of which feel natural, and all of which break down at scale. New problems don't always need new solutions.

But sometimes new settings relax old requirements too. In microservices, we spent a lot of time making systems more machine-friendly via explicit contracts and strict interfaces. In agent systems, we now have components — LLMs — that are surprisingly good at adapting and capable of negotiating ambiguity. Some of the old constraints may loosen, but the system-level problems don't go away. Coordination, state, time, failure: they're all still there.

So the opportunity isn't to start over. It's to be deliberate: where can we lean on what we already learned? Where do the new capabilities actually change the equation? And where are we just replaying old mistakes... faster?

The last time we went through this transition, it took years to sort out the patterns. This time we have the benefit of hindsight — but we don't have the luxury of a long runway. We simply must leverage our prior experience to meet the challenge.

If you're anything like me, you love talking with fellow devs — so if you want to go deeper on what it actually takes to run agents in prod, I'd love to see you at Replay. I'll be there: teaching an AI workshop, moderating a panel, and you'll surely find me all over the hallway track. Catch me there!

What agentic AI borrowed from microservices (and made worse)

Big isn't Better: The case for splitting things up#

Why request/response isn't enough anymore#

The short-term memory problem#

The more things change…#

More Posts