Durable multi-agentic AI architecture with Temporal

Multi-agent architecture enables several powerful patterns. Here, I’ll start from the basics, and describe how Temporal can be used to make building multi-agent systems simple, durable, and fun.

Why multi-agent architectures?#

AI Agents are powerful, but they can get overwhelmed by too many tools or too much to think about while having a limited context window. You may want to limit the access a particular agent has, or take advantage of a particular model for an agent. In addition, general purpose do-everything agents can be more challenging to evaluate than special-purpose agents that do one task well.

Orchestrating agents together is a particularly good fit for Temporal’s orchestration architecture, as Temporal is great at durable orchestration of many tasks as part of an overall process.

In this post, we’ll talk briefly about agent routing orchestration with Temporal, and then go deeper into task delegation.

Agentic term definitions#

Here are some definitions to help you out.

Term	Definition
Agent Routing	Switching goals and agents with a Routing Agent orchestrating the routing between agents.
Task Delegation	Delegation of tasks by an orchestrator agent to sub-agents specializing in those tasks.
Conversational Agents	Agents that can have multi-turn conversations with a human to refine the tasks, elicit arguments, and add context to inform the agent.
Automation Agents	Agents that are designed to take input, take action independently and intelligently, but without multiple conversation steps.
Proactive Agents	Agents that work autonomously, operating without a human trigger, and may be continuously monitoring and even taking action without human intervention.

Agent routing#

One multi-agent pattern is Agent Routing: switching goals and agents with a Routing Agent orchestrating the routing between agents.

agent-routing-multi-agent-patterns Here, our user is trying to plan a trip. They speak with the Routing Agent, which routes them to the Flights Booking agent to look for events and book flights.

Then, after booking flights, the system redirects the user to the Paid Time Off agent to check their PTO, and book their vacation time.

You can explore Agent Routing further with the Temporal Agent Demo here, and there’s a video of it in action below.

Task delegation#

A more powerful model for managing complexity with many agents is to delegate responsibilities to agents by task.

Think of this as having a team of specialists, with each agent being good at their part of a problem. We call this Task Delegation, and it mirrors how teams can work in real life. Think of the team that you work on at your company. Each of you has a distinct role, scoped responsibilities, and collaboration protocols. Your agents can use a similar structure.

task-delegation-managing-complexity

If you want to skip to the end and see the final result in action, check out this video:

A simple example: detection and repair with an analysis agent#

So let’s say I’m a wizard in the Harry Potter universe, and I work in Scribbulus Supplies, a paper and magical goods shop in Diagon Alley. The students and staff of Hogwarts get all their quills, parchment, school supplies, etc. from me.

Lately, I notice there are various problems with orders: not enough inventory, orders needing approval, etc. I don’t want to solve these problems by hand — I want to build a system that magically executes order repairs so I don’t have to worry about them (there are so many Quidditch matches to be seen this season).

Luckily, I know just how to fix this. Here’s how.

The repair process#

To start, let’s build a simple agent to detect if there are any problems in my orders system. I have the orders and inventory data in databases, and tools that can read the data. repair-process-agentic-order-system

@activity.defn
async def analyze_order_problems(input: dict) -> dict:
   #<snip> 
   # Define the messages for the LLM completion
    context_instructions = "You are a helpful assistant that detects and analyzes problems in orders. " \
    "Your task is to analyze the provided orders and identify any issues or anomalies. " \
    "You will receive a list of orders in JSON format, " \
    "each containing an 'order_id', 'order_date', 'status', 'items', and 'quantities'. " \
    "Look for common problems such as orders needing approval, orders stuck or delayed for various reasons for more than two weeks, " \
    "or other anomalies. " \
    "Ensure your response is valid JSON and does not contain any markdown formatting. " \
    "The response should be a JSON object with a key 'issues' that contains a list of detected issues, " \
    "each with an order_id, item with key 'issue' that describes the issue, " \
    "the customer_name the order is for, and  " \
    "a confidence_score of how sure you are there is a problem. " \
    "Feel free to include additional notes in 'additional_notes' if necessary. " \
    "If there are no issues, note that in additional_notes. " \
    "The list of orders to analyze is as follows: " \
    + json.dumps(orders_to_detect_json, indent=2)
    messages = [
        {
            "role": "system",
            "content": context_instructions
            + ". The current date is "
            + DATE_FOR_ANALYSIS.strftime("%B %d, %Y"),
        },
    ]

    try:
        completion_kwargs = {
            "model": llm_model,
            "messages": messages,
            "api_key": llm_key,
        }

        response = completion(**completion_kwargs)

        response_content = response.choices[0].message.content

    except Exception as e:
        activity.logger.error(f"Error in LLM completion: {str(e)}")
        raise

The agent produces output that looks like this:

{
	"issues": [
		{
			"customer_name": "Rubeus Hagrid",
			"order_id": "ORD-004-RHG",
			"issue": "Order contains restricted item requiring approval.",
			"confidence_score": 0.9,
			"additional_notes": "Order includes 'Dragon Egg - Norwegian Ridgeback' which requires Ministry approval."
		}
]
}

So now with the power of AI, we can determine that Hagrid’s order requires approval before packaging and shipping. I can run this Activity from a Temporal Workflow, and it will durably execute and be tolerant of my database being down, slow to respond, or the LLM having temporary issues. The analysis agent tells me that it’s 90% sure in its analysis that the order is stuck because it needs approval.

I can use this information to understand what’s going on with my magical orders — but it would be better if I could also do repairs.

Adding agentic repair#

Let’s add two more steps to our workflow: plan creation and plan execution.

Plan creation combines our found problems, the inventory and order data, and tools we give to the agent to repair orders, and creates a plan to repair the orders. plan-creation-order-and-inventory-data

Here’s what the plan looks like for Hagrid’s order:

{
	"proposed_tools": {
		"ORD-004-RHG": [
			{
				"tool_name": "request_approval_tool"
"confidence_score": 0.9,
				"tool_arguments": {
					"approval_request_contents": "Request to Approve Order",
					"approver": "approve-orders@diagonalley.co.uk",
					"order_id": "ORD-004-RHG"
				}				
			}
		]
	}
}

Plan execution is the step where we execute the tools per the plan. Temporal durability ensures the tools execute successfully.

We separate these steps so a human can review the plan and approve or reject it.

This necessitates our process having the ability to wait for human approval as well as accepting human input. Temporal Workflows make this easy. Here’s what our Workflow looks like now:

@workflow.run
    async def run(self, inputs: dict) -> str:
        #execute the analysis agent
        await self.analyze_problems(self.context)

        # Execute the planning for this agent
        await self.create_plan(self.context)

        # Wait for the approval or reject signal
        await workflow.wait_condition(
            lambda: self.approved is not False or self.rejected is not False,
            timeout=timedelta(hours=12),
        )

        if self.rejected:
            return f"Repair REJECTED by user {self.context.get('rejected_by', 'unknown')}"

        # Proceed with the repair
        await self.execute_repair()

Adding conversational capability with MCP#

Up to this point, I’ve built agents that work in the background. Let’s connect these to a conversational agent via Model Context Protocol (MCP). For this example, I’ll use goose, created by our friends at Block.

@mcp.tool(description="Get the proposed tools for the repair workflow.",
          #tags={"repair", "order management", "workflow", "proposed tools"},
          )
async def get_proposed_tools(workflow_id: str, run_id: str) -> Dict[str, str]:
    """Return the proposed tools for the repair workflow. This is the result of the planning step. 
    This should not be confused with the tools that are actually executed.
    This won't have results before the planning step is complete."""
    load_dotenv(override=True)
    user = os.environ.get("USER_NAME", "Harry.Potter") 
    client = await get_temporal_client()
    handle = client.get_workflow_handle(workflow_id=workflow_id, run_id=run_id)

    try:
        planning_result: dict = await handle.query("GetRepairPlanningResult")
        proposed_tools_for_all_orders: dict = planning_result.get("proposed_tools", [])
        additional_notes = planning_result.get("additional_notes", "")
    except Exception as e:
        print(f"Error querying repair planning result: {e}")
        proposed_tools_for_all_orders = "No tools proposed yet."

    return {
        "proposed_tools": proposed_tools_for_all_orders,
        "additional_notes": additional_notes
    }

I created a Query in my workflow to enable this MCP tool:

 @workflow.query
    async def GetRepairPlanningResult(self) -> dict:
        if "planning_result" not in self.context:
            raise ApplicationError("Planning result is not available yet.")
        return self.context["planning_result"]

Here you can see me chatting with goose: goose-working-with-order-repair So now I have a conversational agent that connects with my repair agents, building an agentic application that talks to a human to solve real problems.

I don’t have to use goose, I can connect to these tools via Slack chat, integrating with Claude, integrating with VSCode, or other conversational agents that can work with MCP.

These steps form a common pattern for Agentic applications: agentic-steps-order-management (green boxes indicate agentic steps)

Building a proactive agent#

With Temporal, it’s easy to build proactive agents that can monitor and even act on their own. I’m going to set this one to start and look for problems. If problems are found, the agents will go deeper: analyzing problems and proposing solutions. I’ll set it up to notify me that problems are found, and wait for my review and approval.

Here’s what it looks like in the Temporal Workflow History: Temporal-Workflow-history-for-order-management One of the great things about Temporal is I can see every step — inputs, outputs, errors for each. If any step fails, it will automatically Retry until the step succeeds. I can quickly diagnose any problems and if needed, fix bugs or prompts and redeploy.

Here’s what my Workflow looks like now:

@workflow.defn
class RepairAgentWorkflowProactive(RepairAgentWorkflow):
    def __init__(self) -> None:
    # <snip> initialization

    @workflow.run
    async def run(self, inputs: dict) -> str:
        while True:
            self.approved = False

            await workflow.wait_condition(
                lambda: self.stop_waiting is True,
                timeout=timedelta(days=1), # Wait a day for the next detect->analysis->repair cycle.           
            )  

            # Execute the detection agent
            self.problems_found_confidence_score = await self.perform_detection(self.context)

            self.problems_found_confidence_score = self.context["detection_result"].get("confidence_score", 0.0)
            if self.problems_found_confidence_score < 0.5:
                analysis_notes = self.context["detection_result"].get("additional_notes", "")
                workflow.logger.info(f"Low confidence score from detection: {self.problems_found_confidence_score} ({analysis_notes}). No repair needed at this time.")
                self.set_workflow_status("NO-REPAIR-NEEDED")
                continue  # Skip to the next iteration if no repair is needed

            #execute the analysis agent      
            await self.analyze_problems(self.context)

            # Execute the planning for this agent
            await self.create_plan(self.context)

            # for low planning confidence scores, 
            # we will notify the user and wait for approval
            if self.planning_confidence_score <= 0.95:
                # Notify the user about the planned repairs
                await self.execute_notification()
                # Wait for the approval or reject signal
                await workflow.wait_condition(
                    lambda: self.approved is not False or self.rejected is not False,
                    timeout=timedelta(hours=20),  # Wait for up to 24 hours for approval
                )

                if self.rejected:
                    continue # Skip to the next iteration if repair is rejected

            else:
                # If the planning confidence score is high enough, we can proceed without waiting for approval
                self.approved = True
                self.context["approved_by"] = f"Agentically Approved with confidence score {self.context['planning_result'].get('overall_confidence_score', 0.0)}"

            # Proceed with the repair
            await self.execute_repair()

            # Create the report with the report agent
            report_summary = await self.generate_report()

Here’s our architecture model now: agentic-architecure-model

Our system is now proactive, detecting and analyzing problems on its own, automatically repairing them if it’s sure of its repairs. If it’s not more than 95% confident, it notifies me and waits for my review and approval. I can interact with it via MCP, Temporal Workflow Signals and Queries, and goose, reviewing order problems and proposed repairs.

This system delegates tasks to multiple agents which fulfill multiple roles in the overall system and have only role-related access to the backing database. Temporal orchestrates all of these agents easily, transparently, and durably. The agents are resilient due to Temporal Activities.

Design patterns#

Here are the design patterns and decisions I made while building this system:

Lifecycle and interaction#

One question I get asked often is, “Should I build an agent as a Workflow or as an Activity?”

For me, the answer is: it depends on lifecycle and interaction:

Conversational Agents: interactive, long running, orchestration: Workflows
LLM Calls: call to external system: Activities
Simple Automation Agents: short-lived, no interaction: Activities
Long-Running Agents: long duration, often have interaction: Workflows
Proactive Agents: interactive, long-running: Workflows

DAPER#

This system implements a pattern that is useful for agents that solve problems and execute on a user’s behalf:

Detect: notice there are problems
Analyze: understand the nature of the problems
Plan: create a plan to fix the problems
Execute: execute the plan
Report: analyze the system and report on problem resolution

a-dapper-gentleman A dapper gentleman

Uses for this pattern could include:

Proactive issue detection
On-call support
Monitoring or SRE tooling
Transaction repair in flight

Try it out!#

Temporal gives agents and tools superpowers: durability, visibility, simplicity. Workflows enable interaction, orchestration, dynamic decision making, unlimited duration, and auto-save for state & memory. Activities enable reliable execution of external calls with automatic retries.

If you haven’t yet, check out the video of this system in action and check out the code here.

Using the power of multi-agent architectures with Temporal