Building a persistent conversational AI chatbot with Temporal

A carefree, infinitely scalable chatbot architecture#

How do you build a chatbot that remembers every conversation, scales infinitely, and never loses context — even during deployments?

This is a problem that many companies have faced when building customer service chatbots that need to handle millions of concurrent conversations while maintaining perfect context awareness. Traditional chatbot architectures fail at scale: they lose conversation context during restarts, can’t scale horizontally due to stateful sessions, and struggle with long-running interactions.

Enter Temporal — the workflow orchestration platform that transforms how we think about conversational AI. By treating each conversation as a persistent workflow, we can build chatbots that are truly stateless, infinitely scalable, and resilient to failures.

In this post, I’ll walk you through building a telecommunications customer service chatbot that demonstrates these principles. Users can inquire about mobile devices, check subscriptions, manage consents, and get general support — all while the system maintains perfect conversation context across any number of pod restarts, deployments, or scaling events.

The key point? One Temporal Workflow per conversation, with automatic lifecycle management that ends conversations after ‘N’ minutes/hours/days of inactivity or when users explicitly say goodbye.

Requirements#

Let’s start from the beginning: what does it take to implement a proper chatbot, and how would it typically be built without Temporal?

Functional requirements#

Users must be authenticated.
An AI-powered chatbot that provides information on consents, subscriptions, devices, stock availability, and general user details.
The chatbot must be able to scale indefinitely to support new features and conversation scenarios.
Multilingual support (at minimum Spanish, French, and English).
Conversations can last as long as the user needs, without forced timeouts.
When a conversation ends, the system must generate a concise summary of the interaction and make it retrievable later.
Each conversation should produce a satisfaction score based on its context, and conversations should be retrievable by score.

Non-functional requirements#

If a user loses their connection and reconnects later, the conversation should seamlessly continue.
Conversations must persist across server/backend restarts or updates.
The system should support effectively unlimited concurrent conversations.
All conversations must be tracked with full state transparency — allowing developers or operators to inspect not only live conversations but also the exact sequence of steps that led to the final state.
If any dependency (API, tool, database, or LLM) goes down and later recovers — regardless of the outage duration — the conversation must resume exactly where it left off, with full history intact. Dropping or resetting conversations is not acceptable.
Chatbot backend must be implemented in Java.

Now that we have a clear picture of what our chatbot must do, let’s take a step back and see how we would implement it without Temporal.

Design#

Meeting all of these functional and non-functional requirements is not trivial. Traditional chatbot architectures typically rely on in-memory sessions, which leads to several challenges:

Session loss during server restarts or deployments.
Limited horizontal scalability because each session is tied to a specific server.
Difficulty maintaining long conversations, especially if users disconnect and reconnect.
Complex error handling when dependencies (APIs, databases, or LLMs) go down.

To illustrate this, here’s a high-level design of a traditional chatbot system: Screenshot 2025-10-09 at 12.36.25 PM

In this setup, the conversation state is kept inside each application instance, which means several problems immediately appear:

State tied to a single app: if the application restarts or crashes, the user’s session is lost and the conversation context disappears.
Sticky sessions required: every user must always be routed back to the same instance to preserve their context. This reduces flexibility and scalability, and becomes problematic in real scenarios — for example, if a user loses their connection mid-flight and resumes from another country, they may no longer land on the same instance.
Difficult prompt management: all the business logic tends to be packed into a single, monolithic prompt. As use cases grow, this quickly becomes hard to maintain, and any error forces the conversation to restart from scratch.
Fragile error handling: if one external service call fails (e.g., fetching subscriptions), the user only sees an error and must manually start the process over. The system has no way of resuming the workflow automatically once the dependency recovers.

While there are manual ways to work around some of these issues — storing state externally, implementing retries, etc. — they add significant complexity without fully solving the core problem. This is exactly why a workflow-orchestration approach like Temporal offers such a powerful alternative.

Now, let’s take a closer look at what our state machine/conversation graph might look like when modeling the chatbot: Screenshot 2025-10-09 at 12.38.21 PM Based on the requirements, the chatbot needs to support the following scenarios:

Consents: the user can retrieve their accepted, rejected, or pending consents.
Subscriptions: the user can view their active subscriptions and even ask more complex questions about them — for example, whether any of their subscriptions are currently flagged for fraud investigation.
Product catalog: the chatbot returns the full catalog of available products (e.g., devices or mobile phones), including prices and stock levels.
Helper: if the user asks about something outside the supported scenarios, the chatbot should redirect them towards the available features.
Farewell: when the user ends the conversation, the chatbot should politely close the interaction and trigger two additional “closing states.” ○ Satisfaction score: evaluates customer satisfaction based on the entire conversation context. ○ Summary: generates a short summary of the conversation, enabling future retrieval of conversations by score or by scanning their summaries.

This graph-based approach makes the flow explicit and makes it easy to maintain and scale in terms of new functionality.

As readers may have noticed, this Workflow runs “indefinitely” during a user conversation or until a timeout signals the user has stopped interacting. The workflow is iterative, returning to an “initial” listening state after processing each message (Temporal Signal per message).

The conversation session isn’t stored in the application, but in Temporal, each Workflow corresponds to a unique user and conversation. When a user sends a message, we first retrieve their active conversation (Workflow) or create a new one if none exists.

This approach enables stateless applications, eliminating the need for a load balancer with sticky sessions or app-managed sessions. Any application instance can handle any user request at any time by retrieving the conversation state from Temporal. Screenshot 2025-10-09 at 12.40.51 PM Basically, the application acts as a proxy between the user and the Temporal Workflows, where all the actual work takes place. Communication with the user can be handled through a protocol like SSE or WebSockets, while communication with Temporal is managed via Workflow signals.

From a workflow design perspective, each Activity/state in the Workflow interacts with the LLM, often requiring specific company data (tools) to provide the necessary context. In practice, an Activity implements a call to the LLM (a prompt) and uses one or more tools to enrich that call with user context.

Additionally, the full conversation history is stored as a Workflow property and passed as input to the LLM on every call. This ensures that the LLM always has complete context to deliver accurate responses to the user’s requests.

Advantages compared to a traditional implementation#

This approach provides several key benefits when measured against both the functional requirements and the limitations of traditional architectures:

Simplified state management: Storing sessions in Temporal eliminates in-memory session management, reducing application complexity and enabling stateless apps.
Infinite scalability: Applications become stateless proxies. Any instance can pick up any request at any time, removing the need for sticky sessions and enabling true horizontal scaling.
Resilient Workflows: If an API, tool, or the LLM itself goes down, the Workflow simply pauses and resumes automatically when the dependency recovers — no need for the user to restart the conversation. Think in app restarts, deployments, or dependency outages (e.g., APIs, LLMs).
Maintainable design: Each Activity maps to a clear step in the conversation (e.g., fetch subscriptions, check devices, update consent). This modularity avoids the “giant monolithic prompt” problem and adding new features (e.g., new conversation scenarios) is straightforward with Temporal’s Workflow design.
Rich context management: By storing and replaying the conversation history, the LLM always has full awareness, resulting in more precise, coherent, and human-like interactions.
Transparent state inspection: Developers and operators can inspect any conversation Workflow at any time, including its history and decisions — something nearly impossible in traditional chatbot setups.
Robust error handling: Temporal.io automatically retries failed activities (e.g., during API outages), ensuring seamless conversation continuity without manual intervention.

In short, by combining Temporal Workflows with LLM-driven activities, we get a chatbot architecture that is scalable, fault-tolerant, and context-aware by design, far surpassing the fragility of traditional session-based implementations.

A note on MCPs#

In my implementation, the tools interact directly with a REST API. However, adapting this design to consume services from an MCP server would be trivial. Instead of calling the REST API directly, each tool could be replaced by an authenticated MCP client that communicates with the MCP server providing the desired service.

In my case, the telco API I’m consuming does not expose any MCP services, which is why I have to integrate with it directly via REST.

Implementation walkthrough#

Perhaps one of the most interesting classes from an implementation perspective is ChatbotService.java, since this is the class responsible for processing all incoming messages from the UI.

public String processMessage(String userEmail, String message) { 
    LOG.info("[ChatbotService] - processMessage | Processing message for user: " + userEmail); 
    IvrWorkflow workflow = getOrCreateWorkflowForUser(userEmail); 
    String requestId = generateRequestId(); 
    workflow.processMessage(message, requestId); 
    return waitForResponse(workflow, userEmail, requestId); 
}

The method getOrCreateWorkflowForUser essentially handles “session-less” management. In other words, for each incoming message, it verifies whether we already have an open conversation/workflow for that user. If it exists, it is returned; if not, a new one is created and returned.

private IvrWorkflow getOrCreateWorkflowForUser(String userEmail) { 
    LOG.debug("[ChatbotService] - getOrCreateWorkflowForUser | Getting workflow for user: " + 
userEmail); 
    String workflowId = "ivr-session-" + userEmail.replaceAll("[^a-zA-Z0-9]", "-"); 

    WorkflowOptions options = WorkflowOptions.newBuilder() 
            .setTaskQueue(IVR_TASK_QUEUE) 
            .setWorkflowId(workflowId) 
            .build(); 

    IvrWorkflow workflow = workflowClient.newWorkflowStub(IvrWorkflow.class, options); 

    try { 
        WorkflowClient.start(workflow::startSession, userEmail); 
        LOG.info("[ChatbotService] - getOrCreateWorkflowForUser | New workflow started 
successfully for user: " + userEmail); 
    } catch (WorkflowExecutionAlreadyStarted e) { 
        LOG.info("[ChatbotService] - getOrCreateWorkflowForUser | Workflow already exists for 
user: " + userEmail + ", connecting to existing workflow"); 
    } catch (Exception e) { 
        String errorMessage = e.getMessage(); 
        if (errorMessage != null && (errorMessage.contains("ALREADY_EXISTS") || 
errorMessage.contains("already running"))) { 
            LOG.info("[ChatbotService] - getOrCreateWorkflowForUser | Workflow already running 
for user: " + userEmail + ", connecting to existing workflow. Error type: " + 
e.getClass().getSimpleName()); 
        } else { 
            LOG.error("[ChatbotService] - getOrCreateWorkflowForUser | Failed to start 
workflow for user: " + userEmail + ". Error type: " + e.getClass().getSimpleName(), e); 
            throw new RuntimeException("Failed to create or connect to workflow for user: " + 
userEmail, e); 
        } 
    } 

    return workflow; 
}

Once we have an active conversation/Workflow, we proceed to process the message by invoking the processMessage method.

@Override 
public void processMessage(String message, String requestId) { 
    var logger = Workflow.getLogger(IvrWorkflowImpl.class); 
    logger.info("[IvrWorkflowImpl] - processMessage | Received message for requestId: " + 
requestId); 
    pendingRequests.add(new MessageRequest(message, requestId)); 
}

This method simply adds the message to a queue of pending requests, which will be processed asynchronously by IvrWorkflowImpl.

This class contains the main implementation of our Workflow. Let’s highlight three methods:

runSessionLoop → defines the “magic” that keeps the workflow running indefinitely (it only ends if the user explicitly says goodbye or if a configured inactivity timeout is reached).
processPendingRequests → iterates through the queue of pending messages waiting to be processed.
processMessageInternal → defines the actual workflow logic, i.e., the states we will transition through.

Everything else is mostly boilerplate: the implementation of tools (consuming a REST API) or the activities, which are generally straightforward. For example, the Activity that determines which scenario we are in: ScenarioDispatcher/getScenario.

@ApplicationScoped 
public class ScenarioDispatcherImpl implements ScenarioDispatcher { 

    private static final Logger LOG = Logger.getLogger(ScenarioDispatcherImpl.class); 

    @Inject 
    ScenarioDispatcherPrompt promt; 

    @Override 
    public String getScenario(String request, String conversationHistory) { 
        LOG.info("[ScenarioDispatcherImpl] - getScenario | Determining scenario for request"); 
        try { 
            String scenario = promt.getScenario(request, conversationHistory); 
            LOG.info("[ScenarioDispatcherImpl] - getScenario | Scenario determined: " + 
scenario); 
            return scenario; 
        } catch (ToolException e) { 
            LOG.error("[ScenarioDispatcherImpl] - getScenario | ToolException occurred: " + 
e.getMessage()); 
            throw ApplicationFailure.newFailure(e.getMessage(), e.getErrorType()); 
        } catch (Exception e) { 
            LOG.error("[ScenarioDispatcherImpl] - getScenario | Error retrieving scenario for 
request, error: " + e.getMessage()); 
            throw ApplicationFailure.newFailure("Error retrieving scenario for request: " + 
request + ". Cause: " + e.getMessage(), "GetScenarioFailure"); 
        } 
    } 
} 

@RegisterAiService 
@ApplicationScoped 
public interface ScenarioDispatcherPrompt { 
    @SystemMessage(""" 
        You are a scenario classifier for the MasOrange chatbot system. 

        Your task is to analyze the message and return exactly one of the following scenario 
labels: 
        CONSENTS 
        USER_DETAILS 
        PRODUCT 
        STOCK 
        SUBSCRIPTIONS 
        FAREWELL 
        UNKNOWN 

        Definitions of scenarios: 
        CONSENTS          : Questions about consent, agreements, permissions, or privacy 
preferences 
        USER_DETAILS      : Questions about user name, email, account details, address, or 
profile 
        PRODUCT           : Questions about mobiles (brands like Nokia, iPhone, Samsung, 
etc.), models, specs or pricing 
        STOCK             : Inquiries about product availability or units in stock 
        SUBSCRIPTIONS     : Inquiries about plans, billing, subscription status, activation, 
cancellation 
        FAREWELL          : User is saying goodbye, ending conversation, or expressing thanks 
and satisfaction (like "adiós", "gracias", "hasta luego", "bye", "that's all", "nothing else") 
        UNKNOWN           : Message does not match any of the scenarios above 

        Rules: 
         - Respond with exactly one of the scenario labels above 
         - Do not explain your answer 
         - Do not call any tools 
         - Do not greet the user or add any extra text 
         - Use plain text only; do not use markdown or formatting syntax 
        """) 
    @UserMessage(""" 
        Conversation History: {{conversationHistory}} 

        Current Message: {{message}} 
        """) 
    String getScenario(String message, String conversationHistory); 
}

This is essentially a prompt that doesn’t need any tools — it only has to infer the scenario based on the user’s request. If no valid scenario is found, the request will be handled by the Helper (another state/Activity).

For instance, let’s look at how consents is implemented:

@ApplicationScoped 
public class ConsentsSupportImpl implements ConsentsSupport { 

    private static final Logger LOG = Logger.getLogger(ConsentsSupportImpl.class); 

    @Inject 
    ConsentsSupportPrompt promt; 

    @Override 
    public String retrieveConsents(boolean hasGreeted, String userId, String userName, String 
message, String conversationHistory) { 
        LOG.info("[ConsentsSupportImpl] - retrieveConsents | Processing consents request for 
userId: " + userId + ", userName: " + userName + ", hasGreeted: " + hasGreeted); 
        try { 
            String response = promt.retrieveConsents(userId, userName, hasGreeted, message, 
conversationHistory); 
            LOG.info("[ConsentsSupportImpl] - retrieveConsents | Successfully retrieved 
consents response for userId: " + userId); 
            return response; 
        } catch (ToolException e) { 
            LOG.error("[ConsentsSupportImpl] - retrieveConsents | ToolException occurred: " + 
e.getMessage()); 
            throw ApplicationFailure.newFailure(e.getMessage(), e.getErrorType()); 
        } catch (Exception e) { 
            LOG.error("[ConsentsSupportImpl] - retrieveConsents | Error retrieving consents 
for userId: " + userId + ", userName: " + userName + ", error: " + e.getMessage()); 
            throw ApplicationFailure.newFailure("Error retrieving consents for userId: " + 
userId + ", userName: " + userName, "RetrieveConsentsFailure", e); 
        } 
    } 
} 

@RegisterAiService(tools = {ConsentsTools.class}) 
@ApplicationScoped 
public interface ConsentsSupportPrompt { 

    @SystemMessage(""" 
    You are a customer support agent for MasOrange telecommunications company. Your job is to 
help users with their consents, based on their requests. 

    You have access to the following tools: 

    - Use `get-accepted-consents-for-a-customer` if the user asks about consents they have 
already accepted. 
    - Use `get-pending-consents-for-a-customer` if the user asks about consents they still 
need to accept or are awaiting action. 
    - Use `get-rejected-consents-for-a-customer` if the user asks about consents they have 
declined. 

Guidelines: 
    - Always respond in the same language the user used, if any data is in a different 
language translate the response. 
    - Always choose the most appropriate tool based on the user's question. 
    - Reply to the user by his/her User Name only if greeted is false 
    - Never ask the user to choose or confirm the tool. 
    - Do not explain the tool usage, just provide the answer. 
    - Always respond in the same language the customer used. 
    - Be polite, helpful, and clear. 
    - Never ask for credentials or passwords. 
    - Do not use markdown syntax, better plain text.  

If the question is unclear or not related to consents, politely ask the user to clarify 
their request. 
""")

    @UserMessage(""" 
    Conversation History: {{conversationHistory}} 

    Customer ID: {{customerId}} 
    User Name: {{userName}} 
    greeted: {{greeted}} 

    Current Message: {{message}} 
    """) 
    String retrieveConsents(String customerId, String userName, boolean greeted, String 
message, String conversationHistory); 
}

Here we can see that this Activity does use a tool (implemented in com.inorganic.tools.ConsentsTools.java). This tool exposes several services, and the prompt decides which one to invoke based on the request context.

All these details belong to the implementation layer, which in practice will depend partly on the library or API you are using to interact with LLMs.

Conclusions#

Well, if you’ve made it this far — congratulations! I hope you found this article interesting. Together, we’ve explored how conversational chatbots can be implemented using Temporal Workflows. This approach provides scalability, resilience, and the ability to maintain full conversational context — something traditional architectures struggle to achieve.

The reference implementation, built in Java (with Quarkus) and available on GitHub, demonstrates the maturity of Temporal’s/Quarkus Java API for production-ready chatbot systems. This approach not only meets the demanding requirements of modern customer service chatbots but also sets a new standard for building reliable, scalable conversational AI.

You can learn more and dive into the code on our GitHub.