What does it take to build a customer support experience your users won't hate? Ask Bitovi.

This blog is a guest post, written by Mark Repka, Software Engineer at Bitovi

There you are, hands hovering above the keyboard with an exasperated breath whooshing out of your lungs. You've been in what feels like a verbal chess match with a customer support chatbot for the last fifteen minutes. You're trying to resolve a duplicate charge on your account and, although you have two entire years worth of payment history, the chatbot has no idea who you are. It needs you to explain all your user context from the very beginning and you're about ready to bang your head on the keyboard.

That's the problem we set out to solve at Bitovi.

Compassion backs our work, so we always strive to put ourselves in the customer's shoes. It's why we get that better copy and a friendlier message won't cut it. An AI agent that actually knows your users, however. That'll get it done. It's why we created an AI agent that remembers users across sessions and handles their issues without dropping the thread the moment something breaks underneath.

We started with Temporal on day one#

One thing that's unique about our experience is that we didn't arrive at Temporal after a painful incident. Temporal was already running elsewhere in our department and the magical powers of platform engineering brought it to our team; so when it came time to build the agent, it was a familiar starting point.

The architecture we shipped was a ReAct agent loop built as a Temporal Workflow. The working context of the conversation lives in Workflow State. LLM inference via the AWS Bedrock Java SDK runs as Activities. That boundary is important: the nondeterministic part of the system is isolated inside an Activity, so the Workflow logic stays deterministic. Each conversation runs as a long-running entity Workflow with continue-as-new, so sessions can run indefinitely without hitting state limits. Tool executions are backed by Activities too, which gives us retry semantics and observability without extra work.

"The Reasoning and Acting Agent Loop was structured in a Temporal Workflow with the Working Context of the Agent stored in Workflow State," says Bitovi's Mark Repka, Developer Consultant. "Temporal was helpful because it was durable and scalable, super easy to iterate."

That combination matters a lot early in a project when you're still figuring out what the system actually needs to be.

The problem retries can't solve#

While the first version of the agent worked, as we continued to work, we thought harder about the support experience we wanted to deliver. That's when we noticed a crucial gap. Customer support is more stateful than a single conversation because you need the full picture, which includes things like:

A player's history
Their preferences
The shape of their relationship with the product.

But the issue we faced is that none of the problems fit into one context window.

We had assumed shorter, more standalone conversations would work and didn't account for needing to track a more complete picture of each user's account over time and across sessions rather than just within them.

Something we had already learned, and that applied here too, is that retry logic is not the same as resilience. Wrapping an HTTP call in retries does not give you a system that can survive a process dying mid-execution and resume exactly where it left off, and it does not give you memory that persists across sessions.

We added memory without re-architecting#

Our fix was AWS Bedrock AgentCore Memory because it ships with strategies for semantic memory, user preference memory, and session summaries. We evaluated building our own Temporal-backed memory extraction workflow, but AgentCore already covered what we needed and it would have taken considerably longer to build and maintain one ourselves.

The integration plugged into our existing Workflow cleanly. User preference memories get injected into every prompt alongside known account details. Semantic memories and session summaries get retrieved based on what applies to the current conversation. Memory extraction runs as a Temporal Activity, so it retries on failure and the result is durable.

One timing thing worth knowing: AgentCore takes a couple of minutes to process and extract memories after a session ends. To cover that gap, we keep a small slice of raw, recent conversation in Workflow State as a buffer and this is a deliberate call rather than a workaround.

The best part is that none of this required rethinking our Workflow structure.

"Adding AgentCore Memory persistence as an Activity was such a clean drop-in," says Mark Repka, Developer Consultant. "The ease of work like this is vital because our engineers, like yours I'm sure, are busy. The right tooling makes their jobs much easier."

The foundation held, and we did not have to blow up what was already working.

How it actually works#

The diagram below shows the ReAct loop in action. Each turn, the agent goes through a Thought step (the LLM deciding what to do next) followed by an Action step that calls the appropriate tool. The tool response comes back as an Observation, which feeds into the next Thought. The available tools reach out to a structured database for ticket data, a vector database for related policies, S3 for scoring rubrics, and separate LLM calls for rubric and policy adherence scoring.

ReAct-loop-Bitovi-blog

What you can see in the Temporal UI is every one of those steps as a discrete, observable Activity. The event history below shows a completed workflow run, agentReActWorkflow, with 109 history events, 73 state transitions, and a total runtime of about 17 seconds. You can see the alternating pattern of ThoughtActivity, ActionActivity, and ObservationActivity across the timeline, with PersistActivity bookending the run.

Temporal-UI-event-history-bitovi

That level of visibility is something you're not going to get from a raw LLM call. Here, if something goes wrong mid-conversation you still know exactly where.

Here is what the implementation looks like. This is the actual Workflow code the Bitovi team shipped — a Reasoning and Acting agent built on Temporal that supports potentially infinite runtime through continue-as-new, with long-term memory management and retrieval handled as Activities.

public class ThoughtActivity {
    public static ThoughtResponse execute(String promptTemplate, List<ContextEntry> context,
            List<LabeledMemoryRecord> memoryRecords)
            throws ApplicationFailure {
        try {
            // Convert ContextEntry list to XML strings for LLM prompt
            List<String> contextStrings = context.stream()
                    .map(ContextEntry::toXMLString)
                    .collect(Collectors.toList());

            // Get current date
            String currentDate = LocalDate.now().toString();

            // Get available tools as XML string
            String availableActions = ToolRegistry.getToolsAsXmlString();

            // Convert LabeledMemoryRecord list to XML strings for LLM prompt
            List<String> memoryRecordStrings = memoryRecords.stream()
                    .map(LabeledMemoryRecord::toXMLString)
                    .collect(Collectors.toList());

            // Format prompt with placeholders
            String systemPrompt = promptTemplate
                    .replace("{currentDate}", currentDate)
                    .replace("{previousSteps}", String.join("\n", contextStrings))
                    .replace("{memoryRecords}", String.join("\n", memoryRecordStrings))
                    .replace("{availableActions}", availableActions);

            // Call Bedrock with high-quality model
            Config config = new Config();
            String modelId = config.getProperty("AWS_MODEL_ID");

            ModelResponseWithUsage response = BedrockConverse.bedrockConverseWithUsage(
                    systemPrompt,
                    null, // No tool config needed for thought
                    modelId);

            String responseText = response.response();
            if (responseText == null || responseText.isEmpty()) {
                throw ApplicationFailure.newFailure("Empty response from model", "EmptyModelResponse");
            }

            return ThoughtResponse.from(response);

        } catch (JSONException e) {
            String errorMsg = "Error parsing JSON response: " + e.getMessage();
            System.err.println(errorMsg);
            EventClient.emitEvent("error", "Thought error: " + errorMsg);
            throw ApplicationFailure.newFailure("Failed to parse model response: " + e.getMessage(),
                    "ThoughtActivityError");
        } catch (ApplicationFailure e) {
            String errorMsg = "Error in thoughtActivity: " + e.getMessage();
            System.err.println(errorMsg);
            EventClient.emitEvent("error", "Thought error: " + errorMsg);
            throw ApplicationFailure.newFailure("thoughtActivity failed: " + e.getMessage(),
                    "ThoughtActivityError");
        }
    }
}

This is one file of three. For the full implementation including the memory workflow and the activities layer, the complete code is on GitHub (and the revision history is worth a look).

How we handle a nondeterministic component reliably#

Running LLM inference in production means accepting that a core part of your system does not produce consistent outputs so we handled this with containment. Inference lives inside Activities, the Workflow sees a clean input and output, and what happens inside the Activity stays inside the Activity.

Structured output with JSON Schema validation adds another layer. If the model's response does not parse cleanly into the expected Java objects, the Activity retries. We also run AI classifiers and rule-based checks alongside the agent to catch actions it should not take, including guards against prompt injection. If something fails validation, we retry or hand off to a human agent.

The user gets an answer without even knowing a retry happened: mission accomplished!

What we would tell you#

If we were trying to pay our lessons learned forward (and we are), we'd tell you to think carefully about:

What needs to live in a single conversation
What needs to survive across sessions
What should be retrieved dynamically versus injected every time.

And, most importantly, to make those decisions intentionally. Do not wait until cost or context window pressure forces your hand.

And lastly, make sure to design for failures you haven't even had yet. Retry logic wraps a call and Durable Execution survives the process dying. For an agent that is expected to run indefinitely and actually remember its users, those aren't the same thing.

If any of the state management tradeoffs, the memory problem, or even the question of what actually belongs in a Workflow sounds familiar, we love to keep the conversation going and help you solve these issues. Check out some of our best work here and keep up with us on LinkedIn and Twitter.