Diving into the AI iceberg: What lies beneath your AI agents

This post is a collaboration between Temporal, Neo4j, Auth0, and Redpanda.

When you interact with AI agents, you're only seeing the tip of the iceberg — the chat interface, the responses, maybe a generated image. But beneath the surface lies a massive infrastructure stack that makes it all actually work: orchestration, data persistence, security, real-time streaming. The stuff users never see, but the applications they use cannot survive without. This is how four teams came together to reveal what's beneath the AI iceberg.

How we got here#

During his time at Temporal, my colleague Steve Androulakis developed a Deep Research Agent — an AI-powered research assistant designed for durable execution in long-running, multi-agent workflows. The agent was fully functional end-to-end: ask it a question and it will clarify your intent, search the web, and generate a comprehensive report. But we realized something: this demo was a perfect canvas to show what else goes into making AI agents production-ready. Orchestration is essential, but so is data persistence, security, and observability. And Neo4j, Auth0, and Redpanda were the teams and experts for each of those areas. So we got together and asked: what if you integrated your tools into this existing agent to show the full stack of what's beneath the surface? The result is a full Deep Research Agent that demonstrates the complete "AI Iceberg":

Takes a research topic from the user
Asks clarifying questions to refine the scope (human-in-the-loop)
Kicks off parallel web search across multiple sources
Neo4j persistently captures conversations, workflow history, and memory for a result cache and resumable sessions
Auth0 protects the UI, tool calls, and agent access tokens across the stack
Streams Workflow events through Redpanda for analytics, monitoring, and downstream automation

The whole process takes about 2–3 minutes and involves multiple agents working together: a Triage Agent, Clarifying Agent, Planner Agent, Search Agent, Writer Agent, and PDF Generator Agent. It's complex, it's long-running, and it's exactly the kind of workflow that needs solid infrastructure beneath it. In this post, we'll walk through each layer of the stack and show you how it all fits together.

Temporal: Making the agent durable#

Author: Melissa Herrera, Senior Developer Advocate at Temporal

Temporal is the backbone of the entire demo. It orchestrates the multi-agent research Workflow, ensuring that every step executes reliably even when things go wrong. Here's what Temporal handles in this demo:

Multi-agent coordination: The research pipeline involves seven different agents, each with a specific job. Temporal orchestrates the flow from Triage → Clarifying → Planner → Image Generator → Web Search → Writer → PDF Generator, managing the handoffs and ensuring each agent gets the input it needs.
Human-in-the-loop: When the Clarifying Agent generates follow-up questions, the Workflow pauses and waits for the user to respond. Temporal is designed exactly for these use cases. The Workflow can wait for minutes, hours, or days, and pick up right where it left off when the user responds.
Parallel execution: When it's time to search, Temporal kicks off multiple Search Agents in parallel, each querying different sources. This is handled through Activities, which are retryable and recoverable — Temporal tracks and manages all of it.
Automatic retries: LLM calls fail, web searches time out, API rate limits get hit. Temporal automatically retries failed Activities with configurable backoff, so transient failures don't derail the entire research session.
Durable execution: This is Temporal's forte. The research Workflow takes 2–3 minutes and involves managing dozens of API calls. If your agent crashes halfway through — say, after completing 15 web searches but before generating the report — Temporal picks up exactly where it left off. No repeated searches, no wasted API calls, no lost progress, no extra tokens spent.

Here's a simplified view of what the Workflow looks like:

@workflow.defn
class InteractiveResearchWorkflow:
    @workflow.run
    async def run(self, request: ResearchRequest) -> ResearchResult:
        # Triage: determine if clarifications are needed
        triage_result = await workflow.execute_activity(
            triage_agent,
            request.query,
            start_to_close_timeout=timedelta(minutes=2),
        )

        # Human-in-the-loop: wait for clarification answers
        if triage_result.needs_clarification:
            await workflow.wait_condition(lambda: self.clarifications_complete)

        # Parallel searches
        search_results = await asyncio.gather(*[
            workflow.execute_activity(
                search_agent,
                search_query,
                start_to_close_timeout=timedelta(minutes=2),
            )
            for search_query in search_plan.queries
        ])

        # Generate report and artifacts
        report = await workflow.execute_activity(writer_agent, search_results)
        pdf = await workflow.execute_activity(pdf_generator, report)

        return ResearchResult(report=report, pdf=pdf)

The real code is more involved, but this captures the essence: Temporal lets you write complex, long-running Workflows in plain code while handling all the reliability concerns behind the scenes. In the Temporal UI, you can see exactly what's happening beneath the surface — every agent invocation, every search Activity running in parallel, the full event history. When the Workflow completes, you have a complete picture of how your research was conducted. The value: Without Temporal, a 2–3 minute multi-agent Workflow would be fragile. With Temporal, it's resilient, observable, and production-ready. Now, with the Workflows orchestrated, we need somewhere to store conversation history and enable users to pick up where they left off. That's where Neo4j comes in.

Neo4j: The knowledge graph and memory#

Author: Jeremy Adams, Senior Developer Advocate at Neo4j

The first time I ran the deep research agent demo, I realized that each session embodied a sizable investment of human time — the back and forth of the interview, plus a non-zero cost in tokens to do all the research, synthesize an article, and generate a hero image. What if someone asked the same question twice (or something very semantically similar) and all of that effort was spent again? If multiple users were on a system like this, there should be positive network effects where the effort of one person benefits others — we needed a shared memory and a retrieval strategy to go with it. We couldn't have a black box memory though. What if the system wasn't working well? We'd never know why. I wanted to capture the research results, traces of how we got there, and a semantic similarity shortcut (vector cosine similarity) for retrieval. That way we got the instant benefit of a "memory cache" with human explainability, plus headroom for making the memory more sophisticated — for example, jumping to a node via vector similarity and then traversing the graph, or short-circuiting an interview sooner once you see where things are going with enough confidence. A knowledge graph with vector embeddings was the perfect structure for this kind of memory, and luckily Neo4j supports all of that. My implementation was straightforward — it used the Neo4j Python driver and some Cypher queries — but today I'd use the increasingly excellent neo4j-agent-memory package or the mcp-neo4j-memory MCP server based on it. A side benefit of having all of the graph traces keep references to the durable Temporal Workflows was that the frontend UI could use the graph to get back on track after an inadvertent page reload or browser crash. With the data persisted and Workflows running, we need to secure access to the application and the tools the agents use.

Auth0: The security layer#

Author: Fred Patton, Senior Developer Advocate at Auth0

When you're building AI agents that access external tools, call APIs, and handle user data, security can't be an afterthought. The deep research agent makes authenticated requests, accesses external services, and maintains user sessions across long-running Workflows. All of that needs to be locked down. Auth0 secures three key layers in this demo: User authentication: Before a user can submit a research query, they authenticate through Auth0. This establishes a secure session and ensures that only authorized users can access the research agent and their conversation history. API and tool protection: The research agent calls external APIs — web searches, image generation, LLM providers. Auth0 manages the access tokens for these tool calls, ensuring that credentials are handled securely and requests are properly authorized. Session continuity: Because these Workflows can run for several minutes (and users can return to previous sessions), Auth0 maintains a secure session state that integrates with Neo4j's conversation memory. When a user comes back to resume a research session, Auth0 verifies their identity before granting access to their previous work.

Authentication: locking down the UI and every API call#

The app uses auth0-fastapi to configure a standard OAuth2 flow. On first visit, the user is redirected to Auth0's hosted login page; on return, an encrypted server-side session cookie is issued. Every protected endpoint declares a single FastAPI dependency — no token-parsing boilerplate repeated across handlers:

auth0_config = Auth0Config(
    domain=AUTH0_DOMAIN,
    clientId=AUTH0_CLIENT_ID,
    clientSecret=AUTH0_CLIENT_SECRET,
    authorization_params={
        "scope": "openid profile email offline_access",
        "audience": f"https://{AUTH0_DOMAIN}/me/",
    },
    mount_connected_account_routes=True,
    appBaseUrl=BASE_URL,
    secret=APP_SECRET_KEY,
)
auth0_client = AuthClient(auth0_config)

@app.post("/api/start-research")
async def start_research(
    request: StartResearchRequest,
    fastapi_request: Request,
    response: Response,
    _auth_session=Depends(auth0_client.require_session),  # one line, all endpoints
):
    ...

Every route is covered: start research, submit answers, fetch results, list conversations, personalization. require_session validates the encrypted cookie, transparently refreshes the access token when it expires (the offline_access scope provides the refresh token), and redirects to login if the session is absent.

Auth0 for AI: Token Vault and delegated tool access#

This is where Auth0 goes beyond standard login. The research agent can import a user's Google Doc to personalize its queries, but the backend should never store the user's Google credentials. Auth0's Token Vault solves this: it holds the OAuth token for the user's connected Google account, and the backend requests it on demand for a single operation, then discards it. The connection is established once via a "Connect Google" link that triggers Auth0's federated identity flow:

<a href="/auth/connect?connection=google-oauth2">Connect Google</a>

When the user later shares a Google Doc, the backend asks Token Vault for a live access token scoped to read-only Drive access, uses it to fetch the document, and never writes it to disk or a database:

from auth0_ai_langchain.token_vault import get_access_token_from_token_vault

# Request a short-lived Google token from Token Vault
google_access_token = await auth0_client.client.get_access_token_for_connection(
    {"connection": "google-oauth2"},
    store_options={"request": request, "response": response},
)

# Use it immediately; Auth0 handles refresh if it has expired
doc_text = await export_google_doc_text(google_access_token, drive_file_id)
# token is never stored — it goes out of scope here

Auth0 manages token refresh automatically. If the Google token has expired, Token Vault refreshes it transparently before returning it. If the user hasn't connected Google at all, AccessTokenForConnectionError is raised and the endpoint returns a 403, at which point the frontend redirects the user to complete the connection.

Identity-keyed personalization#

Once a user is authenticated, their Auth0 sub claim (a stable unique identifier regardless of login method) keys all personalization state in Neo4j:

user = await auth0_client.client.get_user(store_options={"request": request, "response": response})
user_id = user.get("sub")  # e.g. "google-oauth2|107..."

# Load this user's topic preferences and research interests
personalization = get_personalization_state(user_id)

This means a user can log in with Google on any device and always get their personalized context. Neo4j and Auth0 together make the agent feel like it knows the user. The value: Enterprise-grade authentication and delegated service access, without building any of it from scratch. The Token Vault pattern in particular is the right answer to a question every AI agent builder eventually hits: how do I let my agent act on behalf of the user without holding their credentials? Okay, so we have orchestration, memory, and security in place. But how do we know what's happening inside the Workflow in real-time? That's where Redpanda comes in.

Redpanda: The real-time event stream#

Authors: Peter Corless (Principal Product Marketer) and Chandler Mayo (Developer Advocate Lead) at Redpanda Data

Multi-agent Workflows like the deep research agent create a fundamental observability challenge. When a request fans out across multiple agents running in parallel — all producing intermediate state — after-the-fact logging isn't enough. You need a real-time event stream that provides Workflow progress tracking. Redpanda captures Workflow events (clarifications, searches, report generation) in real time and streams them to external consumers for monitoring, analytics, or even custom UI integrations. With multi-agent Workflows, Redpanda delivers value at three levels:

Powers live UI updates so users aren't left staring at a blank screen during a 2–3 minute Workflow
Gives operations teams a single, durable, replayable record across all agents — making debugging and incident reconstruction tractable
Decouples downstream consumers (analytics, alerting, custom integrations) from the core agent logic so teams can extend the system without introducing fragility

Redpanda acts as the real-time event bus and central nervous system for the entire agentic system. Because every action is logged in an immutable stream, Redpanda provides a comprehensive audit trail that lets you reconstruct the exact sequence of events that led to any outcome or failure. "Agents" in the Redpanda ecosystem are collections of Redpanda Connect pipelines. Redpanda Connect is the integration and automation layer that provides the core components for building, securing, and managing agents. It reduces the need for external data orchestration tools by coordinating agent actions and data flows through event streams.

Bringing it all together#

Here's how all of the pieces connect in practice:

User logs in → Auth0 authenticates and secures the session
User submits a research query → Temporal kicks off the Workflow
Clarifying questions are generated → Workflow pauses, waiting for user input
User responds → Workflow continues, search plan is created
Parallel searches execute → Each search is a Temporal Activity, events stream to Redpanda
Report is generated → Results compiled by the Writer Agent
Conversation is saved → Neo4j stores the session for future retrieval
User can return anytime → Neo4j enables resumable conversations; Temporal ensures Workflow completion

Each layer does one thing well. Together, they create something production-ready.

Try it yourself#

We've open-sourced the entire demo. Clone the repo, spin it up locally, and explore what's beneath the AI iceberg. GitHub repository: github.com/temporal-community/ai-iceberg-demo Quick start:

Clone the repo
Copy .env-sample to .env and add your API keys
Start Temporal: temporal server start-dev
Start the worker: uv run openai_agents/run_worker.py
Launch the UI: uv run ui/backend/main.py
Open http://localhost:8234 and start researching

Resources:

What's beneath your AI agents?#

Building this demo reinforced something we all suspected: the most impressive AI agents aren't just about the model. They're about the infrastructure beneath — the orchestration, the memory, the security, the visibility. When these layers work together, you get AI applications that are resilient, secure, observable, and actually ready for production. When they're missing, you get demos that break the moment something unexpected happens. Next time you interact with an AI agent, remember: there's a whole iceberg beneath the surface. And now you know what's down there.

This post was a collaboration between Temporal, Neo4j, Auth0, and Redpanda. Special thanks to everyone who contributed to the demo and the "Dive Into the AI Iceberg" meetup that brought us together. Connect with us on Temporal Community Slack in #topic-ai.