The fallacy of the graph: Why your next agentic workflow should be code, not a diagram

I’m going to start with a bold statement: if you’re building complex, procedural logic, especially for the new wave of agentic applications, you should stop using graphs.

I say this after spending over two decades building orchestration solutions. I’ve seen the patterns, I’ve tried the different approaches, and I’ve watched the industry rediscover the same lessons over and over. As AI agents become more sophisticated, I see teams reaching for graph-based frameworks to orchestrate their logic — essentially trying to rediscover the lessons of workflow engines from the beginning. I’m writing this to help you skip a painful, expensive, and ultimately flawed phase of development.

First, let’s be clear what we’re building. An AI application is a workflow — a series of steps with branching, looping, and dependencies, that must execute reliably. Some, like a RAG pipeline, might follow a predictable path. Others, often called agents, are highly dynamic, using an LLM’s reasoning to decide the next steps at runtime. These agents behave like distributed systems where each tool is a remote hop, and a single network hiccup can cause an agent to lose context.

From an implementation standpoint, both require an engine that can handle this dynamic, data-driven execution. A static graph struggles with both, but it especially struggles with the dynamic nature of agents.

The core promise — and the core problem#

The core idea of a workflow engine or orchestrator is simple: it guarantees that your code will execute to completion, even in the presence of failures like process crashes or network timeouts. This is not usually a property of normal application code. For a long time, representing your logic as a graph or an abstract syntax tree (AST), and augmenting that with persistence operations, was the only way to get this guarantee.

But that is no longer the case. A newer, more powerful abstraction we call Durable Execution — which I initially introduced while building the AWS Simple Workflow Service — now allows you to write normal, procedural code with these same “crashless” guarantees. It provides incremental execution, state persistence, and fault tolerance — tracking your application’s progress so it can pick up right where it left off after a failure, like having “the ultimate autosave.”

The paradigm is robust enough that multiple modern frameworks are built on it, including Temporal, Restate, and DBOS. There is no practical need to force procedural code into an unnatural graph-based representation. This post breaks down the severe limitations of the graph-based approach and shows why plain code is not only sufficient but vastly superior.

What is a programming language, anyway?#

Whether it’s Python, a state machine, or a graph of nodes and edges, any system for expressing logic must provide a few key things:

Control flow: The logic for sequencing, branching, and looping (if/else, for, while).
State (memory): A way to store, access, and manipulate data during execution.
Error handling: A way to manage failures, timeouts, and exceptions. While a subset of control flow, it’s so critical and complex it deserves its own discussion.

Let’s examine how graph-based systems handle each of these and see where they fall short.

1. Control flow: The illusion of simplicity#

The most immediate problem with using a graph to represent control flow is that the flow itself almost always depends on data.

A conditional branch requires an if statement that evaluates an expression to true or false. A loop needs a for or while construct that iterates over a collection or checks a condition. This means that a graph is never enough on its own. You always have to mix the diagram with snippets of code or expressions that are evaluated separately. This creates a disconnect where the true logic is hidden away from the visual representation.

But the real breaking point for graphs is dynamic control flow.

Consider a common pattern in agentic applications: an LLM returns a list of tools to call based on a user’s prompt. The set of tools and their order isn’t known when you design the graph. It’s determined at runtime.

How do you represent this in a graph? A node that dynamically generates other nodes to execute? This concept of a mutable, dynamic graph is something that virtually no graph-based language supports well. Some graph frameworks try to paper over this with runtime-generated subgraphs, “router” nodes, or APIs that mutate the graph/state during execution. In practice, this still pushes real control logic into code blocks inside nodes and leaves the diagram as a thin wrapper around that code.

At this point, the graph is just linking a few big chunks of code together. So I ask: why do you need a graph to link pieces of code when you can just have code link pieces of code in a much cleaner way?

Look how simple this is with code:

def run_agentic_workflow(prompt: str) -> list[str]:
    """
    Dynamically executes tools recommended by an LLM.
    This logic is impossible to represent in a static graph.
    """
    # 1. First step is always to call the LLM
    recommended_tools = llm.get_tools_to_call(prompt)
    # recommended_tools could be ['search_api', 'calculate_results']

    results = []
    # 2. Dynamically iterate and execute the tools
    for tool_name in recommended_tools:
        if tool_name == "search_api":
            result = apis.search(query="some query")
            results.append(result)
        elif tool_name == "calculate_results":
            result = apis.calculate(data=results) # Uses data from a previous step
            results.append(result)
        # ... and so on for other tools

    return results

Durable Execution orchestrators let you use normal language constructs (conditionals, loops, exception handling) to schedule tasks dynamically — the natural fit for agent behavior.

2. Data management: A world of hurt#

Because the control flow logic is often separated from the graph structure, data management becomes a challenge. Graph-based systems typically fall back on two poor patterns:

A global key-value store: A single, untyped dictionary (or map) where every node reads and writes data. This is equivalent to using only global variables. It’s impossible to reason about data scope, and a simple typo in a key name ('user_id' vs 'userId') leads to a runtime error that static analysis can’t catch.
- For example, one node might push a value: state['query_result'] = call_vector_db(). A subsequent node must know the exact string 'query_result' to retrieve it. If the first node changes the key, or the second node misspells it, the workflow fails at runtime.
Node-to-node data passing: Data is explicitly passed along the edges of the graph. While slightly better, it often devolves into passing massive JSON blobs between nodes. To access nested data, you have to use string-based query languages like JSONPath ($.results[0].id).

This is fragile, hard to debug, and troubleshooting such a program is really hard. Some graph frameworks try to mitigate this by letting you attach schemas to state (for example, Pydantic models). That can catch misspelled keys at the boundaries and is an improvement. But in the end, it doesn’t solve the deeper issues: unclear scope, brittle string selectors across edges, weak refactoring support, and limited compile-time checks when control flow rewires which node produces which fields.

Compare these two styles:

1. Graph-based (YAML/JSON with JSONPath):

- id: step_2
  type: process_data
  # Fragile, untyped, easy to misspell
  input: data.user.credentials.token

2. Code-based (Python):

# Clean, strongly-typed, and your IDE can catch errors
process_data(user.credentials.token)

When your entire program relies on untyped, string-based expressions to access memory, you are setting yourself up for a world of runtime failures. Durable Execution abstracts away persistence of intermediate results — there’s no need for a global key-value store.

3. Error handling and compensations: Where graphs completely break down#

Error handling massively complicates control flow, and this is where graphs struggle most. Complex applications require compensations — running actions to undo steps that have already succeeded when a subsequent step fails.

This pattern, often called the saga pattern in distributed systems, is an inherently dynamic control-flow problem.

Imagine you have three conditional steps: A, B, and C.

Sometimes you run A, B, and C.
Sometimes you just run A.
Sometimes you run A and C.

If C fails, you need to compensate for whichever of A or B actually ran. The set of compensations is not static; it depends on the execution path. With just three functions, this is already complex to draw. With 100 nodes, many of which can run in parallel, a graph representing all possible compensation paths becomes an unmanageable monstrosity.

Some engines expose compensation handlers as first-class nodes or edges. They help for simple paths, but once compensations are data-dependent and path-dependent, authors are back to encoding the real logic in code anyway.

In code, compensations are straightforward.

def book_travel_workflow(details: TravelDetails):
    """
    Books a flight and hotel, ensuring compensation if any step fails.
    This dynamic compensation is nearly impossible to model in a graph.
    """
    compensations = []
    try:
        if details.needs_flight:
            flight_booking = book_flight(details.flight_info)
            # Add the inverse operation to our list
            compensations.append(lambda: cancel_flight(flight_booking.id))

        if details.needs_hotel:
            hotel_booking = book_hotel(details.hotel_info)
            # Add the inverse operation to our list
            compensations.append(lambda: cancel_hotel(hotel_booking.id))

        # ... more steps
        print("Workflow completed successfully!")

    except Exception as e:
        print(f"Workflow failed: {e}. Running compensations...")
        # Run all registered compensations in reverse order
        for compensate in reversed(compensations):
            compensate()

With Durable Execution, try/except/finally and idempotent activities give you natural, testable compensations.

Now, try to represent this in a graph. If the car rental fails (step C), you must run compensations for hotel (B) and flight (A). If the hotel fails (step B), you must only compensate for flight (A). You would need to draw a complex web of conditional “failure” edges from every node to every previous node’s compensation. While some workflow engines provide compensation constructs, they still require designers to draw explicit compensation paths. The real issue is that modeling dynamic error paths in a static DAG becomes unwieldy and error-prone. With code, this logic is about 20 lines.

4. Parallelism, reusability, and other woes#

The problems don’t stop there:

Parallelism: While fanning out to parallel branches is easy to draw, coordinating work between them is not. Coordination between parallel branches requires shared state, and such coordination in the graph world happens through data, which is usually just a bunch of key-value pairs. As we’ve seen, data management in graphs is already a weak point. When multiple parallel branches need to synchronize or share intermediate results to achieve a common goal, the graph notation provides no help — you’re back to managing global variables with all their associated problems.
Asynchronous events: How does your workflow handle a user sending a new chat message to cancel a long-running task? This becomes even more complex when events can come out of order and require special handling based on their data. This requires dynamic logic that can interrupt the current flow, something graphs are ill-equipped to model. The common solution (a dedicated “wait for event” node) is clumsy and only works for the simplest cases. Durable Execution supports waiting for external events and user input as native operations.
Reusability: Graph-based logic is almost impossible to reuse. Because a node often relies on a loosely defined, ambient state, you can’t just pick it up and use it in another workflow. In real programming, we have functions, classes, and libraries — units with explicit inputs/outputs. Graphs have no mature equivalent.

5. Code brings modern development practices for free#

The “infrastructure-as-code” movement won for a reason, and the same principles apply here. Graph-based systems force you to abandon the mature ecosystem of tools that makes software development efficient and reliable. When your workflow is code, you get the entire ecosystem of modern software engineering for free.

Source control: Use Git to track history, create branches, and collaborate. Reviewing a pull request for a YAML graph is painful; reviewing Python is a solved problem.
Testing: Write unit and integration tests for your logic using familiar frameworks like Pytest or JUnit. You can sometimes unit-test a node in a DAG, but it’s rarely ergonomic; teams end up building simulators or wrappers before they can test anything meaningful.
CI/CD: Integrate your workflow logic into existing automated build, test, and deployment pipelines.
Developer tooling: Leverage IDEs for autocompletion, static analysis, refactoring, and debugging — tools that barely exist for graph-based systems.
Observability: Durable Execution engines like Temporal integrate with distributed tracing (OpenTelemetry), enabling automatic generation of accurate execution traces for debugging.

When building AI agents, developers need to experiment rapidly with prompts, tool order, and control logic. Representing this in a static graph slows iteration, whereas code can be changed and tested quickly using standard dev tools.

The “But I can see it!” fallacy#

The single most cited benefit of a graph is that you can create a visual picture of it. But this is a siren’s song. The picture is a lie.

It doesn’t show the real control flow, which is hidden in data-dependent expressions.
It doesn’t show data manipulation, which happens through untyped global maps or brittle selectors.
It cannot represent dynamic steps or complex error handling, which is where the most critical logic often resides.

You pay a big price in complexity, verbosity, runtime errors, and an inability to implement non-trivial scenarios, all for a picture that isn’t even accurate.

And here’s the kicker: if you truly need a visual representation of your execution, code can produce a far better one. By instrumenting your code with standard tracing (like OpenTelemetry), you can generate a precise, hierarchical trace of what actually happened, not just a static ideal of what might happen.

Conclusion: Choose code#

The next generation of AI applications demands a better approach. Durable Execution doesn’t render graph-based orchestration entirely obsolete (there might be valid use cases when porting legacy systems), but it massively broadens the set of problems we can handle reliably. Match the tool to the job: static graphs for simple pipelines; durable code for dynamic, data-driven agents.

Don’t assume that because others are using graphs, it’s the right or only way to write durable applications. The industry doesn’t need to repeat the painful lessons we learned building workflow engines over the past two decades. A graph is one of the worst ways to represent procedural code. Its perceived simplicity is an illusion that shatters the moment you encounter the dynamic, data-driven, and error-prone reality of building sophisticated systems.

The next time someone tells you that you must use a graph to achieve durability and resiliency, tell them about Durable Execution. It’s a proven paradigm that lets you write your business logic in plain, testable, reusable code. Pick up a modern framework that supports it, and see how much simpler, cleaner, and more robust your system becomes.

You absolutely can implement a graph-based engine on top of a Durable Execution system. The system makes any code durable. But that is a clunky, unnecessary, and leaky abstraction you simply don’t need.