From Prompts to Durable Workflow Engines

Livia
February 6 2026 6 min read
Blog_Post_-_From_Prompts_to_Durable_Workflows_optimized_1500

How Claude Code’s Tasks Turn LLM Agents into a Real Orchestration Layer

Agentic coding tools have made rapid progress over the last year, but most have remained constrained by a fundamental limitation: state lives inside the prompt. Once the session ends, the plan disappears. Once the context window fills, earlier decisions degrade. Long-running or multi-agent work quickly collapses under its own entropy.

With the introduction of Tasks in Claude Code v2.1.16+, Anthropic has taken a deliberate step away from chat-centric agents and toward something closer to workflow engines. This update is not cosmetic. It redefines how planning, execution, and coordination are handled in agent-driven software development.

This article explores the architectural shift behind Tasks, shows how durable task graphs unlock long-horizon and parallel agent work, and walks through concrete implementation patterns you can use today in local development and CI pipelines.

The Core Shift: State Moves Out of the Context Window

Earlier versions of Claude Code relied on transient “to-dos” that lived inside the active session. This made the agent effective for short, linear work but fragile for anything resembling real software delivery. Session restarts, context compaction, or long dependency chains all resulted in partial or lost plans.

Tasks change this by externalizing state.

Instead of being held implicitly in the model’s reasoning trace, tasks are serialized to disk under a task list directory ~/.claude/tasks/<id> as a structured JSON. The LLM can forget its conversational history entirely and still rehydrate the plan from the filesystem. This mirrors a pattern familiar from build systems and distributed schedulers: planning state must be durable and inspectable.

Here is what it looks like on a disk:

{id:1, subject: “Refactor the auth middleware, description: “…”,status: “in_progress”,owner,”claude-security-guy”,blocked-by: [], blocks: [2,3]}

Conceptually, Claude Code now separates:

  • Reasoning context (bounded, ephemeral, compressible)
  • Execution state (durable, shared, recoverable)

That separation is what enables everything else that follows.

Tasks as a Dependency Graph

The second critical design decision is that Tasks are not flat items. They form a Directed Acyclic Graph (DAG).

Each task can declare what is depends on via blockedBy what it unblocks via blocks .Execution eligibility is derived mechanically:

  • A task is runnable only when all upstream dependencies are complete.
  • Completion propagates downstream, unblocking the graph.
  • Status transitions are explicit rather than inferred from prose.

This matters because dependency management is where LLM agents most often fail. In a pure prompt-based system, the model must remember which steps are finished and reason correctly about ordering. A DAG lets the system enforce ordering externally, reducing hallucinated progress and premature execution.

From an architectural perspective, Claude Code is no longer just generating plans. It is executing against a constraint system.

Multi-Session Coordination Through Shared Task State

Tasks become significantly more powerful when combined with a shared task list identifier. By setting a single environment variable, multiple Claude Code sessions can point at the same task graph on disk.

This enables a pattern that looks less like “chatting with an assistant” and more like agent orchestration:

  • One session focuses on implementation
  • Another handles tests or migrations
  • A third reviews diffs or validates assumptions

All of them observe and update the same underlying task state. Coordination happens through the graph, not through conversational turn-taking.

This is an important distinction. Most multi-agent demos rely on agents talking to each other. Claude Code’s Tasks allow agents to coordinate by reading and writing shared state, which is how real distributed systems scale.

Why This Enables Long-Horizon Work

LLMs struggle with long-horizon tasks for two reasons:

  1. Context windows are finite
  2. Implicit plans decay over time

By moving the plan into a durable, explicit structure, Claude Code avoids both failure modes. Context can be compacted or reset without losing the roadmap. Execution can pause and resume hours or days later. The agent no longer has to remember what it decided; it can re-read it.

This is the same principle that allows long-running workflows in CI systems, data pipelines, and job schedulers. You do not keep the entire plan in memory. You persist and advance it step by step.

Practical Workflow: Shared Task Graph Across Sessions

A minimal setup for coordinated work using workflow engines looks like this.

First, you start a claude code session, and then ask it to break down a project, and it creates a task list 

$ export CLAUDE_CODE_TASK_LIST_ID=my-project

$ claude

>Break this feature into tasks and start doing stuff ….. Idk 🙂

Tasks

   ✅ #1 Setup the database schema – done

   🔵 #2 Implement API endpoints – (in progress)

⚪ #3 Write unit tests – blocked by #2

  ⚪ #4 Update the FE – blocked by #2

Any Claude Code session started afterwards with the same identifier will observe the same task graph. One session can decompose work into tasks and begin execution. Another can join later and pick up unblocked tasks without needing conversational handoff.

$ export CLAUDE_CODE_TASK_LIST_ID=my-project

$ claude

> Pick up any unblocked tasks.

 Reading the task list .., sees tasks #1 as complete

 Task #2 in progress (but it is owned by another session).

 No unblocked tasks available. Waiting for #2 to complete

(When session finishes task #2, the second session sees #3 and #4 unblock and can claim one immediately. The task graph, not the chat history, becomes the source of truth.

Headless Mode and CI Integration

Claude Code’s Tasks are not limited to interactive use. Headless mode allows you to invoke Claude non-interactively, which is where a durable task state becomes especially valuable.

A simple CI invocation might look like this:

export CLAUDE_CODE_TASK_LIST_ID=ci-$(git rev-parse –short HEAD)

claude -p “Run the test suite, summarize failures, and propose minimal fixes. If all tests pass, report green.”

If you split analysis, patching, and re-testing into separate CI steps, all of them can reuse the same task list. The plan persists across invocations, which is something prompt-only agents cannot do reliably.

At this point, Claude Code starts to resemble a lightweight orchestration substrate rather than a command-line chatbot.

Observability: Inspecting Task State on Disk

Because tasks live on disk, you can inspect them with ordinary tooling. While the exact file schema is an implementation detail, the presence of structured task artifacts enables basic observability without special APIs.

For example, listing task files and grepping for status fields is often enough to understand what is blocked and what is runnable. This opens the door to dashboards, metrics, or external schedulers layered on top of Claude Code without modifying it.

The key insight is that state is no longer opaque.

Parallelization Through Ready-Task Selection

Once tasks are represented as a DAG, parallelism becomes straightforward. Any task with no unmet dependencies can be executed independently. This allows you to fan out work across multiple agents or sessions without coordination through dialogue.

A small local dispatcher can:

  • Scan task files
  • Identify tasks whose dependencies are satisfied
  • Assign them to worker sessions

This pattern mirrors job queues and build systems. Claude Code supplies the durable graph. You supply the scheduling policy.

Architectural Reframing: Claude Code as an Orchestration Layer

Taken together, these changes suggest a reframing.

Claude Code is no longer best understood as:

“An LLM that helps you write code”

It is better understood as:

“Workflow engines where an LLM handles planning and execution within explicit constraints”

The Tasks system introduces three properties that traditional chat-based agents lack:

  • Durable state
  • Explicit dependencies
  • Shared coordination primitives

Those are the same properties that underpin successful distributed systems. The LLM becomes one component in a larger execution model, not the sole keeper of truth.

What’s Still Missing

There are open challenges around concurrent writes, conflict resolution, and richer observability. File-based synchronization has limits, and more advanced setups will want transactional guarantees or event streams. There is also room for clearer separation between planner agents and executor agents to reduce context contamination over long runs.

But the direction is clear. Tasks are not a feature add-on. They are a structural commitment to agentic workflow engines that survive time, scale, and complexity.

In Closing, The New Workflow Engines

Most agent tooling today optimizes for impressive demos. Claude Code’s Tasks optimize for something less flashy but more important: reliability over time.

By externalizing state, enforcing dependency graphs, and enabling shared coordination, this update moves LLM agents closer to how real software is actually built, across many steps, many actors, and many opportunities for failure – real workflow engines.