The Agentic Coding Workflow: How to Structure Development When AI Does the Work
Agentic coding is not vibe coding with a longer leash. It is a specific way of structuring development work so that AI agents can execute reliably within defined boundaries. Here is what it actually looks like.
Agentic coding is the practice of directing AI agents to complete bounded, well-defined development tasks with minimal interruption, within a system designed to support that kind of autonomous execution. It is distinct from AI-assisted coding, where a human writes code with AI suggestions. It is also distinct from vibe coding, where a human accepts AI output without systematic review.
The distinction matters because agentic coding requires different infrastructure, different task design, and different verification than either of the other two modes. Teams that try to run agentic workflows on infrastructure designed for human coding, or with task specifications designed for human developers, consistently report poor results. The model is not the problem. The setup is.
This post describes what the agentic coding workflow actually looks like: how to structure tasks, what infrastructure is required, how verification works, and what the failure modes are when you skip the foundations.
What Makes a Task Suitable for Agentic Execution
Not all development work is suitable for agentic coding. The first skill in an agentic workflow is task scoping: identifying which tasks an agent can execute reliably and designing those tasks correctly.
A task is suitable for agentic execution when it has four properties.
Clear acceptance criteria that can be verified programmatically. An agent working on a task needs to know when it is done. "Implement the user authentication flow" is not a well-defined task for an agent. "Add JWT validation to the request handler in services/auth/handler.ts using the pattern in services/payments/handler.ts, with tests covering valid token, expired token, and malformed token cases that pass in CI" is a well-defined task.
The acceptance criteria need to be verifiable without human judgement at the completion stage. Tests that pass or fail, lint checks that succeed or fail, type checking that passes: these are programmatically verifiable. "The code looks clean and fits the style" is not.
Bounded scope that does not require cross-system coordination. Agents perform best on tasks contained within a clearly defined scope: a single service, a specific module, a bounded set of files. Tasks that require coordinating changes across many systems, making architectural decisions that affect multiple teams, or navigating dependencies that are not documented in the codebase are more likely to produce compounding errors.
This does not mean agents cannot work on complex tasks. It means complex tasks should be decomposed before being given to an agent. The decomposition is human work. The execution is agent work.
A codebase with sufficient context infrastructure. An agent working in a codebase with a maintained CLAUDE.md, relevant Skills, and clear module structure will produce consistent output. An agent working in a codebase with no context documentation will produce output that reflects whatever patterns it can infer from the files it can see locally. In a large or inconsistent codebase, those inferences are often wrong.
The context infrastructure requirement is not optional. It is the foundation that makes the rest of the workflow reliable.
A test suite the agent can run and interpret. The agent's primary verification mechanism is tests. If the tests run reliably, have clear failure messages, and cover the relevant behaviour, the agent can verify its own work. If the tests are flaky, slow, or require manual setup to run, the agent cannot use them effectively and the workflow degrades to the agent guessing whether its changes are correct.
Structuring the Workflow
An agentic coding workflow has three phases: specification, execution, and verification. The time distribution is counterintuitive: specification takes longer than most developers expect, execution is fast, and verification is thorough.
Specification. Before the agent starts, the task is written with enough precision that the agent can execute it without ambiguous decisions. This means: specific files and functions to modify, the pattern to follow, the tests that need to pass, the boundaries of the change, and any explicit constraints (what not to touch, what not to create).
For teams starting with agentic workflows, the specification phase often reveals that their task tracking system is not detailed enough. Jira tickets written for human developers often contain implicit context that the human developer fills in from experience. An agent cannot fill in that context. Improving specification quality is a prerequisite for agentic coding, and it is also independently valuable: teams with more precise specifications report fewer misunderstandings and rework even when humans are doing the implementation.
Execution. The agent executes the task. In Claude Code, this means opening a session with the right context (CLAUDE.md loaded, relevant Skills active), providing the specification, entering Plan Mode to confirm the approach, and then running with Auto Accept once the direction is confirmed.
The developer is not entirely absent during execution. They are monitoring: is the agent's approach matching the specification? Are there unexpected decisions being made? Are tests passing incrementally or is the agent accumulating failures? Light supervision during execution is not a sign that the workflow is not working. It is appropriate oversight of an autonomous system.
Verification. When the agent completes the task and opens a PR, the verification focuses on three things: do the tests pass (automated), does the implementation fit the system (senior review of the interface and integration points), and does the change meet the acceptance criteria stated in the specification (the developer who wrote the spec confirms).
The verification is not a full line-by-line review of the implementation. It is a boundary-level check. The agent wrote the implementation. The human confirms that the implementation does what was specified and fits the system it is going into. This is a different skill from traditional code review and takes less time for bounded, well-specified tasks.
The Infrastructure Requirements
Three infrastructure elements are non-negotiable for a reliable agentic workflow.
A maintained CLAUDE.md with accurate system context. This is the foundation. An agent in a codebase with a maintained CLAUDE.md will navigate the system accurately and produce output that fits the architecture. An agent in a codebase without one will produce output that is locally plausible but may violate conventions or constraints that are not visible from the local context.
A Hooks layer for hard constraints. Before running agents in production codebases, define the hard constraints using Hooks. Files that agents should not modify. Commands that should not run in automated context. Operations that require logging before execution. Hooks enforce these constraints deterministically, which is what makes it responsible to run agents without constant supervision.
A fast, reliable test suite. Agents iterate on test results. A test suite that takes forty-five minutes to run means the agent is making changes with thirty-minute feedback cycles. A test suite with flaky tests means the agent cannot trust its own verification. Both conditions degrade the workflow significantly. Investing in test speed and reliability before running agents at scale is the right sequence.
The Failure Modes to Know
Scope creep. The most common failure mode: the agent interprets the task more broadly than intended and makes changes outside the defined scope. The fix is more precise task specification and, if the pattern is recurring, a Hook that blocks file modifications outside a defined path set for specific task types.
Compounding errors in long sessions. If an agent works on a large task in a single long session, early errors can propagate into later steps in ways that are not obvious. The fix is frequent commits at checkpoints and breaking large tasks into a sequence of smaller, committable units. Each commit is a recovery point.
Confident wrongness. Agents produce confident output when they are wrong as well as when they are right. The verification step is designed to catch this. Skipping verification because the agent seems confident is the most dangerous mistake in an agentic workflow. The confidence is a property of the model, not a measure of correctness.
Context saturation. In a large codebase, the agent may saturate its context window with irrelevant files before it has read the relevant ones. The fix is explicit scoping: tell the agent which files are relevant before it starts searching, use .claudeignore to exclude irrelevant directories, and run the agent from the module directory rather than the repository root when possible.
Why This Is Worth the Setup Cost
The productivity argument for agentic coding is not primarily about speed. Individual developer speed has been improving with AI tools since 2023. The argument is about what developers do with their time when execution is delegated.
A developer who specifies tasks precisely, monitors agent execution lightly, and verifies outputs at boundaries is doing the work that compounds: improving context infrastructure, increasing specification quality, strengthening test coverage, building the systems that make future agentic execution more reliable. The more that work is done, the more reliably agents can execute, the more the developer's time shifts toward the judgment work that matters most.
This is a flywheel, not a shortcut. The teams that are furthest ahead with agentic workflows invested significantly upfront in context infrastructure and test quality. That investment paid back in agent reliability, which freed up developer time for more infrastructure investment. The compounding is real but it requires starting with the foundations rather than trying to shortcut to the outcomes.
I help engineering teams close the gap between "we use AI tools" and "AI actually changed how we deliver." Book a 20-minute call and I'll tell you where the leverage is.
Working on something similar?
I work with founders and engineering leaders who want to close the gap between what their technology can do and what it's actually delivering.