The AI Adoption J-Curve Repeats at Every Level

Most teams survive one productivity dip after adopting AI tools and call the transformation done. Velocity is up, engineers are energised, leadership announces success. The J-curve, they believe, is behind them.

It is not behind them. It has barely started.

The J-curve in AI adoption does not happen once. It happens at every level of AI maturity, each time harder, each time with better data pointing in the wrong direction. The teams that understand this build through each dip. The teams that do not plateau, mistake stability for success, and fall further behind with every quarter they stay stuck.

Here is what each curve looks like, and what separates the teams that push through from the teams that stop.

The First Dip: Tool Adoption Without Workflow Change

The first J-curve is the one most people know about. Engineers get access to Copilot, Claude, or Cursor. For the first few weeks, productivity dips. Developers are inserting a new step into an old workflow: generate suggestion, evaluate it, accept or rewrite. The overhead is real.

Some engineers push through and adapt. Others try the tools for a few weeks, find them disruptive, and quietly stop using them. The team reports mixed results. Leadership hears "it works for some people."

The mistake here is diagnosing the dip as a tool problem. It is a workflow problem. The engineers who adapted did not just use the tools more. They restructured how they work: starting from intent rather than blank files, reviewing and directing rather than writing from scratch. The ones who stopped tried to add the tools into an unchanged workflow and found the overhead not worth it.

Most teams exit this curve with uneven adoption: a cluster of enthusiastic users, the rest doing what they did before. Leadership counts the enthusiasts as a win. The team declares the AI rollout complete. They are at L2, and they think they have arrived.

The Second Dip: More Output, More Fragility

The second J-curve is the expensive one, because it arrives disguised as success.

L2 looks good. Velocity metrics are up. Engineers are productive. PRs are moving. This is often where leadership announces the AI transformation is complete.

What is actually happening: more code is entering a system that was not redesigned to handle more code.

The 2026 State of AI Benchmark from Cortex measured this pattern across engineering teams. 20 percent more pull requests per engineer. 23.5 percent more incidents per pull request. The velocity gain is real. The fragility gain is also real. They show up in different reports, on different timelines, attributed to different causes: growth, complexity, bad luck.

The correct attribution is simpler. An engineering system designed for humans writing at human speed is now running at AI-assisted speed. The review process has not adapted. The test suite has not been rebuilt for the volume of output agents are about to produce. Quality measurement is still velocity-focused. The system is absorbing more output without the infrastructure to maintain quality at that volume.

Teams that diagnose this correctly invest in what I call the four layers: context infrastructure (a CLAUDE.md that tells AI tools how the system is structured), an agent-ready test suite (fast, deterministic, with output an agent can actually read and act on), review standards that are written down rather than held in senior engineers' heads, and quality metrics that track incident rate and change failure rate alongside velocity.

Teams that diagnose it incorrectly switch models. They try a different AI tool, conclude that agents are "not there yet" for their codebase, and go back to managed adoption. They are not wrong that agents do not work yet. They are wrong about why. The agents do not fail because of AI limitations. They fail because the infrastructure the agents need is not there.

The move from L2 to L3 is not a tool change. It is an infrastructure investment. It takes three to six months of deliberate work. Most teams do not make it because L2 is comfortable and the pressure is always to ship, not to rebuild the system underneath shipping.

The Third Dip: Where Agents Actually Live

Teams that reach L3 have the foundation: context in the codebase, tests agents can run, review standards that are explicit, quality signals that are reliable. They are ready to introduce agentic workflows. Agents picking up tickets, closing them with a merged PR, operating end-to-end within defined boundaries.

This is the third dip. And it is the one that requires the most precise thinking.

An agent picking up a ticket needs four things to close it without human involvement. Context to understand the system it is working in. A test suite it can run to verify its work. A CI pipeline with structured output it can interpret. And a ticket with acceptance criteria it can actually verify: not "improve the checkout flow" but "the checkout_total function should return the correct value when a discount code is applied, verified by this specific test passing."

Most teams have the first layer. Almost none have the fourth. They build a CLAUDE.md, run an agent on a real ticket, watch it fail, improve the CLAUDE.md, watch it fail differently, and conclude that agents do not work for their codebase. The correct conclusion is that layers two, three, and four are missing.

The third dip is about closing the feedback loop. Agents iterate by running tests, reading the output, adjusting, and running again. If the test suite is slow, the loop stalls. If the CI output is a wall of unstructured logs designed for humans to parse, the agent cannot diagnose the failure. If the ticket has no verifiable acceptance criteria, the agent has no way to know when it is done.

When all four layers are in place, the pattern looks different. The agent picks up a well-specified ticket. It reads the context infrastructure, makes an initial change, runs the tests, reads the structured output, adjusts. The loop closes in minutes. When CI passes, it submits a PR with a summary of what changed and why. The engineer reviews for architectural soundness, not for what the agent did. Merge or request a change. The agent handles it.

Anthropic's analysis of more than 500,000 coding interactions found that 79 percent were automation-oriented: engineers and agents delegating entire tasks, not just autocompleting lines. The teams operating at this level are not doing more of what they did before. They have crossed into L4. They got there by building through three dips, not one.

What Leaders Get Wrong About Each Dip

The consistent mistake across all three curves is the same: misreading the dip as evidence that AI does not work, rather than as the cost of transition to the next level.

The first dip feels like a tool adoption problem. It is a workflow redesign problem.

The second dip feels like an AI quality problem. It is an infrastructure investment problem.

The third dip feels like an agent capability problem. It is a feedback loop problem.

At each level, the data available to the leader points in a misleading direction. The first dip produces frustrated engineers. The second produces rising incident rates with no obvious cause. The third produces agents that keep failing on real tickets. All of these feel like reasons to slow down or stop.

The leaders who push through understand that the dip is the price of the transition, not evidence against it. They invest in the infrastructure the next level requires rather than managing down expectations to match the current level's ceiling. And they protect the time to do it, because the J-curve at every level is vulnerable to sprint pressure, board questions, and performance reviews that treat the dip as a delivery failure.

Microsoft's 2025 Work Trend Index found that 81 percent of leaders expect agents to be moderately or extensively integrated into their engineering strategy within 12 to 18 months. Most of those teams are currently at L2. The leaders who understand that two more dips stand between them and that state will build through them with intention. The leaders who do not will announce each plateau as success and wonder why the gap to the leading teams keeps widening.

The J-curve is not a one-time event you survive. It is the repeating price of staying competitive as the ceiling of what is possible keeps moving.

I help engineering teams close the gap between "we use AI tools" and "AI actually changed how we deliver." Book a 20-minute call and I'll tell you where the leverage is.

The AI Adoption J-Curve Repeats at Every Level

The First Dip: Tool Adoption Without Workflow Change

The Second Dip: More Output, More Fragility

The Third Dip: Where Agents Actually Live

What Leaders Get Wrong About Each Dip

Working on something similar?