Want to learn more about Faros AI?

Fill out this form and an expert will reach out to schedule time to talk.

I'm interested in...
Loading calendar...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.
Submitting...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.

Lab vs. Reality: What METR's Study Can’t Tell You About AI Productivity in the Wild

METR's study found AI tooling slowed developers down. We found something more consequential: Developers are completing a lot more tasks with AI, but organizations aren't delivering any faster.

Naomi Lurie
Naomi Lurie
A chart from the AI Productivity Paradox Report 2025 showing that AI boosts output, but human review becomes the bottleneck
5
min read
Browse Chapters
Share
July 28, 2025

The AI Productivity Debate Gets Complicated

The AI productivity debate took an unexpected turn in July when METR published findings that AI coding assistants made experienced developers 19% slower on complex tasks. Their controlled study of 16 seasoned open-source contributors sparked intense discussion across the developer community—and for good reason. Their findings challenge the widespread assumption that AI automatically boosts productivity.

METR's research deserves credit for bringing scientific rigor to a field often dominated by anecdotal claims. Their controlled methodology revealed important truths about AI's limitations with complex, brownfield codebases that require deep system knowledge and organizational context. Our telemetry from 10,000+ developers confirms this pattern: We see AI adoption consistently skewing toward newer hires who use these tools to navigate unfamiliar code, while more experienced engineers remain skeptical.

But for business leaders making AI investment decisions, METR's study answers only part of the question. While understanding individual task performance (and perception of AI) is valuable, the critical question for organizations isn't whether AI helps developers complete isolated assignments faster. It's whether AI helps businesses ship better software to customers more effectively.

{{ai-paradox}}

The Missing Context: How Real Organizations Actually Work

METR's controlled experiment studied 16 experienced developers from large open-source repositories, primarily using Cursor Pro with Claude 3.5 Sonnet, working on carefully designed tasks in an isolated environment. This approach yields clean, comparable data, but it falls short of capturing how software development actually happens in organizations.

Enterprise software delivery involves far more than individual coding speed. Code must be reviewed by teammates, pass through testing pipelines, navigate deployment processes, and integrate with work from dozens of other developers. A developer might very well complete some simple tasks faster with AI, but if that creates bottlenecks downstream, the organization sees no benefit.

Our analysis took a fundamentally different approach. Instead of controlled tasks, we analyzed telemetry from 1,255 teams and over 10,000 developers across multiple companies, tracking how AI adoption affects real work in natural settings over time. Rather than measuring isolated task completion, we examined the full software delivery pipeline, from initial coding through deployment to production. Our goal was to determine whether widespread AI adoption is correlated with significant changes to common velocity, speed, quality, and efficiency developer productivity metrics.

Comparing the METR and Faros AI study methodologies

What We Discovered: The Power of Parallelization

The results of Faros AI's study revealed a benefit METR's methodology couldn't capture: AI is enabling developers to handle more concurrent workstreams effectively and deliver significantly higher throughput.

Our data shows that developers on high-AI-adoption teams interact with 9% more tasks and 47% more pull requests per day. This isn't traditional multitasking, which research has long shown to be counterproductive. Instead, it reflects a fundamental shift in how work gets done when AI agents can contribute to the workload.

With AI coding assistants, an engineer can initiate work on one feature while their AI agent simultaneously handles another. They can start a refactoring task, hand it off to AI for initial implementation, then review and iterate while AI tackles the next item in the backlog. The developer's role evolves from pure code production to orchestration and oversight across multiple parallel streams.

This parallelization explains why we also found 21% higher task completion rates and 98% more merged pull requests, even as individual task speeds might not improve dramatically. For businesses, this distinction matters enormously. Organizations don't optimize for how quickly developers complete single tasks; rather, they optimize for how much valuable software they ship to customers.

Key findings comparison between METR and Faros AI studies

Notably, while we identified this correlation with throughput and multi-tasking, the telemetry did not indicate a correlation between AI adoption and task or PR speed, as measured by their cycle times.

{{ai-paradox}}

The Organizational Reality Check

Here's where our findings align with METR's concerns. Both studies reveal that AI introduces new complexities into software delivery:

  • Complexity challenges: AI-generated code tends to be more verbose and less incremental as measured by a 154% increase in PR size
  • Code review bottlenecks: Our data shows 91% longer review times, no doubt influenced by the larger diff sizes and the increased throughput
  • Quality concerns: We observed 9% more bugs per developer as AI adoption grows

These findings echo METR's observation that AI can create as many problems as it solves, particularly for complex work.

Our key insight: AI's impact depends entirely on organizational context. In METR's controlled environment, the organizational systems that could absorb AI's benefits simply didn't exist. In real companies, those systems determine whether AI adoption succeeds or fails.

Organizations can address these challenges through more strategic AI rollout and enablement, systematic workflow changes, and infrastructure improvements.

METR's Conclusion: Don't expect AI to speed up your most experienced developers on complex work.

Faros AI's Conclusion: Even when AI helps individual teams, organizational systems must change to capture business value.

Why Lab Results Don't Predict Business Outcomes

Both approaches provide valuable data on where AI helps and where it doesn't. Any disconnect isn't surprising when you consider the fundamental differences in what each approach measures:

METR measured: Individual developer performance on isolated, well-defined tasks with no downstream dependencies.

Faros AI measured: End-to-end software delivery performance across interdependent teams with real business constraints.

METR's environment: 16 experienced developers, primarily Cursor Pro with Claude 3.5/3.7 Sonnet, controlled tasks, no organizational systems.

Faros AI’s environment: 10,000+ developers across all experience levels, multiple AI tools (GitHub Copilot, Cursor, Claude Code, Windsurf, etc.), natural work settings, full organizational context.

For engineering leaders, the Faros AI study demonstrates that AI is unleashing increased velocity but existing workflows and structures are blocking it. Developers don't work in isolation—they work within systems of code review, testing, deployment, and cross-team coordination. Whatever impact AI has on individual productivity only translates to business value if it successfully navigates these organizational processes.

{{ai-paradox}}

The Path Forward: Beyond Individual Productivity

Our qualitative fieldwork and operational insights suggest that companies achieving meaningful AI gains are redesigning workflows to harness AI's unique strengths. This means:

  • Workflow redesign: Adapting review processes to handle larger, AI-generated pull requests effectively
  • Strategic enablement: Providing role-specific training rather than assuming developers will figure it out
  • Infrastructure modernization: Upgrading testing and deployment pipelines to handle higher code velocity
  • Data-driven optimization: Using telemetry to identify where AI delivers the biggest productivity gains and focusing adoption accordingly
  • Cross-functional alignment: Ensuring AI adoption is even across interdependent teams to prevent dependencies from erasing gains

Most importantly, successful organizations treat AI adoption as a catalyst for structural change. This approach focuses on how AI can reshape the organization of software development work, rather than on marginal gains for individual developers.

Building on METR's Foundation

METR's research provides crucial insights into AI's limitations and the importance of human expertise in complex problem-solving and how AI tools will have to evolve to support brownfield codebases.

But the story doesn't end with individual task performance. The question for organizations is how to harness AI's strengths—particularly its ability to enable parallelization and handle routine work—while addressing its weaknesses through better systems, training, and workflow design.

The future of AI in software development won't be determined by whether it makes individual developers faster at isolated tasks. Organizations will be expected to adapt their systems, processes, and culture to leverage AI as a force multiplier for human expertise.

Both lab studies and real-world telemetry have roles to play in understanding that future. For engineering leaders making investment decisions today, the real-world evidence points to a clear conclusion: AI's business impact depends far more on organizational readiness and strategic AI deployment than previously understood. 

The companies that recognize this distinction and invest accordingly will build the durable competitive advantages that matter in the age of AI-augmented software development.

Most organizations don't know why their AI gains are stalling. Faros AI can help. Book a meeting with an expert today.

AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
AI Productivity Paradox Report 2025

Discover the Engineering Productivity Handbook

How to build a high-impact program that drives real results.

What to measure and why it matters.

And the 5 critical practices that turn data into impact.

The cover of The Engineering Productivity Handbook on a turquoise background
Contact us
Tell us what you want to achieve with Faros AI and we’ll show you how.
Want to learn more about Faros AI?
An illustration of a lighthouse in the sea

Thank you!

You will get an email soon. Feel free to download Faros AI Community Edition.
Oops! Something went wrong while submitting the form.

More articles for you

Editor's Pick
News
AI
DevProd
4
MIN READ

Faros AI Hubble Release: Measure, Unblock, and Accelerate AI Engineering Impact

Explore the Faros AI Hubble release, featuring GAINS™, documentation insights, and a 100x faster event processing engine, built to turn AI engineering potential into measurable outcomes.
July 31, 2025
Editor's Pick
AI
News
Editor's Pick
7
MIN READ

The AI Productivity Paradox Report 2025

Key findings from the AI Productivity Paradox Report 2025. Research reveals AI coding assistants increase developer output, but not company productivity. Uncover strategies and enablers for a measurable return on investment.
July 23, 2025
Editor's Pick
Guides
AI
DevProd
MIN READ

Report: The AI Productivity Paradox

Full Report: What data from 10,000 developers reveals about impact, barriers, and the path forward. Insights from our analysis of 1,255 teams across leading software engineering organizations.
July 23, 2025