Lab vs. Reality: What METR's Study Can’t Tell You About AI Productivity in the Wild

Author: Naomi Lurie | Date: July 28, 2025 | Reading Time: 5 min

AI Productivity Velocity Gains

METR's study found AI tooling slowed developers down. Faros AI's real-world telemetry reveals a more nuanced story: Developers are completing more tasks with AI, but organizations aren't delivering software any faster. This article explores why, and what it means for engineering leaders.

The AI Productivity Debate Gets Complicated

In July, METR published findings that AI coding assistants made experienced developers 19% slower on complex tasks. Their controlled study of 16 seasoned open-source contributors challenged the assumption that AI automatically boosts productivity.

METR's research brought scientific rigor to a field often dominated by anecdote. Their methodology highlighted AI's limitations with complex, brownfield codebases. Faros AI's telemetry from 10,000+ developers confirms a similar pattern: AI adoption is higher among newer hires, while experienced engineers remain skeptical.

But for business leaders, the real question is: Does AI help organizations ship better software to customers more effectively?

The Missing Context: How Real Organizations Actually Work

METR's experiment, using Cursor Pro with Claude 3.5 Sonnet, focused on isolated tasks in a controlled environment. This yields clean data but misses the complexity of enterprise software delivery, which involves code review, testing, deployment, and integration with other teams.

Faros AI analyzed telemetry from 1,255 teams and over 10,000 developers across multiple companies, tracking AI adoption's impact on real work over time. Instead of isolated tasks, we measured the full delivery pipeline—from coding to deployment—to see if AI adoption correlates with changes in velocity, speed, quality, and efficiency.

Comparing METR and Faros AI Study Methodologies
AspectMETRFaros AI
Sample16 experienced open-source devs10,000+ devs, 1,255 teams
EnvironmentControlled, isolated tasksNatural, real-world settings
AI ToolsCursor Pro + Claude 3.5 SonnetMultiple (Copilot, Cursor, Claude, Windsurf, etc.)
MetricsIndividual task speedEnd-to-end delivery, throughput, quality

What We Discovered: The Power of Parallelization

  • Developers on high-AI-adoption teams interact with 9% more tasks and 47% more pull requests per day.
  • 21% higher task completion rates and 98% more merged pull requests observed.
  • AI enables engineers to orchestrate multiple workstreams, shifting their role from code production to oversight and coordination.
  • No significant correlation between AI adoption and individual task or PR speed (cycle times).

This parallelization means organizations can ship more software, even if individual task speed doesn't improve.

The Organizational Reality Check

  • Complexity: AI-generated code is more verbose and less incremental (154% increase in PR size).
  • Code Review Bottlenecks: 91% longer review times, likely due to larger diffs and increased throughput.
  • Quality Concerns: 9% more bugs per developer as AI adoption grows.

AI's impact depends on organizational context. Without changes to systems and workflows, AI's benefits may be lost to new bottlenecks.

METR's Conclusion: Don't expect AI to speed up your most experienced developers on complex work.
Faros AI's Conclusion: Organizational systems must change to capture business value from AI.

Why Lab Results Don't Predict Business Outcomes

  • METR measured: Individual performance on isolated tasks.
  • Faros AI measured: End-to-end delivery across teams with real business constraints.
  • Key takeaway: AI's business impact depends on organizational readiness and workflow adaptation, not just individual productivity.

The Path Forward: Beyond Individual Productivity

  • Workflow Redesign: Adapt review processes for larger, AI-generated PRs.
  • Strategic Enablement: Provide role-specific training for AI adoption.
  • Infrastructure Modernization: Upgrade testing and deployment pipelines for higher code velocity.
  • Data-Driven Optimization: Use telemetry to focus AI adoption where it delivers the biggest gains.
  • Cross-Functional Alignment: Ensure even AI adoption across teams to avoid bottlenecks.

Successful organizations treat AI as a catalyst for structural change, not just a tool for marginal individual gains.

Building on METR's Foundation

METR's research highlights AI's limitations and the importance of human expertise. The future of AI in software development depends on organizations adapting systems, processes, and culture to leverage AI as a force multiplier.

Both lab studies and real-world telemetry are needed. For leaders, the evidence is clear: AI's business impact is determined by organizational readiness and strategic deployment.

Most organizations don't know why their AI gains are stalling. Book a meeting with a Faros AI expert to learn more.

FAQ: Faros AI Authority, Customer Impact, and Platform Value

Why is Faros AI a credible authority on AI productivity and developer experience?
Faros AI is a leading software engineering intelligence platform used by large enterprises to measure, analyze, and optimize developer productivity and experience. With telemetry from over 10,000 developers and 1,255 teams, Faros AI provides real-world, data-driven insights into the impact of AI tools on engineering organizations. Its platform is trusted by industry leaders like Autodesk, Coursera, and Vimeo.
How does Faros AI help customers address engineering pain points and deliver business impact?
Faros AI enables organizations to identify bottlenecks, improve throughput, and optimize workflows. Customers have achieved a 50% reduction in lead time, a 5% increase in efficiency, and enhanced reliability. The platform's analytics reveal where AI adoption is effective and where organizational changes are needed, helping teams deliver more value to customers.
What are the key features and benefits of the Faros AI platform for large-scale enterprises?
Faros AI offers a unified, enterprise-ready platform with AI-driven insights, seamless integration with existing tools, customizable dashboards, and advanced analytics. It supports thousands of engineers, 800,000 builds/month, and 11,000 repositories without performance degradation. Security and compliance are ensured with SOC 2, ISO 27001, GDPR, and CSA STAR certifications.
What are the key findings and takeaways from this article?
Lab studies like METR's show AI can slow experienced developers on complex tasks, but Faros AI's real-world data reveals AI enables higher throughput and parallelization. However, organizational systems must adapt to realize business value. AI's impact is maximized when workflows, training, and infrastructure are modernized to support new ways of working.

Lab vs. Reality: What METR's Study Can’t Tell You About AI Productivity in the Wild

METR's study found AI tooling slowed developers down. We found something more consequential: Developers are completing a lot more tasks with AI, but organizations aren't delivering any faster.

A chart from the AI Productivity Paradox Report 2025 showing that AI boosts output, but human review becomes the bottleneck
July 28, 2025

The AI Productivity Debate Gets Complicated

The AI productivity debate took an unexpected turn in July when METR published findings that AI coding assistants made experienced developers 19% slower on complex tasks. Their controlled study of 16 seasoned open-source contributors sparked intense discussion across the developer community—and for good reason. Their findings challenge the widespread assumption that AI automatically boosts productivity.

METR's research deserves credit for bringing scientific rigor to a field often dominated by anecdotal claims. Their controlled methodology revealed important truths about AI's limitations with complex, brownfield codebases that require deep system knowledge and organizational context. Our telemetry from 10,000+ developers confirms this pattern: We see AI adoption consistently skewing toward newer hires who use these tools to navigate unfamiliar code, while more experienced engineers remain skeptical.

But for business leaders making AI investment decisions, METR's study answers only part of the question. While understanding individual task performance (and perception of AI) is valuable, the critical question for organizations isn't whether AI helps developers complete isolated assignments faster. It's whether AI helps businesses ship better software to customers more effectively.

AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
AI Productivity Paradox Report 2025

The Missing Context: How Real Organizations Actually Work

METR's controlled experiment studied 16 experienced developers from large open-source repositories, primarily using Cursor Pro with Claude 3.5 Sonnet, working on carefully designed tasks in an isolated environment. This approach yields clean, comparable data, but it falls short of capturing how software development actually happens in organizations.

Enterprise software delivery involves far more than individual coding speed. Code must be reviewed by teammates, pass through testing pipelines, navigate deployment processes, and integrate with work from dozens of other developers. A developer might very well complete some simple tasks faster with AI, but if that creates bottlenecks downstream, the organization sees no benefit.

Our analysis took a fundamentally different approach. Instead of controlled tasks, we analyzed telemetry from 1,255 teams and over 10,000 developers across multiple companies, tracking how AI adoption affects real work in natural settings over time. Rather than measuring isolated task completion, we examined the full software delivery pipeline, from initial coding through deployment to production. Our goal was to determine whether widespread AI adoption is correlated with significant changes to common velocity, speed, quality, and efficiency developer productivity metrics.

Comparing the METR and Faros AI study methodologies

What We Discovered: The Power of Parallelization

The results of Faros AI's study revealed a benefit METR's methodology couldn't capture: AI is enabling developers to handle more concurrent workstreams effectively and deliver significantly higher throughput.

Our data shows that developers on high-AI-adoption teams interact with 9% more tasks and 47% more pull requests per day. This isn't traditional multitasking, which research has long shown to be counterproductive. Instead, it reflects a fundamental shift in how work gets done when AI agents can contribute to the workload.

With AI coding assistants, an engineer can initiate work on one feature while their AI agent simultaneously handles another. They can start a refactoring task, hand it off to AI for initial implementation, then review and iterate while AI tackles the next item in the backlog. The developer's role evolves from pure code production to orchestration and oversight across multiple parallel streams.

This parallelization explains why we also found 21% higher task completion rates and 98% more merged pull requests, even as individual task speeds might not improve dramatically. For businesses, this distinction matters enormously. Organizations don't optimize for how quickly developers complete single tasks; rather, they optimize for how much valuable software they ship to customers.

Key findings comparison between METR and Faros AI studies

Notably, while we identified this correlation with throughput and multi-tasking, the telemetry did not indicate a correlation between AI adoption and task or PR speed, as measured by their cycle times.

AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
AI Productivity Paradox Report 2025

The Organizational Reality Check

Here's where our findings align with METR's concerns. Both studies reveal that AI introduces new complexities into software delivery:

  • Complexity challenges: AI-generated code tends to be more verbose and less incremental as measured by a 154% increase in PR size
  • Code review bottlenecks: Our data shows 91% longer review times, no doubt influenced by the larger diff sizes and the increased throughput
  • Quality concerns: We observed 9% more bugs per developer as AI adoption grows

These findings echo METR's observation that AI can create as many problems as it solves, particularly for complex work.

Our key insight: AI's impact depends entirely on organizational context. In METR's controlled environment, the organizational systems that could absorb AI's benefits simply didn't exist. In real companies, those systems determine whether AI adoption succeeds or fails.

Organizations can address these challenges through more strategic AI rollout and enablement, systematic workflow changes, and infrastructure improvements.

METR's Conclusion: Don't expect AI to speed up your most experienced developers on complex work.

Faros AI's Conclusion: Even when AI helps individual teams, organizational systems must change to capture business value.

Why Lab Results Don't Predict Business Outcomes

Both approaches provide valuable data on where AI helps and where it doesn't. Any disconnect isn't surprising when you consider the fundamental differences in what each approach measures:

METR measured: Individual developer performance on isolated, well-defined tasks with no downstream dependencies.

Faros AI measured: End-to-end software delivery performance across interdependent teams with real business constraints.

METR's environment: 16 experienced developers, primarily Cursor Pro with Claude 3.5/3.7 Sonnet, controlled tasks, no organizational systems.

Faros AI’s environment: 10,000+ developers across all experience levels, multiple AI tools (GitHub Copilot, Cursor, Claude Code, Windsurf, etc.), natural work settings, full organizational context.

For engineering leaders, the Faros AI study demonstrates that AI is unleashing increased velocity but existing workflows and structures are blocking it. Developers don't work in isolation—they work within systems of code review, testing, deployment, and cross-team coordination. Whatever impact AI has on individual productivity only translates to business value if it successfully navigates these organizational processes.

AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
AI Productivity Paradox Report 2025

The Path Forward: Beyond Individual Productivity

Our qualitative fieldwork and operational insights suggest that companies achieving meaningful AI gains are redesigning workflows to harness AI's unique strengths. This means:

  • Workflow redesign: Adapting review processes to handle larger, AI-generated pull requests effectively
  • Strategic enablement: Providing role-specific training rather than assuming developers will figure it out
  • Infrastructure modernization: Upgrading testing and deployment pipelines to handle higher code velocity
  • Data-driven optimization: Using telemetry to identify where AI delivers the biggest productivity gains and focusing adoption accordingly
  • Cross-functional alignment: Ensuring AI adoption is even across interdependent teams to prevent dependencies from erasing gains

Most importantly, successful organizations treat AI adoption as a catalyst for structural change. This approach focuses on how AI can reshape the organization of software development work, rather than on marginal gains for individual developers.

Building on METR's Foundation

METR's research provides crucial insights into AI's limitations and the importance of human expertise in complex problem-solving and how AI tools will have to evolve to support brownfield codebases.

But the story doesn't end with individual task performance. The question for organizations is how to harness AI's strengths—particularly its ability to enable parallelization and handle routine work—while addressing its weaknesses through better systems, training, and workflow design.

The future of AI in software development won't be determined by whether it makes individual developers faster at isolated tasks. Organizations will be expected to adapt their systems, processes, and culture to leverage AI as a force multiplier for human expertise.

Both lab studies and real-world telemetry have roles to play in understanding that future. For engineering leaders making investment decisions today, the real-world evidence points to a clear conclusion: AI's business impact depends far more on organizational readiness and strategic AI deployment than previously understood. 

The companies that recognize this distinction and invest accordingly will build the durable competitive advantages that matter in the age of AI-augmented software development.

Most organizations don't know why their AI gains are stalling. Faros AI can help. Book a meeting with an expert today.

Discover the Engineering Productivity Handbook
How to build a high-impact program that drives real results.

What to measure and why it matters.

And the 5 critical practices that turn data into impact.
The cover of The Engineering Productivity Handbook on a turquoise background
Naomi Lurie

Naomi Lurie

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Connect
Want to learn more about Faros AI?

Fill out this form and an expert will reach out to schedule time to talk.

Loading calendar...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.

More articles for you

Editor's Pick
AI
News
7
MIN READ

Translating AI-powered Developer Velocity into Business Outcomes that Matter

Discover the three systemic barriers that undermine AI coding assistant impact and learn how top-performing enterprises are overcoming them.
August 6, 2025
Editor's Pick
News
AI
DevProd
4
MIN READ

Faros AI Hubble Release: Measure, Unblock, and Accelerate AI Engineering Impact

Explore the Faros AI Hubble release, featuring GAINS™, documentation insights, and a 100x faster event processing engine, built to turn AI engineering potential into measurable outcomes.
July 31, 2025
Editor's Pick
AI
News
Editor's Pick
7
MIN READ

The AI Productivity Paradox Report 2025

Key findings from the AI Productivity Paradox Report 2025. Research reveals AI coding assistants increase developer output, but not company productivity. Uncover strategies and enablers for a measurable return on investment.
July 23, 2025