Developer sentiment surveys made sense before AI. Now they're misleading you.

Developer sentiment surveys track how engineers feel. But when AI coding tools cost real money and your CFO wants ROI, feelings aren't the answer. Here's what is.

A white AI symbol dwarfs a gray speech bubble symbol on a red background.

Developer sentiment surveys made sense before AI. Now they're misleading you.

Developer sentiment surveys track how engineers feel. But when AI coding tools cost real money and your CFO wants ROI, feelings aren't the answer. Here's what is.

A white AI symbol dwarfs a gray speech bubble symbol on a red background.
Chapters

What your CFO actually wants to know

"We had surveys. We had dashboards. But when my CIO asks for an economic case for our AI tools, none of that helps. You need a fundamentally different class of data to answer that question."

That's a VP of Engineering at a top-tier industrial manufacturing company, speaking in April 2026. His team had used one of the leading developer experience platforms. It had the best surveys. It had some tool telemetry. And when AI coding tools started consuming a meaningful line item in the engineering budget, none of it was sufficient to answer the questions that actually mattered.

He's not alone. Across enterprise engineering organizations, the same pattern is playing out: developer sentiment surveys built for a pre-AI world are being asked to do a job they were never designed to do. And the gap between what they can tell you and what you actually need to know is getting wider every quarter.

Why engineering teams built their measurement programs around developer surveys

Developer sentiment surveys made a lot of sense for a long time. Tools like DX (now part of Atlassian) and DORA-aligned pulse checks gave engineering leaders something genuinely valuable: a scalable way to understand how developers experienced their work. Where were the friction points? Were teams burning out? Was the toolchain getting in the way? These are real questions, and surveys answered them well.

There was also a practical reason surveys became the default. Connecting engineering data across a heterogeneous toolchain, Jira here, GitHub there, ADO somewhere else, CI/CD pipelines, incident management systems, is genuinely hard. Surveys sidestepped that complexity entirely. You didn't need to instrument anything or build a unified data model. You just asked your developers how they felt. For a long time, that felt like enough.

The developer experience discipline that emerged from this era was legitimate. Capturing developer sentiment helped organizations identify systemic problems: manual and slow pipelines; process overhead; too many meetings, interviews, and interruptions. Survey instruments like those in the DX platform gave engineering leaders a credible, structured way to bring those signals to leadership.

When the primary question was "how do we remove friction from our existing engineering process," surveys were the right instrument. They told you where developers were struggling. They gave you a feedback loop on changes you'd made. They were a meaningful part of how engineering leaders justified investments in tooling and process improvement.

That era is not entirely over. Sentiment still matters. But it's no longer sufficient on its own, and in some cases it's actively pointing leaders in the wrong direction.

What changed: AI made the questions harder

AI coding tools changed what engineering leaders are accountable for explaining. It's not just "are my developers happy and productive?" It's "are we getting economic value from this AI investment, and how do I prove it?"

A VP of Engineering at a large enterprise put it plainly: "Every AI tool conversation with my CIO comes down to the same question: what's the economic case? Sentiment data doesn't answer that."

This is the core problem. Developer sentiment surveys were built to measure developer experience, not to produce the economic analysis that CFOs and CIOs now expect. When GitHub Copilot, Cursor, Windsurf, or Claude Code costs real money at scale, the question changes from "do developers like this tool?" to "what is this tool actually delivering, and is it worth what we're paying?"

The "appearance of productivity" problem makes this worse. Developers overwhelmingly report that AI tools make them feel more productive. The 2025 Stack Overflow Developer Survey found that roughly 70% of developers using AI agents agreed they had increased productivity. DORA's 2025 State of AI-Assisted Software Development report, based on nearly 5,000 survey responses, found that over 80% of respondents said AI had enhanced their productivity.

These are not small numbers. And they are almost certainly telling you something true at the individual level: developers working with AI assistance do complete discrete tasks faster. The problem is that organizational outcomes don't follow automatically from individual sentiment, and survey instruments aren't designed to catch the gap between the two.

{{cta}}When the survey says one thing and the data says something else

This is where developer sentiment surveys stop being a useful signal and start becoming a liability.

The AI Engineering Report 2026 analyzed two years of telemetry data from 22,000 developers across more than 4,000 teams. It did not rely on self-reported estimates. It measured what actually happened in the systems where software gets built, reviewed, and shipped. The findings are not what the surveys would have predicted.

Task throughput per developer is up 34% under high AI adoption. Epics completed per developer are up 66%. Those gains are real. But at every downstream stage, the quality signal tells a different story. Bugs per developer are up 54%. Incidents per pull request have increased 242%. Code churn, the rate at which recently written code is deleted and replaced, has risen 861% under high AI adoption. Median PR review time is up 441%, and 31% more pull requests are merging with no review at all.

How production code quality changes from low to high AI adoption. From the AI Engineering Report 2026 - The Acceleration Whiplash

Developers are reporting productivity gains. The telemetry is showing compounding quality costs. Both things are happening simultaneously. A survey cannot see that. It can only record what developers believe about their own experience.

The same gap showed up in independent research. A METR study found that experienced developers using AI tools on real tasks from their own codebases took 19% longer than those working without AI. Before the study, those same developers had predicted AI would make them 24% faster. After experiencing the slowdown, they still believed AI had sped them up. The gap between perception and measurement was nearly 40 percentage points.

This is not a bug in how surveys are designed. It is a feature. Surveys are designed to capture perception. In a world where perception tracks reality reasonably well, that's useful. In an AI-accelerated engineering environment where individual experience and organizational outcomes are diverging, it's a problem.

One engineering leader described the experience precisely: "We'd see the survey say one thing and the telemetry say something completely different. That gap is where the real questions live."

"We'd see the survey say one thing and the telemetry say something completely different. That gap is where the real questions live."

What developer surveys can and can't tell you: a practical distinction

Surveys remain a valid instrument for specific questions. They're not going away, and they shouldn't. The issue is one of scope, not validity.

What developer sentiment surveys measure well What they cannot measure
Developer experience and friction Economic ROI of AI tools
Team morale and engagement signals Whether AI adoption is degrading code quality
Perceived productivity and tool satisfaction Actual cycle time and delivery performance
Friction in process or toolchain Which AI tools produce the best outcomes for your teams
Qualitative signals about working conditions Organizational throughput vs. individual throughput
What surveys are good at and what they cannot measure

The VP who opened this article wasn't dismissing surveys. He was using them the way they're meant to be used: as one dimension. "We still want to capture sentiment and feedback," he said. "But you need to look at them together, because the survey sometimes says one thing and the data says something else. That gap is where the real conversations are."

That's the right framing. Surveys capture one dimension. The mistake is treating them as the primary instrument when you're trying to answer questions about measuring engineering productivity, AI tool ROI, or how to structure your engineering organization going forward.

{{cta}}

What you actually need to justify AI budgets today

The questions engineering leaders face in 2026 require a different class of data. Here's what that actually looks like in practice.

  • Tool-level attribution. If your organization runs Cursor, Copilot, and Claude Code simultaneously across different teams, you need to know which tool is producing what outcomes. Not which tool developers prefer. Which one is actually reducing cycle time, reducing incident rates, and improving delivery performance? That requires telemetry connected to delivery metrics, not a satisfaction survey.
  • Economic analysis, not sentiment scores. When you walk into a budget conversation, your CFO doesn't want to know that developers are happier with AI tools. They want to know what the output delta is, what it costs to produce it, and how to get the best gains at the lowest cost. That last question matters more every quarter. A mid-tier model from one vendor can outperform a premium model on the metrics that actually move your delivery performance. You can't find that out from a satisfaction survey. You find it out by connecting AI adoption data to lead time, throughput, change failure rate, and cost per delivery unit across tools and teams simultaneously.
  • Adoption visibility across tools and teams. In large organizations, shadow AI adoption is real. Developers sign up for new tools independently, vibe-code applications that carry security risks, and experiment outside any governed process. You can't survey your way to that visibility. You need telemetry that spans the full tool landscape, regardless of what people choose to self-report.
  • Cross-system measurement. Most enterprise engineering organizations don't run a uniform toolchain. Teams use Jira, ADO, GitHub, and various combinations. A measurement program that only captures sentiment can paper over that heterogeneity. A telemetry-based approach has to actually connect those systems and normalize the data, which is what gives you a real view of developer productivity across teams that work differently.

Does this mean engineering organizations should stop using surveys?

No. The answer is not to abandon developer sentiment surveys. It's to understand what they're for.

Surveys are a valid instrument for understanding developer experience, identifying friction, and capturing the qualitative signals that telemetry can't surface. If your teams are struggling with unclear requirements, or if a tooling change is creating frustration that hasn't yet shown up in delivery metrics, a well-designed survey can catch that early.

The problem is treating surveys as a proxy for the questions that only telemetry can answer. When you ask "is our AI investment working?," a survey will tell you how developers feel about their AI tools. It will not tell you whether those tools are generating a return on investment. When you ask "how do we justify the role of engineering in an AI-first organization?," a survey will tell you how engineers perceive their own value. It will not give you the economic case that survives a CFO conversation.

Sensors and surveys serve different purposes. The organizations getting this right are using both, in the right proportions, for the right questions. They're not substituting one for the other.

The same caution extends to composite indexes built on top of self-reported data. DX's Developer Experience Index (DXI), for example, aggregates survey responses into a single score meant to represent the health of your engineering organization. The appeal is obvious: one number, easy to track, easy to present. But an index is only as reliable as its inputs. If the underlying data is heavily reliant on self-reported perception, the index inherits all of the same limitations. A rising DXI score tells you developers feel better about their work. It does not tell you whether your AI investment is producing a return, whether code quality is holding up, or whether your delivery performance is improving. In an AI-first engineering environment, those are the questions that determine your budget, your headcount, and your organizational structure. An index that can't answer them isn't a measurement program. It's a mood tracker with a dashboard.

What "different dimensions" means: building a measurement program that holds up

For engineering leaders who want to move beyond surveys as their primary instrument, here is what a complete measurement program looks like.

  • Start with telemetry that spans the full delivery workflow. That means connecting your source code management system, your issue tracker, your CI/CD pipeline, your incident management systems, and your AI coding tools into a unified data model. The Faros platform does this across heterogeneous toolchains, normalizing data from Jira, ADO, GitHub, and every major AI coding tool into a single schema.
  • Add AI adoption tracking that goes beyond "how many licenses are active." You want to know who is using which tools, how intensively, and what the downstream effect is on their team's delivery performance. That means tracking acceptance rates, code churn by tool, and connecting usage data to cycle time, bug and incident rates, and PR review patterns.
  • Continuously evaluate AI coding tools and models against your actual codebase. Vendor benchmarks and analyst reports tell you how models perform in general. They don't tell you how they perform on your repositories, your task types, your engineering context. The gap between the two can be significant. In a head-to-head evaluation on real tasks from internal codebases, a mid-tier model from one vendor outperformed a code-specialized model from another by more than 3x on successful task completion, at a comparable cost per outcome. That finding only surfaces when you test against your own code, not when you ask developers which tool they prefer. Model performance changes as vendors ship updates. The evaluation has to be continuous, not a one-time procurement decision.
  • Layer in the economic analysis. When you can connect AI tool usage to delivery throughput metrics, and you know the per-seat cost of each tool, you can build a defensible ROI case. Not "our developers said they were 30% more productive." But "teams at high AI adoption completed 34% more tasks per developer, their rework rates are healthy, and here is what that means in terms of engineering capacity and cost per unit of output." And when AI tool pricing increases, as it has been doing, you can simulate the net ROI impact before the renewal conversation happens, not after.

Keep surveys in the mix, but scope them correctly. Use them to understand developer experience signals that don't show up in telemetry. Use them to cross-check: when sentiment and data diverge, that's a signal worth investigating, not a number to average away.

The leaders who will make the right calls on AI investment, team structure, and engineering governance over the next two years are the ones who have both dimensions, who know which instrument answers which question, and who can walk into a budget conversation with data that survives scrutiny.

Developer sentiment surveys gave engineering organizations a foundation. For a long time, they also gave engineering leaders a way to avoid the harder data problem. Integrating your toolchain, normalizing data across systems, connecting AI adoption to delivery outcomes, it's real work. Surveys were easier. And in an era where the primary question was developer experience, easier was defensible.

That's no longer the case. The questions engineering leaders face today, what is our AI investment actually producing, which tools are worth the cost, how do we justify the role of engineering in an organization where AI writes the code, cannot be answered with a survey. They require reliable, accurate, and granular telemetry. They require a unified data model. They require a way to query that data as new questions emerge, because the questions will keep emerging. Two years ago, no one was tracking the relationship between AI adoption, code churn, and incidents. Now those are among the most important signals in engineering. 

The good news is that work is now more achievable than it's ever been. The organizations that do it will be able to walk into any budget conversation with data that survives scrutiny. The ones that don't will keep presenting sentiment scores to CFOs who are asking economic questions, and wondering why the answers don't land.

Faros's AI Engineering Report 2026 - The Acceleration Whiplash covers telemetry data from 22,000 developers across more than 4,000 teams, tracking two years of before-and-after AI adoption data across the full software delivery lifecycle.

{{whiplash}}

Naomi Lurie

Naomi Lurie

Naomi Lurie is Head of Product Marketing at Faros. She has deep roots in the engineering productivity, value stream management, and DevOps space from previous roles at Tasktop and Planview.

AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
Cover of Faros AI report titled "The AI Productivity Paradox" on AI coding assistants and developer productivity.
Discover the Engineering Productivity Handbook
How to build a high-impact program that drives real results.

What to measure and why it matters.

And the 5 critical practices that turn data into impact.
Cover of "The Engineering Productivity Handbook" featuring white arrows on a red background, symbolizing growth and improvement.
Graduation cap with a tassel over a dark gradient background.
AI ENGINEERING REPORT 2026
The Acceleration 
Whiplash
The definitive data on AI's engineering impact. What's working, what's breaking, and what leaders need to do next.
  • Engineering throughput is up
  • Bugs, incidents, and rework are rising faster
  • Two years of data from 22,000 developers across 4,000 teams
Blog
8
MIN READ

Tokenmaxxing: Why AI token consumption isn't engineering productivity

Tokenmaxxing—treating AI token consumption as a productivity metric—is repeating the lines-of-code mistake. Data from 22,000 developers points to a better way to measure AI engineering impact.

Customers
9
MIN READ

A Fortune 100 bank uses Faros to measure AI impact and drive a 20% throughput increase

Learn how a top U.S. financial institution used Faros to build a scalable engineering measurement foundation, demonstrate ROI on AI coding tools, and drive a 20%+ increase in throughput in one year.

Blog
12
MIN READ

AI engineering in 2026 demands more than better tools

At enterprise scale, AI engineering necessitates a connected system spanning strategy, tooling, cost management, adoption, measurement, governance, and the context layer that makes AI output production-ready.