
Developer sentiment surveys track how engineers feel. But when AI coding tools cost real money and your CFO wants ROI, feelings aren't the answer. Here's what is.

"We had surveys. We had dashboards. But when my CIO asks for an economic case for our AI tools, none of that helps. You need a fundamentally different class of data to answer that question."
That's a VP of Engineering at a top-tier industrial manufacturing company, speaking in April 2026. His team had used one of the leading developer experience platforms. It had the best surveys. It had some tool telemetry. And when AI coding tools started consuming a meaningful line item in the engineering budget, none of it was sufficient to answer the questions that actually mattered.
He's not alone. Across enterprise engineering organizations, the same pattern is playing out: developer sentiment surveys built for a pre-AI world are being asked to do a job they were never designed to do. And the gap between what they can tell you and what you actually need to know is getting wider every quarter.
Developer sentiment surveys made a lot of sense for a long time. Tools like DX (now part of Atlassian) and DORA-aligned pulse checks gave engineering leaders something genuinely valuable: a scalable way to understand how developers experienced their work. Where were the friction points? Were teams burning out? Was the toolchain getting in the way? These are real questions, and surveys answered them well.
There was also a practical reason surveys became the default. Connecting engineering data across a heterogeneous toolchain, Jira here, GitHub there, ADO somewhere else, CI/CD pipelines, incident management systems, is genuinely hard. Surveys sidestepped that complexity entirely. You didn't need to instrument anything or build a unified data model. You just asked your developers how they felt. For a long time, that felt like enough.
The developer experience discipline that emerged from this era was legitimate. Capturing developer sentiment helped organizations identify systemic problems: manual and slow pipelines; process overhead; too many meetings, interviews, and interruptions. Survey instruments like those in the DX platform gave engineering leaders a credible, structured way to bring those signals to leadership.
When the primary question was "how do we remove friction from our existing engineering process," surveys were the right instrument. They told you where developers were struggling. They gave you a feedback loop on changes you'd made. They were a meaningful part of how engineering leaders justified investments in tooling and process improvement.
That era is not entirely over. Sentiment still matters. But it's no longer sufficient on its own, and in some cases it's actively pointing leaders in the wrong direction.
AI coding tools changed what engineering leaders are accountable for explaining. It's not just "are my developers happy and productive?" It's "are we getting economic value from this AI investment, and how do I prove it?"
A VP of Engineering at a large enterprise put it plainly: "Every AI tool conversation with my CIO comes down to the same question: what's the economic case? Sentiment data doesn't answer that."
This is the core problem. Developer sentiment surveys were built to measure developer experience, not to produce the economic analysis that CFOs and CIOs now expect. When GitHub Copilot, Cursor, Windsurf, or Claude Code costs real money at scale, the question changes from "do developers like this tool?" to "what is this tool actually delivering, and is it worth what we're paying?"
The "appearance of productivity" problem makes this worse. Developers overwhelmingly report that AI tools make them feel more productive. The 2025 Stack Overflow Developer Survey found that roughly 70% of developers using AI agents agreed they had increased productivity. DORA's 2025 State of AI-Assisted Software Development report, based on nearly 5,000 survey responses, found that over 80% of respondents said AI had enhanced their productivity.
These are not small numbers. And they are almost certainly telling you something true at the individual level: developers working with AI assistance do complete discrete tasks faster. The problem is that organizational outcomes don't follow automatically from individual sentiment, and survey instruments aren't designed to catch the gap between the two.
{{cta}}When the survey says one thing and the data says something else
This is where developer sentiment surveys stop being a useful signal and start becoming a liability.
The AI Engineering Report 2026 analyzed two years of telemetry data from 22,000 developers across more than 4,000 teams. It did not rely on self-reported estimates. It measured what actually happened in the systems where software gets built, reviewed, and shipped. The findings are not what the surveys would have predicted.
Task throughput per developer is up 34% under high AI adoption. Epics completed per developer are up 66%. Those gains are real. But at every downstream stage, the quality signal tells a different story. Bugs per developer are up 54%. Incidents per pull request have increased 242%. Code churn, the rate at which recently written code is deleted and replaced, has risen 861% under high AI adoption. Median PR review time is up 441%, and 31% more pull requests are merging with no review at all.

Developers are reporting productivity gains. The telemetry is showing compounding quality costs. Both things are happening simultaneously. A survey cannot see that. It can only record what developers believe about their own experience.
The same gap showed up in independent research. A METR study found that experienced developers using AI tools on real tasks from their own codebases took 19% longer than those working without AI. Before the study, those same developers had predicted AI would make them 24% faster. After experiencing the slowdown, they still believed AI had sped them up. The gap between perception and measurement was nearly 40 percentage points.
This is not a bug in how surveys are designed. It is a feature. Surveys are designed to capture perception. In a world where perception tracks reality reasonably well, that's useful. In an AI-accelerated engineering environment where individual experience and organizational outcomes are diverging, it's a problem.
One engineering leader described the experience precisely: "We'd see the survey say one thing and the telemetry say something completely different. That gap is where the real questions live."
"We'd see the survey say one thing and the telemetry say something completely different. That gap is where the real questions live."
Surveys remain a valid instrument for specific questions. They're not going away, and they shouldn't. The issue is one of scope, not validity.
The VP who opened this article wasn't dismissing surveys. He was using them the way they're meant to be used: as one dimension. "We still want to capture sentiment and feedback," he said. "But you need to look at them together, because the survey sometimes says one thing and the data says something else. That gap is where the real conversations are."
That's the right framing. Surveys capture one dimension. The mistake is treating them as the primary instrument when you're trying to answer questions about measuring engineering productivity, AI tool ROI, or how to structure your engineering organization going forward.
{{cta}}
The questions engineering leaders face in 2026 require a different class of data. Here's what that actually looks like in practice.
No. The answer is not to abandon developer sentiment surveys. It's to understand what they're for.
Surveys are a valid instrument for understanding developer experience, identifying friction, and capturing the qualitative signals that telemetry can't surface. If your teams are struggling with unclear requirements, or if a tooling change is creating frustration that hasn't yet shown up in delivery metrics, a well-designed survey can catch that early.
The problem is treating surveys as a proxy for the questions that only telemetry can answer. When you ask "is our AI investment working?," a survey will tell you how developers feel about their AI tools. It will not tell you whether those tools are generating a return on investment. When you ask "how do we justify the role of engineering in an AI-first organization?," a survey will tell you how engineers perceive their own value. It will not give you the economic case that survives a CFO conversation.
Sensors and surveys serve different purposes. The organizations getting this right are using both, in the right proportions, for the right questions. They're not substituting one for the other.
The same caution extends to composite indexes built on top of self-reported data. DX's Developer Experience Index (DXI), for example, aggregates survey responses into a single score meant to represent the health of your engineering organization. The appeal is obvious: one number, easy to track, easy to present. But an index is only as reliable as its inputs. If the underlying data is heavily reliant on self-reported perception, the index inherits all of the same limitations. A rising DXI score tells you developers feel better about their work. It does not tell you whether your AI investment is producing a return, whether code quality is holding up, or whether your delivery performance is improving. In an AI-first engineering environment, those are the questions that determine your budget, your headcount, and your organizational structure. An index that can't answer them isn't a measurement program. It's a mood tracker with a dashboard.
For engineering leaders who want to move beyond surveys as their primary instrument, here is what a complete measurement program looks like.
Keep surveys in the mix, but scope them correctly. Use them to understand developer experience signals that don't show up in telemetry. Use them to cross-check: when sentiment and data diverge, that's a signal worth investigating, not a number to average away.
The leaders who will make the right calls on AI investment, team structure, and engineering governance over the next two years are the ones who have both dimensions, who know which instrument answers which question, and who can walk into a budget conversation with data that survives scrutiny.
Developer sentiment surveys gave engineering organizations a foundation. For a long time, they also gave engineering leaders a way to avoid the harder data problem. Integrating your toolchain, normalizing data across systems, connecting AI adoption to delivery outcomes, it's real work. Surveys were easier. And in an era where the primary question was developer experience, easier was defensible.
That's no longer the case. The questions engineering leaders face today, what is our AI investment actually producing, which tools are worth the cost, how do we justify the role of engineering in an organization where AI writes the code, cannot be answered with a survey. They require reliable, accurate, and granular telemetry. They require a unified data model. They require a way to query that data as new questions emerge, because the questions will keep emerging. Two years ago, no one was tracking the relationship between AI adoption, code churn, and incidents. Now those are among the most important signals in engineering.
The good news is that work is now more achievable than it's ever been. The organizations that do it will be able to walk into any budget conversation with data that survives scrutiny. The ones that don't will keep presenting sentiment scores to CFOs who are asking economic questions, and wondering why the answers don't land.
Faros's AI Engineering Report 2026 - The Acceleration Whiplash covers telemetry data from 22,000 developers across more than 4,000 teams, tracking two years of before-and-after AI adoption data across the full software delivery lifecycle.
{{whiplash}}




Tokenmaxxing—treating AI token consumption as a productivity metric—is repeating the lines-of-code mistake. Data from 22,000 developers points to a better way to measure AI engineering impact.