Engineering leaders keep running into these three problems
I've been having multiple conversations with heads of engineering, and across all conversations, three problems consistently come up. What surprised me is that none of them actually cared about whether AI is "transformational" or not. Instead, they cared about knowing where they stand, what to do once they know, and how to actually change the way their teams work.
How do we know if we're productive?
The pressure to be more productive is constant, but most leaders can't answer the underlying question: Compared to what? Is comparing PRs per developer per week enough? Should we compare ourselves to ourselves? To other companies? And even if comparing against a set of peer companies tells you that you're below average, what does that mean? After all, what's below average for one organization can be perfectly healthy for another.
The harder question is figuring out what to measure in the first place. For one company, the binding constraint is code review turnaround — bringing it from 2 days to 6 hours unblocks everything downstream. For another, it's environment provisioning, test flakiness, or time between merge and deploy. A generic set of metrics is likely to overwhelm leaders and create more noise than good. The metrics that matter are the ones tied to your actual bottlenecks, and most companies don't independently know what those are. And generic benchmarks aren't going to surface those.
What do we do with the data?
Once you have the data and the right metrics, the next hurdle is acting on it. The common mistake is treating productivity as an engineering or procurement problem — build/buy a tool, ship the change. That overlooks two of the three levers actually available: products, processes, and people.
A process change, such as "code reviews complete within six business hours," can move the needle more than a new tool purchase. A people change — assigning specific AI skill files to C++ developers on a particular service, or pairing top performers with the team's slowest reviewers — can outperform a license rollout. And generic insights, like "teams using Cursor ship more PRs," don't translate into anything actionable. The useful version is specific and actionable: this group, on this codebase, with this setup, ships X% faster, and here's what you need to validate to replicate and standardize this pattern across teams that look similar.
How do we actually transform the work?
The third problem is the one most leaders care about and have the least visibility into: how to fundamentally change how engineering operates with AI, not just nudge a few metrics.
The naturally curious engineers — the so-called 100x crowd — will figure it out on their own. They'll find the right tools, build the right prompts, and pull ahead without much help. The real challenge is the other 80-90% of the team. Getting those engineers to 90x is what determines whether AI compounds across the organization or stays concentrated in a small group of power users.
That requires being deliberate about which tasks are best handled by humans, which by AI, and which by humans working with well-informed AI. It also requires teaching teams what good AI use looks like — applying agents to specific outcomes rather than spending tokens for the sake of it. Token consumption is an input metric; the outputs that matter are throughput, lead time, and quality. Treating token volume as a proxy for transformation results in budget spend without a meaningful change in how the work actually gets done. And you'll hear a lot more about what the right things to look at are in the very near future.







