Tokenmaxxing: Why AI token consumption isn't engineering productivity

Tokenmaxxing—treating AI token consumption as a productivity metric—is repeating the lines-of-code mistake. Data from 22,000 developers points to a better way to measure AI engineering impact.

red background with a stack of AI tokens

Tokenmaxxing: Why AI token consumption isn't engineering productivity

Tokenmaxxing—treating AI token consumption as a productivity metric—is repeating the lines-of-code mistake. Data from 22,000 developers points to a better way to measure AI engineering impact.

red background with a stack of AI tokens
Chapters

TL;DR: Tokenmaxxing is the practice of treating AI token consumption as a proxy for engineering productivity; the more tokens an engineer burns, the more productive they're assumed to be. It's the AI-era version of measuring developers by lines of code, a vanity metric the industry abandoned decades ago.

Data from 22,000 developers across 4,000 teams shows the problem: AI usage is accelerating throughput (task completion up 34%, epics up 66%), but bugs per developer are up 54%, median review time is up 5x, and code churn has increased 861% in high AI adoption environments. Throughput measures what shipped. It doesn't measure what survived.

Token consumption is an input, not an outcome. Engineering leaders should measure AI's impact on throughput, efficiency, and quality—and treat the gap between rising consumption and flat (or diminishing) outcomes as the signal to act on.

How tokenmaxxing went mainstream

Earlier this month, news leaked that Meta had an internal AI leaderboard called Claudeonomics, let 85,000 employees compete to be the top AI token consumer, and watched total consumption hit 60 trillion tokens in a single month. Their top user burned 281 billion tokens. Meta’s CTO publicly endorsed one engineer "spending the equivalent of his salary on AI tokens" as a 10x productivity story. 

Then, news broke that Uber exhausted its entire 2026 AI budget by April. $3.4 billion in R&D, gone in four months, most of it on Claude Code. Uber’s CTO framed the overrun as productivity: 11% of backend code is now AI-authored, 95% of engineers using AI tools monthly. Like Meta, Uber runs internal leaderboards ranking engineers by AI usage.

What tokenmaxxing actually measures

This practice of treating AI token consumption as a proxy for engineering productivity is called “tokenmaxxing.” The premise is simple: The more tokens an engineer burns—through longer prompts, parallel agents, higher reasoning tiers—the more productive they’re presumed to be. Tokenmaxxing is the AI-era equivalent of measuring developer productivity by lines of code, which is a vanity metric the industry dismissed decades ago, but it’s now being reintroduced under a new (and still incorrect) frame.

Why enterprises default to consumption metrics

But, since increased AI usage doesn’t necessarily equate to improved productivity or better business outcomes, why has incentivizing higher token consumption become the norm? 

For these large companies with billions to spend, we believe they’re aiming for maximum usage to signal they are “AI-forward.” They are essentially using a brute-force adoption strategy, encouraging engineers to use AI as much as possible to disrupt old workflows and spark hyper-experimentation. Because they can afford the overhead, this high-velocity path to a competitive advantage makes sense.

On the other hand, companies without billions to spend on AI engineering still face heavy top-down pressure to maximize adoption and prove ROI. When leadership is forced to demonstrate the value of AI coding tools, it is tempting to rely on consumption-based metrics as a proxy for productivity—primarily because truly quantifying AI’s impact remains a significant challenge for enterprises.

What 22,000 developers reveal about AI engineering productivity

We understand the instinct. We also think it's the wrong approach. Here’s why: 

After analyzing two years of data from 22,000 developers across 4,000 teams, we found that AI usage is now the standard and is meaningfully accelerating throughput: task completion up 34%, epics completed per developer up 66%, code-specific tasks up 210%. This is something to celebrate, but activity metrics and leading indicators only tell half the story. 

The downstream numbers tell the other half. Bugs per developer are up 54%. The incident-to-PR ratio has more than tripled. Median review time is up 5x. A staggering 31% more PRs are merging without any review at all. Code churn, the ratio of lines deleted to lines added in a given quarter, has increased 861% with high AI adoption. This shows that while the throughput numbers measure what was shipped, they do not tell you what survived.

Even organizations with strong pre-AI engineering maturity show the same pattern. The gap between what AI is producing and what the engineering system can safely absorb is widening as adoption deepens.

How to measure AI's actual engineering impact

To get a better grasp on whether increased AI usage is actually producing the outcomes the company needs, we’d recommend engineering leaders maintain a balanced view of what productivity means. It's throughput, efficiency, and quality. Checks and balances.

AI token usage is an input, not an outcome. Outcomes are productivity metrics: Are we delivering faster? Are we delivering more? Are our systems remaining safe, stable, and reliable? Measure AI’s impact on these  three fronts.  

When building a dashboard, consider juxtaposing inputs vs outputs. On the consumption side, include the usual suspects: seats activated, tokens consumed, and so on. On the other side, include the metrics that actually tell you whether AI is helping, and normalize them per unit of value delivered. At the same time, keep a close eye on AI’s “bad habits” like doing more than asked (files touched per PR) and being too verbose (PR size). You need to ensure these behaviors aren't doing any damage, wasting developer time, or necessitating high rework rates that negate AI's benefits. If AI consumption is climbing and any of those on the impact side are trending the wrong way, you don't actually have a productivity story, just a volume story. Treat the gap between the two as the signal.

Act on the gap, don't scale through it

Next, act on that gap. Don’t scale through it. When usage is up and outcomes are flat (or worse, declining), the instinct is to add more to the mix: more AI tools, more reviewers, more enablement, more training. We'd suggest the opposite. Segment AI usage by team, types of work, repos, and vendors, and then determine the use cases actually producing positive outcomes. 

In most enterprise orgs, the picture is uneven, with a few workflows shipping real gains, followed by a long tail producing noise, and a handful actively making things worse. But when you have that segmentation, it becomes actionable. From there, you can rationalize tools, standardize on lower tier models when possible (while sustaining gains), and then reinvest the reclaimed spend in the scaffolding that makes AI work at scale—context provisioning, codebase standards, governance and guardrails, and the retrospective loops that make the next cycle smarter. 

Since most companies can’t afford tokenmaxxing as their AI strategy, pushing back against consumption-first directives is the only path to the outcomes leaders actually want—better quality, faster cycles, more predictable budgets, and an AI engineering system that isn't buckling under its own output. The Acceleration Whiplash data is already clear on this: more isn't equaling better. It's time our metrics caught up.

Neely Dunlap

Neely Dunlap

Neely Dunlap is a content strategist at Faros who writes about AI and software engineering.

AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
Cover of Faros AI report titled "The AI Productivity Paradox" on AI coding assistants and developer productivity.
Discover the Engineering Productivity Handbook
How to build a high-impact program that drives real results.

What to measure and why it matters.

And the 5 critical practices that turn data into impact.
Cover of "The Engineering Productivity Handbook" featuring white arrows on a red background, symbolizing growth and improvement.
Graduation cap with a tassel over a dark gradient background.
AI ENGINEERING REPORT 2026
The Acceleration 
Whiplash
The definitive data on AI's engineering impact. What's working, what's breaking, and what leaders need to do next.
  • Engineering throughput is up
  • Bugs, incidents, and rework are rising faster
  • Two years of data from 22,000 developers across 4,000 teams
Customers
9
MIN READ

A Fortune 100 bank uses Faros to measure AI impact and drive a 20% throughput increase

Learn how a top U.S. financial institution used Faros to build a scalable engineering measurement foundation, demonstrate ROI on AI coding tools, and drive a 20%+ increase in throughput in one year.

Blog
12
MIN READ

AI engineering in 2026 demands more than better tools

At enterprise scale, AI engineering necessitates a connected system spanning strategy, tooling, cost management, adoption, measurement, governance, and the context layer that makes AI output production-ready.

Research
7
MIN READ

Ten takeaways from the AI Engineering Report 2026: The Acceleration Whiplash

What two years of telemetry data from 22,000 developers reveals about AI's real impact on developer productivity, code quality, and business risk in 2026.