TL;DR: AI token spend is the fastest-growing line item in software engineering, and most organizations have no way to connect it to outcomes. Faros’s Field Guide to Measuring Token Efficiency identifies three AI outcome signals and 11 guardrail metrics that tie AI spend to the decisions engineering leaders are being asked to make right now.
The questions Finance is asking about AI in software engineering
Only halfway through 2026, and AI token spend is already breaking budgets. Engineering organizations are grappling with skyrocketing AI costs—often from practices like tokenmaxxing—with some even burning through entire annual budgets within a couple of months. With all this spend, Finance is already asking the hard questions about AI in software engineering: Is the spend justified? Is it going towards the right things? Is it advancing real business outcomes?
These questions show up in budget reviews, vendor renewals, and board-level conversations about whether AI engineering investments are producing results that justify the current token expenditure. Engineering leaders who can answer these questions are equipped to make better decisions about which tools are turning tokens into outcomes, which practices are blowing up budgets without results, and where to add controls before problems compound.
From AI adoption and token spend to measuring what AI actually shipped
The AI Engineering Report 2026 - Acceleration Whiplash documented what three years of AI adoption has actually produced at scale: AI agents are the new normal. 60% of what they suggest is accepted into codebases. Throughput is up, but code quality is declining precipitously, and the gap between the two is widening.
Most engineering organizations cannot yet answer the questions this reality demands: Is AI delivering outcomes or slop? Are token budgets justified or reckless? Are we being strategic and efficient, or getting carried away by hype?
The answers to these questions now run through token spend, and they require a measurement foundation to trace AI dollars to shipped outcomes.
That’s why we wrote The Field Guide to Measuring Token Efficiency in AI Engineering. It provides the three AI outcome signals and 11 guardrail metrics you need to move from AI usage to measurement, and from token spend to accountable outcomes.
The four categories that connect AI dollars to decisions
Observability into AI's impact is the necessary first step to optimizing and governing it. To understand the full picture, you need to measure key metrics across these four categories:
Outcomes: Is AI delivering real business outcomes?
This is the category most organizations have the least visibility into, and the one finance cares most about. Most teams can tell you how many tokens they consumed last quarter; few can tell you what those tokens produced. Closing that gap is what turns a cost conversation into an investment conversation.
Adoption: Are your tools being used to their full potential?
Most organizations are paying for AI tools that significant portions of their engineering teams barely touch. Before you can evaluate whether a tool is delivering value, you need to know who is actually using it, how deeply, and whether that usage pattern justifies what you are paying.
Productivity: What are your tools producing, and how efficiently?
The AI Engineering Report 2026 found epics per developer up 66% and task throughput up 33.7% under high adoption. Those gains are real, but lead time rose 480% over the same period. Understanding both sides of that equation, what is being produced and where the pipeline loses speed, is what separates a tool worth expanding from one worth cutting.
Quality: What must you stay vigilant about?
AI-generated code is often superficially convincing: well-named, idiomatic, stylistically consistent. Structural and logical failures sit underneath and tend to surface in production. The report found bugs per developer up 54%, the incidents-to-PR ratio up 242%, and PRs merged without review up 31%. Seeing these signals by team and repo is how you know where to add controls before problems compound.
In the guide, each of the 14 metrics across these four categories is mapped to its data source (version control, work management, AI tool telemetry, CI/CD, incident management) so you know exactly what instrumentation is required and where to start.
How to get ahead before the next AI budget conversation
Tool rationalization, vendor renegotiations, budget justification, and headcount strategy all require data in a connected, actionable form. The field guide gives you a concrete place to start: 14 metrics, each mapped to a decision and a data source, organized so you can begin with the category where your visibility is lowest and your decisions are most immediate.
Whether you are preparing for a vendor renewal, building the case for a tool expansion, or answering finance’s questions about what AI spend is producing, this guide is designed to get you from “we think it’s working” to “here's what the data shows.”
Get your copy of the Field Guide to Measuring Token Efficiency today.







