AI engineering metrics: visibility and control for engineering leaders

AI is the primary author. Now comes the hard part.

AI coding tools are now a standard part of how software gets built. But the organizations running them are increasingly discovering that engineering AI adoption tracking was the easy part. Throughput is up — but so are incidents, unreviewed merges, and token spend that can spiral fast. The challenge has shifted from "should we use AI" to "how do we actually run this."

Our recent research on AI engineering impact, The Acceleration Whiplash, put data behind what engineering leaders are already sensing: AI has become the primary author of code, and the systems built around human-paced development weren't designed to absorb what it's producing. The productivity gains are real. So is the pressure to manage the program more deliberately.

Faros is built for that second problem. It models engineering as a system — across code, people, tools, and spend — and gives the leaders responsible for outcomes the visibility and control to manage it. Our latest release extends that capability in every direction.

This release introduces new capabilities:

Experiments, which give engineering leaders a structured framework for measuring the impact of AI program changes
Expanded AI tool coverage across Claude Code, Amazon Q Developer, and Kiro
Role-based experiences and custom roles, so the most relevant insights are served to every stakeholder when they log in
The Faros MCP server, which lets AI agents like ChatGPT, Claude, and Perplexity access, analyze, and act on your engineering data in the conversational interfaces users now prefer

Experiments: the right unit of AI management is a controlled change, not a rollout

Most organizations think about AI management in terms of tools: we adopted Claude Code, we're evaluating Windsurf. But the real decisions are more granular. We switched models. We capped consumption for certain teams. We made prompting guidance mandatory. We introduced PR size limits. Did those changes improve outcomes?

Experiments give you a structured before/after framework to find out. Define what you're testing and set an observation window. Faros tracks results across your key metrics, including speed (PR cycle time, time to first review), throughput (weekly tasks completed, PRs merged, and story points completed), and quality (PR size, bug ratio). Faros summarizes what moved, what didn't, and by how much.

You can scope an experiment to a group of teams or a single team. The setup is lightweight by design: the goal is a decision, not a research project.

Some experiments to consider running right now:

Model switching. Your team is evaluating whether to move from GPT Codex to Claude Sonnet as your default model. Which produces better outcomes on your codebase?
Consumption caps. A high-spend team is burning tokens faster than their output justifies. For AI tool spend management, cap their consumption and measure whether throughput actually drops or whether they adapt without missing a beat.
‍Junior engineer development. Engineering organizations need engineers who develop enough independent judgment to eventually oversee AI-generated code, not just consume it. Restrict AI access for a cohort on specific task types and track how their code quality, rework rates, and review turnaround evolve over 6 and 12 months compared to a control group. The engineers most valuable in three years will be the ones who learned when to trust AI and when to push back.

Faros screenshot for experiment tracking. Experiment name is Claude Sonnet vs. GPT Codex. Result: Positive Impact. Timeline: Jan 5, 2026 – Mar 31, 2006. An AI-generated summary summarizes the findings. The metric changes like PR cycle time, Weekly PRs merged, Bug ratio, Weekly task completed are color-coded and detailed. On the bottom, a button to "View Full Analysis". — Before and after switching from GPT Codex to Claude Sonnet — Faros experiments surface the productivity and quality changes that drive smarter AI tool spend management.

Complete AI coverage: Claude Code OTEL, Amazon Q Developer, and Kiro

Runaway spend happens in the gaps. If a tool isn't instrumented, you don't know what it's consuming or what it's producing.

This release closes the major remaining gaps.

Claude Code via OTEL. Faros already supports Claude Code for teams on the Claude Console and Claude Enterprise plan. The new OpenTelemetry path closes the gap for teams running Claude Code against Amazon Bedrock, Google Vertex AI, or any custom model provider. A handful of environment variables in a managed settings file — deployable via MDM — is all it takes. Once configured, Claude Code streams rich per-developer telemetry to Faros: sessions, lines added and removed, commits and pull requests, tokens consumed, cost, active time, and tool acceptance rates. One consistent view of Claude Code adoption and ROI across your entire developer base, regardless of how the tool is hosted, authenticated, or billed.

Amazon Q Developer and Kiro. Both are now fully supported in Faros. Teams standardized on AWS developer tooling can now track adoption, quantify time savings, and benchmark AI coding tool productivity gains for Q Developer and Kiro alongside Claude Code, Cursor, GitHub Copilot, and Windsurf, with the same metrics, on the same platform.

Role-based landing pages and custom roles: Faros for every leader, at any scale

Role-based experiences. Every out-of-the-box role in Faros — viewer, analyst, engineering manager, executive, and admin — has a tailored experience. Each role starts with a pre-built landing page designed for their needs: recommended dashboards, recently viewed content, documentation, support, and education resources. No setup required. Every organization can customize from there.

Custom roles. Organizations can also now define entirely new roles with curated experiences (think AI Officer, Security Officer, Product Manager) with their own names, permission sets, landing pages, and favorites. For large enterprises where the same role title means different things in different parts of the business, this means modeling your stakeholders rather than approximating it with standard templates.

MCP server for engineering data: reason and operationalize your insights

Your engineering data lives in Faros. Now your AI agent can work with it the same way it works with anything else: combine it with outside sources, cross-reference it, ask follow-up questions, and build automated workflows.

This is made possible by the Model Context Protocol, the open standard for connecting AI tools like Claude, ChatGPT, and Gemini to external data. When you connect Faros via MCP, your agent isn't pulling raw events from Git or Jira. It's querying data that has already been structured, normalized, unified, and attributed across your entire SDLC: lead time, deployment frequency, AI tool adoption, incident rates, token consumption. It also receives the metadata to understand what it's looking at. That's what makes the answers trustworthy enough to act on.

And once you're running experiments, the MCP server gives you a way to interrogate those results in real time, dig into the details behind the numbers, and build the automated monitoring workflows that keep running after the experiment closes.

How engineering leaders are using it

Here are four sample scenarios that show what this looks like in practice.

‍Forecasting AI spend after a pricing change. A vendor just updated their pricing. An engineering leader opens their AI tool, asks Faros for current token consumption by team, pastes in the new rate card, and asks the agent to model what they'd pay under current usage versus projected growth. Then they ask which teams are generating the least Net ROI per token. A budget conversation that would have taken a week takes a conversation.
‍Combining engineering data with sources that live outside Faros. The L&D team shares a spreadsheet of AI coding tool training completion by engineer, broken down by training type — prompt engineering, tool-specific courses for Java developers, general AI literacy. Cross it with Faros data on AI tool acceptance rates and throughput for that same cohort. See not just whether training moves the needle, but which kind moves it most, and for whom. Or: compare contractor billing records from finance with Faros metrics to get a cost-per-outcome view across your workforce. Identify where the economics of those relationships still make sense.
‍Preparing for performance reviews. Review season is here. An engineering manager opens their AI tool, asks Faros for each direct report's contributions, collaboration signals, and AI utilization over the last two quarters, then uploads the peer feedback from their 360 reviews. They ask the agent to benchmark each engineer against peers at the same level across the org. A complete, objective picture of every team member — what they shipped, how they work, how they've adopted AI, and how they stack up — ready before the first conversation starts.
‍Investigating quality risk and building a recurring workflow. A director just read The Acceleration Whiplash report and wants to know if her teams are experiencing the same quality decline. She opens Claude, connected to Faros, identifies which teams are seeing higher bug rates, drafts a Slack message to explore solutions, and sets up a weekly agent that alerts her if the trend continues. From seeing the problem to putting guardrails around it, all in a single chat.

Watch the demo to see how this investigation unfolds in a Claude conversation, from the first question to the recurring agent.

Setup and security

The MCP server works with any MCP-compatible client — Claude Desktop, Cursor, Claude Code, Windsurf, and others. Setup takes about a minute. Every query is scoped to the requesting user's existing Faros permissions: an exec sees executive-level data, a viewer sees their view. No new data exposure, no admin escalation required.

AI tool spend without outcomes is just a cost

The organizations getting AI right aren't the ones with the most tools or the highest adoption rates. They're the ones who can see what their AI program is producing and act on it.

Faros gives engineering leaders both: visibility into what AI is consuming across every tool in your stack, and the control to ensure it's working. That means complete AI coding tool productivity measurement, experiments that replace assumptions with evidence, and the ability to reason over your engineering data from inside the AI tools your teams already use.

To see these features in action, schedule a demo, or reach out to your Faros account team.