Fill out this form to speak to a product expert.
You can now measure Claude Code token usage, costs by model, and output metrics like commits and PRs. Learn how engineering leaders connect these inputs to leading and lagging indicators like PR review time, lead time, and CFR to evaluate the true ROI of AI coding tool and model choices.

If you've spent any time on Reddit's AI development forums lately, you've seen the frustration firsthand. Developers hitting their Claude Code limits mid-session, burning through $20 in a day when they expected to spend that much in a month, and waking up early just to reset their 5-hour usage windows before the workday starts.
One developer put it bluntly: "4 hours of usage gone in 3 prompts. Used plan mode to refactor a frontend architecture. Worst part is I just re-subscribed to Claude Code after a few months of Codex usage. Used 11% of my weekly credits." Another observed that switching to Opus 4.5 caused their "current session to burn so quickly" compared to Sonnet.
While some of these issues have been addressed by a price correction and removal of Opus-specific caps, they're symptoms of a broader challenge facing engineering organizations: AI coding tools are becoming essential, but the costs are unpredictable, the limits are opaque, and the connection between usage and actual productivity remains murky.
There is some good news on this front: Token usage data is now available through the Claude Code API. You can track estimated costs, monitor tokens by model, and see usage patterns over time.
But if you're only looking at tokens and dollars, you're missing the point. The real question isn't how much you're spending. It's whether that spend is delivering impact.
Engineering teams are deploying GitHub Copilot, Cursor, Windsurf, Claude Code, and other AI coding assistants under the watchful eyes of executives who expect significant productivity gains. The challenge is that most organizations lack the infrastructure to actually measure whether those gains are materializing.
Faros AI provides the measurement layer that connects AI tool usage to real engineering outcomes. The platform integrates data from source control, project management, CI/CD pipelines, incident tracking, security scanning, and HR systems to create a unified view of how AI tools are affecting your software delivery lifecycle. Rather than relying on vendor-reported metrics or developer self-assessments, Faros AI traces the actual downstream impact of AI-generated code on velocity, quality, security, and developer satisfaction.
The AI Transformation solution provides visibility across the full value journey, from initial pilot to large-scale rollout to ongoing optimization. You can track adoption metrics per developer and team, measure acceptance rates and time savings, identify unused licenses and power users, and compare tool effectiveness across different coding assistants. Critically, Faros AI applies causal analysis to separate AI's true effect from confounding factors like team composition, project complexity, and developer seniority, so you know whether performance changes are genuinely attributable to AI or driven by something else entirely.
For organizations seeking to rapidly assess their AI maturity and plan concrete next steps, the GAINS™ framework measures performance across ten dimensions that define engineering readiness for AI: adoption, usage, change management, velocity, quality, security, cost efficiency, satisfaction, onboarding, and organizational efficiency. Each dimension ties AI usage to business performance, quantifying what's working and where value is being lost.
Faros AI was recently recognized as the 2025 Microsoft Partner of the Year for Startups for its work helping enterprise software engineering organizations measure AI productivity gains. Companies like Autodesk, Discord, and Vimeo use Faros AI to become data-driven when it comes to engineering productivity, delivery, outcomes, and AI transformation.
Let's start with the mechanics. At the time of this writing, Claude Code operates on a 5-hour rolling window that begins with your first message in a session. Your token allocation depends on your plan: Pro users get approximately 44,000 tokens per window, Max5 users get around 88,000, and Max20 users receive roughly 220,000 tokens.
These limits reset every 5 hours, but here's where it gets complicated. Starting in August 2025, Anthropic introduced weekly limits on top of the 5-hour windows. This was a response to a small number of users who were, as Anthropic put it, consuming resources at unsustainable rates.
Model selection matters significantly. Opus 4.5 is a premium model with higher per-token costs than Sonnet 4.5 (about 1.7× on list pricing), and Anthropic also gives it much tighter weekly hour caps. Practically, that means heavy use of Opus will exhaust your Pro/Max allocation much faster than Sonnet-only usage. If you're running complex, multi-file agentic workflows with Opus, you'll hit your limits much sooner than you might expect. One developer reported that features like "Explore agents" and "Plan agents" in recent updates were burning through tokens at rates they'd never seen before.
The Claude Code API now provides visibility into several key metrics. You can track estimated cost and tokens used over time, measure total tokens by model, and access usage patterns. For organizations already using Claude Code, you've been able to track active users and sessions, acceptance rates, the number of Claude Code commits and pull requests, and lines of code added and removed.
This is useful data. But it's only the beginning of what you need to measure.
Here's something that doesn't get discussed enough: the AI coding tool landscape changes every few months. Models get updated, pricing structures shift, and capabilities expand in ways that can dramatically alter your cost-to-value equation.
Consider what has happened in 2025 alone. Anthropic released Opus 4.5 with significantly different resource consumption patterns. Weekly limits were introduced. Enterprise governance features and a Compliance API rolled out. Each of these changes affected how organizations should think about deployment, cost management, and measurement.
What worked for your team last quarter may not work next quarter. The governance structures and cost controls you set up six months ago probably need revisiting. Complacency is the enemy here. You need continuous monitoring, not a one-time evaluation.
This is true not just for Claude Code, but across the entire AI coding assistant landscape. GitHub Copilot, Cursor, Windsurf, and others are all evolving rapidly. The tool that delivers the best ROI today may not be the same one that delivers the best ROI in six months. Engineering leaders who treat AI tool selection as a set-it-and-forget-it decision are setting themselves up for surprise.
Now we get to the uncomfortable truth. More code doesn't mean more value.
Faros AI's AI Productivity Paradox research analyzed telemetry from over 10,000 developers across 1,255 teams. The findings reveal a fundamental mismatch between individual output and organizational outcomes.
Teams with heavy AI tool usage completed 21% more tasks and merged 98% more pull requests. On the surface, that sounds like a win. But their PR review times increased by 91%. The code was getting written faster, but the bottleneck just moved downstream. Developers were throwing code over the wall faster, but the walls were piling up higher.
This isn't just a Faros finding. A randomized controlled trial published in July 2025 found that experienced open-source developers using AI tools actually took 19% longer to complete tasks on their own repositories. Yet those same developers believed AI had made them 20% faster. The perception gap was nearly 40 percentage points.
The lesson here is clear: if you're only tracking token usage and cost per developer, you're measuring inputs, not outcomes. You might be optimizing for the wrong thing.
If you're already using Claude Code or other AI coding assistants, you should be capturing a comprehensive set of metrics. Here's what visibility looks like when you're doing it right.
Track active users and sessions over time to understand adoption patterns. Are developers actually using the tools consistently, or is usage sporadic?

A best practice is to also analyze this data by team to identify which groups are getting the most value and which might need additional enablement or training.

Tool usage breakdown matters too. Claude Code uses different internal tools for different operations, and understanding which tools are being invoked can help you understand how developers are actually working with the AI. Are they primarily using it for multi-file edits, notebook interactions, or straightforward code generation?

Total tokens used by model gives you visibility into whether developers are appropriately selecting Sonnet versus Opus for their tasks. If most of your token consumption is going to Opus when Sonnet would suffice, you have an optimization opportunity.
Track estimated cost over time to spot trends and anomalies. Look at average estimated cost per commit to understand efficiency. If cost per commit is trending upward without a corresponding increase in commit complexity or value, something may be wrong with how developers are prompting or configuring their workflows.

Acceptance rate tells you how often developers are actually using what Claude Code generates. A low acceptance rate might indicate poor prompt quality, misaligned model selection, or tasks that aren't well-suited for AI assistance.
Track the number of commits and pull requests originating from Claude Code sessions. Monitor lines of code added and removed to understand the scope of AI-generated changes. Look at PRs per team and PRs per developer to understand productivity distribution.

These metrics give you the "what" of AI tool usage. But to understand the "so what," you need to connect them to impact metrics.
To know whether your AI investment is working, you need to track both leading and lagging indicators. Leading indicators tell you if you're on the right track. Lagging indicators tell you if you've arrived.
Throughput metrics show you how work is flowing through your system. PR Merge Rate indicates how quickly code is moving from creation to integration. PR Review Time reveals whether AI-generated code is creating bottlenecks for reviewers. PR Size matters because larger PRs are harder to review and more likely to introduce defects, and AI tools have a tendency to generate oversized changes.

Pre-production quality metrics at this stage include code smells detected in AI-generated code and code coverage for AI-assisted changes. If AI-generated code is introducing more code smells or shipping with lower test coverage, you're trading short-term velocity for long-term maintenance burden.
Velocity metrics capture actual delivery outcomes. Task Throughput measures how many units of work are getting done. Lead Time tracks the end-to-end time from work starting to work shipping. Deployment Frequency indicates how often you're actually getting value to production.
Production quality metrics at this stage reveal the downstream consequences of your development practices. Change Failure Rate (CFR) tells you how often deployments cause problems. Mean Time to Recovery (MTTR) shows how quickly you can fix issues when they occur. Bugs per Developer and Incidents per Developer help you understand whether individual productivity gains are coming at the cost of quality. Rework Rate reveals whether AI-generated code is requiring more revision than human-authored code.

The key is connecting these metrics across the full lifecycle. You want to see whether increases in AI usage and output are translating into improvements in delivery velocity and quality, or whether gains in one area are being offset by degradation in another.
Satisfaction metrics matter alongside telemetry. If developers are reporting that AI tools are frustrating to use, require excessive prompting, or generate code that needs heavy editing, that's signal you can't get from usage data alone.
Cost per developer is only part of the equation. The average cost for Claude Code runs around $6 per developer per day, with 90% of users staying below $12. For team deployments using the API, expect roughly $100-200 per developer per month with Sonnet 4.5, though there's significant variance based on usage intensity and whether developers are running multiple instances.
But here's the real question: is that spend worth it?
To answer that, you need to compare tool effectiveness across your portfolio. If you're using GitHub Copilot for some teams, Cursor for others, and Claude Code for others still, you need a unified view of how each tool is performing relative to cost.
A/B testing and cohort analysis help you isolate the impact of specific tools. One data protection company ran a bake-off between GitHub Copilot and Amazon Q Developer, measuring adoption, usage, and downstream productivity impacts. They found 2x higher adoption and user acceptance with their chosen tool, 3 additional hours saved per developer per week, and 40% higher ROI compared to the alternative. That kind of rigorous comparison is what separates organizations that are genuinely optimizing from those that are just guessing.
Connect usage to business outcomes wherever possible. An software company with 300 engineers used comprehensive AI coding assistant measurement to track not just adoption and productivity metrics, but downstream impacts on PR cycle times. The result was $8M in savings from productivity improvements, and leadership gained the ability to course-correct faster when adoption patterns weren't delivering expected results.
The right tool and model are worth paying more for, but only if they deliver impact. Opus 4.5 costs more than Sonnet. Claude Code may cost more than alternatives. That's fine if the incremental spend generates incremental value. But you can't know that without measuring both sides of the equation.
Here are five things engineering leaders should do with Claude Code token limit, usage, and impact data:
Claude Code token limits are real constraints that engineering organizations need to understand and manage. But focusing solely on tokens and costs misses the bigger picture.
The developers complaining on Reddit about burning through their usage allocation aren't wrong to be frustrated. But the solution isn't just better token management. It's comprehensive visibility into whether AI coding tools are actually delivering value across the full software development lifecycle.
The AI Productivity Paradox is real. More code doesn't automatically mean more value. Individual output gains can be offset by downstream bottlenecks. Perception of productivity and reality of productivity often diverge.
The organizations that will get the most from AI coding tools are those that measure usage, cost, and impact together. They track leading indicators like PR merge rate and review time. They monitor lagging indicators like lead time, deployment frequency, and change failure rate. They connect the dots between what developers are doing with AI and what the organization is actually delivering.
A good tool and a good model are worth paying more for, if they deliver impact. But you can't know if they're delivering impact unless you're measuring it.
Ready to see how your AI coding tools are actually performing? Request a demo of Faros AI to get unified visibility into usage, cost, and productivity impact across your entire AI coding assistant portfolio.




