min read

April 29, 2026

How to use DORA's AI ROI calculator before you bring it to your CFO

A telemetry-informed companion to DORA's AI ROI calculator. Use these inputs to pressure-test your assumptions before presenting AI investment numbers to finance.

Faros Research

Blog

How to use DORA's AI ROI calculator before you bring it to your CFO

A telemetry-informed companion to DORA's AI ROI calculator. Use these inputs to pressure-test your assumptions before presenting AI investment numbers to finance.

About the Company

Industry

Chapters

Heading 2

Copy link to blog post entry

Copied!

DORA's ROI of AI calculator, stress-tested against two years of engineering data on AI’s impact

In April 2026, two reports landed within days of each other. DORA published The ROI of AI-Assisted Software Development, a careful financial framework with an interactive ROI of AI calculator that lets engineering leaders model the business case for AI tooling. Faros published The Acceleration Whiplash, a telemetry analysis of two years of data from 22,000 developers across 4,000 teams as they shift from low to high AI adoption.

On the surface, the two reports look like they're saying different things. They're not. They're measuring different things. DORA captures how developers experience AI through structured surveys. Faros captures what engineering systems record when AI is at work. Both are real.

The DORA ROI of AI calculator is a useful tool. It does what it claims: turns assumptions about productivity, throughput, and instability into a financial model. The framework is sound. What it needs, and what every framework like it needs, is empirical anchors for the inputs that matter most. DORA's own methodology note acknowledges this directly: the calculator is meant to spark a conversation, not deliver a verdict, and they invite users to adjust the assumptions to match their reality. This piece is a companion to that invitation. Plug these values in before you bring the number to your CFO.

What the two reports agree on

Before the divergences, the agreements. They're more numerous than the disputes, and they tell you what to take seriously.

Both reports find that individual developer effectiveness is up. DORA documents this through self-report. Faros's findings confirm it through telemetry: task throughput per developer rose 33.7% in environments with high AI adoption, and epics completed per developer rose 66.2%.

Both reports find that team-level throughput is up. Faros adds the granular finding here: tasks specifically involving code, those with an associated pull request, rose 210% per team. That's roughly six times more than general engineering task completion. AI is doing what AI tools are designed to do: accelerate the act of writing software.

Faros telemetry chart: Output is up under high AI adoption. Task throughput per dev +33.7%, epics completed per dev +66.2%, PR merge rate +16.2%, Tasks completed with associated PR per team +210%. Deployment frequency is down 11%. Code deletion ratio increased 861%. — Two years of telemetry: throughput is up under high AI adoption in engineering. Source: *The Acceleration Whiplash*, Faros's analysis of two years of telemetry from 22,000 developers.

Both reports find that instability rises during adoption. DORA frames this as the J-curve, the temporary dip in delivery performance before the system absorbs the new way of working. Faros measures it: incidents per pull request up 242.7%, monthly incidents up 57.9%, bugs per developer up 54%. The directional agreement is unambiguous. The magnitude and duration are where things get interesting.

Faros telemetry chart: Poor quality code is reaching production under high AI adoption. Incidents-per-PR rose +242.7%. Monthly incidents up 57.9%. Bugs per dev +54%. — Developer-felt quality and measured quality have decoupled under AI adoption. More findings from Faros's two-year dataset

Both reports identify a tax on senior engineers. DORA calls it the verification tax, the time senior reviewers spend confirming AI-generated work. Faros calls it the senior engineer tax, the cognitive load of reviewing code that looks idiomatic and well-named but conceals structural failures beneath the surface. Two independent methodologies, survey and telemetry, landed on the same finding from different angles. When two methods converge, it's worth taking seriously.

This is the floor. The disagreements are about magnitude, duration, and what surveys can't see.

Where survey and telemetry diverge

Self-report and telemetry are not in conflict. They measure different things. Developers know how they feel about their work; Git knows what was merged. Both are true, but only one shows up in production.

Three places the two views diverge are worth understanding before you use the calculator.

‍On code quality. DORA's survey data shows developers feel code quality has improved with AI adoption. Faros's telemetry shows incidents tripling per pull request and bugs per developer rising from 9% in the 2025 dataset to 54% in 2026. Both can be accurate at once. AI-generated code looks idiomatic, compiles cleanly, and passes local tests. The structural and logical failures show up downstream, in review queues, in QA, in production. The felt experience and the measured outcome are different signals.
On preexisting conditions. DORA's framework places real weight on engineering maturity as a protective factor. The argument is that strong foundations, mature DevOps practices, and high DORA scores insulate organizations from AI's downsides. Faros's most striking finding directly contradicts this. The AI Acceleration Whiplash appears regardless of baseline maturity. Organizations with strong pre-AI engineering performance show the same downstream deterioration as those without. This matters for the calculator because it changes who should expect the J-curve to be shallower, and the answer appears to be: nobody, automatically.
On what governance is actually doing. This is the cleanest example of something only telemetry can see: 31.3% more pull requests are merging without any review, human or agentic. No developer self-reports this. No survey instrument captures it. The gate is failing under the volume of AI-generated output, and the failure is invisible to the organizations experiencing it unless they're instrumented to see it.

The calculator asks you to estimate inputs. The question is whether your estimates come from how the work feels, or from what the systems show.

How to pressure-test the calculator

DORA's calculator pre-fills a baseline scenario for a 500-person engineering organization with $100M in revenue. At its defaults, the calculator returns:

First-year benefit: +$3,281,000
Return on investment: +39.2%
Payback period: 0.7 years

That's the number a CFO sees. That's the number that funds the AI tooling line item.

Now consider what happens when you swap individual inputs to match what telemetry shows. Each scenario below changes only the variables noted. Every other input remains at DORA's pre-filled default. The full URL with each scenario's parameters is verifiable in 30 seconds.

Test 1: J-curve realism

Run the J-curve scenario in DORA's calculator. Change 'J-Curve productivity drop timeline' from 3 months to 12 months. Nothing else.

First-year benefit: −$6,619,000
Return on investment: −36.2%
Payback period: 1.6 years

The calculator's most consequential input isn't deployment frequency or feature throughput. It's the duration of the J-curve, the recovery period during which delivery performance dips before stabilizing. A single change to that one assumption produces a $9.9M swing and flips ROI from positive to negative.

The J-curve framing, in its original form, does not promise automatic recovery. It describes a period of learning, adaptation, and complementary investment — reskilling, process redesign, infrastructure work — after which productivity returns and exceeds the prior baseline. The recovery is conditional on the investments, not on the passage of time.

Where the calculator's three-month default needs scrutiny is not in invoking the J-curve. It is in operationalizing J-curve duration as a time input, without asking whether the conditions for recovery are present. The model recovers throughput and quality on the far side of the input regardless of what the org has actually done to earn that recovery.

Faros's two-year window is informative here. Across that window, quality metrics worsened as AI adoption deepened, and they did not stabilize and recover. That pattern is consistent with two readings: a J-curve substantially longer than three months, or a regime where the complementary investments needed to drive recovery — at the authoring layer, the review layer, and the guardrail layer — are not being made at most organizations. The second reading is the one engineering leaders should sit with. It implies that 'wait it out' is not a strategy, and that the calculator's time-based recovery assumption is doing work the underlying framing never claimed to support.

Test 2: Steelman the throughput

A reasonable objection: DORA's default assumes only a modest throughput gain, from 50 to 56 features per year. Faros's data shows much larger gains in epic completion.

‍Run the throughput scenario in DORA's calculator. Apply the +66.2% finding directly. Push 'Target number of features deployed per year' from 50 to 83. Keep the 12-month J-curve.

First-year benefit: −$2,164,000
Return on investment: −11.8%
Payback period: 1.1 years

Even when you give DORA's calculator the most generous throughput assumption telemetry supports, the recovery time still dominates. The lesson isn't that throughput gains aren't real. They are. It's that quality cost over a realistic time horizon eats more of the benefit than the calculator's defaults suggest.

Test 3: Quality realism at the optimistic J-curve

Run the quality scenario in DORA's calculator. Keep the J-Curve duration at 3 months. Change only the 'Target change failure rate'. The calculator's default assumes CFR rises from 5% to 6% during the J-curve. Faros didn't find statistically significant movement on CFR itself, but did find incidents-to-PR up 242.7%. As a conservative proxy, model CFR tripling from 5% to 15%.

First-year benefit: +$1,265,000
Return on investment: +15.1%
Payback period: 0.9 years

ROI stays positive, but it's cut by more than half. Even at DORA's optimistic three-month recovery assumption, modeling realistic quality degradation reduces the financial case substantially. This is the scenario that says: even if you accept everything else DORA assumes, the quality cost alone is worth taking seriously.

Test 4: Telemetry-informed combined scenario

Run the telemetry-informed scenario in DORA's calculator. Set every adjustable input to match telemetry. Deployments down from 50 to 44 (Faros sees deployment frequency down 11.7%). Features up from 50 to 83. CFR up from 5% to 15%. J-curve duration at 12 months.

First-year benefit: −$3,460,000
Return on investment: −18.9%
Payback period: 1.2 years

The point of running this scenario isn't to argue AI doesn't pay back. It's that accepting a 39% first-year ROI sets expectations telemetry doesn't support, and the gap between expectation and outcome is where engineering leaders lose credibility with finance. A 1.2-year payback is something a CFO can plan around. A 0.7-year payback that turns into 1.6 because the J-curve input was set too short is something that erodes trust in every subsequent forecast. The slippage isn't in the math. It's in the inputs the user accepted without testing.

A $3.46M first-year loss on transformational technology isn't catastrophic. It's a realistic number. Companies routinely accept negative first-year returns on platform investments, infrastructure migrations, and major capability shifts. However, one caveat the DORA framing understates: for most organizations, year one is not ahead of you. Faros's dataset shows 80% of teams already past 50% AI adoption, and quality metrics have been degrading across the full two-year window. The $3.46M first-year loss is not a hypothetical investment cost; for many engineering organizations, it is a description of spend already absorbed, against regressions that have not yet recovered. The question is not whether to accept year-one losses. It is whether year two looks different, and what has to change for it to look better.

AI adoption in 2026 from the AI Engineering Report 2026 - 80% of teams exceed the 50% weekly active user threshold, up from 50% next year.

Summary of scenarios

Scenario	Inputs changed	First-year benefit	ROI	Payback
DORA default	None	+$3,281,000	+39.2%	0.7 yr
J-curve realism	J-curve 3 → 12 mo	−$6,619,000	−36.2%	1.6 yr
Steelman throughput	Above + features 50 → 83	−$2,164,000	−11.8%	1.1 yr
Quality realism	CFR 5% → 15%, J-curve at 3 mo	+$1,265,000	+15.1%	0.9 yr
Telemetry-informed combined	All four adjustments	−$3,460,000	−18.9%	1.2 yr

Summary of scenarios: how DORA's calculator output shifts when telemetry-informed inputs replace defaults

Of the inputs the calculator exposes, J-curve duration moves the answer the most. Throughput gains, deployment frequency direction, and even CFR move it less. Users deserve to know which inputs are load-bearing. The honest position is that no one knows yet how long the curve lasts, or whether it closes at all without intervention. So the practical move is to watch your own quality metrics against your adoption curve and treat the calculator's recovery assumption as a hypothesis you are testing, not a number you can trust.

A note on AI cost assumptions

The calculator currently bundles AI tooling costs at roughly $330 per user per year, combining license cost, additional usage cost, and infrastructure overhead. That figure is a defensible starting point for today. It's not a defensible static assumption for the next three years.

Token and inference costs are not stable. Agentic adoption, currently below 1% of pull requests in Faros's dataset, is rising, and reasoning-model calls draw 5 to 20× the tokens of simple completion. Every frontier vendor — Cursor, Copilot, Windsurf, Anthropic, OpenAI — has already moved toward consumption pricing. The calculator's static cost assumption is not where reality is heading. Pressure-test cost upward the same way you pressure-test J-curve duration. Modeling 3× current per-user cost over a three-year horizon is not aggressive given the trajectory the vendors are signaling — it is a baseline. The calculator should reflect what you will actually be funding, not the snapshot of what AI costs today.

A starter input pack

If you don't have telemetry on your own environment yet, the values below are reasonable defaults to begin with. They're drawn from Faros's two-year dataset across 22,000 developers and 4,000 teams. Use them as starting points for sensitivity testing, not as substitutes for measurement. The right inputs are the ones you can verify in your own systems.

Input	DORA default	Telemetry-informed starting point
J-curve duration	3 months	12 months minimum
Target features per year	+12% over baseline	Apply your own +66% if you accept Whiplash's epic completion finding
Target deployments per year	+12% over baseline	Flat or slightly down during J-curve window
Change failure rate (during J-curve)	+1 percentage point	3× baseline as a conservative incident proxy
AI cost per user per year	$330 static	Pressure-test upward. Static pricing is no longer the right default. Model 3× per-user cost over three years as a baseline.

A starter input pack: telemetry-informed values to replace DORA's defaults when you don't yet have your own data

If you don't know your numbers, start with these. Then go measure. The values that matter most are the ones grounded in your own engineering systems.

What both reports agree you should do

The most important convergence between DORA and Faros isn't in the numbers. It's in the recommendations. Despite different methods, both reports point to the same actions.

Track rework as a first-class metric. Both reports flag it; Faros quantifies code churn at +861% under high adoption. Track deployment frequency and lead time directly from CI/CD pipelines, not from work management systems. Run experiments on tooling and measure the deltas, both reports endorse this. Work in small batches, Faros documents pull request size up 51.3% and files per pull request up 59.7%, suggesting the small-batch principle is being violated systematically by AI tooling defaults. Invest in agent context, guardrails, and quality gates at the authoring layer, not the review layer.

And both reports are explicit on this point: do not rush to change headcount on the basis of first-year throughput numbers. The engineers absorbing the quality gap AI is creating are the ones you'll need most when the gap becomes visible.

Use the calculator with the right inputs

The DORA calculator is a useful tool. It does what it claims. The framework is sound, the math is clear, and the team behind it has built one of the more accessible and well-structured financial models available for AI tooling decisions — and they're transparent about its limits.

What it asks of you is real assumptions about your own environment. J-curve duration matters most. Deployment frequency direction matters next. AI cost trajectory matters more than the static defaults suggest. Telemetry can provide all three.

Two reports, one calculator, real numbers. The calculator works. Use it with inputs that match what your engineering systems actually show, and the conversation with your CFO becomes one about realistic timelines and durable returns, not optimistic forecasts and missed targets. That's a better conversation to have. It's also one Faros can help you prepare for. Talk to our team.

Faros is the system for running engineering with AI. Faros gives engineering leaders visibility into how work operates across code, people, and systems, and control over how that work progresses through enforceable workflows and policy.

Faros Research

Faros Research studies how engineering teams build, deliver, and improve. From annual reports to customer insights, our analysis helps enterprises understand what's working (and what's not) in AI-native software engineering.

Connect on linkedin