
The industry's most in-depth guide to measuring engineering productivity: what to track, how to collect data, and how to turn metrics into business impact at scale.

Engineering is an increasingly important and expensive function. According to BCG, the fastest-growing companies are spending more than 20% of their revenues on R&D and as much as 40% to 50% when trying to expand beyond their core products.
Engineering leaders are being asked to know their business thoroughly and explain what engineering is doing, how it relates to key company initiatives, and what resources they need and where. To say it simply, they must be able to conduct business-oriented conversations about very technical things.
Measuring engineering productivity is how leaders meet that expectation. This guide walks through what to measure, how to collect the data, how to make sense of it, and how to turn it into decisions that move the needle on business outcomes.
Engineering productivity is a measure of how efficiently an engineering organization delivers high-quality, functional software. Engineering productivity should be thought of as a multi-dimensional concept and not as a single number; speed of delivery is part of the equation, but so are quality, collaboration, developer experience, resource utilization, and whether the team is building the right things in the first place. Optimizing for any one of these in isolation tends to degrade the others, which is why experienced engineering leaders treat measuring engineering productivity as a balance of competing factors rather than a metric to maximize.
For a small shop with twenty engineers, leaders can usually answer questions about what's getting done by walking around. As an organization scales, that line of sight disappears. A VP of Engineering at a 500-person organization cannot directly observe how time is being spent across thirty teams, which projects are slipping, where the bottlenecks are, or how each sub-org is performing relative to the others.
Engineering productivity programs exist to restore that line of sight at scale. They give senior leaders the visibility they need to make resource and strategy decisions, and they give line managers the data to coach teams and remove blockers. When the program is working, productivity metrics inform quarterly planning, budget allocation, talent reviews, and board reporting — not just engineering's internal retros.
There is not a universal "correct way" to measure engineering productivity, because what and how you measure depends on your context: your engineering productivity program should be adapted to your context, including what you need to achieve, how you work, and what you value.
Your context will naturally change over time as you grow, evolve, and respond to market forces, and your program will evolve along with these changes. To create an engineering productivity program around this context, engineering organizations should follow these five steps:
In order to measure engineering productivity in this way, engineering leaders need visibility. At enterprise scale, steps 2 through 5 are typically run on a software engineering intelligence platform (SEIP): a category of tool that unifies engineering data across systems, normalizes it, and surfaces it as operational insight. As organizational complexity grows, the capabilities of your SEIP matter. Think: modularity, customizability, and extensibility needed to support enterprise realities.
The rest of this guide will help you identify the right path for you today and navigate the five main steps of a successful data-driven engineering productivity program.
The most useful framework for measuring engineering productivity in 2026 is SPACE. SPACE is a multi-dimensional model proposed by researchers at Microsoft, GitHub, and the University of Victoria, and it has become the dominant academic and industry reference for productivity measurement. SPACE stands for five dimensions: Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow.
SPACE is useful because it is holistic without being prescriptive. It combines system-generated telemetry with developer sentiment from surveys and interviews, and it explicitly recommends measuring multiple dimensions at once so that gains in one area aren't masking losses in another. Measuring multiple dimensions also makes metric gaming much harder.
If you're more familiar with DORA metrics — lead time for changes, deployment frequency, change failure rate, time to restore service, and rework rate — those are a subset of SPACE. DORA captures performance and a slice of efficiency, but it does not address satisfaction, activity, or collaboration. DORA is a strong starting point, not a complete framework.
The challenge with SPACE in practice is that it offers a wide menu of possible metrics in each dimension. Choosing the right ones requires a clear view of three things: your goals, your operating model, and your engineering culture.
To decide which engineering productivity metrics you should be measuring, first take a beat to identify what's important to you, how you define success, and what productivity looks like to you. In selecting what to measure, consider three elements:
The next three subsections walk through how each factor shapes which metrics to pick. Larger organizations will often have multiple internal groups facing different versions of these questions, in which case corporate strategy can guide where to focus first.
Below are typical goals and engineering productivity metrics based on the stage of your company. Metrics are additive as a company progresses through the stages.
With most modern enterprises working on a global scale, engineering teams can be heavily outsourced, geographically distributed, and remote/hybrid, and may have either a centralized SDLC or multiple SDLCs. Your company's specific operating model determines the additional lenses through which you'll want to analyze your engineering productivity metrics.
Corporate and engineering culture will also influence the smallest unit of measurement, whether the individual or the team level, and how the engineering productivity metrics are applied.
Once you've identified the common engineering productivity metrics associated with your goals, operating model, and culture, you'll then need to collect the relevant data.
Collecting engineering productivity data is the crucial first step to creating visibility, but doing so is more challenging than it seems. The single biggest obstacle to measuring engineering productivity at scale is data fragmentation. Engineering data lives in dozens of systems — Jira, GitHub, Jenkins, SonarQube, PagerDuty, Workday, Salesforce, Google Calendar, custom internal tools — and each tool tells a partial story. Cross-tool and cross-domain analysis usually means exporting data into spreadsheets and manually stitching it together, which is slow, brittle, and unscalable.
By using an enterprise-grade SEIP like Faros, your metrics will be generated from a rich and complex combination of data sources, some standard and some bespoke, covering SaaS products and homegrown solutions, org structure data from HR systems, developer experience surveys, cost data from business systems, and much, much more.
Faros recommends a field-proven strategy for collecting engineering productivity data incrementally, creating valuable insight into productivity at each stage.
A step-wise approach can deliver quick wins, build trust, and gradually develop the data-driven mindset you need. Layer by layer, you will assemble the complete picture of engineering productivity as you've envisioned it:
Step 1: Baseline. Set aside concerns about data quality and data hygiene. Normalization and validation address those (we'll talk about this more in the next step) and are not a barrier to collection. The first step involves baselining the current state in support of your first or primary use case.
Step 2: Blend. Developer surveys capture developers' perceptions of how their team delivers. They provide insights into points of friction in the software delivery process and more descriptive feedback about what can be improved at the team or organizational level. Developer surveys are key to tracking employee engagement and satisfaction with the developer experience over time.
With your intelligence platform in place, the detailed and highly contextual feedback from developers can be lined up against the data you've collected from engineering systems and processes. Powerful insights come from blending qualitative insights from surveys with telemetry about systems, processes, and workflows. You'll also discover the next set of high-priority data sources to connect to for deeper analysis.
Step 3: Expand. Don't lose sight of the KPIs that act as checks and balances within your initiative. According to the authors of the SPACE framework, "productivity cannot be reduced to a single dimension… Only by examining a constellation of metrics in tension can we understand and influence developer productivity."
That's why, at this stage, you'll want to expand to data sources related to quality and reliability to prevent over-focusing on velocity. You've likely identified several of these metrics earlier, and you're now ready to generate them now that the basics are in place.
Step 4: Align. The C-Suite expects the engineering department, like every other corporate function, to demonstrate its impact on corporate objectives. To that end, the next step is collecting business results data in support of quarterly and annual planning and OKR tracking.
Some of that information may be readily available in the task management systems you've connected, which will allow you to measure say/do ratios and on-time delivery of product roadmap.
That said, there is a lot more opportunity to illuminate costs and impacts by intersecting engineering data with business metrics pertaining to product usage, customer satisfaction, and financial performance.
The following table summarizes the data sources to connect at each stage of your program, with examples of specific tools.
During the data collection phase, the data will be stored in the Faros canonical schema, where it is normalized into a single, connected data set that can be queried efficiently across the entire organization.
Inevitably, your data sources will be a combination of vendor tools and homegrown sources, handled as follows:
Engineering productivity data is often examined based on reporting structure, with leaders tracking metrics for the sub-org and teams they manage. This essential information can be ingested from an HR source like Workday, which will update Faros when people join or leave the company or upon a major re-org.
Within Faros, the reporting structure is used for rollups and drill-downs because it represents how your teams, teams of teams, and groups are organized. It also enables comparisons, outlier identification, and team-tailored insights.
In addition to the formal reporting hierarchy, Faros infers the mapping between teams, apps, and repos. (While rare, if there is a source of truth for this data, it can be ingested directly instead.)
Furthermore, for any specific metric, your organization can choose whether to attribute it to a team based on:
Faros will auto-select the best attribution method based on experience and domain expertise, but as with any other metric behavior, this is configurable.
The Faros schema is designed to represent all the relevant SDLC data in a cohesive, interconnected manner. To use it, you simply need to plug in your data.
The standard schema will be sufficient in 99% of cases. In rare edge cases where a certain data set does not fit perfectly within the schema, the schema can be extended by leveraging tags, custom columns, and custom metrics. If your use case is of general interest, Faros will consider making it a first-class concept in the schema.
Every sizable engineering organization inevitably has different teams working in different ways. They have different workflows running on different tools and pipelines, use different custom fields and statuses, and release on different cadences.
To analyze all this data through a single lens requires normalization, and to trust the data requires confidence in its quality. But how do you trust the numbers if there's bad data hygiene?
To put it bluntly, every organization has data hygiene issues when it comes to human-curated data. That's the bad news. But here's the good news:
The guiding principle here is to start measuring to create high-level visibility and then let the teams identify and address pressing data hygiene issues once they see how they are skewing their metrics. Never ask teams to change how they manage projects or use Jira before they start actually generating metrics and insights. Instead, let Faros highlight the inconsistencies that teams should address.
Good data quality results from the organization's commitment to becoming data-driven. Normally, no one is incentivized to address these issues until they start impacting highly visible metrics. That is to say, once leaders start paying attention to metrics, the data hygiene issues will be fixed — but not before.
The road to improving data quality involves top-down and bottom-up motions:
If you treat data quality as a showstopper, you will never get visibility. And without visibility, you'll never address the data quality issues. You'll waste years trying to address data hygiene only to discover you were focused on fixing the wrong things. Here are some simple dos and don'ts:
In this section, we'll explain how Faros handles sophisticated normalization for common scenarios without requiring upfront standardization. This includes:
It is quite common for different teams or sub-orgs to use different tools. For example, one team might manage their tasks in Jira, another uses Asana, and a third uses GitLab. Another example is a company with multiple instances of the same tool.
Normalization is very simple in these cases. The Faros connectors normalize the data upon ingestion, automatically mapping corresponding data types to the right place in our canonical schema.
Another common scenario is projects within a single tool, like Jira, that use different workflows, as expressed by statuses. Faros automatically deals with status transitions and provides the desired breakdowns based on the level of analysis:
Every team in your organization might be using Jira, but they're using it very differently. Normalization is required to report effectively across this variance in tool usage.
The Faros approach is to be compatible with how people work today, especially at the very beginning of your program. To that end, the data normalization can be handled in a couple of ways:
At some point, if the maintenance of the queries or transforms becomes too complex and error-prone, Faros recommends introducing a few standard options. You don't force everyone to comply to the same behavior, rather to select one of a handful of approved ways of doing things. This should cover the majority of team preferences while keeping the in-tool configurations manageable.
Let's face it: good and bad are relative.
Consider one product under active development and another product that is in maintenance mode. While you may want to measure the same things for these teams — for example, throughput — they will have different definitions of good. "Good" is also relative to a baseline, and their starting points may be wildly different.
The Faros approach is to make it easy for every role to easily understand how teams are performing relative to contextual goals.
Note: Popular frameworks like DORA publish annual benchmarks, but the way the metric is defined might not be applicable to how you work. For example, deployment frequency measures how often you deploy code changes to production. If your organization has a major product release four times a year, strict adherence to that definition won't give you the insight you seek. In this example, Faros recommends measuring deployment frequency to your pre-prod environments.
As Faros begins to ingest and normalize data, it will identify gaps in data collection and mapping. Through troubleshooting and cleanup, you can address and fix these errors. In addition, charts can be tweaked, for example, to use median instead of average.
During this phase, you may also discover places where processes are not being followed internally and need to address the issue with the relevant teams.
Once your data has been validated, you've paved the way to the next stage of analyzing it.
Once data is flowing and normalized, dashboards and scorecards begin to populate. The next question is what to do with them. Analysis is where measurement either pays off — by revealing contributing factors and pointing to changes — or fails, by drowning leaders in dashboards no one acts on.
The most useful frame for analysis is role-based. Different roles in the organization care about different questions, at different levels of aggregation, with different cuts of the data. A senior engineering leader looking at organizational health needs a different view than a line manager debugging a sprint, who needs a different view than a TPM tracking a cross-functional initiative.
Engineering productivity data exists to serve two distinct purposes:
The table below maps engineering roles to their analysis dimensions and the questions they should answer first when a measurement program goes live.
The day-one goal for any role is the same: get a sense of overall health, confirm or correct gut feelings, and identify hotspots. Deeper analysis comes after.
Now that you've baselined your data, identified hotspots, bottlenecks, or areas of friction, and begun uncovering their contributing factors, it's time to validate and contextualize the information.
No one knows your business like your developers and managers, so any statistical finding should be validated by the people involved. Faros recommends the following continuous improvement approach with four stages: Measure, Understand, Decide, Act.
The most underestimated step in engineering productivity programs is operationalization. Tools don't change companies; the use of tools in recurring decision-making does. Implementing a software engineering intelligence platform is similar to implementing Salesforce. Salesforce doesn't increase sales by being installed; it increases sales because the Head of Sales reviews metrics weekly, salespeople keep their pipelines current, QBRs are run on Salesforce data, and decisions are visibly tied to the numbers.
Engineering needs the same discipline. Without it, a measurement program produces dashboards no one looks at and reports no one references in meetings. With it, engineering metrics start showing up in planning sessions, retros, talent reviews, and board reports, and decisions get faster and more confident as a result.
For most organizations, getting there requires change management: modifying existing meeting protocols and practices to include integrated data. Four guiding principles support that transition.
World-class engineering organizations, from scaling startups to mega-enterprises, run on five operational pillars: Productivity, Delivery, Outcomes, Budgets, and Talent. Each pillar is supported by recurring meetings and decision processes, and each benefits from being fueled by integrated data instead of partial spreadsheets and gut feel.
Most organizations launch their measurement programs with the productivity pillar — that's where engineering's internal pain usually lives — and then expand outward to the other four. The remaining sections walk through the cadences and recommended metrics for each pillar.
Platform Engineering, Developer Experience, and Architecture teams run continuous initiatives to modernize technology, optimize workflows, and remove friction. Monthly operational reviews track key metrics, address challenges, and align on priorities. Project reviews track foundational transformations like migrations, modernization efforts, compliance initiatives, and tooling rollouts.
Delivery in most organizations runs on agile, with work segmented into sprints or development iterations. Stand-ups, sprint planning, and retros keep projects on track and quality high. Monthly product and tech reviews provide a structured forum for assessing progress, surfacing issues, and aligning priorities.
Outcomes are managed on a quarterly rhythm — setting, reviewing, and adjusting OKRs. Regular check-ins keep teams aligned to strategic goals, and cross-functional QBRs evaluate the quarter's performance. The outcomes pillar is where engineering data has the most direct impact on the C-suite conversation, because it's where engineering metrics get tied to business results.
Engineering budget planning runs on an annual cycle, with quarterly reviews and adjustments. The process forecasts financial needs, allocates resources, and sets goals for the upcoming year. Vendor and global sourcing reviews happen at their own cadences, often tied to contract renewal cycles. Periodic accounting events, such as capitalization reviews, also fall in this pillar.
Talent reviews and performance evaluations typically happen twice a year. Compensation reviews run in March or April for most organizations. Talent decisions benefit from objective data more than almost any other engineering process, because objective data reduces subjectivity, accelerates preparation, and grounds feedback in evidence rather than impression.
A note on individual-level metrics: whether to use them at all is a cultural question, and not every organization should. Where individual metrics are used, they should always be looked at in cohort context — comparing engineers of similar role, seniority, and tenure rather than across the whole organization — and they should be one input among several, not the basis for ranking decisions.
Engineering productivity programs that follow this sequence — measure, collect, normalize, analyze, operationalize — produce visibility that informs real decisions. They give engineering leaders the data to talk about their organizations in the same business terms as the rest of the C-suite, and they give teams the feedback loops to improve continuously. The right approach is the one that fits the organization's goals, operating model, and culture; the wrong approach is to wait for perfect conditions before starting at all.
For enterprise organizations, Faros is the only SEIP built to handle real-world complexity. Learn more about how we can help you improve engineering productivity at scale.
Measuring engineering productivity is a five-step discipline. Each step has a guiding principle and a concrete set of next actions.
DORA is a five-metric subset of SPACE focused specifically on software delivery performance. The DORA metrics are lead time for changes, deployment frequency, change failure rate, time to restore service, and rework rate (added in 2024 to measure unplanned deployments addressing user-facing bugs). SPACE is the broader framework, covering five dimensions: Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow. Use DORA as a starting point for measuring delivery; use SPACE for measuring engineering productivity overall.
That depends on engineering culture, but we recommend sticking to the team level as the smallest unit of measure. Organizations with compete cultures and stack ranking are usually comfortable with individual metrics. Organizations with collective-ownership cultures should stay at the team level. Where individual metrics are used, they should compare engineers within similar cohorts by role, seniority, and tenure, and they should never be the sole input into compensation or performance decisions. Most enterprise engineering organizations get more value from team-level measurement than from individual-level measurement.
Initial baseline visibility (task data, PR data, and org structure connected and producing dashboards) usually takes weeks, not months. Reaching a comprehensive program that covers all five operational pillars (productivity, delivery, outcomes, budgets, talent) is typically a multi-quarter journey, with each stage producing its own value before the next begins.
No, and waiting for clean data is the most common reason programs never launch. Machine-generated data from PRs, builds, and deployments is mostly clean by default. Coarse, high-level metrics are robust to hygiene issues thanks to the law of large numbers. Once leaders start paying attention to metrics, teams have a reason to clean up the underlying data, and they will. Visibility produces data quality, not the other way around.
Two tools handle data overload in engineering productivity measurement: visualization and AI. For visualization, use simplified scorecards that consolidate the top 5–10 KPIs the organization cares about. This lets senior leaders see which areas are healthy and which require attention, with drill-down available when needed. For AI, use statistical analysis and machine learning to identify problem areas in specific sub-orgs, repos, or stages of the SDLC, and surface team-tailored insights about what's helping or hurting performance. Both reduce the time between identifying an issue and acting on it.
Industry benchmarks let an organization see itself in context, answering three questions that are otherwise hard to answer in isolation: where to start, what to aim for, and how to justify investment in incremental improvement. Most popular benchmarks come from research that ties high engineering performance to better financial performance, which gives them credibility in business conversations. Common benchmarks include the DORA 5 metrics, cycle times, velocity, say/do ratios, planned vs. unplanned work, AI coding assistant impact, and staffing ratios. They are most useful for surfacing the performance areas where the gap between current state and industry norm is widest.
Startups should focus on lead time, cycle times, throughput, deployment frequency, percent delivered vs. committed, and bottlenecks. These are the metrics that surface friction in shipping new features. As the company grows, additional dimensions get added: production stability and code quality during the growth stage; on-time roadmap delivery and SLO compliance during scale-up; cost, individual and team performance, and skill composition at maturity. The metrics are additive: a mature company tracks everything a startup tracks, plus more.
The metrics are similar to those for insourced teams (velocity, throughput, lead time, cycle time, quality), but they should be sliced by contract type and vendor. The most useful additional metrics are productivity per dollar spent, activity per dollar spent, time spent vs. target hours, and quality of delivery (bugs per task). Tracking institutional knowledge capture is also important to prevent vendor lock-in.
Yes, but the standard definition needs adjustment. Strict DORA defines deployment frequency as deploys to production, which is meaningless for an organization that ships to production four times a year. The right adaptation is to measure deployment frequency to pre-production environments such as staging, QA, and integration. Same metric, calibrated to the actual SDLC.
Three practices reduce gaming. First, measure multiple dimensions in tension (velocity and quality, throughput and stability) so that gaming one metric shows up as degradation in another. Second, don't tie individual compensation directly to engineering metrics; the moment a metric becomes a target, it stops being a measurement. Third, focus on team-level metrics where possible, since team norms tend to self-correct against gaming behavior that an individual incentive structure would reward.
Leading indicators predict future performance: PR review time, build reliability, developer satisfaction, planned vs. unplanned work. Lagging indicators report past performance: on-time delivery, customer-reported defects, mean time to restore. A balanced engineering productivity program tracks both. Leading indicators tell leaders where to act; lagging indicators tell them whether the actions worked.
AI coding assistants don't change the core engineering productivity metrics, but they do change benchmarks for some of them and add new metrics specific to AI assistant usage. The underlying productivity question, whether the organization is delivering high-quality, functional software efficiently, stays the same. New AI-specific metrics worth tracking include adoption, code acceptance rate, and downstream quality of AI-assisted code. The risk to watch for is over-rotating on activity metrics like lines of code or PRs opened, which AI inflates without corresponding increases in delivered value.
In most organizations, the program is owned by an engineering productivity, developer experience, or platform engineering team, with executive sponsorship from a VP of Engineering or CTO. The owning team needs both technical understanding (to work with the data and tooling) and organizational standing (to drive change management across teams). At least one dedicated data analyst, deeply familiar with the business and the engineering organization, is recommended.
The most common mistake is picking a single metric (usually velocity, story points, or lines of code) and trying to maximize it. This invariably degrades quality, satisfaction, and collaboration in ways that the chosen metric doesn't capture. The next most common mistake is delaying the program until data quality is "ready," which it never will be. Both mistakes share a root cause: treating engineering productivity as a number to optimize rather than a multi-dimensional reality to understand.




AI coding tool prices are climbing. Learn how to build a defensible ROI calculation at the tool and team level — and justify spend to your execs.