Fast and Furious: Attempt to Merge
A guide to measuring CI Speed and CI Reliability and an introduction to the most important developer productivity metric you never knew existed.
February 7, 2024
It Ain’t Over Till It’s Over
In a previous blog post, we talked about the intricacies of measuring Build Time, an inner loop developer productivity metric. Keeping Build Time low is a constant battle, aimed at providing the developer with rapid feedback while they are iterating on their code changes.
But no dev task is complete until those code changes are successfully merged into the main branch, triggered by a Pull Request, in a process known as Continuous Integration (CI).
CI is often the last step in the process and ideally is a set-and-forget type of activity. Mentally, the engineer is ready to wrap up this task and move on to the next one. When CI breaks unexpectedly, it adds significant friction and frustration to the developer’s experience.
So, if CI is a critical factor impacting developer productivity, how do you measure it and what does good look like?
Let’s rev up and find out.
Taking an Outcome-Centric Approach to CI
The goal of the CI process is to act as a safety net after the developer has run local validations. It extensively tests the code to catch errors and bugs that could destabilize production systems and the customer experience.
While it’s understood that CI will take longer than local builds and is a considerably more expensive operation, it is still required to run quickly and smoothly to ensure process efficiency, i.e. that the engineer’s time is used effectively and efficiently.
Therefore, there are two dimensions to CI metrics: CI Speed and CI Reliability.
While there’s no hard number, ‘good’ CI Speed can be defined as a run time that provides success or failure feedback to the developer while they are close to the code changes. The context and details are still fresh in their minds, and they have not switched yet to a new task.
If CI takes too long, developers are either stuck waiting (which is wasteful) or have already moved on to something else — increasing the “context-switching tax” (the cognitive and performance cost incurred when shifting focus from one task to another).
Also, the longer it takes, the likelihood increases of having to deal with merge conflicts and/or breakages caused by divergence from the main branch, which would only be detected post-merge.
CI Speed is calculated as the time between triggering all required checks to the time they complete their execution and the engineer receives an approval or denial to merge.
CI Reliability means that if CI fails, it should only be due to legitimate errors, introduced by the code changes tested. It should not fail due to preventable and unrelated — and thus unacceptable — infrastructure issues.
CI infra failures like running out of disk space and bad images or scripts waste a lot of time. Both the engineer and the infra team get sucked into trying to resolve the issue at the expense of other important and strategic work.
Typically, an engineering org has far fewer infra engineers than product engineers. So you are likely never to have enough infra team members to support a high frequency of failures. If you do the math, you’ll find that CI Reliability, where we exclude valid errors, needs to be at least 99.9%.
Here’s the calculation:
Let's say you have an engineering organization of 500 engineers. If each engineer submits an average of three new PRs per workweek, that means a total of 1,500 new PRs every week, or 300 new PRs per workday.
Now, imagine the company’s CI system has 99% reliability. That means that 3 PRs fail due to infrastructure stability issues every day (1% of the 300 daily PRs).
Beyond the frustration and productivity hit to the PR author, each of these failures will require the help of an infra engineer to troubleshoot. This has the potential to keep three members of the infra team busy for the day, every day, leaving them no bandwidth to focus on anything else that could enhance the productivity and efficiency of their organization.
It would be much better if CI were to fail up to once or twice a week (99.9% reliability) or even better — less than once a month (99.99% reliability).
Hence, every organization should want the CI process to be effective at catching valid errors and clean of invalid infra errors. So, how do you get there?
Three Metrics to Measure CI Reliability
Like every productivity metric, you often start by measuring what is easy and quick, so at least you directionally know where you stand and where you should be focusing your investigation and optimization efforts.
For CI Reliability, this typically involves three steps:
- Baselining your current state with Merge Success Rate.
- Understanding why CI is failing with CI Failure Rate by Type.
- Understanding CI's perceived reliability with Attempts to Merge.
Let’s break it down.
#1 Merge Success Rate
Measuring Merge Success Rate is an easy place to begin baselining your CI process: How often does a CI run complete without failing?
As defined by Semaphore, “The CI success rate is the number of successful CI runs divided by the total number of runs. A low success rate indicates that the CI/CD process is brittle, needs more maintenance, or that developers are merging untested code too often.”
If the success rate is lower than your target, typically 90%, it’s an indication that the process requires some attention.
Ideally, to start focusing your investigation, you’d want to be able to analyze the success rate by repository, team, and technical criteria like runtime environment, platform, and language.
#2 CI Failure Rate by Type
The next step is understanding why CI fails — are these legitimate failures or unacceptable infra failures? Analyzing CI Failure Rate by Type is a telemetry-based metric that can answer that question. But it requires some instrumentation.
There are different approaches to classifying CI errors. Some, like LinkedIn, classify every step of the CI pipeline. Cloning a repo or publishing the artifacts are infra steps while compiling the source or running the tests are mostly on the product teams.
Another approach is to use error logs keywords/regexes to classify the errors, e.g., failures that mention “git” or “disk space” are typically infra failures.
This type of instrumentation takes time and effort, so you might be wondering if there is a shortcut to get a quick read on whether the reliability problems stem from infra or products.
The short answer is there is.
#3 Attempts to Merge
When CI fails, the knee-jerk reaction is to rerun it. This reaction often stems from distrust of a flaky CI system. The more a developer encounters infra failures when they run CI, the more prone they’ll be to just simply try their luck and run it again.
Suppose you could measure the number of times a developer triggers the CI process on the same code, without making any changes. You would see how often engineers repeatedly attempt their CI jobs, assuming a failure is not due to their code changes or tests but rather due to infrastructure or test flakiness. That would tell you how your CI process is perceived.
If the average Attempts to Merge (ATM) for identical code is greater than a certain threshold (1.1 is a good value to target), it’s a good indication that your developers believe many of the errors stem from infra. And you should start your optimizations there ASAP.
ATM gives you a faster read on perceived reliability than waiting till you meticulously classify all your CI errors by failure type.
Furthermore, not only is ATM a shortcut, but we’d argue that it’s the best KTLO (keeping the lights on) metric you’ve never heard of.
How so? ATM allows you to associate the Merge Success Rate with the developer’s experience with the system. It tells you something about user behavior and their satisfaction. If it spikes, you must pay attention.
ATM is notably a compound metric, in that it provides insight into two dimensions of the SPACE framework: Performance and Efficiency and Flow.
- Performance: ATM measures the outcome of a system-level process, namely CI.
- Efficiency and Flow: ATM measures whether the developer —and the infra engineer — can do their work with minimal delays or interruptions.
It’s the type of sophisticated metric we’ve come to measure for our customer-facing products but rarely leverage for internal platforms and services.
This article introduced a comprehensive approach to measuring the Continuous Integration (CI) process, emphasizing its importance as a critical factor impacting developer productivity.
CI is not just about speed but also about reliability, ensuring that failures are due to legitimate code issues rather than preventable infrastructure problems.
A combination of speed and reliability metrics like CI Speed, Merge Success Rate, CI Failure Rate by Type, and Attempts to Merge help assess and monitor CI health and identify areas for improvement. They are key to optimizing developer efficiency and minimizing disruptions, which ultimately contributes to a more productive development environment.
Want to get started with CI speed and reliability metrics? Chat with the Faros AI team about how we can help.
More articles for you
See what Faros AI can do for you!
Global enterprises trust Faros AI to accelerate their engineering operations.
Give us 30 minutes of your time and see it for yourself.