Does measuring software engineering performance actually deliver value?
The concept of measuring the performance of software development teams is nothing new, but it recently returned to the public consciousness with a little controversy, thanks to a McKinsey article. Guest Author, Jason English shares his perspective on why everyone hasn't already jumped on the measurement bandwagon?
Jason English, Intellyx (Guest)
November 3, 2023
Every enterprise in the world wants to maximize performance: delivering for customers better, faster, and cheaper than the competition.
Further, software company executives love to repeat the mantra that “every company is a software company” as often as possible.
Therefore, it stands to reason that management consulting firms would seek to apply their MBA statistical models to maximize performance of the software-producing function of any enterprise.
The concept of measuring the performance of software development teams is nothing new, but it recently returned to the public consciousness with a little controversy thanks to this recent McKinsey piece titled: “Yes, you can measure software developer productivity.”
Implement their methodology, the article says, and developers could realize a 20-to-30 percent reduction in customer-reported defects, a 20 percent improvement in employee experience scores, and a 60 percent improvement in customer satisfaction.
Sounds incredible! With results like that, why hasn’t everyone already jumped on their proposed measurement bandwagon?
Why measure developer productivity?
Compared to other process-oriented industries, the software industry has been rather undisciplined in its approach to measuring results. An ineffable ‘tiger team’ mentality arose, where we expected one genius developer or an expert team to lock themselves in the office with a couple pizzas and some Jolt Cola, and hammer out brilliant code.
This ‘code cowboy’ mentality predictably led to failure and heartbreak, as two-thirds of software projects consistently failed to meet budgets and timelines.
CEOs and CFOs were constantly frustrated by a lack of accountability. They wanted engineering orgs to take a page from the discipline of industrial supply chain optimization, so software development could realize the benefits of KPI measurements, Kanban-style workflows, and process automation that built everything else in our modern economy.
The DevOps movement evolved from Agile methodologies around 2008, and engineering organizations started looking at software delivery through a continuous improvement lens. We learned to empower dev teams to collaborate with empathy while ‘measuring what matters’ and ‘automating everything’ toward delivering customer value.
The release of The Phoenix Project book articulated the connection between DevOps and supply chain optimization, highlighting the Three Ways: flow/systems thinking, feedback loops, and a culture of continuous improvement reminiscent of the best-running Toyota car factories in Japan.
In an industrial supply chain scenario, planners could look for signals like supplier availability, work-in-process, and inventory turns as performance indicators. By comparison, software development deals with much less substantial signals — bits and bytes moving over the internet: the intellectual assets of ideas, requirements, and data.
If we are to achieve a new wave of industrialization in the software industry, clearly coming to grips with the data that feeds the software supply chain is our first priority.
Where measurements meet incentives
The McKinsey model was built atop two currently popular frameworks: DORA (DevOps Research and Assessment) metrics, popularized by Google and many other companies invested in the DevOps movement; and SPACE metrics (satisfaction, performance, activity, communication and collaboration, and efficiency) added by GitHub and Microsoft.
On top of that, they added a set of new ‘opportunity focused’ metrics: Developer velocity benchmarks, contribution analysis, talent capability score, and inner/outer loop time spent.
Interestingly, their “inner/outer loop” metric uniquely prioritizes time spent on the “inner loop” building (coding and testing) software, instead of the “outer loop” time spent on integration, integration testing, releasing, and deployment.
But what if that outer loop is a vitally important part of certain roles in the engineering org? To avoid technical debt, we need architects focused on system design, and SREs capable of tracking down root causes of issues in deployment.
This wonderfully vitriolic blog response in The Pragmatic Engineer with Kent Beck and Gergely Orosz responds with a perfect example of how a measurement initiative that started with decent results eventually strayed:
“At Facebook we [Kent here] instituted the sorts of surveys McKinsey recommends. That was good for about a year. The surveys provided valuable feedback about the current state of developer sentiment.
Then folks decided that they wanted to make the survey results more legible so they could track trends over time. They computed an overall score from the survey. Very reasonable thing to do. That was good for another year. A 4.5 became a 4. What happened?
Then those scores started cropping up in performance reviews, just as a "and they are doing such a good job that their score is 4.5". That was good for another year.
Then those scores started getting rolled up. A manager’s score was the average of their reports’ scores. A director's score would be the average of their reporting managers’ scores.
Now things started getting unhinged. Directors put pressure on managers for better scores. Managers started negotiating with individual contributors for better survey scores. “Give me a 5 & I’ll make sure you get an ‘exceeds expectations’.” Directors started cutting managers & teams with poor scores, whether those cuts made organizational sense or not.”
Whoa. How orgs act upon development metrics is as important as the measurements themselves. Nobody wants to see performance improvement goals create a zero-sum game that disheartens valued technical talent.
On the positive side, McKinsey’s article can only spur more thought and discussion among the development community toward how engineering orgs can deliver more predictable metrics, like the ones CEOs and CFOs expect to see from other groups like sales and customer services.
Developer enablement metrics for success at Autodesk
You already know Autodesk—if you’ve ever seen a really cool modern building, or a hyper-realistic 3D animated film, chances are, their software was used by professionals to help design or create it.
Autodesk supports an suite of highly refined and specialized CAD and design tools, but as they started migrating to a common cloud-and-microservices-based architecture to improve scalability and automate deployment infrastructure, delivery time became unpredictable, with teams stymied by environment availability and service interdependencies.
“If ten teams are doing well and only one team is doing poorly, you are only as good as your weakest link,” said Ben Cochran, VP of the newly formed Developer Enablement team, reporting directly to the CTO.
With an eye to improving developer experience and morale across their system, rather than at an individual level, the team adopted DORA metrics, including deployment frequency, mean time to recovery (MTTR), lead time, and change failure rate (CFR) as Autodesk's foundation for productivity measurement.
The output velocity and business outcomes of their software team were improved, but in the macro view, creating an environment of collaboration and shared learning that removes roadblocks, rather than taking punitive measures based on measurements, made all the difference.
The Intellyx Take
For engineers, too much emphasis on monitoring and metrics can feel like Big Brother is looking over your shoulder, inhibiting creative problem solving. Conversely, a lack of measurement also means that problems aren’t getting reliably solved.
Poor development performance metrics overlook the constant competitive imperative for achieving more productivity with fewer resources, and can eventually result in layoffs or draconian performance measures being put in place.
Success at measurement depends on a balancing act between innovation and efficiency, while aligning team members with high-value business outcomes and eliminating administrative toil from the development process.
Even if there’s healthy disagreement about the details of McKinsey’s developer performance model, it’s useful to get everyone talking about how to mature the discipline of software development.
Said Vitaly Gordon, CEO of Faros.ai in a recent blog: “McKinsey speaks the language of the C-Suite well. If they can get executives to commit time and effort to removing friction from the engineering experience based on what the data is telling us, I am all for it.”
Image source: Mike G., Flickr CC2.0 license.
©2023 Intellyx LLC. Intellyx retains editorial control of this document. At the time of writing, Faros.ai is an Intellyx client. No AI was used in the writing of this story. Image source: Mike G., Flickr CC2.0 license.
More articles for you
See what Faros AI can do for you!
Global enterprises trust Faros AI to accelerate their engineering operations.
Give us 30 minutes of your time and see it for yourself.