Want to learn more about Faros AI?

Fill out this form to speak to a product expert.

I'm interested in...
Loading calendar...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.
Submitting...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.

How GitHub Copilot Fixes Flaky Tests in CI

A step-by-step example of GitHub Copilot fixing a flaky test: analyze logs, propose a PR, validate the solution.

Yandry Perez Clemente
Yandry Perez Clemente
Text written: How GitHub Copilot fixes flaky tests in CI, above a computer with an alert symbol, an arrow with the GitHub Copilot logo, and a computer with a green success symbol, on a gradient blue background
5
min read
Browse Chapters
Share
July 16, 2025

I recently hit one of the most frustrating problems in software development: a flaky test. Flaky tests break trust in continuous integration (CI) pipelines and slow down developers. Instead of debugging it myself, I asked GitHub Copilot to fix it. 

How can GitHub Copilot fix a flaky test?

GitHub Copilot can fix flaky tests because it has access to the codebase, CI logs, and failed runs. All you need to do is direct it to the failure.

Steps Copilot took:

  1. Analyzed the CI logs → identified the race condition causing the flakiness
  2. Proposed a pull request with the fix
  3. Validated the fix → I ran the test 100 times with Copilot’s fix (100/100 passed) vs. without it (~23/100 passed)

The flaky test hasn’t reappeared since merging the fix.

Why use Copilot for flaky tests?

  • Saves developers time by skipping manual debugging
  • Provides reproducible validation (stress-testing the fix)
  • Improves CI reliability and developer confidence

This example shows how GitHub Copilot can diagnose and repair flaky tests automatically, turning a frustrating CI failure into a quick success. Watch the video below for a walkthrough.

More details in my video below: 

<iframe width="445" height="791" src="https://www.youtube.com/embed/inYn4Os9zMU" title="How GitHub Copilot (Agent) Helped Me Fix Flaky Tests &amp; Unreliable CI - Experience Report | Faros AI" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

Full Transcript: Using GitHub Copilot to fix flaky tests

“Today I want to tell you about a pretty nice success story that I had with GitHub Copilot. 

I merged some code the other day, and after a while, I got an email from the continuous integration saying that one of the tests had failed. 

When I looked into that test failure, I realized that the test that was failing was completely unrelated to the change that I had made. So this seemed to indicate that this test was flaky.

So I just figured, hey, since GitHub Copilot should have access to the logs in this continuous integration run and the code itself, maybe I just put the link to the failed action here and I just simply said, hey, investigate this possibly flaky test. And I just went on to do whatever I was doing that day.

I came back and to my very positive surprise, GitHub Copilot had identified the root cause of the flakiness and had proposed a fix. So I told it to run the flaky test 100 times. So it did three validation scenarios and then run each 100 times, getting a 100% success rate. That was very promising.

Just to be super sure, I then told GitHub Copilot to run the flaky test without the fix to get the success rate before the fix. So it did the same thing, it ran the test 100 times and it got a success rate of 23%. As you know, this is very bad for developer happiness—when you're trying to merge your code and have to retry and retry and retry.

I took a look at the fix and indeed it had to do with how to handle the fake timers and the real timers in the unit test framework that we use, which is kind of not trivial to fix. 

So I was very pleased that Copilot, without any back and forth, was able to fix my problem and we never heard about this flaky test since.”

Ending flaky test frustration with GitHub Copilot

Flaky tests used to mean lost hours, broken momentum, and eroding trust in your CI pipeline; but with GitHub Copilot or similar AI coding tools, flaky tests become just another problem AI can tackle—quickly and reliably—to keep developers moving forward. 

For a deeper dive into the hidden costs of flaky tests and why it’s worth investing in fixing them, my colleague at Faros AI, Ron Meldiner, wrote a must-read article on the topic.  

If you’re interested in broader perspectives on AI in software development, I also publish my thoughts on AI and share hands-on experiences with AI coding tools frequently. Follow me on LinkedIn for more tips on using AI coding agents.

Yandry Perez Clemente

Yandry Perez Clemente

Yandry Perez is a senior software engineer at Faros AI.

Connect
AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
AI Productivity Paradox Report 2025
Discover the Engineering Productivity Handbook
How to build a high-impact program that drives real results.

What to measure and why it matters.

And the 5 critical practices that turn data into impact.
The cover of The Engineering Productivity Handbook on a turquoise background
Want to learn more about Faros AI?

Fill out this form and an expert will reach out to schedule time to talk.

Loading calendar...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.

More articles for you

Editor's Pick
AI
News
7
MIN READ

Translating AI-powered Developer Velocity into Business Outcomes that Matter

Discover the three systemic barriers that undermine AI coding assistant impact and learn how top-performing enterprises are overcoming them.
August 6, 2025
Editor's Pick
News
AI
DevProd
4
MIN READ

Faros AI Hubble Release: Measure, Unblock, and Accelerate AI Engineering Impact

Explore the Faros AI Hubble release, featuring GAINS™, documentation insights, and a 100x faster event processing engine, built to turn AI engineering potential into measurable outcomes.
July 31, 2025
Editor's Pick
AI
DevProd
5
MIN READ

Lab vs. Reality: What METR's Study Can’t Tell You About AI Productivity in the Wild

METR's study found AI tooling slowed developers down. We found something more consequential: Developers are completing a lot more tasks with AI, but organizations aren't delivering any faster.
July 28, 2025