Fill out this form to speak to a product expert.
A step-by-step example of GitHub Copilot fixing a flaky test: analyze logs, propose a PR, validate the solution.
I recently hit one of the most frustrating problems in software development: a flaky test. Flaky tests break trust in continuous integration (CI) pipelines and slow down developers. Instead of debugging it myself, I asked GitHub Copilot to fix it.
GitHub Copilot can fix flaky tests because it has access to the codebase, CI logs, and failed runs. All you need to do is direct it to the failure.
Steps Copilot took:
The flaky test hasn’t reappeared since merging the fix.
This example shows how GitHub Copilot can diagnose and repair flaky tests automatically, turning a frustrating CI failure into a quick success. Watch the video below for a walkthrough.
More details in my video below:
<iframe width="445" height="791" src="https://www.youtube.com/embed/inYn4Os9zMU" title="How GitHub Copilot (Agent) Helped Me Fix Flaky Tests & Unreliable CI - Experience Report | Faros AI" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
“Today I want to tell you about a pretty nice success story that I had with GitHub Copilot.
I merged some code the other day, and after a while, I got an email from the continuous integration saying that one of the tests had failed.
When I looked into that test failure, I realized that the test that was failing was completely unrelated to the change that I had made. So this seemed to indicate that this test was flaky.
So I just figured, hey, since GitHub Copilot should have access to the logs in this continuous integration run and the code itself, maybe I just put the link to the failed action here and I just simply said, hey, investigate this possibly flaky test. And I just went on to do whatever I was doing that day.
I came back and to my very positive surprise, GitHub Copilot had identified the root cause of the flakiness and had proposed a fix. So I told it to run the flaky test 100 times. So it did three validation scenarios and then run each 100 times, getting a 100% success rate. That was very promising.
Just to be super sure, I then told GitHub Copilot to run the flaky test without the fix to get the success rate before the fix. So it did the same thing, it ran the test 100 times and it got a success rate of 23%. As you know, this is very bad for developer happiness—when you're trying to merge your code and have to retry and retry and retry.
I took a look at the fix and indeed it had to do with how to handle the fake timers and the real timers in the unit test framework that we use, which is kind of not trivial to fix.
So I was very pleased that Copilot, without any back and forth, was able to fix my problem and we never heard about this flaky test since.”
Flaky tests used to mean lost hours, broken momentum, and eroding trust in your CI pipeline; but with GitHub Copilot or similar AI coding tools, flaky tests become just another problem AI can tackle—quickly and reliably—to keep developers moving forward.
For a deeper dive into the hidden costs of flaky tests and why it’s worth investing in fixing them, my colleague at Faros AI, Ron Meldiner, wrote a must-read article on the topic.
If you’re interested in broader perspectives on AI in software development, I also publish my thoughts on AI and share hands-on experiences with AI coding tools frequently. Follow me on LinkedIn for more tips on using AI coding agents.