Asjad Khan

•Oct 30, 2025•

What Is a Flaky Test in Software Testing, and How to Fix It

What is a flaky test? Learn why flaky tests cause false failures, waste CI/CD time, and how to fix them to keep your test suite reliable.

Few things frustrate QA engineers more than a test you were confident about suddenly failing for no clear reason. You rerun it, and it passes. Run the same test suite later in the day, and it fails again. Nothing in the code or environment has changed, but the results don’t match.

That’s a flaky test summed up: it slows down reviews, blocks pull requests, and fades confidence in your CI/CD process. Teams end up rerunning pipelines in loops, chasing failures that don’t point to real bugs. Meanwhile, feature work slows down.

As a QA engineer, you've certainly felt the impact of flaky tests. In this article, we'll discuss what causes flaky tests and guide you through how you can resolve them when you encounter them.

What Is a Flaky Test?

A flaky test is an automated test that produces inconsistent outcomes. In one run it’s green, and the next it’s red, without any changes to the code or test environment.

The major problem here is trust. A failing might expose a real bug, or it might just be a false alarm. Over time, due to this uncertainty, teams stop relying on their results.

For example, consider a login flow that is thoroughly tested end-to-end. Even though the code and environment remain unchanged, it still produces inconsistent results when executed. The issue isn’t the login feature, but a test order dependency: the test behaves differently when it runs after another, leaving shared test data behind. Instead of catching real defects, flaky tests create confusion and slow down the delivery process.

Knowing about flaky tests is just one part of the process. To understand why they become such a critical issue, you also need to consider the impact they have across the engineering team and the business as a whole.

Why Flakiness Is Costly for Your Business

Flaky tests affect the entire business. A test that fails randomly might appear to be a minor glitch, but on a larger scale, it slows down releases, inflates CI/CD costs, and forces teams into manual work. What starts as a small spike in the test suite quickly turns into lost time, higher costs, and reduced delivery speed.

Here's how:

1. Delayed Pull Requests

When a flaky test blocks a pull request, developers either rerun the jobs until they pass or request manual overrides. Both options slow down the review process and add friction to the development workflow.

2. Higher CI/CD Costs

Each rerun consumes extra computing resources. Even a handful of flaky tests can force entire pipelines to rerun, consuming more CPU, memory, and build time than planned. Over time, these wasted cycles accumulate, slowing pipelines and increasing CI/CD costs without improving software quality.

3. Loss of Trust Amongst Teams

When test failures don’t match expectations, developers stop trusting the results. Once that happens, they rely less on automation and more on manual checks, rerunning builds, debugging locally, or verifying features by hand. This slows down the development process and drains engineering time that should be spent on building a new functionality. Over time, the test suite shifts from being a source of confidence to a source of friction.

4. Impact on Customers

Flakiness increases the risk of pushing defective code into production, as we discussed. When trust among the teams erodes, developers may be more likely to overlook actual issues within the CI pipeline, which can lead to bugs that directly impact users. Such production-level defects can lead to customer dissatisfaction, revenue loss, and damage to your brand image.

5. Why Flakiness Hurts More Than You Think

Flaky tests look small in isolation, but when it comes to numbers, they can be huge:

A team running 10,000 automated tests daily with a 5% flakiness rate sees 500 false failures every day.
If each rerun wastes even five minutes of engineer time, that’s 40+ hours lost per week, an entire engineer’s week, spent on this loop, bringing productivity to an all-time low.

Once you observe the real cost of flakiness, the next question becomes clear: What causes the flaky tests to occur, and if they're occurring, how can you solve them?

Main Causes of Flaky Tests

Flaky tests don’t come from a single source. They can slip in through timing issues, the way tests are written, or even the test environment itself. Understanding these root causes is the first step toward reducing them.

Asynchronous Behavior

Sometimes the application under test doesn’t respond at the same speed every time. For example, a login form might take an extra second to load its fields, or an API call might return slower under load. If the test checks too early, the element isn’t there yet, so it fails. Run the same test again when the timing lines up, and it passes. These kinds of race conditions are one of the most common causes of periodic test failures.

Here's a simple example:

// Flaky: timing might not line up
await page.click("#login");
await page.fill("#username", "testuser"); // element may not be ready yet

External Services

Tests that depend on APIs, databases, or third-party tools inherit their instability. If the service lags or drops, the test fails, even though nothing in your code is wrong.

Random and Unpredictable Data

Random inputs, generated values, or leftover records can cause different results across runs. Without controlled test data, results won’t be consistent.

Environment Issues

System clocks, environment variables, or limited resources in CI can change how a test behaves from one run to the next.

Leaking State

When setup and teardown aren’t handled cleanly, data leaks between tests. The same test might pass on its own but fail if it runs right after another.

Test Interdependencies

Two tests that work fine when run individually can fail when run back-to-back, because one leaves behind a state that the other doesn’t expect.

// Test A: creates a user
it("creates a user", async () => {
  await createUser("alice");
});

// Test B: assumes "alice" exists, but might run before Test A
it("deletes a user", async () => {
  await deleteUser("alice"); // fails if Test A hasn't run
});

Flakiness shows up most often in end-to-end testing. UI layers, network calls, and external systems add moving parts that are harder to control. A single slow response or unhandled async operation can make a stable feature look broken.

Understanding the causes explains why flakiness seeps into a test suite, but it doesn’t solve the challenge of seeing tests pass and fail unpredictably. The next step is learning how to detect flaky tests before they waste more time in your CI/CD pipelines.

So far, we've looked at how flaky tests occur and why they can be challenging to predict. The real challenge lies in managing them at scale. Let's take a look at how mature teams tackle them at scale.

A Framework for Managing Flakiness at Scale

As teams scale, flakiness scales along with them. A single unreliable test might seem manageable when dealt with alone, but if you multiply it by hundreds of test cases, pipelines and multiple teams, the impact also compounds.

Each false failure adds to the noise, slows the pace of the feedback loop, and starts eroding trust in CI results. Therefore, at scale, flakiness demands visibility, ownership and measurement across the entire delivery process.

Detection: Finding Flakiness Before It Slows You Down

You won’t be able to manage anything unless you detect it first. Therefore, detecting flakiness is not just a single build failure; it’s also how tests behave over time and across environments.

Compare Results Over Time

If the same commit sometimes fails and sometimes passes, this is the first sign. A flaky test generates noise even when no changes to the code and the environment have been made, something that won’t happen with reliable tests.

Tools like Playwright can automatically retry failed tests, but retries alone wouldn’t suffice; the key is to study the patterns over time and try to connect the dots when the tests fail and how often they fail.

Tip: In Plawright, enabling test retries with the --retries flag is a good starting point for studying the pattern and detecting flaky tests.

For example, you can give failing tests 4 retry attempts:

    npx playwright test --retries=4

This will show an instability pattern over various retries, which will help you compare results over time.

Analyze Patterns

Detection is more about seeing failure frequency and getting context around it.

While studying these patterns, ask questions like:

Does the same test fail on specific branches?
Does it correlate with certain commits or authors?
Does it appear during peak CI usage times?

Use Historical Visibility

Present-day teams track test history to better confirm whether a failure is just a one-time issue due to a small glitch in the code, or is it a pattern.

Currents, a Playwright-focused test analytics platform, simplifies this with automatic flaky detection and its Test Explorer, which displays flaky behaviour across branches, environments, and builds.

Measurement

Detection is important, but once you have detected flakiness, it becomes even more important to quantify it. The numbers and data can prove how big the problem is and where it is impacting the business the most.

Here’s how you can measure:

Flakiness Rate: Track how often a test fails vs how often it runs.
Impact Metrics: Count how many pull requests have been blocked, how many test reruns were triggered (because the test failed), and how much CI time was lost.
Set Threshold: Teams can set up an acceptable flakiness threshold, usually under 1-2% across suites.

Currents automatically calculate Flakiness Rates for each test that you run, helping you provide history and trends across projects. Turning flakiness into measured data helps the team to prioritize better.

Prioritize

Sometimes, depending on the scope of the project, not every flaky test deserves immediate attention. Some are as critical as blocking release, and some are just background noise.

Fix high-impact flakes first: Focus on the tests covering core workflows (login, payments, deployment, etc.)
Quarantine selectively: Move non-critical flaky tests out of the main pipeline so they don’t slow every PR.
Assign ownership: Each flaky test should have a clear owner; it’s usually the team responsible for the related feature.
- To designate an owner of a test, add an annotation with type: owner, for example: testInfo.annotations.push({ type: "owner", description: "johnsmith", });

This value will appear in various areas of your Currents dashboard so that you can quickly identify who owns the test and prioritize better.

Prioritizing and clear ownership ensures flaky tests are never left unattempted and helps team on what matters most before moving to the resolution stage.

Resolution

After identifying and prioritizing flaky tests, the next steps is fixing the root cause, not just getting the test pass once.

Here are some steps that you can perform:

Reproduce locally: Running the tests repeatedly (e.g., --repeat-each=20) to confirm instability. This flag re-runs each test sequentially within the same environment, helping you verify if the failures are consistent or intermittent, a quick way to isolate flakiness from actual bugs.
Use logs and artifacts: Screenshots,videos, and traces often reveal dependency issues.
Traces: They are a great way for debugging your tests when they fail on CI.

Here’s how you can use Playwright’s .zip trace locally:

npx playwright show-trace path/to/trace.zip

Currents can automatically aggregate these artifacts, logs, screenshots and videos and traces across builds.

By running tests locally multiple times and comparing artifacts from each failure, you can start identifying patterns, whether its timing delays, change in the environment, or unstable dependencies, and start focusing on the actual source of flakiness.

Prevention

The most effective and successful teams build prevention when it comes to their testing culture. Flakiness prevention isn’t a one time job but requires constant effort and intervention.

Establish ownership: Every test belongs to someone; assign owners by feature or component.
Define flakiness budgets: Building on threshold defined earlier, a flakiness budget sets how much instability a team is willing to take before taking corrective measures. For example, allowing up to 1% flakiness per suite per week before a fix becomes mandatory. This turns flakiness tracking from being just a metric into an accountability process.
Monitor continuously: Track trends to study the patterns.
Review regularly: Include flaky test reviews in sprint retrospectives or share them async in Slack every couple weeks.

Currents make prevention of flaky tests practical via tracking flakiness trends, pre-test actions like skip and post-test actions like quarantine and add tag thanks to Currents Actions, and collecting and presenting test results data for retrospecting.

Even with all the right prevention strategies in place, flakiness will still appear from time to time. What differentiates the smarter teams from the rest is how you implements these testing techniques and tighter monitoring.

Techniques That Strengthen Flakiness Management

You can also utilize some additional techniques and methods to further strengthen you flaky tests management.

Test Burn-in

Run new or suspicious tests repeatedly (e.g., 100 times) before merging them into the main test suite. This exposes the instability early and prevents unreliable tests from merging and damaging the main CI pipeline.

Continuous Monitoring and Alerts

Monitoring via dashboards and alerts keep teams aware of the flakiness trends. Currents can automatically tell you when the test results are deteriorating.

Test Cases Management

Use tags and other metadata like teams, feature and environment to group flaky tests. This makes it easier to filter, assign, and prioritize fixes.

Flakiness is inevitable, but it can be contained and minimized with precision. Once you’ve built the right processes and structure for your suite, the next steps would be how to tackle these flaky tests if you encounter one.

How to Fix a Flaky Test

Solve flaky tests with a repeatable process: reproduce, isolate, fix, and prevent. That usually means improving test isolation, stabilizing the test environment, and redesigning fragile tests.

Here are the steps you can follow to study the issue and fix it:

1. Reproduce the Issue

Run the test in isolation.
Repeat the test execution multiple times to confirm that it fails intermittently.
If it only fails inside CI but passes locally, the cause may be environmental factors:
- Limited CPU or memory resources on the CI runner
- Differences in environment variables
- Clock or timezone mismatches
- Network latency in shared environments

2. Stabilize Selectors and Waits

Most UI flakiness comes from unstable selectors or async timing.

Prefer stable locators (data attributes, ARIA roles) over fragile text or XPath.
Avoid blind sleeps (e.g., waitForTimeout(3000)) as the primary strategy, since they mask timing variability.
Wait for observable state: element visible, network idle, or a specific DOM/text change.
Use targeted retries or wait-for-response where the async operation is known to be flaky, rather than blanket test-level retries.
Read Debugging Playwright Timeouts article for more tips on how to stabilize your selectors and waits.

Here's how you can do it in Playwright:

// stable locator + visible assertion
await expect(page.locator('[data-testid="submit"]')).toBeVisible({
  timeout: 5000,
});

// wait for network activity if the page triggers fetches
await page.waitForLoadState("networkidle");

// wait for a specific API response before asserting
await page.waitForResponse(
  (resp) => resp.url().includes("/api/login") && resp.status() === 200
);

3. Control Dependencies

Eliminate reliance on external systems unless you’re testing them directly.
Mock or stub APIs and services to avoid noise from outages or network slowness.

4. Fix Test Data and Isolation

Use fixed instead of random values; when randomness is necessary, seed it so failures are reproducible.
Clean setup and teardown for each test.
Avoid test order dependency: one test’s state should not affect the next test.

5. Quarantine Unreliable Tests

If a flaky test blocks merges, quarantine it from the main pipeline rather than letting it keep stalling other PRs. Move it to a “flaky” suite and mark the PR with a note so reviewers know why it was excluded.
Track quarantined tests in your issue tracker and attach the reproduction steps and artifacts. Currents allows you to quarantine them while still tracking their results.

6. Inspect Logs and Artifacts

Collect screenshots, videos, console logs, network traces, and stack traces for every failure. Don’t rely on memory or a single test run.
Use tracing where available (Playwright traces, Cypress videos). Traces reveal the timing and sequence of events, making race conditions apparent.
Centralize artifacts so they are accessible after CI jobs terminate. Currents stores all artifacts centrally, so you don’t lose evidence when a build is retried.

7. Monitor After Fixing

Don’t assume “fixed.” Track a test’s Flakiness Rate (failures/runs) over time. Set thresholds and alerts (e.g.,, notify when flakiness > 1% over 7 days).
Add automation: create a ticket or auto-quarantine a test when it exceeds a flakiness threshold.
Use continuous monitoring in Currents to expose flakiness trends, and catch regressions before they block your workflow.

The aim of fixing flaky tests isn’t achieving perfection; you’ll never reach zero flakiness. The goal is to know how to manage it. Detect it early, resolve what you can, and quarantine the rest so your test suite stays reliable. With the right process and monitoring, flakiness becomes predictable and manageable instead of disruptive.

Final Thoughts

Flaky tests aren’t going anywhere. Every growing codebase and every complex test suite faces them. Therefore, it's essential to understand how you will handle flakes.

Teams that try ignoring flakiness are the teams that end up with bloated CI/CD pipelines, delayed pull requests, and developers rerunning the tests just to pass them all, whereas the teams that manage it by isolating, fixing, quarantining, and monitoring are the ones that protect their tests’ reliability and keep their product delivery pace to the mark.

In the end, it’s simple, you won’t reach zero flakiness, but what matters is that you're building a system that detects, contains, and reduces it before hampering the development process and software quality.

That’s where tools like Currents become valuable. They give visibility and methods that make the management of flaky tests simple and organized, thanks to centralized artifacts, history across runs, and continuous monitoring. With this, you start to see flakiness as a risk that you can learn to control.

Scale your Playwright tests with confidence.
Join hundreds of teams using Currents.

Learn More

Trademarks and logos mentioned in this text belong to their respective owners.