The Chromium Chronicle #2: Fighting Test Flakiness
Episode 2: by Vasilii in Munich (May, 2019)
Flaky tests are a common problem in Chrome. They impact the productivity of other developers, and get disabled over time. Disabled tests mean diminishing test coverage.
Triaging Stage #
The OWNERS of directories are responsible for fixing their flaky tests. If you received a bug about a flaky test, spend a few minutes and comment what went wrong on the bug. If you have an old flaky test and it's unclear what went wrong, try to simply re-enable the test. Reassign the bug ASAP if it's clearly a problem in another component. The owners of that component should have better judgement about the failure,
Debugging Stage #
A number of command-line flags are useful for fixing flaky tests. For example,
--enable-pixel-output-in-tests will render the actual browser UI.
Have fallback tools if the debugger makes flakiness disappear. It's possible that, under debugger, the test is never flaky. In that case, log statements or
base::debug::StackTrace can be handy.
Keep in mind common reasons for
EXPECT__* failures besides bugs in production code:
- Incorrect expectations (e.g. secure page means HTTPS; it can be a localhost instead).
- Race conditions due to tests not waiting for the proper event.
Don't test the implementation but the behavior.
// It takes 2 round trips between the UI and the background thread to complete.
The two round trips may change into three in the future, making the test flaky. However, only the store state is relevant. Instead, use an observer for the store.
Beware of common patterns such as the following:
// Wait until things settle down.
A snippet like the above from a browser test is almost surely incorrect. There are many events that should happen in different processes and threads before some UI appears.
The following is a correct fix:
The fix above is correct under the assumption that
WaitUntilCredentialPromptVisible() doesn't actually check the UI. The browser tests should not depend on external UI events like "focus lost" or "window became foreground". Imagine an implementation where the prompt appears only when the browser window is active. Such an implementation would be correct; however, checking for the actual window makes the test flaky.
Post-fix Stage #
Once the test is fixed, run it hundreds of times locally. Keep an eye on the Flakiness Portal.