Test failures vs. errors, and preconditions

I’m running a single stb-tester script repeatedly to understand the reproducibility of an intermittent defect. I’ve run the test 50 times:

Good! It takes me 2 minutes tops to verify that each of those 5 failures really are the defect we’re interested in, thanks to stb-tester’s great triaging UI.

What about 1,000 runs?

It turns out that running the same test script 1,000 times against a consumer-grade set-top box will reveal some interesting behaviours. Some of these failures might be genuine defects in the system-under-test; some of them are merely that the test script doesn’t handle every eventuality. I wrote the test script in a hurry to investigate this particular defect, so it isn’t super robust to things like a dialog popping up that says “I couldn’t connect to the network”. I don’t care if the IT team scheduled some maintenance over the same weekend when I left this test running, and the network went down for a few minutes. All I want to know is: Is my defect real? What is its reproducibility rate? Or maybe: Has it really been fixed in the latest release?

So I take my test script and I add one line (the line in green):

import stbt, mainmenu, channels, power

def test_that_the_on_screen_id_is_shown_after_booting():
    channel = 100

    with stbt.as_precondition("Tune to channel %s" % channel):
        mainmenu.close_any_open_menu()
        channels.goto_channel(channel)
        power.cold_reboot()
        assert channels.is_on_channel(channel)

    stbt.wait_for_match("on-screen-id.png")

Now when I run the test script 1,000 times I get results like this:

Much better! Now I can ignore all the yellow results (which I call test errors) and focus on investigating the few red results (test failures).

Test failures and errors

A test fails when it detects a defect in the system-under-test.

A test error is a problem with the test script or test infrastructure.

Test failures

wait_for_match raises a UITestFailure exception if it doesn’t find a match. (Specifically it raises a MatchTimeout, which is a subclass of UITestFailure.)

Stb-tester’s other core functions also raise UITestFailure exceptions when the system-under-test’s behaviour doesn’t match the expected behaviour.

You can also use assert in your test scripts.

Stb-tester considers all of these test failures, and reports them in red.

Test errors

Stb-tester will treat any other exception raised by your Python script as a test error (reported in yellow). These include mistakes in your Python script (SyntaxError, TypeError, ValueError), as well as the PreconditionError raised by stbt.as_precondition.

You can use the interactive filter in stb-tester’s results to filter out test errors and other results you aren’t interested in.

Preconditions

Most test scripts will perform a series of operations to get the system-under-test into a certain state, and then check the behaviour that is the purpose of the test.

Sometimes it is useful to treat failures that happen during those initial setup operations as test errors. For example when you are investigating a single intermittent defect, as in the example at the beginning of this article. In such cases, use with stbt.as_precondition to turn test failures into errors.

But in more general soak and functional tests, it is better not to use stbt.as_precondition to hide potential defects. Write your test script so that it expects, and can correctly react to, every legitimate behaviour of the system-under-test. That way, any misbehaviour of the system-under-test will be flagged as a failure; problems with the test infrastructure itself will still appear as errors.