Stb-tester : Testing on-screen keyboards with Stb-tester

Meet Stb-tester at IBC (Amsterdam, 12-15 September 2025)

02 Oct 2020.

Stb-tester v32 added new APIs that make it much easier to navigate on-screen keyboards from your test scripts. In this tutorial we’ll model the behaviour of YouTube’s search keyboard on the Apple TV, and we’ll write a Page Object (a Python class) to navigate the keyboard from our test scripts.

By the end of this tutorial we will have implemented a Page Object with an enter_text method, so you can write a test script like this:

page = Search()
page.enter_text("Peppa Pig")

Watch it in action:

Modelling the keyboard

First we will specify a Directed Graph that describes the behaviour of the keyboard under test. A “graph” (in the computer-science sense of the word) consists of “nodes” connected by “edges”. Each key on our keyboard will be a node in our graph, and the possible transitions from each key to its neighbours will be the edges:

Figure 1: YouTube's search keyboard on Apple TV, showing the outgoing transitions from key "a".

Each edge specifies the button that you need to press on the remote control to trigger that transition: KEY_RIGHT, KEY_LEFT, KEY_UP, or KEY_DOWN.

Stb-tester will use this graph to calculate the shortest path to the target. For example, if the current selection is on “a”, to type the letter “p” Stb-tester would press KEY_RIGHT 3 times and KEY_DOWN twice (and then KEY_OK to type the selected letter). Note that there can be more than one shortest path.

Figure 2: Shortest path from "a" to "p".

To model this keyboard’s behaviour in our Python code, we can use the stbt.Keyboard API to specify each key’s name and region (its position on the screen; we’ll use this later) using Keyboard.add_key, and the transitions between keys using Keyboard.add_transition, like this:

kb = stbt.Keyboard()
kb.add_key(name="a", region=stbt.Region(x=125, y=175, width=50, height=50))
kb.add_key(name="b", region=stbt.Region(x=175, y=175, width=50, height=50))
kb.add_transition("a", "b", "KEY_DOWN")
# ...and so on for all the other keys...

Hold on, hold on. This keyboard is laid out in a regular grid, so instead of typing each key one by one, let’s use Keyboard.add_grid. Our keyboard has 3 different grids, shown below in different colours:

Figure 3: The keys are laid out in a 3x1 grid (yellow), a 6x6 grid (red), and another 3x1 grid (blue).

We specify all these keys like this:

kb = stbt.Keyboard()
kb.add_grid(stbt.Grid(region=stbt.Region(x=145, y=125, right=410, bottom=160),
                      data=[["abc", "ABC", "#+-"]]))
kb.add_grid(stbt.Grid(region=stbt.Region(x=125, y=175, right=425, bottom=475),
                      data=["abcdef",
                            "ghijkl",
                            "mnopqr",
                            "stuvwx",
                            "yz1234",
                            "567890"]))
kb.add_grid(stbt.Grid(region=stbt.Region(x=125, y=480, right=425, bottom=520),
                      data=[[" ", "DELETE", "CLEAR"]]))

Much easier, isn’t it! Keyboard.add_grid is only suitable if all the cells in the grid are the same size. You don’t need to be super-precise with the region coordinates — just make sure the centre of each key is inside the right cell.

Note that stbt.Grid’s data parameter is a list of lists. Actually it’s a list of iterables — that’s why we can provide a list of strings because iterating over a string yields one character at a time. We could also have specified it like this:

        data=[
            ["a", "b", "c", "d", "e", "f"],
            ["g", "h", "i", "j", "k", "l"],
            ...etc...
        ]

…but the first way is easier to type and easier to read. For the top and bottom grids we do have to use a list of lists because the key names are longer than a single character.

Keyboard.add_grid will add all the keys and the transitions between them (within the grid). It won’t add transitions that go outside of the grid, so we need to add those explicitly, like this:

# abc ABC #+-
# ↕ ↕ ↕ ↕ ↕ ↕
# a b c d e f
kb.add_transition("a", "abc", "KEY_UP")
kb.add_transition("b", "abc", "KEY_UP")
kb.add_transition("c", "ABC", "KEY_UP")
kb.add_transition("d", "ABC", "KEY_UP")
kb.add_transition("e", "#+-", "KEY_UP")
kb.add_transition("f", "#+-", "KEY_UP")

# 5 6 7 8 9 0
# ↕ ↕ ↕ ↕ ↕ ↕
# SPC DEL CLR
kb.add_transition("5", " ", "KEY_DOWN")
kb.add_transition("6", " ", "KEY_DOWN")
kb.add_transition("7", "DELETE", "KEY_DOWN")
kb.add_transition("8", "DELETE", "KEY_DOWN")
kb.add_transition("9", "CLEAR", "KEY_DOWN")
kb.add_transition("0", "CLEAR", "KEY_DOWN")

Note that, by default, Keyboard.add_transition adds the opposite transition automatically, for example KEY_UP for KEY_DOWN or KEY_LEFT for KEY_RIGHT.

You may have noticed that some keys have two possible transitions for the same remote-control button — for example pressing KEY_UP from “SPACE” can land on “5” or on “6”. This reflects the keyboard-under-test’s real behaviour: It remembers which key you came from before navigating down onto “SPACE”, and it returns to the same column when you go back up. stbt.Keyboard doesn’t keep track of this state, so we just accept that both of those two keys (“5” and “6”) are valid targets.

Modes

This keyboard has three different modes: Lowercase, uppercase, and symbols. It’s best to think of each mode as an entirely different keyboard, with transitions that change between them: Pressing KEY_OK on one of the mode keys (like “ABC”) will go to that mode.

Some keys might appear in more than one mode. It’s important to model these as different keys — even though they look the same, they are different because they are connected to different keys. For example there is a “SPACE” key in all of the modes, but pressing KEY_UP from it will go to a totally different key:

Figure 5: The "SPACE" keys in the 3 modes are modelled as 3 different keys.

The same is true for the “DELETE” and “CLEAR” keys, the mode keys (“abc”, “ABC”, and “#+-“), and the number keys.

To tell these apart in our model we specify mode when we call Keyboard.add_key, Keyboard.add_transition, or Keyboard.add_grid, like this:

top_grid = stbt.Grid(region=stbt.Region(x=145, y=125, right=410, bottom=160),
                     data=[["abc", "ABC", "#+-"]])
bottom_grid = stbt.Grid(region=stbt.Region(x=125, y=480, right=425, bottom=520),
                        data=[[" ", "DELETE", "CLEAR"]])
middle_region = stbt.Region(x=125, y=175, right=425, bottom=475)
middle_grids = {
    "lowercase": stbt.Grid(region=middle_region,
                           data=["abcdef",
                                 "ghijkl",
                                 "mnopqr",
                                 "stuvwx",
                                 "yz1234",
                                 "567890"]),
    "uppercase": stbt.Grid(region=middle_region,
                           data=["ABCDEF",
                                 "GHIJKL",
                                 "MNOPQR",
                                 "STUVWX",
                                 "YZ1234",
                                 "567890"]),
    "symbols": stbt.Grid(region=middle_region,
                         data=["!@#$%&",
                               "~*\\/?^",
                               "_`;:|=",
                               "éñ[]{}",
                               "çü.,+-",
                               "<>()'\""]),
}

kb = stbt.Keyboard()
for mode in ["lowercase", "uppercase", "symbols"]:
    kb.add_grid(top_grid, mode=mode)
    kb.add_grid(bottom_grid, mode=mode)
    g = middle_grids[mode]
    kb.add_grid(g, mode=mode)

    # Transitions between grids:
    #
    # abc ABC #+-  (top grid)
    # ↕ ↕ ↕ ↕ ↕ ↕
    # a b c d e f  (first row of middle grid)
    kb.add_transition(g[0, 0].data, "abc", "KEY_UP", mode=mode)
    kb.add_transition(g[1, 0].data, "abc", "KEY_UP", mode=mode)
    kb.add_transition(g[2, 0].data, "ABC", "KEY_UP", mode=mode)
    kb.add_transition(g[3, 0].data, "ABC", "KEY_UP", mode=mode)
    kb.add_transition(g[4, 0].data, "#+-", "KEY_UP", mode=mode)
    kb.add_transition(g[5, 0].data, "#+-", "KEY_UP", mode=mode)

    # 5 6 7 8 9 0  (last row of middle grid)
    # ↕ ↕ ↕ ↕ ↕ ↕
    # SPC DEL CLR  (bottom grid)
    kb.add_transition(g[0, 5].data, " ", "KEY_DOWN", mode=mode)
    kb.add_transition(g[1, 5].data, " ", "KEY_DOWN", mode=mode)
    kb.add_transition(g[2, 5].data, "DELETE", "KEY_DOWN", mode=mode)
    kb.add_transition(g[3, 5].data, "DELETE", "KEY_DOWN", mode=mode)
    kb.add_transition(g[4, 5].data, "CLEAR", "KEY_DOWN", mode=mode)
    kb.add_transition(g[5, 5].data, "CLEAR", "KEY_DOWN", mode=mode)

Now we just need to add the transitions between modes: If we’re in lowercase mode with the selection on “ABC”, pressing KEY_OK takes us to uppercase mode with the selection still on “ABC” (see Figure 4) — and so on for the other mode keys:

for source_mode in ["lowercase", "uppercase", "symbols"]:
    for name, target_mode in [("abc", "lowercase"),
                              ("ABC", "uppercase"),
                              ("#+-", "symbols")]:
        kb.add_transition(kb.find_key(name=name, mode=source_mode),
                          kb.find_key(name=name, mode=target_mode),
                          "KEY_OK")

Note that in this keyboard we can identify a key unambiguously by its name + mode. Some keyboards might have the same key twice in two different places in the same mode (for example two “shift” keys) — in that case you would model this as two separate keys with the same name & mode, but different region.

This keyboard has another way of changing modes: Pressing KEY_PLAY cycles through the modes. For example from “a” to “A” to “!” and back to “a”; or from “b” to “B” to “@”, etc.

Figure 6: Mode change from any key by pressing `KEY_PLAY`.

To model this we need to add each a transition from each and every key in the keyboard. We can use Keyboard.find_keys to loop over the keys we have already added to the model, and Keyboard.find_key (singular) to find the corresponding target for each transition, like this:

for source_mode, target_mode in [("lowercase", "uppercase"),
                                 ("uppercase", "symbols"),
                                 ("symbols", "lowercase")]:
    for key in kb.find_keys(mode=source_mode):
        target = kb.find_key(region=key.region, mode=target_mode)
        kb.add_transition(key, target, "KEY_PLAY")

Identifying the currently selected key

We have modelled the keyboard’s behaviour. Now, to use that model we need to understand the current state of the device under test: Which key is currently selected?

With Stb-tester, the way we extract this information from the screen is to write a Page Object (a Python class) that does the necessary image-processing. Our Page Object class will have two properties:

is_visible: Returns True if the keyboard is visible and focused.
selection: Returns the currently selected key.

We can answer both of these questions (Is the keyboard visible? And which key is selected?) with stbt.find_selection_from_background. Let’s start with a simple example that only understands the lowercase keyboard:

class Search(stbt.FrameObject):
    """The YouTube search keyboard on Apple TV."""

    @property
    def is_visible(self):
        return bool(self.selection)

    @property
    def selection(self):
        match = stbt.find_selection_from_background(
            "lowercase-background.png",
            max_size=(115, 70),
            frame=self._frame,
            mask=stbt.Region(x=125, y=125, right=425, bottom=520))
        if match:
            return kb.find_key(region=match.region, mode="lowercase")
        else:
            return None

stbt.find_selection_from_background compares the video frame (captured from the device under test) against the specified reference image (“lowercase-background.png”). This reference image is a screenshot of the keyboard without any selection. Thus, any differences between the frame (which does show a white rectangle around the selected key) and the reference image are going to tell us where the selection is. If the differences span a larger region than the size of the biggest key (max_size above), then it means that we’re looking at a different screen — not the keyboard.

You may need to create this selection-less image by merging two different screenshots together. This video shows how to do it in the GNU Image Manipulation Program, a free open-source cross-platform image editor:

Step by step instructions:

Open both screenshots. They must have the selection on different, non-overlapping keys.
Drag one of the open image tabs onto the other tab and drop it into the Layers window so that it’s above the existing layer.
Use the rectangle selection tool to select the part of the image that contains the selection.
Choose Edit > Clear (or press Delete on your keyboard). The layer underneath will show through the deleted region, showing the same key but without the white rectangle.
Use File > Export As… to save the image in PNG format to your test-pack.
Don’t forget to commit the image to git! (Until you do, your IDE will show a lint error underneath the filename to remind you.)

Now, to recognize all three modes, we need to create similar (selection-less) reference images for the other modes: “uppercase-background.png” and “symbols-background.png”. Finally, we update our Search.selection property so that it looks like this:

    @property
    def selection(self):
        for mode in ["lowercase", "uppercase", "symbols"]:
            match = stbt.find_selection_from_background(
                mode + "-background.png",
                max_size=(115, 70),
                frame=self._frame,
                mask=stbt.Region(x=125, y=125, right=425, bottom=520))
            if match:
                return kb.find_key(region=match.region, mode=mode)

        return None

Tip

You can visualise & debug your Page Object’s properties in the Object Repository tab of your Stb-tester Portal:

Our *Search* page object in Stb-tester's object repository.

To learn more about Page Objects see Object Repository in the Stb-tester manual, and the stbt.FrameObject API reference documentation.

Navigating the keyboard

Now we have all the pieces we need to navigate the on-screen keyboard:

A way to tell which button is currently selected.
A graph that tells us the shortest path from each key to any other key.

We’ll add a method to our Search Page Object called enter_text. This will take a text parameter, and it will type the given text into the on-screen keyboard by calling Keyboard.enter_text:

class Search(stbt.FrameObject):
    ...

    def enter_text(self, text):
        return kb.enter_text(text, page=self)

Keyboard.enter_text will use the selection property of its page parameter to see which button is currently selected. Then it will loop over each letter in text: Find a key in our model that matches that letter, navigate to it, and press KEY_OK to type the letter.

Key.name versus Key.text

Keyboard.enter_text searches for a key with a text attribute that matches the desired letter. A key’s text defaults to its name if the name is a single character. This heuristic makes it convenient to specify most keyboards; longer names like “CLEAR” typically don’t enter any text when pressed. If you do have a key that types longer text when pressed (for example “@gmail.com”) you can add it like this:

kb.add_key(name="@gmail.com", text="@gmail.com",
           region=stbt.Region(...), mode=...)

Or if it’s part of a grid, you can specify it like this:

kb.add_grid(stbt.Grid(
    region=...,
    data=[
        [{"name": "@gmail.com", "text": "@gmail.com"}, {"name": "another key", ...}],
        ...
    ]))

Some of the entries in data can be dicts like the example above and others can be strings, like we had seen in earlier examples — but remember that a stbt.Grid is only suitable if all the cells in the grid are the same size. For irregular-shaped keyboards you might have to specify each key (and its region) using Keyboard.add_key.

We can also navigate to a single key using Keyboard.navigate_to. For example, here is the implementation of a clear method that we can provide so that our test scripts can clear any text that has been entered into the Search page:

class Search(stbt.FrameObject):
    ...

    def clear(self):
        kb.navigate_to("CLEAR", page=self)
        stbt.press_and_wait("KEY_OK")
        return self.refresh()

Some keyboards have an explicit “SEARCH” button that you have to press after typing the text. For those keyboards, our Page Object’s enter_text method would look like this:

    def enter_text(self, text):
        page = self
        page = kb.enter_text(text, page)
        page = kb.navigate_to("SEARCH", page)
        stbt.press_and_wait("KEY_OK")
        return page.refresh()

Common mistake: Using an outdated page instance

Keyboard.enter_text and Keyboard.navigate_to take a Page Object instance in their page parameter. This instance has a selection property that reflects the position of the selection at the time the instance was created. If this instance is out of date (because the selection has moved since that time), then stbt.Keyboard will calculate a path from the wrong start position to your target node.

This is because Stb-tester’s Page Objects are immutable: An instance of the Page Object reflects the state of the device-under-test at the time the instance was created.

The following code won’t work:

    # EXAMPLE OF BAD CODE -- DON'T COPY
    def enter_text(self, text):
        kb.enter_text(text, page=self)
        kb.navigate_to("SEARCH", page=self)  # <-- self.selection is outdated!
        stbt.press_and_wait("KEY_OK")

To get the latest state, you can create a new instance of the Page Object like this:

page = Search()

Or like this:

page = self.refresh()

(where self is an instance of our Search Page Object.)

For this purpose, Keyboard.enter_text and Keyboard.navigate_to return a new page instance that reflects the state of the device-under-test after the text has been entered (or the navigation completed). We can use their return value instead of calling self.refresh(). Here’s the corrected example:

    # FIXED EXAMPLE
    def enter_text(self, text):
        page = self
        page = kb.enter_text(text, page)
        page = kb.navigate_to("SEARCH", page)
        stbt.press_and_wait("KEY_OK")
        return page.refresh()

Note that we have made our method return an updated page instance. This is consistent with Keyboard.enter_text’s behaviour, and it allows any testcases that call our enter_text method to use the same pattern.

Simple keyboards (no modes)

Many on-screen keyboards don’t have modes — there’s just a single, uppercase or lowercase keyboard. Or maybe your keyboard does have several modes, but you don’t care to test them — you just want your test script to type in a search term, and the case doesn’t matter. (Typically the different modes are only really necessary for login keyboards where you type a password.)

In this case, you don’t need to specify mode when you call Keyboard.add_key, Keyboard.add_grid, or Keyboard.add_transition. The key’s name alone, or the name + region, will be enough to identify a key unambiguously. You will need to make your enter_text method convert to lowercase or uppercase, as appropriate, like this:

class Search(stbt.FrameObject):
    ...

    def enter_text(self, text):
        return kb.enter_text(text.lower(), page=self)

Shift modes

If your keyboard has a “shift” mode, where pressing KEY_OK on an uppercase letter types the letter and changes to the lowercase mode, you can model this by specifying a transition from every key in the uppercase mode, to the same key (by region) in the lowercase mode. The code would look somewhat similar to the example earlier in this article for changing modes by pressing KEY_PLAY.

See the full code from this tutorial here.