Testing on-screen keyboards with Stb-tester
02 Oct 2020.
Stb-tester v32 added new APIs that make it much easier to navigate on-screen keyboards from your test scripts. In this tutorial we’ll model the behaviour of YouTube’s search keyboard on the Apple TV, and we’ll write a Page Object (a Python class) to navigate the keyboard from our test scripts.
By the end of this tutorial we will have implemented a Page Object with an
enter_text method, so you can write a test script like this:
Watch it in action:
Modelling the keyboard
First we will specify a Directed Graph that describes the behaviour of the keyboard under test. A “graph” (in the computer-science sense of the word) consists of “nodes” connected by “edges”. Each key on our keyboard will be a node in our graph, and the possible transitions from each key to its neighbours will be the edges:
Each edge specifies the button that you need to press on the remote control to
trigger that transition:
Stb-tester will use this graph to calculate the shortest path to the
target. For example, if the current selection is on “a”, to type the letter “p”
Stb-tester would press
KEY_RIGHT 3 times and
KEY_DOWN twice (and then
KEY_OK to type the selected letter). Note that there can be more than one
To model this keyboard’s behaviour in our Python code, we can use the stbt.Keyboard API to specify each key’s name and region (its position on the screen; we’ll use this later) using Keyboard.add_key, and the transitions between keys using Keyboard.add_transition, like this:
Hold on, hold on. This keyboard is laid out in a regular grid, so instead of typing each key one by one, let’s use Keyboard.add_grid. Our keyboard has 3 different grids, shown below in different colours:
We specify all these keys like this:
Much easier, isn’t it! Keyboard.add_grid is only suitable if all the cells in the grid are the same size. You don’t need to be super-precise with the region coordinates — just make sure the centre of each key is inside the right cell.
Note that stbt.Grid’s data parameter is a list of lists. Actually it’s a list of iterables — that’s why we can provide a list of strings because iterating over a string yields one character at a time. We could also have specified it like this:
…but the first way is easier to type and easier to read. For the top and bottom grids we do have to use a list of lists because the key names are longer than a single character.
Keyboard.add_grid will add all the keys and the transitions between them (within the grid). It won’t add transitions that go outside of the grid, so we need to add those explicitly, like this:
Note that, by default, Keyboard.add_transition adds the opposite transition
automatically, for example
You may have noticed that some keys have two possible transitions for the
same remote-control button — for example pressing
KEY_UP from “SPACE” can
land on “5” or on “6”. This reflects the keyboard-under-test’s real behaviour:
It remembers which key you came from before navigating down onto “SPACE”, and
it returns to the same column when you go back up. stbt.Keyboard doesn’t keep
track of this state, so we just accept that both of those two keys (“5” and
“6”) are valid targets.
This keyboard has three different modes: Lowercase, uppercase, and symbols.
It’s best to think of each mode as an entirely different keyboard, with
transitions that change between them: Pressing
KEY_OK on one of the mode keys
(like “ABC”) will go to that mode.
Some keys might appear in more than one mode. It’s important to model these as
different keys — even though they look the same, they are different because
they are connected to different keys. For example there is a “SPACE” key in all
of the modes, but pressing
KEY_UP from it will go to a totally different key:
The same is true for the “DELETE” and “CLEAR” keys, the mode keys (“abc”, “ABC”, and “#+-“), and the number keys.
Now we just need to add the transitions between modes: If we’re in lowercase
mode with the selection on “ABC”, pressing
KEY_OK takes us to uppercase mode
with the selection still on “ABC” (see Figure 4) — and so on for the other
Note that in this keyboard we can identify a key unambiguously by its name + mode. Some keyboards might have the same key twice in two different places in the same mode (for example two “shift” keys) — in that case you would model this as two separate keys with the same name & mode, but different region.
This keyboard has another way of changing modes: Pressing
through the modes. For example from “a” to “A” to “!” and back to “a”; or from
“b” to “B” to “@”, etc.
To model this we need to add each a transition from each and every key in the keyboard. We can use Keyboard.find_keys to loop over the keys we have already added to the model, and Keyboard.find_key (singular) to find the corresponding target for each transition, like this:
Identifying the currently selected key
We have modelled the keyboard’s behaviour. Now, to use that model we need to understand the current state of the device under test: Which key is currently selected?
With Stb-tester, the way we extract this information from the screen is to write a Page Object (a Python class) that does the necessary image-processing. Our Page Object class will have two properties:
is_visible: Returns True if the keyboard is visible and focused.
selection: Returns the currently selected key.
We can answer both of these questions (Is the keyboard visible? And which key is selected?) with stbt.find_selection_from_background. Let’s start with a simple example that only understands the lowercase keyboard:
stbt.find_selection_from_background compares the video frame (captured from
the device under test) against the specified reference image
(“lowercase-background.png”). This reference image is a screenshot of the
keyboard without any selection. Thus, any differences between the frame
(which does show a white rectangle around the selected key) and the reference
image are going to tell us where the selection is. If the differences span a
larger region than the size of the biggest key (
max_size above), then it
means that we’re looking at a different screen — not the keyboard.
You may need to create this selection-less image by merging two different screenshots together. This video shows how to do it in the GNU Image Manipulation Program, a free open-source cross-platform image editor:
Step by step instructions:
- Open both screenshots. They must have the selection on different, non-overlapping keys.
- Drag one of the open image tabs onto the other tab and drop it into the Layers window so that it’s above the existing layer.
- Use the rectangle selection tool to select the part of the image that contains the selection.
- Choose Edit > Clear (or press Delete on your keyboard). The layer underneath will show through the deleted region, showing the same key but without the white rectangle.
- Use File > Export As… to save the image in PNG format to your test-pack.
- Don’t forget to commit the image to git! (Until you do, your IDE will show a lint error underneath the filename to remind you.)
Now, to recognize all three modes, we need to create similar (selection-less)
reference images for the other modes: “uppercase-background.png” and
“symbols-background.png”. Finally, we update our
so that it looks like this:
You can visualise & debug your Page Object’s properties in the Object Repository tab of your Stb-tester Portal:
Navigating the keyboard
Now we have all the pieces we need to navigate the on-screen keyboard:
- A way to tell which button is currently selected.
- A graph that tells us the shortest path from each key to any other key.
We’ll add a method to our
Search Page Object called
enter_text. This will
text parameter, and it will type the given text into the on-screen
keyboard by calling [stbt.Keyboard.enter_text]:
Keyboard.enter_text will use the
selection property of its
to see which button is currently selected. Then it will loop over each letter
text: Find a key in our model that matches that letter, navigate to it,
KEY_OK to type the letter.
Key.name versus Key.text
Keyboard.enter_text searches for a key with a
text attribute that matches
the desired letter. A key’s
text defaults to its
name if the name is a
single character. This heuristic makes it convenient to specify most keyboards;
longer names like “CLEAR” typically don’t enter any text when pressed. If you
do have a key that types longer text when pressed (for example “@gmail.com”)
you can add it like this:
Or if it’s part of a grid, you can specify it like this:
Some of the entries in
data can be dicts like the example above and others
can be strings, like we had seen in earlier examples — but remember that a
stbt.Grid is only suitable if all the cells in the grid are the same size.
For irregular-shaped keyboards you might have to specify each key (and its
region) using Keyboard.add_key.
We can also navigate to a single key using Keyboard.navigate_to. For example,
here is the implementation of a
clear method that we can provide so that our
test scripts can clear any text that has been entered into the Search page:
Some keyboards have an explicit “SEARCH” button that you have to press after
typing the text. For those keyboards, our Page Object’s
would look like this:
Common mistake: Using an outdated page instance
Keyboard.enter_text and Keyboard.navigate_to take a Page Object instance in
page parameter. This instance has a
selection property that reflects
the position of the selection at the time the instance was created. If this
instance is out of date (because the selection has moved since that time), then
stbt.Keyboard will calculate a path from the wrong start position to your
This is because Stb-tester’s Page Objects are immutable: An instance of the Page Object reflects the state of the device-under-test at the time the instance was created.
The following code won’t work:
To get the latest state, you can create a new instance of the Page Object like this:
Or like this:
self is an instance of our
Search Page Object.)
For this purpose, Keyboard.enter_text and Keyboard.navigate_to
return a new page instance that reflects the state of the device-under-test
after the text has been entered (or the navigation completed). We can use
their return value instead of calling
self.refresh(). Here’s the corrected
Note that we have made our method return an updated page instance. This is
consistent with Keyboard.enter_text’s behaviour, and it allows any testcases
that call our
enter_text method to use the same pattern.
Simple keyboards (no modes)
Many on-screen keyboards don’t have modes — there’s just a single, uppercase or lowercase keyboard. Or maybe your keyboard does have several modes, but you don’t care to test them — you just want your test script to type in a search term, and the case doesn’t matter. (Typically the different modes are only really necessary for login keyboards where you type a password.)
In this case, you don’t need to specify
mode when you call
Keyboard.add_key, Keyboard.add_grid, or Keyboard.add_transition. The
key’s name alone, or the name + region, will be enough to identify a key
unambiguously. You will need to make your
enter_text method convert to
lowercase or uppercase, as appropriate, like this:
If your keyboard has a “shift” mode, where pressing
KEY_OK on an uppercase
letter types the letter and changes to the lowercase mode, you can model
this by specifying a transition from every key in the uppercase mode, to the
same key (by region) in the lowercase mode. The code would look somewhat
similar to the example earlier in this article for changing modes
See the full code from this tutorial here.