Using segmentation to find GUI elements#
In image processing, “segmentation” means finding which pixels belong to foreground objects (such as text or other GUI elements) versus background pixels. Stb-tester v33 adds a new API called segment (pronounced like the verb: segMENT) that finds the location of distinct foreground elements.
For example, let’s find the location of each “poster” or “tile” in this screenshot:
By default, segment starts from the top of the screen and travels down, finding distinct rows:
frame = stbt.get_frame()
regions = stbt.segment(frame)
Next, we can discard rows that are too small, and then run segment again, but horizontally: Starting from the left of the row, and moving right.
tiles = []
for row in regions:
if row.height > 100:
tiles.extend(stbt.segment(frame, region=row,
initial_direction=stbt.Direction.HORIZONTAL))
We can combine both of these steps into a single call to segment, by
specifying steps=2
. The steps alternate between vertical and horizontal
directions:
regions = stbt.segment(frame, steps=2)
Then we can discard regions that are too small to be a tile:
tiles = [r for r in stbt.segment(frame, steps=2)
if r.height > 100]
Alternately, we could have limited the search region —if we know it— by
specifying the region
parameter of segment, instead of searching the whole
frame.
Note that some of the regions aren’t the same height, because the bottom of
the poster is too similar to the background color. If we set narrow=False
,
then segment will keep the top and bottom coordinates of the row from the
previous step:
tiles = [r for r in stbt.segment(frame, steps=2, narrow=False)
if r.height > 100 and r.width > 40]