A post-mortem report on the stb-tester ONE's infrared transmitter reliability
26 May 2016.
This is a technical report on our investigation into an intermittent problem with the stb-tester ONE’s infrared transmitter.
If you’re a customer of ours and you have an stb-tester ONE, we will be posting you a replacement infrared transmitter free of charge within the next few days. All you have to do is install the v24.7 software update on your stb-tester ONE, unplug the old infrared transmitter, and plug in the new one. The new transmitter is perfectly reliable – we have tested more than 500,000 keypresses without a single missed keypress.
This doesn’t affect our customers with larger test rigs (who are using the RedRat irNetBox instead of our USB infrared transmitters).
Technical summary: Our old infrared transmitter used the FTDI FT232R USB-to-serial chip in bit-bang mode. We proved that the FT232R’s clock is unreliable in bit-bang mode, even the newer “C” revision of the chip. The FTDI FT230X chip doesn’t have such problems.
Read on for details of the defect, the solution, and our test methodology.
Reproducing the problem: Missed keypresses
We became aware that the infrared transmitter we ship with the stb-tester ONE wasn’t entirely reliable: When your test script called stbt.press, sometimes the device-under-test wouldn’t react – it didn’t see the keypress.
When facing an intermittent defect, the first thing we need is statistically significant data on the reproducibility of the defect. Otherwise we won’t know if our changes are helping or hurting, or making any difference at all. So we picked a Roku box as our target –the Roku is very reliable, for a consumer electronics device– and we wrote a test script that sends 20 keypresses, checking that every keypress had the expected effect:
Starting from the Roku home screen where the menu selection is on “Home”, this testcase presses DOWN and checks that the menu selection has moved to “My Feed”. Then it presses UP and checks that the selection has moved back to “Home”. It repeats this 10 times for a total of 20 keypresses. press and wait_until are functions provided by stb-tester; to_roku_home and find_selection are Roku-specific helper functions that we have defined elsewhere in our test-pack. The whole testcase only takes a few seconds; you can watch a single run of the testcase (where every keypress succeeded) in the video below:
After running the testcase repeatedly for a few hours we have 1,368 test-runs, and 79 of those failed. Each test-run sends 20 keypresses, so we saw 79 missed keypresses out of approximately 25,800 keypresses: 0.3% of all keypresses were missed.
In the video below you can watch me spot-check some of the test failures, just to make sure that the failures are really caused by missed keypresses and not some other issue.
Running the same testcase against a different Roku model gave much worse results: About 4% missed keypresses. So it seems that some infrared receivers are more susceptible than others. At least we’ve found a good device to test our future fixes against!
Now: Is it really the fault of our infrared transmitter? It could be caused by our infrared config file (which describes the protocol used by the Roku’s remote control), or by the Roku itself. To check, we ran the same test using the RedRat3 infrared transmitter. RedRat is a company based near Cambridge, UK, who specialise in infrared and other remote control technologies; they are the gold standard in infrared control automation. The RedRat3 is a USB infrared transmitter, but unlike ours, it outputs a high-power signal so it isn’t usually suitable in a test lab with dozens of set-top boxes that are running independent testcases.
Using the same test script, we didn’t see a single failure in over 30,000 keypresses using the RedRat3. This rules out the Roku and our infrared config file, leaving only our USB infrared transmitter hardware or software to blame.
We also tested the RedRat irNetBox and we didn’t see any missed keypresses either. So our customers with larger test rigs aren’t affected by this issue, as they use the RedRat irNetBox instead of our USB infrared transmitters.
Analysing the output from our infrared transmitter
Infrared signals are a sequence of pulses and spaces. This is the signal for the Roku’s DOWN button. It’s a long header pulse & space, followed by a sequence of shorter pulses & spaces:
During each pulse (in blue) the infrared transmitter is switching on and off at a certain carrier frequency, usually 38kHz. You can’t actually see the carrier signal in these diagrams. During the spaces, the infrared transmitter isn’t sending anything at all.
We set up a RedRat3 as an infrared receiver, and used the mode2 tool from lirc to log the pulses & spaces for each keypress as we ran our testcase.
Here are a good keypress and a bad keypress from one of the test-runs:
In the bad signal (the bottom one), about halfway through, it looks like the pulses start slightly earlier than the good signal. It also seems that the effect is cumulative, so towards the end of the signal the pulses get earlier and earlier.
After looking at several bad signals there were no other obvious differences, so we turned to some statistical measures. Here’s a histogram showing the mean difference between the ideal & actual pulse or space, within a given keypress:
The x axis is in microseconds. The green and red lines represent the good & bad keypresses from the stb-tester ONE’s infrared transmitter. There’s no obvious difference between the two groups of signals – maybe this is because in a bad keypress it only takes one bad pulse to cause the Roku to miss the keypress, but for a given keypress we’re measuring the average error across all the pulses & spaces for that keypress.
What is clear, however, is that there is a lot of variability to the duration of these pulses & spaces. Compare to the blue data, which is using a RedRat3 as the transmitter: Its pulses & spaces have much less variation from the ideal duration.
A histogram of the total signal length shows a similar curve, which confirms that variations in the length of one pulse have a knock-on effect on the rest of the signal:
At this point in the investigation, our best theory is: There is too much variability in our infrared transmitter’s output, and this is causing the missed keypresses. This could explain why some set-top boxes are affected more than others – perhaps their decoding algorithm is more or less tolerant of variations in the signal.
A minimal testcase: Generating a plain carrier frequency
Can we see the same variability in the carrier signal itself? The data we captured with mode2, above, doesn’t tell us the carrier frequency so we’ll need an Oscilloscope.
After purchasing a Hantek DSO-2090 and making a few patches to OpenHantek, we could see the raw signal coming out of our infrared transmitter. Here is what happens when we send a single pulse of a fixed length, repeatedly:
You can see that the total length of the pulse varies quite a bit, and the carrier signal within the pulse isn’t clean at all.
Our infrared transmitter used the FTDI FT232R chip, which is a USB to serial converter. The chip has two modes of operation: RS-232, and bit-bang. In bit-bang mode you specify a clock rate, and send it a sequence of ones and zeros; the chip writes these ones and zeros to its output pins at the specified clock rate. Unlike RS-232, there are no control bits (start bit, stop bit, parity bit) so we have complete control over the signal.
In theory we would set the clock rate to twice the infrared protocol’s carrier frequency, and send 1-0-1-0-1-0-etc to generate the carrier signal. In practice, the FT232R’s bit-bang mode only works at certain clock rates so we use a higher clock rate and send, for example, 1-1-1-1-1-0-0-0-0-0-1-1-1-1-1-etc (or however many ones we need to generate the right carrier frequency).
To analyse the chip’s behaviour in the simplest way possible, we wrote a simple C program using libftdi that sets the clock rate and sends 1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0-1-0 (20 ones and zeros). Then we re-wrote our test program so that it uses FTDI’s own D2XX SDK instead of libftdi. This is what we used to generate the signal you saw in the oscilloscope video, above.
We bought some new chips directly from FTDI, just in case our manufacturer was using counterfeit chips. Nope, the output is still bad. (Sorry manufacturer for doubting your supply chain skills.)
FTDI’s FT232R Errata Technical Note acknowledges that this is a known problem with Revision A of the chip, but claims that it has been fixed in Revision B. Our chips are Revision C, which is supposed to be identical to Revision B. But we have proved that big-bang mode is still unreliable in Revision C of the FT232R chip. We sent our code & data to FTDI support, who acknowledged the bug.
The solution: FTDI FT230X
The FTDI FT230X is a similar USB-to-serial chip that also has a bit-bang mode, and most importantly, has a stable clock. Even better, we can set any clock rate we want (to match the desired carrier frequency of our infrared signal) so that we can minimise the necessary USB bandwidth (we only need to send 1-0-1-0… instead of 1-1-1-1-1-0-0-0-0-0-1-1-1-1-1…).
With the FT230X, the signal as seen on an oscilloscope is perfectly stable, and our Roku testcase passes 100% of the time – we have tested more than 500,000 keypresses without seeing a single missed keypress.
We have written a lirc driver for the FT230X, which is now on the lirc master branch, in case anyone else wants to build their own USB infrared transmitter based on the FT230X. We are proud to have created the first reliable, low-powered, infrared transmitter on the market in a USB dongle form-factor.
If you’re a customer of ours and you have an stb-tester ONE, we will be posting you a replacement infrared transmitter within the next few days. All you have to do is install the v24.7 software update on your stb-tester ONE, unplug the old infrared transmitter, and plug in the new one.