Identifying Failure

Example 2.3.1
The Unsynchronised Measuring Tapes
Two builders are on a job site. Builder A measures lengths and calls out the lengths to Builder B, an apprentice, whose job is to cut a piece of wood to that length. Builder A measures 525mm and calls out the number. Builder B measures the length and cuts a piece of wood and passes it to Builder A. The piece is too short by 3mm. Builder B re-measures the original piece and the tape reads the specified measurement. This pattern continues until Builder A, fed up with Builder B’s mistakes, investigates.
LQ: Why are Builder B’s pieces too short?
EE1: Builder B’s pieces are too short.
EE2: Builder B insists he’s measuring correctly.
RA1: Builder B isn’t measuring correctly.
RA2: Builder B isn’t cutting at the measured length.
ER1: Builder B is an apprentice builder.


At this point, we only have this information, and therefore these two plausible answers.

Builder A, believing his apprentice incompetent, uses his measuring tape to show Builder B that the piece is not the specified length. Builder B, surprised, uses his measuring tape to show that it is the specified length. To both of their surprise, their measuring tapes are not synchronised. This introduces an unexpected Rival Answer, which immediately becomes the best explanation:

RA3: The Builders’ measuring tapes are not synchronised.

Given Builder B’s unshakeable certainty that he is measuring correctly, and the unlikely case that he cuts incorrectly in exactly the same way every time, A3 becomes the most plausible answer.

Example 2.3.2
The Unsynchronised Measuring Tapes, formalised
LQ: Why are Builder B’s pieces too short?

EE1: Builder B’s pieces are too short.
EE2: Builder B insists he’s measuring correctly.

RA1: Builder B isn’t measuring correctly.
RA2: Builder B isn’t cutting at the measured length.
RA3: The Builders’ measuring tapes are not synchronised.

ER1: Builder B is an apprentice builder.

BE: Builder B’s pieces are too short because the two builders’ measuring tapes are not synchronised.


In this case, neither builder knew that something systematic was going wrong. At first it seemed a case of incompetence or error, and only after a pattern emerged did they launch an investigation into the cause of the unexpected behaviour. The case is intriguing (and it is a real case) because of the unstated expectation that measuring tapes are synchronised. In this case, the unstated expectation turned out to be so important that it nearly cost Builder B his job.

Sometimes what appears to be a mistake is in fact a systematic problem in need of troubleshooting. And sometimes we troubleshoot with an eye to potential mistakes, and an effort to avoid them. It can take a lot of creative thinking to troubleshoot, and to help us make sense of what’s involved, we will, of course, start with a distinction.

Similar to our distinction between explanation and prediction, one question is backward-looking, and one is forward-looking. Let’s call the first sort explanatory troubleshooting and the second sort predictive troubleshooting.

Explanatory Troubleshooting asks “What went wrong?” The aim is to rule things out and improve performance and reliability.

Predictive Troubleshooting asks “What might go wrong?” The aim is preparation and prevention. Nobody wants a catastrophe. (This leads to a Recommendation species called “Request for Prediction”, which we will treat later.)

Last modified: Wednesday, 7 February 2018, 9:51 PM