It's important to evaluate highway safety programs scientifically to find out whether or not they're reducing losses, and if so by how much. But sometimes sound data and analyses are incorrectly interpreted to arrive at unsound conclusions.
This is the caution offered by the University of Toronto's Ezra Hauer, who supplies examples of the pitfalls of misinterpreting scientific findings. One example involves allowing drivers to turn right on red lights after stopping, which was almost universally adopted in response to the oil crisis of the mid-1970s. Several studies of this practice, conducted in 1976-77, found associated crash increases, but the results of individual studies weren't statistically significant. So even though every study pointed to crash increases, policymakers could conclude that safety wasn't being compromised.
But effects that are statistically nonsignificant aren't the same as no effects at all. The initial datasets were small, which means very large effects would have been needed to achieve statistical significance. Later on, after right turn on red had been allowed for years and sample sizes were larger, research did show a statistically significant 20 percent increase in right-turn crashes. The effect on pedestrian crashes was worse (see "Right-turn-on-red laws raise intersection toll," Dec. 9, 1980).
By then, right turn on red had been firmly entrenched. The practice still is in effect, despite the documented adverse safety consequences.
Hauer cites two other examples, one involving whether wider paved shoulders alongside roads would reduce crashes more effectively than narrower shoulders, and another case involving whether raising speed limits produced an increase in motor vehicle deaths. In both cases, initial studies didn't find statistically significant effects. Subsequent analyses based on more data did find definitive effects.
In each case, the confusion involved a statistical exercise known as null hypothesis testing, which long has been regarded as a safeguard against spurious findings. The process involves trying to determine the size of an effect (for example, the increase in crashes after right turn on red), but the estimated size always is subject to variability. So statistical tests are used to determine whether the estimated effect is large enough to imply that the true effect is greater than zero.
When estimated effects are nonsignificant, it means the researchers couldn't be confident that the true effects were different from zero. But when separate studies find positive effects that, individually, aren't statistically different from zero, they collectively add weight to the conjecture that the true effect is greater than zero.
A problem arises when descriptors such as "insignificant" or "unimportant" are substituted for a statistical finding of "nonsignificant." Then common understanding of the findings changes from "effects could be zero" to "effects are zero." When such misrepresentations are used to guide highway safety policy decisions, promising programs are ended prematurely and unsafe practices continue indefinitely, costing lives and money.
What to do? Hauer says "the ritual [of null hypothesis testing] is so pervasively misapplied as to be simply unfit for use." Instead he says researchers should estimate the magnitude of each effect and supply measures of the precision of the estimates.
Institute chief operating officer Adrian Lund agrees, saying "promising countermeasures shouldn't be discarded just because the data are insufficient. Researchers should continue to gather data and evaluate results until the findings are definitive. In the meantime, policymakers shouldn't cite inconclusive data to tout their own pet programs or to discredit programs they oppose. This amounts to willy-nilly policymaking, not policy based on science."