Upper-bound significance testing

One of the alternatives to checking against null

Testing for importance using significance

As part of the designing a study that will use linear regression analysis, you should consider what the best hypothesis against which to test for significance would be. You want a hypothesis that fits both the theory you're intending to support or debunk, and that works with the sort of data that you're expecting to have.

The page on this site titled “Insignificance is not failure” discusses why null-hypothesis significance testing is not really the universal litmus test for significance that so many researchers in the humanties treat is as. An example of an alternative test given on that page is an upper-bound significance test, one in which you're attempting to demonstrate that a correlation is too small in effect to be important. This page looks at that sort of analysis in a little more detail.

What would it look like if we designed our experiment from the beginning to check for an upper bound? That is, what if we were trying to demonstrate that, with 95% confidence, a correlation is lower than some value that we have interpreted as the minimum important correlation?

Figure 01

First, we'd mark our upper bound—which equates to the minimum important correlaton—on the the vertical baseline of our ‘sideways’ graph of the analysis. Then, the population probability curve on the right side of the graph would be positioned so that 95% of the area between it and the vertical baseline would be below that minimum important correlation. That would mean that line that bisects the population probability curve is therefore the maximum sample correlation that demonstrates that the population correlation does not meet your criteria for importance, with ‘p<.05’. See Figure 01.

As with null-hypothesis testing, the sample population curve on the left side of the ‘sideways’ graph is centered on the population correlation we're expecting to see. The farther away from the minimum important correlation that expected size of effect is, the better off we are, as can be seen in Figure 01 larger would be to move the sample probability curve down, away from the upper bound. In this example, the statistical power is a somewhat dismal 16%.

So, arranging things on a sideways graph to help understand an upper-bound significant test is fairly straight-forward. However, when it comes to doing the actual calculation things become somewhat more complicated. More specifically, they are less symmetric.

Figure 02

When considering correlations near zero (which is where null-hypothesis significance testing is most relevant), you can assume that the probability distributions are symmetric about some mean value. But when looking at larger correlations, you can no longer assume that.

This is because correlations can never have a magnitude above 1. So, as the correlation of interest moves higher up the vertical baseline, the probability distribution begins to “pile up” against the maximum value of 1, and this distorts the curve, making it asymmetric.

Figure 02 shows this. The line marking the mean of the curve no longer perfectly bisects the area of the sample correlation probability curve. When one side of the curve is truncated, the probability of the remaining values increase, distorting the distribution of area about the mean. Which means we can no longer assume that 50% of the area is on either side of the correlation that defines the curve.

To do the calculations, you have to correct for this distorting truncation to the probability curves. You can do this by transforming the data; typically you would transform each correlation into its Fisher z-score, using the following equation.

Figure 03

What the Fisher transformation does it to effectively remove the upper and lower bounds—at +1 and -1—that are a consequence of the nature of correlations. Figure 03 shows one way of understanding what the transformation does. The perfectly straight diagonal line represents untransformed correlations; it's the line where r=r. The long curving line is shows the results of Fisher transformations; it's the line where r=z(r). Near the origin the two lines are almost identical, but as the correlation magnitude approaches 1, the transformed values split away and extend toward infinity. This effectively removes the truncating bounds, so the probability curves have nothing to “bunch up” against.

Type I errors

Figure 04

The visual interpretation of Type I errors for this type of analysis differs from that for a null-hypothesis significance test. For a Type I error in a null-hypothesis significance test, the population is assumed to have a correlation of zero. While the same sort of wort-case scenario is used in an upper-bound significance test for understanding Type I errors, the specific worst-case used is different. For a Type I error, instead of being equal to null, the worst-case population correlation is assumed to be infinitesimally higher than the correlation selected as the boundary of importance.

Type I errors are false positives. In this case, that means that you would identify a population correlation as being below the minimum important correlation, when in fact it's equal to or above that minimum. Figure 02 shows a population correlation that is infinitesimally higher than the minimum important correlation. The line that bisects the population probability curve is the maximum correlation that would meet your hypothesis that the correlation is not important. So, the area below that line, and between the sample probability curve and the vertical baseline, is equal to the worst-case probability of getting an apparently unimportant sample correlation for a population that, in fact, has an important correlation.

Much like with the null-hypothesis significance test, we know that the area that covers the Type I error is 5%, since it is symmetric with the area that's between the population probability curve and the vertical baseline, and also above the value chosen as the minimum important population correlation. The sample population curve was positioned vertical such that the area above the minimum important correlation was 5% of the total, giving the analysis ‘p<.05’. Both the 5% areas are shaded in Figure 04.

Type II errors

Figure 05

Type II errors are false negatives. In this case that means you would fail to properly identify a population correlation as being below the minimum important correlation. Figure 01. The difference in Figure 05 is that instead of the statistical power being shaded, the area equal to beta is shaded. Any sample correlation that is higher than the maximum correlation that is significantly below the importance cut-off will lead to a Type II error. The population does have an unimportant correlation, but the sample analysis fails to identify it.

One important thing to note is that the two areas related to Type II errors in these tests are reversed from those for Type II errors with null-hypothesis significance tests. In the latter, the statistical power was the area farther from the null-line. In upper-bound tests, the statistical power is the area nearer to the null-line. This is because the null-hypothesis test are effectively a specialized sort of lower-bound test—one in which the lower-bound is equal to zero.

Figure 06

Of course, these figures are assuming that the correlations have undergone a Fisher transformation. For comparison purposes, Figure 06 is an example of untransformed correlations. It's also a different analysis that the one used in the earlier figures, and shows a statistical power of 39.2%. What's particularly interesting is that the expected population correlation has been moved up to not only be close to +1, but is also equal to the upper-bound. Without the truncation inherent in correlation analyses, the statistical power would have been only 5%. But since that inherent truncation has an uneven effect on the two non-transformed probability curves, it means that they are no longer symmetric, so their respective mean lines no longer slice through identical areas at identical locations, so the areas bounded by them are no longer equal.

So, while with non-truncated curves alpha would have been exactly equal to the statistical power, at 5%, with the truncation, the statistical power has grown to almost 40%. Presumably, this increase would continue as the upper-bound approached the maximum possible correlation of +1, and should achieve 100% when the upper-bound equals +1. (Which makes sense—every untransformed correlation is at or below 1 in magnitude.)