### Why does *alpha* trump *beta?*

Statistical power or *beta* doesn't come up very often in humanities papers that involve null hypothesis significance testing. I believe that's because most researchers are fishing for results instead of testing specific hypothesis, but that's not the point at the moment. Without a consideration of the statistical power of an experiment, there can be no discussion made of Type II errors.

The example graphic (Figure 01) represents a Type II error. It shows a linear regression done at a 95% confidence level, that has a statistical power of 84%.

Remember that a Type II error is when a population correlation exists, but the sample fails to find it. So, in the example graphic, a Type II error would be any sample correlation that is below the minimum statistically significant correlation. Therefore, the area of the sample probability curve on the left side of the graphic that is shaded covers the sample correlations that would be a Type II error.

If your sample correlation is low enough that it would cross the shaded area on the left side of the graph, your result is not statistically significant. Which, I suspect, is the point when many researchers stop in the analysis of their data. Because sample correlations that are not significantly different from null are largely unpublishable. They are not viewed as ‘negative results’; rather, they're viewed as a non-results.

But is that appropriate? If you calculate a statistical power of 84%, and then you fail to get a statistically significant sample correlation, there is a 16% that you just have bad luck. But there's a 84% chance that there's something wrong with effect size you were expecting.

Is finding that your expected size of effect was wrong a useful result? That depends on how you went about coming up with your effect size. (It also assumes you did in fact consider statistical power in the design of your study, which often doesn't seem to be the case in humanities research.) In general, you probably estimated the expected effect size based on one of three things:

- The results of previous studies of the same correlation.
- The results of previous studies of similar correlations, combined with accept theories about the relation between those correlations and the one you're studying.
- An interpretation of what magnitude of result would qualify as “important”.

If you are doing a “replication study”, then you used the first approach. And when you failed to find a statistically significant result, that means that there's a 84% chance that those previous studies were wrong. That would seem to be worth further investigation, but is not something you see very often (if ever) in the humanities literature.

If you aren't doing a replication study, you likely used the second approach. So, when you failed to find a statistically significant result, per NHST, the implication is that either the previous related studies were wrong, or the accepted theories that relates those studies to yours are wrong. Again, while that seems like a potentially interesting result, you rarely see such a thing explored in the humanities literature.

In any of both of those cases, by not exploring the negative result, you are effectively saying that an 84% unlikelihood of a Type II error isn't compelling. Yet, a 95% percent unlikelihood of a Type I error is considered, not merely compelling, but definitive. (Compare Figure 01 and Figure 02.) So, if a 95% confidence level is enough to get a positive result it published, what's the required level of confidence in a negative result to get it published? There doesn't seem to be one. The utility of negative results from studies with large statistical power remains, as far as I know, largely unused.

But what about the third case? That's when you estimated your expected size of effect on an interpretation of what magnitude of result would qualify as “important”. That's discussed in the next section.