In our previous post, we started introducing the concept of statistical significance. We began with making two important points.
First, statistical methods are applied in order to estimate or measure an unknown. A sample of data is analyzed which is then used to draw conclusions about a larger population. This is known as statistical inference. We are only interested in the results from the sample because of what it tells us about the larger population. In instances when this is not the case, there would be no need for statistical methods.
The second point was that through the use of statistics, scientific methods are employed. Since the standard of proof in science is so high, absolutes are rarely, if ever, attainable. Instead, probabilities are relied on in order to draw conclusions.
As an example, we concluded the post by considering hypothetically the possibility that the coin toss that began the overtime period in Super Bowl 51 may have not been fair. Would it be possible that the coin may have been tampered with and, if so, how could this be determined? One way, obviously, would be to physically inspect the coin and see if it were more weighted on one side or the other, for example. We know though that since there are only two sides to the coin, if it were a “fair” coin, it should land on heads 50% of the time and tails 50% of the time. We, therefore, could test the coin and see for ourselves.
We are now entering the realm of statistical significance. If we tossed the coin twice and got heads both times, would that indicate the coin was biased toward heads? Or four times and got 3 heads would that indicate the coin was biased toward heads? What if we tossed the coin twice and got 1 heads and 1 tails, would that mean it was fair? Obviously, we would not draw conclusions concerning the coin based just on these few tosses as it is easy to see that there is a high probability that outcomes are occurring just by chance, and do not necessarily suggest either fairness or bias.
Although the above example is simplistic, it is analogous to the statistical application of fair lending analysis. Our question in such an analysis is, from a representative sample of data (the tosses in this case), do the outcomes suggest some type of bias or are the results simply random variation in the data? The same is true in a fair lending analysis – were applicants treated fairly or does the data suggest some type of discriminatory preference or bias?
Now, let’s bring it all together. With respect to fair lending, when a result is “statistically significant”, the conclusion being drawn is that the observed correlation (such as with race or gender in fair lending) with outcomes are unlikely to have occurred just by chance. This, in turn, suggests that similar differences exist in the larger population, i.e. the institution’s lending practices. As in the case of the coin, if the coin was “fair”, we would expect on average for heads and tails to occur at roughly the same frequency in repeated tosses. If they did not occur about equally (assuming an adequate number of tosses), we could then determine if the differences in what is expected (in this case 50-50) and what occurred is statistically significant as opposed to randomness.
Statistical significance is determined and expressed as a probability of occurring by chance. In most cases, a probability of less than 5% is deemed “statistically significant” or sufficient to rule out what is being observed as random. This probability is known as the “p value.”