A fundamental assumption in fair lending regression analysis is that the model is correctly specified and contains all the relevant variables.

It is commonly understood that there are issues that can be created by omitting important variables, and we have demonstrated this in previous posts. The typical consequence of leaving out an important variable in a fair lending regression analysis is that the model may suggest a discriminatory practice when, in fact, the disparity noted is the consequence of a non-discriminatory factor that was not accounted for.

What if irrelevant variables were included instead of omitting critical variables? Could mistakenly including these lead to a conclusion of discrimination? And, could excluding important variables actually *mask* discrimination when it actually exists? The short answer is that both of these consequences are possibilities. Let’s address each question separately.

With respect to the first question, generally and theoretically adding variables that are truly irrelevant would have minimal impact on the regression equation. The key words here are *truly irrelevant**,* meaning that there is no correlation with the outcome being measured, the target group indicator variable, or other covariates. In this case, including such a variable would have little or no impact.

To the extent there is correlation, however, this may not hold true and adding such a variable could distort or bias the results. It is important to understand here that this is true even if such correlation is completely random. As an example, let’s suppose a lender had a pricing structure in which loans with terms of 5 years or less were priced at the same rate.

In our model, however, we controlled for terms of 1, 2, 3, 4, and 5 years individually. Depending on the sample distribution, doing so could affect the results and lead to an erroneous conclusion and the possibility of the model suggesting there was discrimination when there actually was not.

In terms of the second question, it is also possible that excluding relevant variables could result in a failure to detect discrimination when it was actually present. Again, as an example, let’s assume a lender prices loans for condominiums higher than for typical single-family structures.

If all or most of the higher priced loans for the control group was condominiums, it is possible that a model not taking into account property type could indicate no disparity in pricing whereas a model that correctly controlled for property type would (correctly) indicate a disparity.

The key takeaway here is that fair lending compliance entails more than just running regression analysis. Lenders must understand their lending practices and enforce policy discipline in order to manage fair lending risk. In addition, with respect to statistical methods, often principles that are sound in theory do not always hold in practice because the methods employed are always limited by data availability and the sample under review.