# The Consequences of Using Proxies For Fair Lending Regression Analysis

»  The Consequences of Using Proxies For Fair Lending Regression Analysis In a whitepaper entitled “BISG Methodology and Its Impact on Regression Analysis,” we study the use of proxy methodologies as applied in regression analysis designed to test for potential discriminatory practices. In that study, we ran a number of simulations to test various aspects of proxy methods including accuracy and examining different ways to quantify the treatment variable.

In previous posts, we have presented some of these aspects conceptually in more detail to illustrate them in less technical terms. Below, we present a more concise summary which simply explains the implications of proxy and fair lending regression analysis.

Recall that a proxy is simply an estimated value that provides an indication of the classification of each borrower in terms of whether they are a target or control group applicant. The proxy is used in the regression equation in place of the actual race/ethnicity/gender of the applicant as the treatment variable. As the proxy is an estimated value, it introduces measurement error. The most important implications relative to a fair lending assessment can be summarized as follows:

1.  If there are no actual disparities i.e., everyone was treated the same, then both the actual and estimated regression coefficient are zero whether a proxy or the actual value is used in the equation. That is, if there really is no discrimination across groups, it doesn’t matter whether an applicant is correctly classified or not. If every applicant was treated the same the impact of group classification is zero. Therefore, in this instance, there is no bias from using a proxy.
2.  If there are actual disparities, for example, female borrowers were treated differently, the use of a proxy as opposed to the actual gender of the applicant would likely serve only to mask discrimination. The measurement error inherent in the proxy would cause the true disparity to be understated relative to the actual disparity. Therefore, the magnitude of any statistically significant disparity is likely lower than it would be if actual data on each applicant were available.
3.  The better the BISG methodology works, the less bias there is. If we classify 100% correctly, then we estimate the actual regression coefficient. In this case, the regression results would be identical when using a proxy or the actual group designation. Conversely, the worse the proxy methodology performs the less precise the measurement. And, as the precision lessens, the more the odds become stacked against statistically finding discrimination, even when if it exists.