The use of proxy methods in fair lending analysis – and in particular fair lending enforcement actions – has drawn much attention. In an article entitled “BISG Methodology and Its Impact on Regression Analysis,” we studied the use of proxy methodologies as applied in regression analysis designed to test for potential discriminatory practices.

In this study, we ran a number of simulations to test various aspects of proxy methods including accuracy and examining different ways to quantify the treatment variable. Below, we explain in simple terms how the use of proxies for target and control group designations, as opposed to having data, could affect the measured differences in treatment.

There has been much “buzz” and criticism of proxy methods used in fair lending. Detractors have pointed to the rate of error in distinguishing protected and non-protected class designations when compared to having actual data. While it is true that the use of proxies does introduce bias with respect to the regression coefficient, this bias is toward zero meaning the bias is __in favor of NOT finding discrimination__. The use of a proxy then actually works in the institution’s favor with respect to testing for potential disparate treatment. This is due to the measurement error introduced via the proxy.

We can illustrate this with a simple example:

Suppose we have a sample of data in which minorities received a 10% rate while non-minorities received a 5% rate (other factors equal). So, if we observed the **actual** race of everyone in sample, all minorities here would pay 10% and whites 5% and we would get a regression coefficient (the measured difference in treatment based on race) of 5.0. Therefore, a disparity of 5.0 exists in which minorities were charged higher rates.

Let’s now assume in this same sample of data we do not have race information and must use a proxy in our model as opposed to the actual race of the applicant. We further assume an accuracy rate of target and control group assignment of 80%. This means that 1 out of 5 of the minorities are misclassified as non-minority which works in the institution’s favor since it appears some minorities paid the lower rate. Also, 20% of non-minorities are misclassified as minorities. This also works in the institution’s favor as it appears some non-minorities received the higher rate.

The data then shows that 80% of minorities were charged a 10.0 rate and 20% a 5.0 rate equaling a weighted average of 9.0. Likewise, 80% of non-minorities were charged 5.0 and 20% were charged 10.0 which equals a weighted average of 6.0. Now the data shows a disparity of only 3.0 (a 9.0 average for minorities and a 6.0 average for non-minorities) as opposed to the actual disparity of 5.0.

Further, if we change our accuracy rate to 50% as opposed to 80% the disparity appears to go away completely in this scenario. One-half of the minorities were charged 10.0 and one-half 5.0 and the same for non-minorities. This equals a weighted average of 7.5 for both groups (.5*10+.5*5 = 5.0 + 2.5 = **7.5**). Additionally, if we classify fewer than 50% correctly then it would appear there was a disparity in favor of minorities. Generally, therefore, the use of proxies as opposed to actual data would be more likely to mask a disparity (or discrimination) rather than suggest one that did not exist.