In determining whether discrimination occurs based on race or ethnicity, particularly with regard to fair lending, it is obviously important to know the race or ethnicity of the applicant(s). However, for some non-mortgage products lenders are not allowed to collect these data. Therefore, one must use proxies for race and ethnicity.
About the BISG Methodology
In recent years, Bayesian Improved Surname Geocoding (BISG) has become the standard as it combines geographic and surname information to give probabilities for each applicant’s race and ethnicity. This methodology has been found to be more accurate than previous methodologies with regard to predicting race and ethnicity.
However, its impact on regression coefficients has not been extensively studied. In a white paper entitled “BISG Methodology and Its Impact on Regression Analysis,” we examined both the accuracy of the methodology as well as the impact on estimated coefficients used to test for discrimination. Below, we discuss one aspect of the study, the accuracy of the BISG methodology.
Results and Impact of BISG Proxy Matching
To conduct our assessment, we use the BISG probabilities and assign each customer a race/ethnicity. This assignment is made based on some arbitrary cut-off rule.
For example, we could assign an individual as black if the BISG probability of black is greater than 50% (or some other cut-off such as 20% or 80%). We then compare the assigned race to the true race, which is known in our sample. A lower threshold results in most black applicants being correctly identified; however, many white applicants are then incorrectly categorized as black.
As the threshold increases, more black applicants are incorrectly identified while more white applicants are correctly identified. Overall, our analysis indicated a threshold of 50%-60% resulted in the highest overall percentage of customers correctly identified.
Using a sample of 1,984 black or white customers (511 black and 1,473 white), we find that we can correctly identify race for roughly 85% of those where a probability was available based on the BISG matching methodology.
For a similar sample of 1,485 Hispanic or white customers (71 Hispanic and 1,414 white), we correctly identify ethnicity for 96% of them.
The accuracy of BISG will vary depending on the sample as some geographic areas and surnames provide more accurate information than others. We find that in general the BISG is fairly accurate. In future posts we will examine how even relatively small misclassification error impacts regression coefficients and tests of discrimination.
Download the full white paper on BISG Matching and Its Impact on Regression Analysis here: