We have discussed previously some of the limitations of regression analysis and the importance of an appropriate sample. In today’s post we delve a little deeper into these issues and provide a broader perspective to amplify these points.
It is always helpful to contextualize and refocus on the goal of what is to be accomplished in a fair lending analysis. The objective is to answer a simple question: Is there evidence to suggest that the lender discriminated (whether intentional or not) in its credit practices?
To provide that answer, we need to be able to determine if loan applicants were treated differently, and if they were treated differently on the basis of protected class status.
Delving Into the Question
Setting aside the scientific rigor that would be required to establish cause and effect, let’s approach this as a research question. How would one determine if applicants were treated differently with respect to loan decisioning or pricing and that it was due solely to protected class status?
Simply, all variation related to factors other than protected class status would have to be accounted for, thus, the use of regression analysis.
The point of using regression is to attempt to control all the variation in the measurement variable (such as loan approval or denial) that is related to factors other than protected class, therefore, isolating the true effect of protected class status. This is actually not an easy thing to accomplish in a cross section research design such as a typical fair lending analysis.
A Quick Example
To illustrate this, let’s consider a research design at the opposite end of the spectrum – one in which cause and effect could actually be established scientifically, and that is an experiment. Assume we wanted to test whether a certain plant fertilizer was effective in accelerating plant growth.
In order to establish that the treatment worked, we would again want to be able to account for all of the variation in plant growth unrelated to the treatment, thereby, ruling out alternative explanations for any improvement in growth.
For simplification, let’s assume that our testing will be limited to (2) plants: one that gets the treatment and one that does not. The first thing we would want to do is control the environment. We would not want our results to be tainted by odd weather patterns, such as excessive rain, drought, or heat. We would want the two plants in a greenhouse in the same environment. We would also want the same soil composition, amount of water, sunlight, and so forth to make sure we keep everything the same for both plants except for the treatment.
Our subjects would also have to be similar, as we would not want to compare a tomato plant to a blueberry bush. Doing so obviously would introduce variation that could not be accounted for. We would instead choose the same type of plant, same species, and age in order to effectively isolate the effects of our treatment.
Getting Back to Fair Lending
We are in essence trying to accomplish the same thing in fair lending. However, we are not able to control the environment, nor are we likely to be able to choose our subjects. As a result, regression becomes the method used to control “the environment” or variation related to the measurement variable outside of our “treatment” variable (protected class status).
It should be clear at this point that the experimental project is far superior in establishing cause and effect, but also that the same kind of thought should be put into formulating a fair lending regression analysis.
One simple step toward this end is in stratifying the sample to be analyzed. For example, combining different “species” of loans in an analysis that have their own unique circumstances and then attempting to control the variation through regression simply introduces extraneous variation which will make parameter estimation more difficult. In addition, sometimes these differences are outside of the lender’s control.
There is sometimes a desire to analyze as large a volume of loans as possible, but this is easily accomplished by analyzing the data in different samples. This would reduce the variation that the model has to control for and will produce more precise and accurate estimates.
In summary, using regression is an effective and powerful tool for fair lending analysis. The regression model is, however, simply a mathematical equation that relies on a set of assumptions. This means the old adage of “garbage in – garbage out” always applies. As is the case with any tool it must be applied correctly.