# Issues in Regression Modeling For Fair Lending Underwriting Analysis (Part 1 of 2)

»  Issues in Regression Modeling For Fair Lending Underwriting Analysis (Part 1 of 2) Evaluating loan application outcomes (approval or denial) in the context of fair lending is referred to as an “underwriting analysis.” Regression modeling is commonly employed in such analyses.

Because the variable of interest is dichotomous assuming the outcome is either approved or denied, the functional form usually chosen is a non-linear form such as Logit or Probit. In the last few decades, Logit has emerged as the model of choice in these analyses because of the ease of interpretation due to the capability to compute an “odds ratio.”

In non-linear models such as Logit or Probit, the coefficients themselves have no direct interpretation; with Logit, however, a simple transformation converts the coefficients to odds ratios which can be directly interpreted. There is no such direct transformation of the coefficients available in Probit.

The odds ratio provides a simple, convenient, and intuitive measurement in evaluating loan decision outcomes.  For example, when evaluating denial rates between a target and control group, an odds ratio of 2.0 means the odds of denial for the target group are 2.0 times higher than those of the control group. This is simple and easy to understand.

The odds ratio does, however, have some shortcomings. (We have discussed these in previous posts, and we will expound on this along with other issues with regard to underwriting analysis generally in a forthcoming white paper.) Here we just raise a few precautionary notes with regard to evaluating results.

The first thing to be aware of with regard to non-linear models such as Logit or Probit is that different software packages may do calculations differently. These models use Maximum Likelihood (MLE) for parameter estimates, and there are different ways the calculations can be done so that actual results from regressions from identical datasets can produce different results. Additionally, issues with the data, such as collinearity, may be handled differently. This can also affect the results, and some difference can be rather stark.

Second, it is important to understand how calculations are being done when examining and presenting results. For example, marginal effects (which we have discussed in a previous post) can be calculated a number of different ways.

One way is to hold all the variables in the model constant at the means and then compute the predicted probability assuming the individual is in the target group (Pt) and again assuming the individual is in the control group (Pc); the marginal effect is then Pt – Pc.

Another is to calculate this marginal effect for every individual in the sample and then take the average of these effects. This method is known as average marginal effects or average partial effects. In a model that contains a single dummy variable, these two methods will yield identical results.

However, if the model contains continuous variables, then the results may differ slightly. Different model specifications from the same dataset may yield very different results, but the difference could be computationally related unless one is aware of how calculations are being done.

Often the odds ratio for Logit and the marginal effects are reported together. The odds ratio computed from the marginal effects can be different than that generated by the software applications depending upon how the calculations are being done. This again reinforces why it is critical to understand particular calculation methods.

In Part 2 of this post, we will address (2) other common issues with regard to underwriting regression modeling.