Binary choice models arise when the dependent variable (the measurement variable or what is to be explained) in a regression model y only takes two possible outcomes, generally 0 or 1.
For example, it is common in a fair lending analysis of underwriting to regress denial (y=1 if denied, 0 approved) on a target group indicator variable and other explanatory factors to measure conditioned denial incidences.
One way to proceed is to simply ignore the fact that y is binary and estimate the regression by Ordinary Least Squares (OLS). The OLS model is commonly used in fair lending analysis of pricing. However, with y as a [0, 1] outcome variable, the OLS model is referred to as the linear probability model (LPM) since the predicted value of y is the estimated probability that y =1 given x. The same principle applies to other types of analysis where y is dichotomous.
There are two drawbacks to this approach: (1) the error term is heteroskedastic or non-constant which can affect hypothesis testing, and (2) the predicted probabilities may fall outside the [0, 1] interval.
There are methods to account for the heteroskedastic disturbance term, but to address the latter deficiency, Logit and Probit estimation have become the standard since they constrain the predicted probabilities to the [0, 1] interval. These have also become the models of choice for fair lending analysis with respect to underwriting. This is particularly true of the Logit model.
In terms of testing parameter significance, generally there may often be little difference between the LPM, Logit and Probit models. However, this turns on the data that is being analyzed and the distributions and compositions thereof, such as the sample sizes and the degree of variation in the variables.
In addition, although Logit and Probit are the preferred functional forms for these types of analyses, there are issues that can arise in Logit and Probit modeling that may not always be readily apparent without close examination. This can result in biased parameter estimates and, therefore, can affect conclusions that are drawn. This is potentially problematic in regard to fair lending analysis.
In Part 2 of this post, we will address some of these issues more specifically. We will also show comparison of the models and how they perform under different fair lending scenarios.