We have discussed previously in various posts some of the challenges of using regression and other statistical methods for fair lending reviews as well as the current emphasis on quantitative methods. Over the last 10 years, the regulatory and enforcement agencies as well as institutions have been gravitating more and more towards statistical methods when it comes to assessing fair lending performance and issues.

There are likely a number of reasons for this, but the most obvious is one of simplification. This may seem contradictory, as for many just the mention of statistics invokes memories of fear and dread from high school or college classes. Perhaps a better way to say it is using statistics to evaluate fair lending provides “packaging” or a measurement tool that removes (or gives the illusion of removing) subjectivity. Despite the complexities, this makes the application of statistics very attractive. The reason is (again seemingly) it provides a definitive answer to the question that is being asked.

## The Use of Regression & Other Statistical Methods

Both the agencies and financial institutions alike are drawn to the use of statistical methods for fair lending assessment. Institutions are aware that the agencies use statistical methods and place a great deal of value on it as a tool. Institutions, then, obviously have an incentive to use these methods. From the agency perspective but also to some degree from the institution’s, the use of statistics and, in particular, “regression modeling” bestows a degree of sophistication that is deemed much more authoritative than a traditional file review.

On one hand, the agencies need a tool by which to be able to reach a conclusion in an examination and then be able to support that conclusion. They also need a way to quantify damages if there is a finding of discrimination. Statistical methods provide a clear path to doing that. Similarly, institutions yearn for certainty (after all risk management is reducing uncertainty) and to know that they are “ok.” Again, statistical methods appear to provide such an answer.

The reality is, however, that often this simplification is made possible only by “loose” application and interpretation of these methods. It is not a question of the mathematics or equations being correct, as software takes care of that today. It is more in how these tools are applied to the problem and used in drawing conclusions.

As the use of statistics has become common place, most practitioners and examiners understand the basic concepts when interpreting statistical results. The understanding is largely limited to the question of “statistical significance.” Most understand that when there is a “disparity” (i.e. higher loan pricing for females versus males), and the disparity is “statistically significant”, that is a bad thing and means discrimination. There may even be bantering about of other lingo, such as t tests, p values, and the variables used in a regression model. The problem here is that being able to repeat the jargon is one thing and understanding what it really means in each particular situation is another. And, it leaves a lot unsaid – a tremendous amount, in fact.

## One Critical Fact About Regression & Fair Lending Analysis

In any type of fair lending review, the goal is to isolate the effects of protected class status by keeping all-else-equal. In regression analysis, this is referred to as “holding other factors constant.” In practice, this is accomplished by including relevant criteria in the form of explanatory variables in the model which are intended to account for any differences that are independent of protected class status and affect the measurement of interest. If it is a pricing analysis, for example, typically attributes on the rate sheet are used to price loans; and these would have to be accounted for in order to truly measure the effects of class status.

There is an inference of causality with respect to any fair lending review conducted. In other words, after accounting for relevant factors, any differences noted are attributed to the target group’s identification and, therefore, are assumed to be discriminatory.

As noted earlier, the application of sophisticated methods such as econometric analyses (i.e. regression) tends to be viewed as authoritative and, therefore, the causal link has been established. It is certainly true such methods are powerful and are used in science and many fields that require statistical analyses. However, what is often ignored is that a cross-sectional analysis, which constitutes the typical fair lending regression analysis, is not robust enough to establish a causal link scientifically. In fact, the cross-sectional design is the **weakest **of all types of analysis for establishing causality, with the exception of the one-shot case study. A cross-sectional analysis can establish correlation or independence. If two measures move together (such as loan approval being more likely for higher credit scoring borrowers), then they are correlated. If they do not move together, then they are considered independent. What it **cannot **do is establish cause and effect.

Here’s why: If A and B are correlated, there are three possible explanations concerning causation. A causes B, B causes A, or the correlation is spurious or just by chance. In fair lending, the first two (casual direction) is easy to rule out since behavior cannot change a person’s race, ethnicity, or gender. In order to establish causation, however, (i.e. that the differences noted were discriminatory) all other alternative explanations have to be ruled out. In a cross-sectional analysis, it is just not possible to do that.

## Another Common Issue

As we have written about previously, related to the point above and a problem that plagues fair lending work is that of omitted variables. A fundamental assumption of the regression model is that it contains all relevant explanatory variables. In real-world, applied work across all disciplines, this is often not the case. This is particularly true with regard to fair lending analysis.

Per the above, we have written about this before and will refrain from repeating a detailed explanation here. In a nutshell, however, if an important explanatory factor is left out and the left out factor happens to be correlated with another factor in the model, much if not all of the effect will be attributed to that variable.

As an example, suppose a bank weighs heavily in its underwriting (approval/denial decisions) on a borrower’s payment history with the bank. The bank would view a borrower with a low score but a good payment history with the bank differently than another borrower with a low score but no payment history with the bank. If these data were modeled, and all that was available was credit score and no information on payment history, the result could attribute the effects of payment history to some other attribute (including protected class status). These types of situations are encountered frequently in applied fair lending work.

## Model Specification Issues

The above points hold true even if there is a perfect model and perfect data. That is, the model can be correctly specified with the data perfectly accurate and complete and it would not erase the challenges above.

Data issues are inherent in all data and sample sizes, and other complications often generate violations of the model assumptions. These do not always invalidate the results, but they could affect how they are interpreted.

## The Bottom Line

The point here is that statistical methods are **tools** in the process of evaluating fair lending risk and compliance. It is **one **of the methods that should be employed when appropriate to evaluate fair lending. In doing so, however, one should have an understanding of the available and unavailable data, the policy criteria used in generating these data, coupled with a clear interpretation of the results that includes any qualifiers.

One of the chief pitfalls to be avoided is to ignore data or information that provides explanation into the results of statistical tests. This especially true when there are policy factors that are used by the institution that are not available for use electronically and, therefore, are left out of the model.

A key benefit of using statistical analysis is to help focus attention on areas of greatest risk. As part of the fair lending practitioners’ toolkit, statistical methods can be used to help target file reviews and policy analysis as part of a fair lending evaluation.

Bearing in mind that statistical methods are __a__ tool but not the ONLY tool for fair lending evaluation will serve to keep the right perspective of its use and application.